

Intelligent Incident Management and AI Operational Support for Modern Systems
Devz
ROLES
Product Design, UX / UI Design
IMPACTS
30+ Customers
BACKGROUND
As modern systems become increasingly complex, engineering teams face growing challenges in managing and resolving operational incidents quickly and efficiently.
Traditional incident management workflows often rely on manual triaging, fragmented communication, and multiple monitoring tools. As a result, teams spend significant time identifying root causes, coordinating responses, and documenting incidents, which can slow recovery and increase operational costs.
Sentry was designed to streamline incident management through an AI-powered AIOps platform. With an intelligent AI agent acting as a custom incident engineer, the platform can automatically triage alerts, assist with remediation actions, and support root cause analysis, enabling teams to respond faster and improve overall operational efficiency.
PROCESS OVERVIEW

PROBLEM
As infrastructure systems become more complex, engineering teams face pressure to detect and resolve incidents quickly while maintaining service reliability.
However, many existing incident management tools still fall short in several key areas:
Manual Incident Triage
Engineers must manually review alerts, which is time-consuming and risks delayed responses.
Fragmented Workflows
Switching between monitoring, messaging, and documentation tools slows coordination and clarity.
Remediation Gaps
Tools detect incidents but rarely guide corrective actions, prolonging resolution and workload.
Lack of Context
Teams struggle to gather necessary insights for root cause analysis, making troubleshooting inefficient.
Reactive Management
Solutions focus on responding to alerts rather than anticipating issues, limiting proactive prevention.
Excessive Alerts
Teams are overwhelmed by high volumes of notifications, making it hard to focus on critical issues.
RESEARCH
Goals
Enhance incident response with a user-centered design that leverages Sentry’s AI engineer, Devi, to streamline triage, remediation, and root cause analysis.
The design focuses on creating a clear, intuitive interface that empowers teams to respond to alerts efficiently while minimizing manual effort. By integrating Devi, Sentry’s AI agent, the platform intelligently guides users through incident resolution, provides actionable insights, and centralizes contextual information for root cause analysis. The experience aims to strengthen system reliability, reduce operational overhead, and deliver measurable improvements in overall incident management efficiency.
Competitive Analysis
We researched and created this competitor analysis to compare leading incident management and AIOps platforms with our own product, Sentry. The analysis reviewed key capabilities such as alerting, automation, remediation support, and root cause analysis to better understand how different platforms support incident response workflows. These insights helped identify gaps, strengths, and opportunities that informed design decisions and improvements for Sentry’s AI-driven incident management


PERSONA
IDEATION
Journey Map
The Sentry user journey illustrates the end-to-end workflow of incident management, from receiving alerts to preventing future issues. It highlights common pain points, user actions, and how Sentry’s AI-driven features streamline triage, remediation, root cause analysis, and reporting—turning a traditionally reactive process into a more proactive, efficient experience.

Site Map
The Sentry site map illustrates the platform’s hierarchical structure, showing how users navigate between core sections such as account management, support plans, incidents, the NOC hub, AI assistant, and calendar. It highlights the organization of dashboards, analytics, team configurations, and AI-driven features, ensuring a clear and intuitive flow for monitoring, responding to, and preventing incidents efficiently.

Lo-Fi Wireframes
After defining the main user task and flow, we attempted to create the first set of lo-fi wireframes to run some preliminary testing with the actual users, which allowed us to gather some initial feedback


DESIGN & PROTOTYPE
Design Iterations
During the interaction design phase, we encountered several challenges. To address them, we conducted multiple moderated user testing sessions and went through several rounds of iteration based on the feedback gathered. Insights from focus groups and stakeholder meetings helped us better understand user needs and refine our designs accordingly. Below are some of the key design improvements we made.

Could Be Improved
Limited operational visibility – It’s difficult for users to quickly assess system health or ongoing incidents
Incidents were not prioritized
Did not show relationships between services and incidents
Static information layout & Underutilized AI capabilities

New Design Based On Feedback
A node-based service map provides a clearer view of system health and relationships between components
The My Incidents panel now surfaces alerts directly on the homepage for faster access and response
Key metrics such as Incidents, AI Activity, and Prevention Insights are grouped together to provide a quick operational overview
Integrated AI insights and more actionable homepage experience
Hi-Fidelity Prototypes
Homepage
A visual monitoring dashboard designed to help teams quickly understand system health and track incidents within a single operational view.
The homepage provides an interactive overview of service activity and system status through a visual network map and real-time incident panels. Users can quickly monitor ongoing incidents, review alerts, and navigate related services without leaving the dashboard. The panels can slide within the interface for quick access while maintaining visibility of the system map, and the listings can also be expanded into a full-screen view for more detailed investigation. This flexible layout improves situational awareness for operations teams, helping them identify issues faster and manage incidents more efficiently.

Support Plan, Service & Incident
Centralize Your Support, Track Issues with Confidence
The dashboard offers a variety of customizable widgets—allowing users to tailor their workspace to their specific roles and needs. This flexibility ensures that team members at all levels can access the information most relevant to them.
My Support Plan
The My Support Plan interface provides a clear overview of active support plans while offering insight into overall system health. Users can explore available services and monitor ongoing issues, and toggle between a list view highlighting key service metrics and a service map view that visualizes system relationships and dependencies.

Services
The Service interface helps users monitor the overall health and performance of their services. The Service Management tab displays key metrics, including total incidents and counts by status—Open, Acknowledged, Blocked, Escalated, and Resolved—enabling teams to quickly assess priorities and respond efficiently. Users can also switch to the Service Maps tab to visualize system relationships and dependencies, or the Link Management tab to manage connections between services, providing a comprehensive and organized view of service operations.

Incidents
The Incident Listing View provides a clear and organized overview of all incidents, allowing users to quickly scan and manage ongoing issues. Each incident is displayed with customized key details such as status, priority, affected services, and timestamps based on their own needs, making it easy to identify critical problems at a glance. Users can sort, filter, and search incidents to focus on the most urgent items, enabling faster response times and better tracking of system reliability across all services.

Incident Details
See the Full Picture, Resolve Incidents Faster
The Incident Details page provides a comprehensive view of each incident, bringing together all the information needed to investigate and resolve issues efficiently. Users can review and edit key incident details along with related sections.
Add Incident

Incident Details

Recommended Solutions
The Recommended Solutions section helps teams quickly identify the best course of action during an incident. Sentry automatically recommends solutions by analyzing incident details, solution metadata, and past applications to similar incidents, with a clear rationale provided for transparency. Devi, Sentry’s AI assistant, can then apply the recommended solution automatically. If the resolution fails or key information is missing, the incident is escalated to the human incident team, with highlighted areas needing attention. By combining AI-driven automation with human validation, teams can resolve issues faster and focus on more complex work.

Analytics
Turning Complex Metrics Into Clear Actionable Insights
The Analytics section turns complex incident data into actionable insights. Users can track volume, unresolved issues, MTTA/MTTR, alerting apps, and other key metrics through customizable graphs. This flexible view helps teams spot patterns, optimize workflows, and make faster, informed decisions.


Solutions
Actionable Intelligence, Instantly
The solution empowers teams to monitor services, track incidents, and resolve issues faster. With real-time insights, clear metrics, and streamlined workflows, it transforms complex operational data into actionable intelligence. Past incident solutions can be reused for similar future issues, helping teams respond faster and prevent repeats.
Create Solution

Solution Details

Incident Retro
Smarter, Clearer Insights After Every Incident
The Incident Retro feature provides a structured view of past incidents, allowing teams to review timelines, analyze root causes, and evaluate the response process. Powered by AI, Sentry automatically compiles incident data, communication history, and remediation actions into a clear summary, helping teams identify patterns, improve workflows, and prevent similar issues while keeping a record for reporting and improvement.
Create Retro

Retro Details

On-Call Schedule
Full Visibility Into Current and Upcoming On-Call Coverage
The On-Call Schedules section provides a clear view of current and upcoming on-call responsibilities, along with a calendar view of the project’s schedule. Users can quickly see who is on duty, when shifts change, and plan accordingly. By centralizing this information, teams can reduce confusion, ensure coverage, and respond to incidents more efficiently.

View & Edit Schedules

Team Updates
Stay Connected & Informed, Updates in One Place
The Team Updates brings all team communications into one place, including Announcements, Jira Updates, Knowledge Sharing, Attention Requests, and Questions. By centralizing these updates, teams can quickly catch up on what matters, share insights, and respond efficiently. This streamlined approach keeps everyone informed, improves collaboration, and ensures important information is always visible.

NOC Hub
Centralized Visibility for Smooth & Efficient Operations
The NOC Hub gives teams a centralized view of system health, combining incidents, service statuses, alerts, and updates in one place. Users can monitor multiple services, prioritize critical issues, and coordinate responses efficiently. By keeping key information accessible, the NOC Hub helps teams respond faster and maintain smooth operations.
Ongoing INC

INC Details

Devi AI Assistant
AI-Powered Engineer for Teams, Reports, and Incidents
The Devi AI Assistant acts as a smart guide for operations teams, streamlining onboarding, triaging incidents, generating reports, and answering questions in real time. By providing instant guidance and structured support, Devi helps new team members get up to speed quickly, ensures incidents are prioritized effectively, and simplifies access to key information. This AI-powered assistant reduces manual effort, accelerates response times, and keeps teams informed and efficient.

Mobile App
Incident Management, Anytime, Anywhere
The Mobile App extends Sentry’s capabilities to teams on the go, providing real-time access to incidents, service statuses, and team updates directly from a smartphone. Users can view and acknowledge incidents, track on-call schedules, and receive notifications instantly, ensuring critical issues are never missed. Designed for quick interactions and easy navigation, the app empowers teams to stay connected, respond faster, and maintain operational efficiency no matter where they are.




























