As incidents continue to occur, teams generally respond by starting to track more metrics, such as “Mean Time To Detect” (MTTD)—the gap in time between the issue beginning and an alert getting triggered—and “Mean Time To Mitigation” (MTTM)—the time between that first alert and when you’ve contained the user impact. Evaluating incident response effectiveness, but often fail to direct you where you should be improving. The answer is extending your incident response program to also include incident analysis - a meeting where the group reviews groupings of incidents to identify improvements.

Move past incident response to reliability
from GitHub favicon