InfoQ AI/ML•
Zesdag storing: technische en menselijke lessen uit een crisis
Back to overview
Molly Struve shares insights from a six-day outage that nearly destroyed her company, detailing both technical and human factors critical to recovery. Key technical lessons include the importance of failure mode and effects analysis (FMEA), shadow traffic testing, and regularly exercising rollback procedures. Beyond technical measures, Struve emphasizes that psychological safety depends on involving stakeholders early and having leadership that actively defends teams during crises.
Read full article
0 views