incident management
AWS introduces a new service to streamline security event response, providing automated triage, coordinated communication, and expert guidance to recover from cybersecurity threats.
Show a public status page for configured nagios hosts and services - pgmac-net/nagios-public-status-page
(this is also posted on O’Reilly’s Radar blog. Much thanks to Daniel Schauenberg, Morgan Evans, and Steven Shorrock for feedback on this) Before I begin this post, let me say that this is intended to be a critique of the Five Whys method, not a criticism of the people who are in favor of using…
Read how Google is using System Theoretic Process Analysis (STPA) to analyze pure software systems and discover risks.
Cloudflare suffered a service outage on November 18, 2025. The outage was triggered by a bug in generation logic for a Bot Management feature file causing many Cloudflare services to be affected.
Well, looks like we have a dumpster fire on DynamoDB in us-east-1 again.