Power Loss Siren: Making Facebook Resilient to Power Outages
Power outages cause the majority of unplanned server downtime in Facebook data centers. During a power outage, thousands of servers can go offline simultaneously for several hours, which can lead to service degradations. At Facebook, all data center racks are equipped with batteries that can provide backup power for a few minutes after power outages. Power Loss Siren (PLS) is a rack level, low latency, distributed power outage detection and alerting system. PLS leverages existing in-rack batteries to notify services about impending power outages and helps mitigate the impact of power outages on services. Typical mitigations include promoting remote database secondaries when primaries are experiencing power outages, routing requests away from hosts experiencing power outages, flushing memory to disk, etc. PLS also helps simplify physical infrastructure management by not requiring additional power source redundancy for critical services.