William Ting, "Self-Healing Systems: The Road to 99.99% Uptime", PyBay2016

Описание к видео William Ting, "Self-Healing Systems: The Road to 99.99% Uptime", PyBay2016

Stop firefighting and start fireproofing! There are many tools that make oncall easier and increase availability, but we'll be mostly focusing on a few principles and design patterns that help make your systems more robust.

Abstract
Feature velocity is typically a higher priority early in a software's lifecycle, but as the system matures there is an effort to start fireproofing the system. On the Yelp Transactions Platform team we've used a combination of circuit breakers, queues, and idempotent operations to minimize downtime and waking up in the middle of the night.

We'll take a look at how these design patterns help us in a distributed system, when they should be used, and common pitfalls associated.

Bio
William Ting is a longtime FOSS advocate with contributions in various projects (Pelican, autojump, pyramid_swagger, Rust, GNOME). He's currently an infrastructure engineer at Reddit, and previously on the Yelp Transaction Platform team.

https://speakerdeck.com/pybay2016/wil...

Комментарии

Информация по комментариям в разработке