Tutorial: Mastering Etcd Observability: A Comprehensive Guide to M... Bogdan Kanivets & Vivek Patani

Описание к видео Tutorial: Mastering Etcd Observability: A Comprehensive Guide to M... Bogdan Kanivets & Vivek Patani

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon Europe in Paris from March 19-22, 2024. Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at https://kubecon.io

Tutorial: Mastering Etcd Observability: A Comprehensive Guide to Metrics, Monitoring, and Incident Handling - Bogdan Kanivets & Vivek Patani, Apple

Etcd is a critical component of Kubernetes. It defines ~100 metrics within the codebase. If you are on-call, do you know which metrics to check and what they mean? When should you trigger an alert? This tutorial is a comprehensive guide to etcd observability based on our production experience running large-scale etcd clusters. We begin by categorizing metrics according to different areas of etcd's code: etcdserver, storage, peers, and maintenance. We’ll look at the key concepts using architecture diagrams. Next, we’ll have a hands-on session that explores common etcd operations and the associated metrics. Finally, we will show what to expect during typical production incidents, such as disk/network issues and an overloaded etcd. We’ll use the etcd benchmark tool to stress local cluster and etcd proxy to insert network failures. The audience will learn how to set up production-ready etcd monitoring. You will leave with a deeper understanding of etcd internals and logic behind metrics.

The tutorial has been only tested on MacOS/Linux environments. Link to Github tutorial - https://github.com/lavacat/kubecon-et.... Docker will be required to follow along.

Комментарии

Информация по комментариям в разработке