The Site Reliability Workbook: Practical Ways to Implement SRE is the hands-on companion to the bestselling Site Reliability Engineering book and uses concrete examples to show how to put SRE principles and practices to work. This book contains practical examples from Google’s experiences and case studies from Google’s Cloud Platform customers. Evernote, The Home Depot, The New York Times, and other companies outline hard-won experiences of what worked for them and what didn’t.
Table of Contents
- Chapter 1 – How SRE Relates to DevOps
- Chapter 2 – Implementing SLOs
- Chapter 3 – SLO Engineering Case Studies
- Chapter 4 – Monitoring
- Chapter 5 – Alerting on SLOs
- Chapter 6 – Eliminating Toil
- Chapter 7 – Simplicity
- Chapter 8 – On-Call
- Chapter 9 – Incident Response
- Chapter 10 – Postmortem Culture: Learning from Failure
- Chapter 11 – Managing Load
- Chapter 12 – Introducing Non-Abstract Large System Design
- Chapter 13 – Data Processing Pipelines
- Chapter 14 – Configuration Design and Best Practices
- Chapter 15 – Configuration Specifics
- Chapter 16 – Canarying Releases
- Chapter 17 – Identifying and Recovering from Overload
- Chapter 18 – SRE Engagement Model
- Chapter 19 – SRE: Reaching Beyond Your Walls
- Chapter 20 – SRE Team Lifecycles
- Chapter 21 – Organizational Change Management in SRE
- Conclusion
- Appendix A – Example SLO Document
- Appendix B – Example Error Budget Policy
- Appendix C – Results of Postmortem Analysis
Download Free PDF / Read Online
Author(s): Betsy Beyer, Niall Richard Murphy, David K. Rensin, Kent Kawahara, Stephen Thorne
Publisher: O’Reilly Media, Inc.
Published: July 2018
Format(s): Online
File size: –
Number of pages: 512
Download / View Link(s): Read online
Publisher: O’Reilly Media, Inc.
Published: July 2018
Format(s): Online
File size: –
Number of pages: 512
Download / View Link(s): Read online