• Menu
  • Skip to right header navigation
  • Skip to main content
  • Skip to secondary navigation
  • Skip to primary sidebar

OnlineProgrammingBooks.com

Legally Free Computer Books

  • All Categories
  • All Books
  • All Categories
  • All Books
  • About Us
  • Privacy policy
  • Disclaimer
  • Subscribe
  • Contact
You are here: Home ▶ Networking ▶ Site Reliability Engineering

Site Reliability Engineering

February 15, 2022

Site Reliability Engineering

In Site Reliability Engineering: How Google Runs Production Systems, Members of the SRE team explain how their engagement with the entire software lifecycle has enabled Google to build, deploy, monitor, and maintain some of the largest software systems in the world.

 

 

Table of Contents

  • Chapter 1 – Introduction
  • Chapter 2 – The Production Environment at Google, from the Viewpoint of an SRE
  • Chapter 3 – Embracing Risk
  • Chapter 4 – Service Level Objectives
  • Chapter 5 – Eliminating Toil
  • Chapter 6 – Monitoring Distributed Systems
  • Chapter 7 – The Evolution of Automation at Google
  • Chapter 8 – Release Engineering
  • Chapter 9 – Simplicity
  • Chapter 10 – Practical Alerting
  • Chapter 11 – Being On-Call
  • Chapter 12 – Effective Troubleshooting
  • Chapter 13 – Emergency Response
  • Chapter 14 – Managing Incidents
  • Chapter 15 – Postmortem Culture: Learning from Failure
  • Chapter 16 – Tracking Outages
  • Chapter 17 – Testing for Reliability
  • Chapter 18 – Software Engineering in SRE
  • Chapter 19 – Load Balancing at the Frontend
  • Chapter 20 – Load Balancing in the Datacenter
  • Chapter 21 – Handling Overload
  • Chapter 22 – Addressing Cascading Failures
  • Chapter 23 – Managing Critical State: Distributed Consensus for Reliability
  • Chapter 24 – Distributed Periodic Scheduling with Cron
  • Chapter 25 – Data Processing Pipelines
  • Chapter 26 – Data Integrity: What You Read Is What You Wrote
  • Chapter 27 – Reliable Product Launches at Scale
  • Chapter 28 – Accelerating SREs to On-Call and Beyond
  • Chapter 29 – Dealing with Interrupts
  • Chapter 30 – Embedding an SRE to Recover from Operational Overload
  • Chapter 31 – Communication and Collaboration in SRE
  • Chapter 32 – The Evolving SRE Engagement Model
  • Chapter 33 – Lessons Learned from Other Industries
  • Chapter 34 – Conclusion
  • Appendix A – Availability Table
  • Appendix B – A Collection of Best Practices for Production Services
  • Appendix C – Example Incident State Document
  • Appendix D – Example Postmortem
  • Appendix E – Launch Coordination Checklist
  • Appendix F – Example Production Meeting Minutes

Download Free PDF / Read Online

Author(s): Betsy Beyer, Chris Jones, Jennifer Petoff and Niall Richard Murphy
Publisher: O’Reilly Media, Inc.
Published: April 2016
Format(s): Online
File size: –
Number of pages: 552
Download / View Link(s): Read online

Similar Books:

  1. 55 Ways to Have Fun With Google
  2. Free eBook: Google Apps: The Missing Manual
  3. Cutting Edge Robotics
  4. Guide to the Software Engineering Body of Knowledge
  5. Mastering Ethereum
Previous Post: « Learning Pixelmator
Next Post: The Site Reliability Workbook »

Primary Sidebar

Get Latest Updates

  • Facebook
  • Pinterest
  • RSS
  • Twitter
  • YouTube
  • About Us
  • Privacy policy
  • Disclaimer
  • Subscribe
  • Contact

Copyright © 2006–2023 OnlineProgrammingBooks.com