The Site Reliability Workbook: Practical Ways to Implement SRE

The Site Reliability Workbook: Practical Ways to Implement SRE

  • Downloads:5798
  • Type:Epub+TxT+PDF+Mobi
  • Create Date:2021-06-21 09:55:48
  • Update Date:2025-09-06
  • Status:finish
  • Author:Betsy Beyer
  • ISBN:1492029505
  • Environment:PC/Android/iPhone/iPad/Kindle

Summary

In 2016, Google's Site Reliability Engineering book ignited an industry discussion on what it means to run production services today--and why reliability considerations are fundamental to service design。 Now, Google engineers who worked on that bestseller introduce The Site Reliability Workbook, a hands-on companion that uses concrete examples to show you how to put SRE principles and practices to work in your environment。

This new workbook not only combines practical examples from Google's experiences, but also provides case studies from Google's Cloud Platform customers who underwent this journey。 Evernote, The Home Depot, The New York Times, and other companies outline hard-won experiences of what worked for them and what didn't。

Dive into this workbook and learn how to flesh out your own SRE practice, no matter what size your company is。

You'll learn:


How to run reliable services in environments you don't completely control--like cloud
Practical applications of how to create, monitor, and run your services via Service Level Objectives
How to convert existing ops teams to SRE--including how to dig out of operational overload
Methods for starting SRE from either greenfield or brownfield

Download

Reviews

August Schau

Great case studies and ideas for improving operations within an organization, small or large。

Andrew Hatch

Overall this is a good book and worthy follow up to the original。 However it is really long and many of the case studies presented are very similar so there are a lot "groundhog day" moments。 You will probably find you will read for quite some time before identifying advice or paragraphs that are worthy of highlighting or note-taking。Sections on incident diagnosis require caution by the reader。 There was a lot of advocation for Root Cause Analysis, Reductionism and linear thinking - practices th Overall this is a good book and worthy follow up to the original。 However it is really long and many of the case studies presented are very similar so there are a lot "groundhog day" moments。 You will probably find you will read for quite some time before identifying advice or paragraphs that are worthy of highlighting or note-taking。Sections on incident diagnosis require caution by the reader。 There was a lot of advocation for Root Cause Analysis, Reductionism and linear thinking - practices that will drift you into failure in very complex and dynamic systems。 More thought and understanding by the authors is needed before championing these processes。 。。。more

Ines

This book is great at explaining by example, which makes it easier to absorb the information。 I found chapters "boring" because they focus on areas that are alien to me, but mostly I think this book works wonderfully for future consulting: when confused or in need of help, go directly to the chapter that explains what you need。 This book is great at explaining by example, which makes it easier to absorb the information。 I found chapters "boring" because they focus on areas that are alien to me, but mostly I think this book works wonderfully for future consulting: when confused or in need of help, go directly to the chapter that explains what you need。 。。。more

SolidM

Bon livre mais certains chapitres/sujets plus intéressants que d'autres。 Bon livre mais certains chapitres/sujets plus intéressants que d'autres。 。。。more

Gary Boland

Similar to the first book, it is a worthy read and covers a lot of the best practices (including extensive techniques for dealing with toil)。 It also comes over as product placement advertising (write your software in Go, deploy it to GCP)。 Despite that, it is worth your time if you are in the IT industry

Nat

Written by a bunch of my coworkers, I really enjoyed this book。 Arguably much more practical than the original SRE book。 I find myself sending folks chapters from this book far more than the first。

Gui

Muito interessante ver como algumas questões relacionadas ao mundo SRE são aplicadas na prática。 Sinto que terei que voltar ao primeiro livro para entender como apresentar conceitos simples e basilares ao restante da empresa para iniciar um gradual processo de adoção de algumas práticas。

Rastko Vukasinovic

More of this Google tech writings in this zone I read, more redundancy I feel - exceptional writing style and discuss approach are covering for lack of original thought comparing to the first book。Still recommended read in some of my coaching sessions, modern and high quality approaches right from the source。 Could live without it, tho 😀

Ahmad hosseini

This is the second book that Google published about SRE。 The first one explains the theories and principles of SRE and this book shows you how to implement SRE at any company, startup and giant。What is SRE?SRE is a job role, a set of practices we’ve found to work, and some beliefs that animate those practices。 If you think of DevOps as a philosophy and an approach to working, you can argue that SRE implement some of the philosophy that DevOps describes。 So, in a way, class SRE implements interfa This is the second book that Google published about SRE。 The first one explains the theories and principles of SRE and this book shows you how to implement SRE at any company, startup and giant。What is SRE?SRE is a job role, a set of practices we’ve found to work, and some beliefs that animate those practices。 If you think of DevOps as a philosophy and an approach to working, you can argue that SRE implement some of the philosophy that DevOps describes。 So, in a way, class SRE implements interface DevOps。 Book provides good practices and have good case studies from Spotify, DayerDuty, Pokemon Go, and etc。 that can help you to understand practices。 Book examines real incidents in Google and other big companies and how their SRE teams handle them。 At the last part of the book, there are good advices for creating and managing SRE teams in every kind of companies。 。。。more

Miguel Alho

I've found a great set of ideas and practical examples to bring back into my work, even though i am not an SRE。 I've found a great set of ideas and practical examples to bring back into my work, even though i am not an SRE。 。。。more

Mike Phung

great book about site reliability, i never have read any book which covers the best in this domain, it's clearly that you can only write the best book about what you are doing and Google is doing great。I enjoy the most with Post mortem chapter, the way how to build SRE team and how to design scalability system。 The Chapter about Confiuration Philosophy is also great as well as the importance of reliability。The best ist about SLA, SLO and SLI, and it's deserve the main pillar of Site reliability! great book about site reliability, i never have read any book which covers the best in this domain, it's clearly that you can only write the best book about what you are doing and Google is doing great。I enjoy the most with Post mortem chapter, the way how to build SRE team and how to design scalability system。 The Chapter about Confiuration Philosophy is also great as well as the importance of reliability。The best ist about SLA, SLO and SLI, and it's deserve the main pillar of Site reliability! Thank you very much! 。。。more

Charlie Gorichanaz

This was actually pretty practical。

Ivan

Та же SRE, но с практическими примерами из практики Google。 Читать без первой книги, наверное, смысла нет。

Steven

I stopped about three quarters through 。。。 was being overloaded with SRE。

Yaroslav

Great workbook for SRE practises。 I like last few chapters where authors explain how change management might be done to implement SRE。

Giovani Facchini

This is a great book for those who want to learn about how to build a SRE team and the main concepts that need to be taken into consideration for a successful team。On the positive front, it gives the reader a step by step approach on how to start such as definition of SLO as for when the team should take action so it focus only on what is important, to have the team focusing from 50% to 70% of their time in automating the resolution of problems and improving operations and smaller percentage of This is a great book for those who want to learn about how to build a SRE team and the main concepts that need to be taken into consideration for a successful team。On the positive front, it gives the reader a step by step approach on how to start such as definition of SLO as for when the team should take action so it focus only on what is important, to have the team focusing from 50% to 70% of their time in automating the resolution of problems and improving operations and smaller percentage of time in toil, and finally the blameless postmortem culture。Another good point is the focus on the human aspect of processes (culture, respect, challenge) since it is paramount to have a functioning stable team。 On the challenge front, it empowers team members to focus on the problem themselves and not just pass it onto someone else。 On the respect, the blameless postmortem tell us to focus on the technical problems and everything that happen for the situation (incident) to occur and how it could be avoided with changes in tools, automation, alerts, etc and taking the blame out of people for their mistakes。 Mistakes will happen。A lot of technical details are provided on how to solve specific problems and this is great for those who are starting and do not have experience in SRE team。 It also focus on many aspects of performance engineering, networks, parallelization, distributed processing and has great content for people interested in massive global distributed systems。The downside is on the oversimplification of transitions and the marketing style of writing for some of the business cases。 Since those cases came from companies trying to promote themselves, you may not find the real struggles and issues you will face in your team。 At least some hardships are described and you are able to get a feel about the transition process, but in the end of the day, the tacit knowledge plays a big part and it is hard to find it in writing in order for you to be able to understand things that can happen (behavior, challenges, culture, issues, etc)。 I strongly recommend this book for those who likes operations, automation, improving stability! Have fun reading。 。。。more

Dimitris

Excellent read, with a lot of interesting ideas on how to change your organization into an SRE-oriented structure。 There are a lot of methodologies and guides on how to achieve this, as well as real-world case studies - both of which make the book much more approachable than the first SRE book with its more theoretical approach。 I found the 3rd part of the book ("Processes") the most useful one and I consider it reason enough to read this book。 I wish I could have read the "Identifying and recov Excellent read, with a lot of interesting ideas on how to change your organization into an SRE-oriented structure。 There are a lot of methodologies and guides on how to achieve this, as well as real-world case studies - both of which make the book much more approachable than the first SRE book with its more theoretical approach。 I found the 3rd part of the book ("Processes") the most useful one and I consider it reason enough to read this book。 I wish I could have read the "Identifying and recovering from overload" chapter from it 3 or 4 years ago。 I was very happy to see that this book puts a strong focus on the well-being of the people working in this field and makes it clear that burnout is an issue that is directly correlated to organizational issues。 。。。more

Sebastian Gebski

Solid 4。5 stars。Surprisingly good supplement to the original SRE book。 BUT be warned - it's a workbook, it's practical, but it doesn't mean it tech-based, in fact it's more conceptual & tech-agnostic。 Filled with many good examples, coming not just from Google, but from various (but all well known) organizations。What did I like most? That it's tight to real-life practices - there's a full chapter on SLO, another one on On-Call duties, Post Mortems, etc。 Some are quite specific (data processing o Solid 4。5 stars。Surprisingly good supplement to the original SRE book。 BUT be warned - it's a workbook, it's practical, but it doesn't mean it tech-based, in fact it's more conceptual & tech-agnostic。 Filled with many good examples, coming not just from Google, but from various (but all well known) organizations。What did I like most? That it's tight to real-life practices - there's a full chapter on SLO, another one on On-Call duties, Post Mortems, etc。 Some are quite specific (data processing one felt almost like out of place), but some can be easily related to any case - like configuration one (btw。 I didn't expect to learn anything new here, but actually I really like some conceptual figures when approaching this topic - ones I didn't use before)。Anything I didn't like? 3 final chapters need some more work - it's not just about polish, it's more like that they are not "selling their goal" properly - IMHO it was like some tangle of thoughts that was in general hard not to agree with, but did lack the clarity & natural flow of previous ones。Still - it's a very decent book。 Highly recommended, additionally - book is currently (for a limited time) available for free - not using this opportunity to educate yourself would be a sin。 。。。more