24/7 resiliency (Google Cloud Next ’17)

By inergency On Jul 22, 2017

From Site Reliability Engineering (SRE) to Customer Reliability Engineering (CRE) and Cloud Ops, there’s a lot involved in keeping the Google cloud running, scaling and performing, across our organization and by extension for our customers. In this video, Mahesh Kallahalla, Luke Stone, and William Bonnell give you a close look into the internal procedures we use to continually improve reliability. They also discuss best practices for interacting with Google in order to reduce mean-time-to-detect and the conference? Watch all the talks here: more talks about Infrastructure & Operations here:

google resiliency