
Empowering the world through technology and information
Google LLC, headquartered in Mountain View, California, is a global leader in internet-related services and products, including its flagship search engine, Google Search, and the Android operating system. With over 100,000 employees, Google also offers cloud computing services through Google Cloud P...
Google offers competitive salaries, equity options, generous PTO policies, comprehensive health benefits, and a remote work policy that allows flexibi...
Google is known for its engineering-first culture, emphasizing innovation and collaboration. The company fosters a unique environment that encourages ...

Google • Mountain View, CA, USA
Google is seeking a Site Reliability Manager to lead the Site Reliability Engineering team in Mountain View. You'll manage large-scale distributed systems and ensure high reliability and performance. This role requires 5+ years of programming experience and strong leadership skills.
You have a Bachelor’s degree in Computer Science or a related field, along with at least 5 years of experience programming in one or more languages. Your background includes 3 years of people management experience, where you've successfully led projects and collaborated with teams on administration and networking tasks. You possess a deep understanding of distributed systems and system design, having spent at least 2 years developing infrastructure that supports scalable applications. Your expertise extends to building reliable and high-performance web applications, and you are familiar with programming languages such as Java or C++. Additionally, you have experience with Artificial Intelligence and experimental design methodologies like A/B testing.
A Master's degree in Computer Science or Engineering would be a plus, as would any experience in building scalable, reliable, and highly performant web applications. You are someone who thrives in a fast-paced environment and enjoys tackling complex challenges unique to large-scale systems.
As a Site Reliability Manager at Google, you will lead the Site Reliability Engineering team, ensuring that our services maintain high reliability and uptime. You will oversee the management of on-call rotations across continents, utilizing a follow-the-sun model to ensure continuous service availability. Your role will involve designing, writing, and delivering software that enhances the availability, scalability, latency, and efficiency of Google's services. You will also be responsible for mentoring your team, establishing credibility through quality technical execution, and automating responses to non-exceptional service conditions to prevent problem recurrence. Your leadership will guide the team in optimizing existing systems and building infrastructure that eliminates manual work through automation.
At Google, you will have the opportunity to manage complex challenges of scale while working with cutting-edge technologies. We foster a culture of innovation and collaboration, where your contributions will directly impact the reliability of our services. You will be part of a team that values mentorship and professional growth, providing you with the chance to develop your skills further. We encourage you to apply even if your experience doesn't match every requirement, as we believe diverse teams build better products.
Apply now or save it for later. Get alerts for similar jobs at Google.