Site Reliability Manager, Site Reliability Engineering

Google • Mountain View, CA, USA

Posted 2w ago🏛️ On-Site Lead Site reliability engineer 📍 Mountain view

Apply Now →

Skills & Technologies

Java C++Artificial intelligence Distributed systems System design Networking

Overview

Google is seeking a Site Reliability Manager to lead the Site Reliability Engineering team in Mountain View. You'll manage large-scale distributed systems and ensure high reliability and performance. This role requires 5+ years of programming experience and strong leadership skills.

Job Description

Who you are

You have a Bachelor’s degree in Computer Science or a related field, along with at least 5 years of experience programming in one or more languages. Your background includes 3 years of people management experience, where you've successfully led projects and collaborated with teams on administration and networking tasks. You possess a deep understanding of distributed systems and system design, having spent at least 2 years developing infrastructure that supports scalable applications. Your expertise extends to building reliable and high-performance web applications, and you are familiar with programming languages such as Java or C++. Additionally, you have experience with Artificial Intelligence and experimental design methodologies like A/B testing.

Desirable

A Master's degree in Computer Science or Engineering would be a plus, as would any experience in building scalable, reliable, and highly performant web applications. You are someone who thrives in a fast-paced environment and enjoys tackling complex challenges unique to large-scale systems.

What you'll do

As a Site Reliability Manager at Google, you will lead the Site Reliability Engineering team, ensuring that our services maintain high reliability and uptime. You will oversee the management of on-call rotations across continents, utilizing a follow-the-sun model to ensure continuous service availability. Your role will involve designing, writing, and delivering software that enhances the availability, scalability, latency, and efficiency of Google's services. You will also be responsible for mentoring your team, establishing credibility through quality technical execution, and automating responses to non-exceptional service conditions to prevent problem recurrence. Your leadership will guide the team in optimizing existing systems and building infrastructure that eliminates manual work through automation.

What we offer

At Google, you will have the opportunity to manage complex challenges of scale while working with cutting-edge technologies. We foster a culture of innovation and collaboration, where your contributions will directly impact the reliability of our services. You will be part of a team that values mentorship and professional growth, providing you with the chance to develop your skills further. We encourage you to apply even if your experience doesn't match every requirement, as we believe diverse teams build better products.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Google.

Apply Now →Get Job Alerts

About Google

Key Highlights

🎁 Benefits

🌟 Culture