Senior Software Developer, Site Reliability Development

Google • Waterloo, ON, Canada

Posted 1d ago🏛️ On-Site Senior Site reliability engineer 📍 Waterloo

Apply Now →

Skills & Technologies

Java Python Linux Docker Kubernetes

Overview

Google is seeking a Senior Site Reliability Engineer to build and maintain large-scale, fault-tolerant systems. You'll leverage your expertise in software development and systems design, focusing on automation and reliability. This role requires 5+ years of experience in software development and distributed systems.

Job Description

Who you are

You have a Bachelor’s degree in Computer Science or a related field, along with 5 years of experience in software development across various programming languages. Your background includes at least 3 years of experience in designing, analyzing, and troubleshooting large-scale distributed systems, and you have spent 2 years leading projects and providing technical leadership. You are passionate about Site Reliability Development, combining software and systems development to ensure the reliability and uptime of critical services.

Your expertise in coding, algorithms, and complexity analysis allows you to tackle the unique challenges of scale at Google. You thrive in a culture of intellectual curiosity and problem-solving, and you appreciate the value of collaboration and diverse perspectives in achieving success. You are committed to optimizing existing systems and building infrastructure that enhances performance through automation.

Desirable

A Master's degree in Computer Science or Engineering is preferred, as is experience with capacity planning and launch reviews. You are familiar with sustainable incident response practices and blameless postmortems, which are essential for maintaining high service reliability.

What you'll do

As a Senior Site Reliability Engineer at Google, you will manage the complexities of large-scale systems, ensuring they are reliable and efficient. You will focus on measuring and monitoring system health, availability, and latency, while also implementing automation to scale systems sustainably. Your role will involve evolving systems by advocating for changes that enhance reliability and velocity.

You will collaborate with cross-functional teams to address challenges and improve system performance. Your responsibilities will include maintaining services once they are live, conducting thorough analyses of system capacity, and leading initiatives that drive improvements in uptime and user satisfaction. You will also be involved in incident management, ensuring that responses are efficient and that lessons learned are documented for future reference.

What we offer

At Google, you will be part of a team that values innovation and excellence. We provide a supportive environment where you can grow your skills and advance your career. You will have access to cutting-edge technologies and the opportunity to work on projects that have a significant impact on millions of users worldwide. We encourage you to apply even if your experience doesn't match every requirement, as we believe in the potential of diverse backgrounds and perspectives to drive success.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Google.

Apply Now →Get Job Alerts

About Google

Key Highlights

🎁 Benefits

🌟 Culture