
Empowering the world through technology and information
Google LLC, headquartered in Mountain View, California, is a global leader in internet-related services and products, including its flagship search engine, Google Search, and the Android operating system. With over 100,000 employees, Google also offers cloud computing services through Google Cloud P...
Google offers competitive salaries, equity options, generous PTO policies, comprehensive health benefits, and a remote work policy that allows flexibi...
Google is known for its engineering-first culture, emphasizing innovation and collaboration. The company fosters a unique environment that encourages ...

Google • Waterloo, ON, Canada
Google is seeking a Senior Site Reliability Engineer to build and maintain large-scale, fault-tolerant systems. You'll leverage your expertise in software development and systems design, focusing on automation and reliability. This role requires 5+ years of experience in software development and distributed systems.
You have a Bachelor’s degree in Computer Science or a related field, along with 5 years of experience in software development across various programming languages. Your background includes at least 3 years of experience in designing, analyzing, and troubleshooting large-scale distributed systems, and you have spent 2 years leading projects and providing technical leadership. You are passionate about Site Reliability Development, combining software and systems development to ensure the reliability and uptime of critical services.
Your expertise in coding, algorithms, and complexity analysis allows you to tackle the unique challenges of scale at Google. You thrive in a culture of intellectual curiosity and problem-solving, and you appreciate the value of collaboration and diverse perspectives in achieving success. You are committed to optimizing existing systems and building infrastructure that enhances performance through automation.
A Master's degree in Computer Science or Engineering is preferred, as is experience with capacity planning and launch reviews. You are familiar with sustainable incident response practices and blameless postmortems, which are essential for maintaining high service reliability.
As a Senior Site Reliability Engineer at Google, you will manage the complexities of large-scale systems, ensuring they are reliable and efficient. You will focus on measuring and monitoring system health, availability, and latency, while also implementing automation to scale systems sustainably. Your role will involve evolving systems by advocating for changes that enhance reliability and velocity.
You will collaborate with cross-functional teams to address challenges and improve system performance. Your responsibilities will include maintaining services once they are live, conducting thorough analyses of system capacity, and leading initiatives that drive improvements in uptime and user satisfaction. You will also be involved in incident management, ensuring that responses are efficient and that lessons learned are documented for future reference.
At Google, you will be part of a team that values innovation and excellence. We provide a supportive environment where you can grow your skills and advance your career. You will have access to cutting-edge technologies and the opportunity to work on projects that have a significant impact on millions of users worldwide. We encourage you to apply even if your experience doesn't match every requirement, as we believe in the potential of diverse backgrounds and perspectives to drive success.
Apply now or save it for later. Get alerts for similar jobs at Google.