
Empowering the world through technology and information
Google LLC, headquartered in Mountain View, California, is a global leader in internet-related services and products, including its flagship search engine, Google Search, and the Android operating system. With over 100,000 employees, Google also offers cloud computing services through Google Cloud P...
Google offers competitive salaries, equity options, generous PTO policies, comprehensive health benefits, and a remote work policy that allows flexibi...
Google is known for its engineering-first culture, emphasizing innovation and collaboration. The company fosters a unique environment that encourages ...

Google • Bengaluru, Karnataka, India
Google is seeking a Site Reliability Engineer III to ensure the reliability and uptime of their services. You'll work with programming languages like Python, C++, and Java to design and troubleshoot large-scale distributed systems. This role requires a Bachelor's degree in Computer Science and 2 years of relevant experience.
You hold a Bachelor’s degree in Computer Science or a related technical field, and you have at least 2 years of experience in software development using programming languages such as Python, C++, or Java. You are familiar with designing, analyzing, and troubleshooting large-scale distributed systems, and you understand the principles of Site Reliability Engineering (SRE). You are passionate about building and running large-scale, fault-tolerant systems, and you have a mindset geared towards continuous improvement and operational excellence.
A Master's degree in Computer Science or Engineering is preferred, along with additional experience in optimizing existing systems and building infrastructure through automation. You have a strong understanding of incident response procedures and are comfortable leveraging emerging technologies like AI and Machine Learning to enhance system reliability.
As a Site Reliability Engineer at Google, you will be responsible for maintaining live services by measuring and monitoring availability, latency, and overall system health metrics. You will implement sustainable incident response procedures to ensure that services consistently meet defined Service Level Objectives (SLOs). Your role will involve collaborating with cross-functional teams to build creative engineering solutions to operational problems, focusing on minimizing operational toil and enhancing system performance.
You will engage in practices such as blameless postmortems and proactive identification of potential outages, which are essential for iterative improvement and product quality. You will also have the opportunity to work with a breadth of tools and approaches to solve a wide spectrum of problems, contributing to the overall reliability and efficiency of Google's services.
At Google, you will be part of a diverse culture that values innovation and collaboration. You will have access to cutting-edge technologies and the opportunity to work on impactful projects that shape the future of technology. We encourage you to apply even if your experience doesn't match every requirement, as we value curiosity and a growth mindset. Join us in building better production systems and enhancing the reliability of services used by millions around the world.
Apply now or save it for later. Get alerts for similar jobs at Google.