
The streaming service redefining entertainment worldwide
Netflix, headquartered in Los Gatos, California, is a leading streaming service with over 238 million subscribers globally. The platform offers a vast library of movies, TV shows, and original content, including award-winning series like 'Stranger Things' and 'The Crown.' With a market valuation exc...
Employees enjoy competitive salaries, stock options, unlimited PTO, and comprehensive health benefits. Netflix also offers a flexible remote work poli...
Netflix fosters a culture of freedom and responsibility, encouraging employees to take risks and make decisions independently. The company values tran...

Netflix • USA - Remote
Netflix is seeking a Senior Site Reliability Engineer to support live streaming events by ensuring cloud infrastructure stability and reliability. You'll work with technologies like AWS, Docker, and Kubernetes to handle API traffic during high-demand events.
You have 5+ years of experience in site reliability engineering, with a strong focus on cloud infrastructure and live event support. Your expertise in monitoring and observability tools allows you to ensure high availability and performance during critical events. You are skilled in implementing load tests and analyzing system behavior under stress, which helps you identify potential bottlenecks and improve system resilience.
Your background includes hands-on experience with AWS and container orchestration tools like Kubernetes and Docker. You understand the intricacies of microservices architecture and are adept at managing API traffic, ensuring seamless communication between services. You are passionate about driving improvements in observability and monitoring, always looking for ways to enhance system performance and reliability.
You thrive in collaborative environments and enjoy working closely with cross-functional teams to deliver exceptional user experiences. Your problem-solving skills enable you to tackle complex challenges, and you are committed to fostering a culture of diversity and inclusion within your team.
Experience with real-time streaming technologies and protocols is a plus, as is familiarity with tools like Prometheus and Grafana for monitoring and visualization. You are open to learning new technologies and methodologies, and you embrace opportunities for professional growth and development.
In this role, you will be responsible for supporting Netflix's live streaming events, ensuring that our cloud infrastructure can handle sudden increases in API traffic. You will prepare and execute load tests to validate the performance of critical applications and overall system stability. Your work will directly impact the success of live events, from planning and testing phases to the actual event launch.
You will drive continual improvements in observability and monitoring practices, focusing on solving the thundering herd problem and enhancing system scalability. Collaborating with engineering teams, you will implement end-to-end observability solutions that provide insights into system performance and user experience.
Your role will involve analyzing data to identify trends and potential issues, allowing you to proactively address challenges before they impact users. You will also contribute to the development of best practices for incident management and response, ensuring that our systems remain reliable and performant.
At Netflix, you will be part of a dynamic team that is dedicated to delivering high-quality entertainment experiences to millions of viewers worldwide. We offer a competitive salary and benefits package, along with opportunities for professional development and growth. Our culture values innovation, collaboration, and diversity, and we encourage you to apply even if your experience doesn't match every requirement. Join us in shaping the future of entertainment and making a lasting impact on how people enjoy content.
Apply now or save it for later. Get alerts for similar jobs at Netflix.