
The language education platform for everyone
Duolingo is a leading language education platform headquartered in East Liberty, Pittsburgh, PA, with over 500 million learners worldwide. The company offers 95 courses across 38 languages and has developed the Duolingo English Test, accepted by more than 2,000 institutions globally. Since its IPO i...
Duolingo provides comprehensive medical, dental, and vision coverage for employees and their families, along with mental health support and fertility ...
Duolingo fosters a data-driven culture with a focus on continuous improvement in language education. Its team includes language learning scientists an...

Duolingo • Pittsburgh, PA
Duolingo is hiring a Senior Site Reliability Engineer to ensure the quality and scalability of their distributed systems. You'll collaborate with product and platform engineering teams and work with technologies like AWS, Docker, and Kubernetes. This role requires strong experience in system reliability and operational excellence.
You have 5+ years of experience in site reliability engineering or a related field, with a strong understanding of distributed systems and operational excellence. You are proficient in AWS and have hands-on experience with containerization technologies like Docker and orchestration tools such as Kubernetes. Your background includes scripting and automation, particularly with Python, which you use to streamline processes and reduce toil. You are familiar with monitoring and observability tools like Prometheus and Grafana, enabling you to maintain high availability and performance of systems. You thrive in collaborative environments, working closely with product and engineering teams to identify and resolve issues proactively. You are passionate about improving system reliability and scalability, advocating for best practices in incident response and postmortem analysis.
In this role, you will collaborate with internal teams to identify sources of instability in distributed systems and drive operational excellence. You will own core infrastructure, understanding, diagnosing, and debugging these systems in production. Your responsibilities will include providing system design consulting, developing software platforms and frameworks, and conducting launch reviews and root cause analysis. You will maintain and document sustainable postmortem and incident response practices, advocating for and implementing changes that improve reliability, scalability, and velocity. Additionally, you will work on reducing the burden of toil through iterative development of tooling and automation solutions, ensuring that the systems you manage are efficient and resilient.
At Duolingo, you will have limitless learning opportunities and mentorship from world-class minds in the industry. You will be part of a mission-driven team that is dedicated to making education universally available. We offer a collaborative work environment where your contributions will have a meaningful impact on millions of learners worldwide. Join us in our life-changing mission and enjoy a variety of projects with large scopes, all while doing work that is both fun and fulfilling.
Apply now or save it for later. Get alerts for similar jobs at Duolingo.