
Empowering creators in a vibrant gaming universe
Roblox is an online gaming and entertainment platform headquartered in South San Mateo, CA, that connects over 200 million monthly active users. The platform empowers its community to create and monetize their own games, with over $500 million paid out to developers in 2022 alone. As a leader in the...
Roblox offers competitive salaries, equity options, generous PTO policies, and a flexible remote work policy to support work-life balance. Employees a...
Roblox fosters a creator-centric culture, encouraging employees to innovate and collaborate while prioritizing user safety. The company values communi...

Roblox • San Mateo, CA, United States
Roblox is seeking a Senior Site Reliability Engineer to enhance the reliability and scalability of their platform. You'll work with technologies like Java, Python, and AWS to solve complex technical challenges. This role requires a strong background in system reliability and performance optimization.
You are a seasoned engineer with over 5 years of experience in site reliability engineering, focusing on building and maintaining scalable systems. You have a strong background in programming languages such as Java and Python, and you understand the intricacies of system performance and reliability. Your experience with cloud platforms like AWS has equipped you with the skills to design robust infrastructure that can handle millions of users. You are proficient in containerization technologies such as Docker and orchestration tools like Kubernetes, which you have used to streamline deployment processes and improve system resilience.
You have a deep understanding of Linux systems and are comfortable navigating and troubleshooting various distributions. Your expertise in version control systems like Git allows you to collaborate effectively with cross-functional teams, ensuring that code is managed and deployed efficiently. You are familiar with monitoring tools and practices, enabling you to proactively identify and resolve issues before they impact users. Your experience with incident management has taught you the importance of maintaining a calm and systematic approach during outages, ensuring that systems are restored quickly and efficiently.
You thrive in collaborative environments and enjoy working closely with developers, product managers, and other stakeholders to enhance system reliability. You are passionate about continuous improvement and are always looking for ways to optimize processes and systems. You understand that the role of a Site Reliability Engineer is not just about keeping systems running but also about driving innovation and improving user experiences.
Experience with infrastructure as code tools such as Terraform or CloudFormation is a plus, as it allows you to automate and manage infrastructure efficiently. Familiarity with database management and optimization techniques will also be beneficial in this role, as you will work with various data storage solutions to ensure high availability and performance.
In this role, you will be instrumental in driving the evolution of Roblox's systems, ensuring they meet the highest standards of performance, reliability, and efficiency. You will collaborate with cross-functional teams to build and maintain robust infrastructure that supports the company's growth and user engagement. Your responsibilities will include designing and implementing monitoring solutions to track system performance and health, as well as developing incident response strategies to minimize downtime and user impact.
You will create software and libraries that promote reliability and scalability, contributing to the overall architecture of the platform. Your work will involve analyzing system performance metrics and identifying areas for improvement, allowing you to implement changes that enhance user experiences. You will also participate in on-call rotations, providing support during incidents and ensuring that systems are restored quickly.
As a Senior Site Reliability Engineer, you will mentor junior engineers, sharing your knowledge and expertise to help them grow in their roles. You will advocate for best practices in reliability engineering and contribute to the development of a culture that prioritizes system performance and user satisfaction. Your contributions will directly impact the company's mission to connect a billion people with optimism and civility, shaping the future of human interaction through technology.
At Roblox, you will be part of a dynamic team that is dedicated to building the tools and platform that empower our community. We offer competitive compensation and benefits, including opportunities for professional development and growth. You will work in an inclusive environment that values diversity and encourages collaboration. Join us in shaping the future of our platform and delivering unparalleled value to our users.
Apply now or save it for later. Get alerts for similar jobs at Roblox.