
Find amazing deals on experiences near you
Groupon, headquartered in Chicago, Illinois, connects consumers with local businesses through its platform, offering over 300,000 deals across various categories including travel, dining, and entertainment. Founded in 2008, Groupon has served millions of customers and employs approximately 3,000 peo...
Groupon offers competitive salaries, stock options, flexible PTO, and a remote work policy to support work-life balance. Employees also benefit from w...
Groupon fosters a culture of innovation and customer-centricity, encouraging employees to explore new ideas and solutions to enhance user experiences....

Groupon • Remote - Peru
Groupon is seeking a Principal Site Reliability Engineer to lead the evolution of their platform towards AI-driven resilience. You'll design self-healing systems ensuring high availability and reliability. This role requires expertise in AI and machine learning.
You have extensive experience in site reliability engineering, with a strong focus on building and maintaining resilient systems. Your background includes designing intelligent, self-healing systems that achieve high availability targets — you understand the importance of proactive maintenance and automation in infrastructure management. You are well-versed in using AI and machine learning to enhance system reliability and governance, ensuring that incidents are prevented before they occur. Your technical skills are complemented by your ability to collaborate effectively with cross-functional teams, driving innovation and improvement in system performance.
Experience with predictive analytics and automation tools is a plus, as is familiarity with cloud infrastructure and services. You have a passion for continuous learning and staying updated with the latest trends in site reliability and AI technologies. You thrive in environments that encourage risk-taking and innovation, and you are eager to contribute to a culture that celebrates success and autonomy.
In this role, you will lead the modernization of Groupon's global platform, focusing on reliability as a core component of the transformation. You will architect and maintain self-healing systems that meet or exceed 99.9% availability targets, leveraging AI and machine learning to automate infrastructure governance and incident detection. Your responsibilities will include designing and implementing monitoring and alerting systems that provide real-time insights into system performance, enabling rapid response to potential issues. You will collaborate with engineering teams to integrate reliability practices into the development lifecycle, ensuring that reliability is prioritized from the outset of new projects.
You will also be responsible for conducting post-incident reviews to identify root causes and implement preventive measures, fostering a culture of continuous improvement within the team. Your leadership will guide the evolution of the site reliability engineering practice at Groupon, influencing technical direction and mentoring junior engineers. You will have the opportunity to make a significant impact on the reliability and performance of systems that serve millions of customers daily.
Groupon offers a dynamic work environment where you can make a meaningful impact on the business. You will have the autonomy to drive initiatives and the support of a collaborative team. We provide competitive compensation and benefits, along with opportunities for professional growth and development. Join us in our mission to help local businesses thrive and transform the way customers discover experiences and services.
Apply now or save it for later. Get alerts for similar jobs at Groupon.