
Find amazing deals on experiences near you
Groupon, headquartered in Chicago, Illinois, connects consumers with local businesses through its platform, offering over 300,000 deals across various categories including travel, dining, and entertainment. Founded in 2008, Groupon has served millions of customers and employs approximately 3,000 peo...
Groupon offers competitive salaries, stock options, flexible PTO, and a remote work policy to support work-life balance. Employees also benefit from w...
Groupon fosters a culture of innovation and customer-centricity, encouraging employees to explore new ideas and solutions to enhance user experiences....

Groupon • Remote - Argentina; Remote - Brazil; Remote - Chile; Remote - Colombia; Remote - Ecuador; Remote - Mexico; Remote - Peru; Remote - Uruguay
Groupon is seeking a Principal Site Reliability Engineer to lead the evolution of their global platform with a focus on AI-driven resilience. You'll design intelligent, self-healing systems to ensure high availability and reliability. This role requires expertise in AI and infrastructure automation.
You have extensive experience in site reliability engineering, with a strong focus on building and maintaining resilient systems. Your background includes designing self-healing architectures that meet high availability targets, and you are well-versed in leveraging AI and machine learning to enhance system performance and reliability. You understand the importance of predictive maintenance and have a track record of implementing automation strategies that improve operational efficiency. You thrive in collaborative environments and are passionate about driving innovation within your team. You are proactive in identifying potential issues before they arise, ensuring seamless user experiences across platforms.
Experience with cloud platforms and infrastructure as code is a plus. Familiarity with monitoring and alerting tools will help you excel in this role. You are comfortable working in a fast-paced environment and can adapt to changing priorities while maintaining a focus on delivering high-quality results.
In this role, you will lead the design and implementation of self-healing systems that achieve 99.9%+ availability. You will collaborate with cross-functional teams to modernize Groupon's global platform, ensuring that reliability is at the forefront of all initiatives. Your responsibilities will include architecting solutions that utilize AI and machine learning to automate infrastructure management and incident response. You will also be involved in capacity planning and performance tuning to optimize system resources. As a Principal Site Reliability Engineer, you will mentor junior engineers and share best practices to foster a culture of reliability and innovation within the team.
Groupon provides a dynamic work environment where you can make a significant impact on the company's transformation journey. You will have the opportunity to work with cutting-edge technologies and contribute to projects that enhance the customer experience. We value innovation and encourage you to take risks and explore new ideas. Our remote work model allows for flexibility, and we support a healthy work-life balance. Join us in our mission to empower local businesses and create memorable experiences for our customers.
Apply now or save it for later. Get alerts for similar jobs at Groupon.