LeethubLeethub
JobsCompaniesBlog
Go to dashboard

Leethub

Curated tech jobs from FAANG and top companies worldwide.

Top Companies

  • Google Jobs
  • Meta Jobs
  • Amazon Jobs
  • Apple Jobs
  • Netflix Jobs
  • All Companies →

Job Categories

  • Software Engineering
  • Data, AI & Machine Learning
  • Product Management
  • Design & User Experience
  • Operations & Strategy
  • Remote Jobs
  • All Categories →

Browse by Type

  • Remote Jobs
  • Hybrid Jobs
  • Senior Positions
  • Entry Level
  • All Jobs →

Resources

  • Google Interview Guide
  • Salary Guide 2025
  • Salary Negotiation
  • LeetCode Study Plan
  • All Articles →

Company

  • Dashboard
  • Privacy Policy
  • Contact Us
© 2026 Leethub LLC. All rights reserved.
Home›Jobs›Groupon›Principal Site Reliability Engineer (AI-first SRE)
Groupon

About Groupon

Find amazing deals on experiences near you

🏢 Tech👥 1K-5K📅 Founded 2008📍 Chicago, Illinois, United States

Key Highlights

  • Headquartered in Chicago, Illinois
  • Over 300,000 deals available across various categories
  • Approximately 3,000 employees
  • Publicly traded since 2011 (NASDAQ: GRPN)

Groupon, headquartered in Chicago, Illinois, connects consumers with local businesses through its platform, offering over 300,000 deals across various categories including travel, dining, and entertainment. Founded in 2008, Groupon has served millions of customers and employs approximately 3,000 peo...

🎁 Benefits

Groupon offers competitive salaries, stock options, flexible PTO, and a remote work policy to support work-life balance. Employees also benefit from w...

🌟 Culture

Groupon fosters a culture of innovation and customer-centricity, encouraging employees to explore new ideas and solutions to enhance user experiences....

🌐 Website💼 LinkedIn𝕏 TwitterAll 50 jobs →
Groupon

Principal Site Reliability Engineer (AI-first SRE)

Groupon • Remote - Ecuador

Posted 3d ago🏠 RemotePrincipalSite reliability engineer📍 Ecuador
Apply Now →

Skills & Technologies

AIMachine learningInfrastructure as codeMonitoringIncident management

Overview

Groupon is seeking a Principal Site Reliability Engineer to lead the evolution of their global platform with a focus on AI-driven resilience. You'll design intelligent, self-healing systems to ensure high availability and reliability. This role requires expertise in AI and machine learning.

Job Description

Who you are

You have extensive experience in site reliability engineering, with a strong focus on building and maintaining high-availability systems. Your background includes designing self-healing architectures that leverage AI and machine learning to predict and prevent incidents before they occur. You understand the importance of reliability in a marketplace environment and are passionate about creating seamless experiences for users.

You possess a deep understanding of infrastructure as code and have successfully implemented monitoring and alerting systems that ensure operational excellence. Your experience includes working with large-scale distributed systems, and you are adept at incident management and response, ensuring minimal downtime and optimal performance.

You thrive in collaborative environments and enjoy working with cross-functional teams to drive innovation and improve system reliability. Your ability to communicate complex technical concepts to non-technical stakeholders makes you an effective leader in your field. You are committed to continuous learning and staying updated with the latest advancements in AI and machine learning technologies.

Desirable

Experience with cloud platforms such as AWS or Azure is a plus, as is familiarity with container orchestration tools like Kubernetes. You may also have experience in developing automation scripts using languages such as Python or Bash, which enhances your ability to streamline operations and improve system efficiency.

What you'll do

In this role, you will lead the modernization of Groupon's global platform, focusing on reliability as a core component of the transformation. You will architect and maintain self-healing systems that meet or exceed 99.9% availability targets, ensuring that customers enjoy fast and reliable experiences across millions of daily interactions. Your work will involve using AI and machine learning to automate infrastructure governance and enhance system resilience.

You will collaborate closely with engineering teams to implement best practices in site reliability, including capacity planning, performance tuning, and incident response strategies. Your leadership will guide the team in adopting a proactive approach to system maintenance, shifting from reactive measures to predictive solutions that enhance overall system health.

You will also be responsible for mentoring junior engineers, sharing your knowledge of reliability engineering principles, and fostering a culture of continuous improvement within the team. Your contributions will directly impact Groupon's ability to serve local businesses and customers effectively, making your role crucial to the company's success.

What we offer

Groupon provides a dynamic work environment where innovation is encouraged, and success is celebrated. You will have the opportunity to make a significant impact on the company's transformation journey while working with a talented team of professionals. We offer competitive compensation and benefits, along with the flexibility of remote work.

Join us in our mission to help local businesses thrive and create memorable experiences for customers. We encourage you to apply even if your experience doesn't match every requirement, as we value diverse perspectives and backgrounds.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Groupon.

Apply Now →Get Job Alerts