Site Reliability Engineer (Senior or Staff), Observability

MongoDB • Ireland

Posted 3d ago🏢 Hybrid Senior Lead Site reliability engineer 📍 Dublin

Apply Now →

Skills & Technologies

Splunk Prometheus Docker Kubernetes Fluentbit Jaeger Vector Victoriametrics

Overview

MongoDB is seeking a Senior Site Reliability Engineer for their Observability team to build and maintain the observability stack. You'll work with technologies like Splunk, Prometheus, and Docker to ensure service reliability. This role requires strong collaboration skills and experience in observability infrastructure.

Job Description

Who you are

You have a strong background in Site Reliability Engineering with a focus on observability — you've designed and implemented observability stacks that include metrics, logging, and tracing to ensure service reliability across various platforms. Your experience includes working with tools like Splunk and Prometheus, and you understand the importance of monitoring and alerting in maintaining service health.

You thrive in collaborative environments — you enjoy working closely with software engineering and other SRE teams to promote best practices in service instrumentation and monitoring. Your ability to communicate effectively with cross-functional teams ensures that observability standards are met and maintained across the organization.

You are proactive in identifying and troubleshooting issues — your analytical skills allow you to define key metrics that detect incidents and quantify service performance. You have experience in building reliable, fault-tolerant systems that are self-healing, and you understand the complexities of operating in a multi-cloud environment.

You are comfortable participating in on-call rotations — your experience has prepared you to handle incidents effectively and to contribute to the continuous improvement of incident response processes. You are dedicated to building a culture of reliability and resilience within your team and the broader organization.

Desirable

Experience with telemetry pipelines and monitoring infrastructure is a plus — you have a keen interest in exploring new technologies and methodologies that enhance observability practices. Familiarity with cloud providers and their observability tools will help you adapt quickly to MongoDB's infrastructure.

What you'll do

As a Senior Site Reliability Engineer on the Observability team, you will define the standards and vision for the observability platform used by all engineering teams — your role will involve designing, architecting, and delivering core pieces of observability services in collaboration with various stakeholders. You will be responsible for ensuring that the observability stack is robust and meets the needs of the organization.

You will work on building and maintaining the observability stack, which includes metrics, logging, and tracing — your expertise will help in troubleshooting and implementing monitoring solutions that span across multiple cloud providers. You will also identify and configure key metrics that help detect incidents and quantify service health, availability, and performance.

Collaboration is key in this role — you will partner with other SRE and software engineering teams to promote best practices in instrumenting and monitoring services. Your contributions will directly impact the reliability and performance of MongoDB's services, making them more resilient and self-healing.

You will participate in a week-long on-call rotation, ensuring that you are hands-on with incident management and response — your experience will guide you in improving the incident response process and enhancing the overall reliability of the services.

What we offer

MongoDB offers a hybrid working model, allowing you to balance your work between the office and remote — you will have the opportunity to work in a collaborative environment that values innovation and reliability. The company is committed to providing necessary accommodations for individuals with disabilities within the application and interview process.

You will be part of a team that is dedicated to building impactful observability solutions — your work will contribute to the overall success of MongoDB's engineering efforts, ensuring that services are reliable and performant. The culture at MongoDB encourages continuous learning and growth, providing you with opportunities to expand your skills and knowledge in the field of Site Reliability Engineering.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at MongoDB.

Apply Now →Get Job Alerts

About MongoDB

Key Highlights

🎁 Benefits

🌟 Culture