
The cloud monitoring platform engineers love
Datadog (NYSE: DDOG) is a leading cloud observability platform that provides monitoring and analytics for applications, infrastructure, and logs. Trusted by over 26,000 customers including major companies like Netflix, Samsung, and Airbnb, Datadog is headquartered in New York City. The company went ...
Datadog offers competitive salaries, equity options, generous PTO policies, and a flexible remote work policy. Employees also benefit from a learning ...
Datadog fosters an engineering-first culture, with 70% of its workforce comprising engineers. The company emphasizes a strong focus on solving complex...

Datadog • Paris, France; Sophia Antipolis, France
Datadog is hiring a Senior MLOps Engineer to design and build robust backend systems for AI infrastructure. You'll work with technologies like Python, Docker, and Kubernetes to enhance ML workflows. This role requires significant experience in MLOps and distributed systems.
You have 5+ years of experience in software engineering with a focus on MLOps, and you understand the intricacies of managing machine learning workflows at scale. Your expertise in Python allows you to build and optimize systems that support model training and deployment. You are familiar with containerization technologies like Docker and orchestration tools such as Kubernetes, enabling you to create scalable and reliable infrastructure. You have a solid understanding of cloud platforms, particularly AWS, and you leverage services to enhance machine learning operations. Your experience with MLflow and TensorFlow equips you to manage model lifecycles effectively, ensuring that models are tracked and versioned properly. You thrive in collaborative environments, working closely with applied scientists and platform teams to drive innovation in AI infrastructure.
Experience with PyTorch is a plus, as it complements your skill set in machine learning frameworks. Familiarity with distributed systems and job orchestration will help you tackle the challenges of managing training jobs across multiple data centers. You are proactive in seeking improvements in ML experimentation workflows, and you enjoy mentoring junior engineers to foster a culture of learning and growth within your team.
In this role, you will design and implement scalable systems for training orchestration, artifact tracking, and model registration across various cloud regions. You will improve and streamline ML experimentation workflows, ensuring that applied scientists can iterate rapidly and reliably. Your work will involve collaborating with cross-functional teams to shape the future of AI infrastructure at Datadog. You will be responsible for building deeply technical infrastructure that supports job orchestration and model lifecycle management. As part of a high-impact team, you will tackle critical problems that contribute to Datadog’s AI evolution. You will also engage in code reviews and contribute to the overall architecture of the systems you help build, ensuring they meet the highest standards of reliability and performance.
Datadog values a collaborative office culture that fosters creativity and teamwork. As part of a hybrid workplace, you will have the flexibility to create a work-life harmony that suits your needs. You will be part of a team that is at the forefront of AI development, working on projects that have a significant impact on the company's future. We encourage you to apply even if your experience doesn't match every requirement, as we believe in the potential of diverse backgrounds to drive innovation. Join us and be a part of a mission that is transforming the way AI is integrated into business processes.
Apply now or save it for later. Get alerts for similar jobs at Datadog.