
The cloud monitoring platform engineers love
Datadog (NYSE: DDOG) is a leading cloud observability platform that provides monitoring and analytics for applications, infrastructure, and logs. Trusted by over 26,000 customers including major companies like Netflix, Samsung, and Airbnb, Datadog is headquartered in New York City. The company went ...
Datadog offers competitive salaries, equity options, generous PTO policies, and a flexible remote work policy. Employees also benefit from a learning ...
Datadog fosters an engineering-first culture, with 70% of its workforce comprising engineers. The company emphasizes a strong focus on solving complex...

Datadog • New York, New York, USA
Datadog is hiring an AI Research Engineer to collaborate with research scientists in developing AI-powered solutions for cloud observability and security. You'll work with technologies like Python, TensorFlow, and PyTorch to build and evaluate advanced models. This position requires experience in machine learning and data engineering.
You have a strong background in AI and machine learning, with experience in building and deploying models that solve real-world problems. Your proficiency in Python allows you to implement complex algorithms and work with large datasets effectively. You understand the intricacies of data engineering and can build robust data pipelines that support model training and evaluation. You are familiar with distributed systems and have experience orchestrating training processes using frameworks like Ray. You thrive in collaborative environments, partnering with research scientists to turn innovative ideas into practical applications. You are detail-oriented and committed to ensuring the reliability and performance of the systems you develop.
Experience with cloud observability and security challenges is a plus. Familiarity with advanced forecasting, anomaly detection, and multi-modal telemetry analysis will help you excel in this role. Knowledge of site reliability engineering principles and practices will also be beneficial as you work on creating autonomous agents for incident detection and resolution.
In this role, you will build and operate datasets, training and evaluation pipelines, and internal tooling that facilitate rapid iteration and trustworthy evaluation of AI models. You will implement models and run experiments at scale, profiling them for reliability, performance, and cost. Your work will involve orchestrating distributed training and reinforcement learning processes using Ray, ensuring that the models you develop can handle the complexities of real-world data. You will collaborate closely with research scientists to refine research ideas and translate them into working systems that can be deployed in production environments. Your contributions will directly impact the development of AI agents that enhance cloud observability and security, pushing the boundaries of what is possible in this field.
At Datadog, you will be part of a dynamic team that is at the forefront of AI research and application. We offer a collaborative work environment where innovation is encouraged, and your contributions will be valued. You will have the opportunity to work on high-risk, high-reward projects that tackle real-world challenges. We provide competitive compensation and benefits, along with opportunities for professional growth and development. Join us in shaping the future of AI-powered solutions in cloud observability and security.
Apply now or save it for later. Get alerts for similar jobs at Datadog.