
Empowering AI with robust infrastructure solutions
Nebius is a Nasdaq-listed company headquartered in Amsterdam, specializing in AI infrastructure solutions. With a team of around 400 engineers, Nebius provides large-scale GPU clusters and cloud platforms designed to support the rapid growth of the AI industry. The company has established R&D and co...
Nebius offers competitive equity packages, a flexible PTO policy, and opportunities for remote work. Employees also benefit from a learning budget to ...
Nebius fosters a culture centered around engineering excellence and innovation in AI infrastructure. The company values collaboration across its globa...

Nebius AI • Amsterdam, Netherlands
Nebius AI is seeking a Senior Software Engineer to join their Hardware Infrastructure Observability team. You'll design and develop services for monitoring server fleets and data center systems, utilizing skills in Python and Linux. This role is based in Amsterdam.
You have 5+ years of experience in software engineering, particularly in building and maintaining infrastructure observability systems. Your expertise in Python and Linux allows you to develop robust monitoring solutions that ensure the reliability of large-scale server fleets. You are familiar with containerization technologies such as Docker and orchestration tools like Kubernetes, which you have used to streamline deployment processes and enhance system performance. Your experience with monitoring tools like Prometheus and Grafana enables you to create insightful dashboards and alerts that help maintain system health. You thrive in collaborative environments, working closely with cross-functional teams to drive improvements and resolve incidents effectively. You are proactive in investigating issues and implementing root-cause fixes, ensuring that systems remain operational and efficient.
Experience with cloud infrastructure and AI/ML systems is a plus, as is familiarity with incident response protocols and debugging techniques. You are comfortable working in a fast-paced environment and are eager to learn new technologies that can enhance your contributions to the team.
As a Senior Software Engineer at Nebius, you will be responsible for designing and developing services and agents that provide deep visibility into a large server fleet and data center engineering systems. You will evolve metrics, aggregation, and alerting pipelines to improve signal quality and ensure that the infrastructure remains healthy. Your role will involve building maintenance workflows and automation processes that facilitate safe and predictable fleet-wide changes. You will also investigate incidents hands-on, including on-host debugging, and drive root-cause fixes to enhance system reliability. Collaboration with other engineers and teams will be key as you work to improve the overall performance and efficiency of the infrastructure.
Nebius offers a competitive salary and a comprehensive benefits package, along with opportunities for professional growth within the company. You will enjoy flexible working arrangements and be part of a dynamic and collaborative work environment that values initiative and innovation. As Nebius continues to grow and expand its products, you will have the chance to contribute to exciting projects that shape the future of AI cloud infrastructure.
Apply now or save it for later. Get alerts for similar jobs at Nebius AI.