
Empowering AI with robust infrastructure solutions
Nebius is a Nasdaq-listed company headquartered in Amsterdam, specializing in AI infrastructure solutions. With a team of around 400 engineers, Nebius provides large-scale GPU clusters and cloud platforms designed to support the rapid growth of the AI industry. The company has established R&D and co...
Nebius offers competitive equity packages, a flexible PTO policy, and opportunities for remote work. Employees also benefit from a learning budget to ...
Nebius fosters a culture centered around engineering excellence and innovation in AI infrastructure. The company values collaboration across its globa...

Nebius AI • Amsterdam, Netherlands; London, United Kingdom; Remote - Europe
Nebius AI is seeking a Senior Machine Learning Engineer to optimize training and inference performance for large language models. You'll work with distributed systems and high-performance computing technologies. This role requires expertise in machine learning and programming in Python.
You have 5+ years of experience in machine learning engineering, particularly in optimizing training and inference for large-scale models. Your background includes working with distributed systems and high-performance computing, allowing you to effectively manage multi-GPU and multi-node setups. You are proficient in Python and have hands-on experience with frameworks such as TensorFlow and PyTorch, enabling you to implement cutting-edge AI solutions. Your strong communication skills allow you to collaborate effectively with cross-functional teams, ensuring that AI products meet both technical and business requirements.
Experience with large language models and their training processes is a plus. Familiarity with cloud computing platforms and AI infrastructure will help you excel in this role. You are also encouraged to bring innovative ideas to the table, contributing to the ongoing development of AI products at Nebius.
As a Senior Machine Learning Engineer at Nebius AI, you will be at the forefront of applied research and product development in AI. Your primary responsibility will be to optimize the performance of training and inference processes for large language models, ensuring they operate efficiently in a distributed environment. You will collaborate with a team of skilled engineers and researchers to develop AI-heavy products that address real-world challenges. Your role will involve designing and implementing algorithms that enhance the efficiency of model training, as well as conducting experiments to validate your approaches.
You will also be responsible for analyzing the performance of existing models and identifying areas for improvement. This may include scaling task data collection for reinforcement learning and maximizing the efficiency of LLM training on agentic trajectories. Your contributions will directly impact the capabilities of Nebius AI Studio, our inference and fine-tuning platform for AI models.
In addition to technical responsibilities, you will mentor junior engineers and contribute to a collaborative team culture that values innovation and initiative. You will have opportunities to present your findings and research to stakeholders, helping to shape the direction of our AI products.
At Nebius, we provide a competitive salary and a comprehensive benefits package that supports your professional growth. You will have flexible working arrangements, allowing you to balance your personal and professional life effectively. Our dynamic and collaborative work environment encourages initiative and innovation, making it an exciting place to grow your career. As we expand our products and services, you will have the chance to work on cutting-edge technology that is shaping the future of AI and cloud computing. We encourage you to apply even if your experience doesn't match every requirement, as we value diverse perspectives and backgrounds.
Apply now or save it for later. Get alerts for similar jobs at Nebius AI.