ML Engineer, Large Language Models (LLM Training & Inference Optimization)

Nebius AI • Amsterdam, Netherlands; London, United Kingdom; Remote - Europe

Posted 2w ago🏠 Remote Senior Lead Machine learning engineer 📍 Amsterdam 📍 London

Apply Now →

Skills & Technologies

Distributed systems High-performance computing Machine learning Python TensorFlow PyTorch

Overview

Nebius AI is seeking a Senior Machine Learning Engineer to optimize training and inference performance for large language models. You'll work with distributed systems and high-performance computing technologies. This role requires expertise in machine learning and programming in Python.

Job Description

Who you are

You have 5+ years of experience in machine learning engineering, particularly in optimizing training and inference for large-scale models. Your background includes working with distributed systems and high-performance computing, allowing you to effectively manage multi-GPU and multi-node setups. You are proficient in Python and have hands-on experience with frameworks such as TensorFlow and PyTorch, enabling you to implement cutting-edge AI solutions. Your strong communication skills allow you to collaborate effectively with cross-functional teams, ensuring that AI products meet both technical and business requirements.

Desirable

Experience with large language models and their training processes is a plus. Familiarity with cloud computing platforms and AI infrastructure will help you excel in this role. You are also encouraged to bring innovative ideas to the table, contributing to the ongoing development of AI products at Nebius.

What you'll do

As a Senior Machine Learning Engineer at Nebius AI, you will be at the forefront of applied research and product development in AI. Your primary responsibility will be to optimize the performance of training and inference processes for large language models, ensuring they operate efficiently in a distributed environment. You will collaborate with a team of skilled engineers and researchers to develop AI-heavy products that address real-world challenges. Your role will involve designing and implementing algorithms that enhance the efficiency of model training, as well as conducting experiments to validate your approaches.

You will also be responsible for analyzing the performance of existing models and identifying areas for improvement. This may include scaling task data collection for reinforcement learning and maximizing the efficiency of LLM training on agentic trajectories. Your contributions will directly impact the capabilities of Nebius AI Studio, our inference and fine-tuning platform for AI models.

In addition to technical responsibilities, you will mentor junior engineers and contribute to a collaborative team culture that values innovation and initiative. You will have opportunities to present your findings and research to stakeholders, helping to shape the direction of our AI products.

What we offer

At Nebius, we provide a competitive salary and a comprehensive benefits package that supports your professional growth. You will have flexible working arrangements, allowing you to balance your personal and professional life effectively. Our dynamic and collaborative work environment encourages initiative and innovation, making it an exciting place to grow your career. As we expand our products and services, you will have the chance to work on cutting-edge technology that is shaping the future of AI and cloud computing. We encourage you to apply even if your experience doesn't match every requirement, as we value diverse perspectives and backgrounds.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Nebius AI.

Apply Now →Get Job Alerts

About Nebius AI

Key Highlights

🎁 Benefits

🌟 Culture