Senior Backend Engineer, Inference Platform

Together AI • San Francisco

Posted 4d ago🏛️ On-Site Senior Backend engineer 📍 San francisco

Apply Now →

Skills & Technologies

Python Docker Kubernetes AWS Machine learning Nvidia dynamo OpenAI API

Overview

Together AI is seeking a Senior Backend Engineer to build and optimize their Inference Platform for advanced generative AI models. You'll work with technologies like Python, Docker, and AWS to enhance performance and scalability. This role requires strong experience in backend engineering and machine learning.

Job Description

Who you are

You have 5+ years of backend engineering experience, particularly in building production systems that leverage advanced AI models. Your expertise includes optimizing performance and scalability, ensuring that applications run efficiently on a large scale. You thrive in environments where you can take deep technical ownership and make impactful contributions to the team.

Your technical skills include proficiency in Python and experience with containerization technologies like Docker and orchestration tools such as Kubernetes. You understand the intricacies of cloud platforms, particularly AWS, and how to leverage them for high-performance applications. You are also familiar with machine learning concepts and have experience working with generative AI models, which allows you to collaborate effectively with research teams.

You are passionate about contributing to the open-source community and have experience with projects that enhance inference performance and efficiency. You enjoy solving complex problems related to global request routing, load balancing, and resource allocation, and you have a knack for optimizing latency to ensure the best user experience.

Desirable

Experience with NVIDIA Dynamo and OpenAI API is a plus, as it aligns with the technologies used in the Inference Platform. Familiarity with large-scale GPU utilization and performance optimization techniques will set you apart.

What you'll do

In this role, you will be responsible for shaping the core inference backbone that powers Together AI's frontier models. You will work hands-on with cutting-edge hardware, including tens of thousands of GPUs, to optimize their performance and ensure they are fully utilized. Your work will directly impact the efficiency and accessibility of generative AI models for developers, enterprises, and researchers.

You will collaborate closely with world-class researchers to bring new model architectures into production, ensuring that the latest advancements in AI are effectively integrated into the platform. Your role will involve addressing performance-critical challenges, such as optimizing global request routing and load balancing, to enhance the overall user experience.

You will also contribute to and leverage open-source projects like SGLang and vLLM, pushing the boundaries of inference performance and efficiency. Your contributions will help shape the tools that advance the industry and make generative AI more accessible to a wider audience.

What we offer

Together AI offers competitive compensation, equity, and benefits, reflecting the value of your contributions to the team. You will be part of a culture that emphasizes deep technical ownership and high impact, where your work will make a significant difference in the field of AI. Join us in our mission to bring the most advanced generative AI models to the world and help shape the future of technology.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Together AI.

Apply Now →Get Job Alerts

About Together AI

Key Highlights

🎁 Benefits

🌟 Culture