
Empowering corporate mentorship for effective learning
Together is a corporate mentorship management platform founded in 2018, headquartered in CityPlace, Toronto, ON. The platform streamlines the mentorship lifecycle, facilitating connections among employees at companies like Heineken, Reddit, and 7-Eleven. With $1.7 million in seed funding, Together a...
Together offers competitive salaries and equity packages, 4 weeks of paid vacation, and a comprehensive health, dental, and vision plan through Honeyb...
Together fosters a culture of autonomy and impact, allowing employees to take on significant responsibilities without bureaucratic constraints. The fo...

Together AI • San Francisco
Together AI is seeking a Senior Backend Engineer to build and optimize their Inference Platform for advanced generative AI models. You'll work with technologies like Python, Docker, and AWS to enhance performance and scalability. This role requires strong experience in backend engineering and machine learning.
You have 5+ years of backend engineering experience, particularly in building production systems that leverage advanced AI models. Your expertise includes optimizing performance and scalability, ensuring that applications run efficiently on a large scale. You thrive in environments where you can take deep technical ownership and make impactful contributions to the team.
Your technical skills include proficiency in Python and experience with containerization technologies like Docker and orchestration tools such as Kubernetes. You understand the intricacies of cloud platforms, particularly AWS, and how to leverage them for high-performance applications. You are also familiar with machine learning concepts and have experience working with generative AI models, which allows you to collaborate effectively with research teams.
You are passionate about contributing to the open-source community and have experience with projects that enhance inference performance and efficiency. You enjoy solving complex problems related to global request routing, load balancing, and resource allocation, and you have a knack for optimizing latency to ensure the best user experience.
Experience with NVIDIA Dynamo and OpenAI API is a plus, as it aligns with the technologies used in the Inference Platform. Familiarity with large-scale GPU utilization and performance optimization techniques will set you apart.
In this role, you will be responsible for shaping the core inference backbone that powers Together AI's frontier models. You will work hands-on with cutting-edge hardware, including tens of thousands of GPUs, to optimize their performance and ensure they are fully utilized. Your work will directly impact the efficiency and accessibility of generative AI models for developers, enterprises, and researchers.
You will collaborate closely with world-class researchers to bring new model architectures into production, ensuring that the latest advancements in AI are effectively integrated into the platform. Your role will involve addressing performance-critical challenges, such as optimizing global request routing and load balancing, to enhance the overall user experience.
You will also contribute to and leverage open-source projects like SGLang and vLLM, pushing the boundaries of inference performance and efficiency. Your contributions will help shape the tools that advance the industry and make generative AI more accessible to a wider audience.
Together AI offers competitive compensation, equity, and benefits, reflecting the value of your contributions to the team. You will be part of a culture that emphasizes deep technical ownership and high impact, where your work will make a significant difference in the field of AI. Join us in our mission to bring the most advanced generative AI models to the world and help shape the future of technology.
Apply now or save it for later. Get alerts for similar jobs at Together AI.