LeethubLeethub
JobsCompaniesBlog
Go to dashboard

Leethub

Curated tech jobs from FAANG and top companies worldwide.

Top Companies

  • Google Jobs
  • Meta Jobs
  • Amazon Jobs
  • Apple Jobs
  • Netflix Jobs
  • All Companies →

Job Categories

  • Software Engineering
  • Data, AI & Machine Learning
  • Product Management
  • Design & User Experience
  • Operations & Strategy
  • Remote Jobs
  • All Categories →

Browse by Type

  • Remote Jobs
  • Hybrid Jobs
  • Senior Positions
  • Entry Level
  • All Jobs →

Resources

  • Google Interview Guide
  • Salary Guide 2025
  • Salary Negotiation
  • LeetCode Study Plan
  • All Articles →

Company

  • Dashboard
  • Privacy Policy
  • Contact Us
© 2026 Leethub LLC. All rights reserved.
Home›Jobs›Together AI›Machine Learning Operations Lead
Together AI

About Together AI

Empowering corporate mentorship for effective learning

👥 21-100 employees📍 CityPlace, Toronto, ON💰 $1.7m
B2BHRLearningSaaSCommunity

Key Highlights

  • Founded in 2018, headquartered in Toronto, ON
  • Raised $1.7 million in seed funding
  • Partnerships with Heineken, Reddit, and 7-Eleven
  • 4 weeks paid vacation and competitive equity packages

Together is a corporate mentorship management platform founded in 2018, headquartered in CityPlace, Toronto, ON. The platform streamlines the mentorship lifecycle, facilitating connections among employees at companies like Heineken, Reddit, and 7-Eleven. With $1.7 million in seed funding, Together a...

🎁 Benefits

Together offers competitive salaries and equity packages, 4 weeks of paid vacation, and a comprehensive health, dental, and vision plan through Honeyb...

🌟 Culture

Together fosters a culture of autonomy and impact, allowing employees to take on significant responsibilities without bureaucratic constraints. The fo...

🌐 WebsiteAll 50 jobs →
Together AI

Machine Learning Operations Lead

Together AI • San Francisco

Posted 2w ago🏛️ On-SiteLeadMlops engineer📍 San francisco
Apply Now →

Skills & Technologies

AWSKubernetesPython

Overview

Together AI is seeking a Machine Learning Operations Lead to oversee the ML API offerings and ensure operational excellence. You'll work with AWS, Kubernetes, and Python to optimize ML processes and tooling at production scale. This role is based in San Francisco and requires a strong technical background in MLOps.

Job Description

About the Role

Together AI is building the AI Inference & Model Shaping Platform that brings the most advanced generative AI models to the world. Our platform powers multi-tenant serverless workloads and dedicated endpoints, enabling developers, enterprises, and researchers to harness the latest LLMs, multimodal models, image, audio, video, and reasoning models at scale.

We are looking for an exceptional MLOps Engineering Lead to partner closely with our cross-functional engineering, infrastructure, research, and sales teams to ensure excellence of our ML API offerings. Your primary focus will be on delivering world-class inference and fine-tuning in our public APIs and customer deployments by building automation and operations processes.

This role is ideal for a highly motivated and technically adept individual who excels in fast-paced, dynamic environments. You will be in charge of designing and scaling our ML processes & tooling at production scale – optimizing operations to ensure availability and reliability for our services, across differing tenants and user loads, and in a multi-cluster deployment. You will serve as a passionate advocate for internal and external customers, providing feedback to the wider engineering and infrastructure teams to improve our systems and core business metrics. If you thrive in a collaborative, problem-solving environment and are driven to deliver operational excellence, we encourage you to apply for this exciting opportunity.

Responsibilities

  • Own availability and performance SLAs for production inference and fine-tuning services across serverless and dedicated deployments
  • Own & improve testing, deployment, configuration management, and monitoring practices for multi-cluster ML infrastructure – partnering closely with Infra SREs
  • Build self-serve tooling and automation to reduce operational toil and enable internal users (MLOps, customer experience) and self-serve offerings
  • Define and enforce configuration best practices for inference engines (vLLM, tvLLM, Pulsar) to prevent runtime issues
  • Lead incident response, conduct postmortems, and drive reliability improvements
  • Hire, mentor, and grow an MLOps engineering team
  • Partner with infrastructure and ML engineering teams to improve system reliability and cost efficiency

Requirements

  • 5+ years operating production ML inference or training systems at scale
  • 2+ years leading engineering teams, with experience building teams from scratch
  • Deep expertise with Kubernetes, multi-cluster orchestration, and ML serving frameworks
  • Strong track record owning production SLAs (e.g. availability, TTFT, TPS) 
  • Experience with LLM inference serving systems (vLLM, TRT-LLM, or similar)
  • Ability to influence cross-functional teams and make deployment/architecture decisions

Nice to Have

  • Experience building internal developer platforms or self-serve tooling
  • Background in cost optimization for GPU infrastructure
  • Contributions to open-source ML infrastructure projects

Compensation

We offer competitive compensation, startup equity, health insurance and other competitive benefits. The US base salary range for this full-time position is: $160,000 - $280,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

Equal Opportunity

Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.

Please see our privacy policy at https://www.together.ai/privacy  

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Together AI.

Apply Now →Get Job Alerts