LeethubLeethub
JobsCompaniesBlog
Go to dashboard

Leethub

Curated tech jobs from FAANG and top companies worldwide.

Top Companies

  • Google Jobs
  • Meta Jobs
  • Amazon Jobs
  • Apple Jobs
  • Netflix Jobs
  • All Companies →

Job Categories

  • Software Engineering
  • Data, AI & Machine Learning
  • Product Management
  • Design & User Experience
  • Operations & Strategy
  • Remote Jobs
  • All Categories →

Browse by Type

  • Remote Jobs
  • Hybrid Jobs
  • Senior Positions
  • Entry Level
  • All Jobs →

Resources

  • Google Interview Guide
  • Salary Guide 2025
  • Salary Negotiation
  • LeetCode Study Plan
  • All Articles →

Company

  • Dashboard
  • Privacy Policy
  • Contact Us
© 2026 Leethub LLC. All rights reserved.
Home›Jobs›Sesame›ML Model Serving Engineer
Sesame

About Sesame

Affordable healthcare access without insurance hassles

🏢 Retail👥 101-200 employees📅 Founded 2018📍 Canal Street, New York, NY💰 $76.1m⭐ 4.2
HealthcareB2CB2BMarketplaceeCommerceMedTech

Key Highlights

  • Headquartered in New York, NY on Canal Street
  • $76.1 million raised in Series B funding
  • 101-200 employees, fostering a diverse workplace
  • Marketplace model for both B2C and B2B healthcare

Sesame is a healthcare marketplace platform headquartered on Canal Street in New York, NY, that enables patients to access high-quality medical care at affordable self-pay prices. With $76.1 million raised in Series B funding, Sesame has attracted significant investment, allowing users to search, co...

🎁 Benefits

Employees enjoy a flexible vacation policy, comprehensive health care coverage options, and the opportunity to work in a fun, international environmen...

🌟 Culture

Sesame fosters a unique culture focused on transparency and accessibility in healthcare, empowering patients to make informed decisions while simplify...

🌐 Website💼 LinkedIn𝕏 TwitterAll 23 jobs →
Sesame

ML Model Serving Engineer

Sesame • San Francisco

Posted 10 months ago🏛️ On-SiteMid-LevelMachine learning engineer📍 San francisco
Apply Now →

Job Description

About Sesame

Sesame believes in a future where computers are lifelike - with the ability to see, hear, and collaborate with us in ways that feel natural and human. With this vision, we're designing a new kind of computer, focused on making voice companions part of our daily lives. Our team brings together founders from Oculus and Ubiquity6, alongside proven leaders from Meta, Google, and Apple, with deep expertise spanning hardware and software. Join us in shaping a future where computers truly come alive.

Responsibilities:

  • Turbocharge our serving layer, consisting of a variety of LLM, speech, and vision models.

  • Partner with ML infrastructure and training engineers to build a fast, cost-effective, accurate, and reliable serving layer to power a new consumer product category.

  • Modify and extend LLM serving frameworks like VLLM and SGLang to take advantage of the latest techniques in high-performance model serving.

  • Work with the training team to identify opportunities to produce faster models without sacrificing quality.

  • Use techniques like in-flight batching, caching, and custom kernels to speed up inference.

  • Find ways to reduce model initialization times without sacrificing quality.

Required Qualifications:

  • Expert in some differentiable array computing framework, preferably PyTorch.

  • Expert in optimizing machine learning models for serving reliably at high throughput, with low latency.

  • Significant systems programming experience; ex. Experience working on high-performance server systems—you’d be just as comfortable with the internals of VLLM as you would with a complex PyTorch codebase.

  • Significant performance engineering experience; ex. Bottleneck analysis in high-scale server systems or profiling low-level systems code.

  • Always up to date on the latest techniques for model serving optimization.

Preferred Qualifications:

  • Familiarity with high-performance LLM serving; ex. experience with VLLM, SGlang deployment, and internals.

  • Experience with a public cloud platform such as GCP, AWS, or Azure.

  • Experience deploying and scaling inference workloads in the cloud using Kubernetes, Ray, etc.

  • You like to ship and have a track record of leading complex multi-month projects without assistance.

  • You’re excited to learn new things and work in a multitude of roles.

Sesame is committed to a workplace where everyone feels valued, respected, and empowered. We welcome all qualified applicants, embracing diversity in race, gender, identity, orientation, ability, and more. We provide reasonable accommodations for applicants with disabilities—contact careers@sesame.com for assistance.

Full-time Employee Benefits: 

  • 401k matching

  • 100% employer-paid health, vision, and dental benefits 

  • Unlimited PTO and sick time 

  • Flexible spending account matching (medical FSA) 

Benefits do not apply to contingent/contract workers

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Sesame.

Apply Now →Get Job Alerts