LeethubLeethub
JobsCompaniesBlog
Go to dashboard

Leethub

Curated tech jobs from FAANG and top companies worldwide.

Top Companies

  • Google Jobs
  • Meta Jobs
  • Amazon Jobs
  • Apple Jobs
  • Netflix Jobs
  • All Companies →

Job Categories

  • Software Engineering
  • Data, AI & Machine Learning
  • Product Management
  • Design & User Experience
  • Operations & Strategy
  • Remote Jobs
  • All Categories →

Browse by Type

  • Remote Jobs
  • Hybrid Jobs
  • Senior Positions
  • Entry Level
  • All Jobs →

Resources

  • Google Interview Guide
  • Salary Guide 2025
  • Salary Negotiation
  • LeetCode Study Plan
  • All Articles →

Company

  • Dashboard
  • Privacy Policy
  • Contact Us
© 2026 Leethub LLC. All rights reserved.
Home›Jobs›Together AI›Site Reliability Engineer
Together AI

About Together AI

Empowering corporate mentorship for effective learning

👥 21-100 employees📍 CityPlace, Toronto, ON💰 $1.7m
B2BHRLearningSaaSCommunity

Key Highlights

  • Founded in 2018, headquartered in Toronto, ON
  • Raised $1.7 million in seed funding
  • Partnerships with Heineken, Reddit, and 7-Eleven
  • 4 weeks paid vacation and competitive equity packages

Together is a corporate mentorship management platform founded in 2018, headquartered in CityPlace, Toronto, ON. The platform streamlines the mentorship lifecycle, facilitating connections among employees at companies like Heineken, Reddit, and 7-Eleven. With $1.7 million in seed funding, Together a...

🎁 Benefits

Together offers competitive salaries and equity packages, 4 weeks of paid vacation, and a comprehensive health, dental, and vision plan through Honeyb...

🌟 Culture

Together fosters a culture of autonomy and impact, allowing employees to take on significant responsibilities without bureaucratic constraints. The fo...

🌐 WebsiteAll 51 jobs →
Together AI

Site Reliability Engineer

Together AI • San Francisco

Posted 4d agoMid-LevelSite reliability engineer📍 San francisco
Apply Now →

Skills & Technologies

AnsibleTerraformKubernetesLinux

Overview

Together AI is hiring a Site Reliability Engineer to ensure the reliability and performance of user-facing services and production systems. You'll work with Ansible, Terraform, and Kubernetes to build and manage infrastructure. This role requires 2+ years of experience in SRE or a related field.

Job Description

Who you are

You have 2+ years of professional experience as a Site Reliability Engineer or in a related field, demonstrating a strong understanding of operational discipline and engineering principles. Your educational background includes a Bachelor's degree in Computer Science or a related field, or equivalent work experience. You possess knowledge of Ansible, including roles and playbooks, as well as Terraform and Kubernetes, which are essential for building and managing infrastructure. Your proficiency in programming and scripting languages allows you to automate processes effectively. You have direct experience in monitoring and observability practices, ensuring that systems are reliable and performant. Your familiarity with cloud services enhances your ability to manage scalable infrastructures. You thrive in collaborative environments, working well with cross-functional teams to achieve common goals.

Desirable

Experience with additional monitoring tools and practices would be a plus, as would familiarity with incident management systems like PagerDuty. A strong interest in algorithms and distributed systems will help you identify improvements in product architecture from reliability, performance, and availability perspectives.

What you'll do

As a Site Reliability Engineer at Together AI, you will be responsible for keeping all user-facing services and production systems running smoothly. You will participate in an on-call rotation to respond to production incidents, ensuring that any issues are addressed promptly. Your role will involve building and running infrastructure using tools like Ansible, Terraform, and Kubernetes, enabling the scaling of services to accommodate a massive number of concurrent users. You will also build monitoring systems to ensure the highest quality service for customers, designing and implementing operational processes such as deployments and upgrades. Debugging production issues across all services and levels of the stack will be a key part of your responsibilities, as will identifying improvements for the product architecture from a reliability, performance, and availability perspective. You will plan the growth of Together AI’s infrastructure, contributing to the overall success of the organization.

What we offer

Together AI offers a collaborative work environment where you can thrive as a Site Reliability Engineer. You will have the opportunity to work with cutting-edge technologies and contribute to the reliability of critical systems. The company values your input and encourages you to apply even if your experience doesn't match every requirement. We provide competitive compensation and benefits, fostering a culture of growth and development within the team. Join us in making a significant impact on the reliability and performance of our services.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Together AI.

Apply Now →Get Job Alerts