LeethubLeethub
JobsCompaniesBlog
Go to dashboard

Leethub

Curated tech jobs from FAANG and top companies worldwide.

Top Companies

  • Google Jobs
  • Meta Jobs
  • Amazon Jobs
  • Apple Jobs
  • Netflix Jobs
  • All Companies →

Job Categories

  • Software Engineering
  • Data, AI & Machine Learning
  • Product Management
  • Design & User Experience
  • Operations & Strategy
  • Remote Jobs
  • All Categories →

Browse by Type

  • Remote Jobs
  • Hybrid Jobs
  • Senior Positions
  • Entry Level
  • All Jobs →

Resources

  • Google Interview Guide
  • Salary Guide 2025
  • Salary Negotiation
  • LeetCode Study Plan
  • All Articles →

Company

  • Dashboard
  • Privacy Policy
  • Contact Us
© 2026 Leethub LLC. All rights reserved.
Home›Jobs›Cohere›Site Reliability Engineer, Inference Infrastructure
Cohere

About Cohere

AI solutions built for enterprise trust and security

🏢 Tech👥 501-1000 employees📅 Founded 2019📍 Grange Park, Toronto, ON💰 $1.5b⭐ 4
B2BArtificial IntelligenceMachine LearningSaaS

Key Highlights

  • Headquartered in Grange Park, Toronto, ON
  • $1.5 billion in funding from top investors
  • Clients include Royal Bank of Canada, Fujitsu, and Oracle
  • Focus on AI solutions for regulated industries

Cohere, headquartered in Grange Park, Toronto, ON, specializes in enterprise-grade AI solutions tailored for regulated industries such as banking and telecom. With $1.5 billion in funding, Cohere has secured contracts with major clients including Royal Bank of Canada, Fujitsu, and Oracle, providing ...

🎁 Benefits

Cohere offers comprehensive benefits including 100% coverage for health, dental, and vision insurance premiums, a $2,000 annual education benefit, six...

🌟 Culture

Cohere's culture emphasizes security and trust in AI adoption, focusing on enterprise needs rather than consumer trends. The company prioritizes a sup...

🌐 Website💼 LinkedInAll 123 jobs →
Cohere

Site Reliability Engineer, Inference Infrastructure

Cohere • Toronto

Posted 1d ago🏢 HybridMid-LevelSite reliability engineer📍 Toronto
Apply Now →

Skills & Technologies

AWSDockerKubernetesPythonREST API

Overview

Cohere is hiring a Site Reliability Engineer to develop and operate AI platforms for advanced NLP applications. You'll work with technologies like AWS, Docker, and Kubernetes to ensure high-performance and reliable machine learning systems. This role requires experience in deploying scalable systems and a strong understanding of API management.

Job Description

Who you are

You have a strong background in site reliability engineering, with experience in building and maintaining high-performance, scalable systems. You understand the intricacies of deploying machine learning models and have a solid grasp of cloud infrastructure, particularly AWS. Your expertise in containerization technologies like Docker and orchestration tools such as Kubernetes allows you to manage complex deployments effectively. You are proficient in programming languages like Python, enabling you to automate processes and enhance system reliability. You thrive in collaborative environments, working closely with cross-functional teams to deliver optimized solutions that meet customer needs. You are passionate about AI and its potential to transform industries, and you are eager to contribute to innovative projects that push the boundaries of technology.

Desirable

Experience with monitoring and alerting tools, as well as familiarity with REST APIs, will be beneficial in this role. A background in natural language processing (NLP) or machine learning will set you apart as you work on cutting-edge AI applications.

What you'll do

As a Site Reliability Engineer at Cohere, you will be responsible for developing, deploying, and operating the AI platform that delivers large language models through user-friendly API endpoints. You will collaborate with various teams to ensure that NLP models are deployed in low-latency, high-throughput environments, maintaining high availability and performance standards. Your role will involve optimizing system performance, troubleshooting issues, and implementing best practices for reliability and scalability. You will also engage with customers to understand their needs and provide tailored solutions that enhance their experience with our AI products. Your contributions will directly impact the efficiency and effectiveness of our AI systems, helping to drive the widespread adoption of AI technologies.

What we offer

Cohere provides a supportive work environment that values mental health and well-being, offering benefits such as a separate budget for mental health care and a 100% parental leave top-up for up to six months. We encourage personal enrichment through benefits towards arts and culture, fitness, and workspace improvement. Our flexible remote work policy allows you to choose between working from our offices in Toronto, New York, San Francisco, London, or Paris, or from the comfort of your home. You will enjoy a generous vacation policy, with six weeks of vacation (30 working days) to recharge and pursue personal interests. Join us at Cohere and be part of a team that is shaping the future of AI.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Cohere.

Apply Now →Get Job Alerts