LeethubLeethub
JobsCompaniesBlog
Go to dashboard

Leethub

Curated tech jobs from FAANG and top companies worldwide.

Top Companies

  • Google Jobs
  • Meta Jobs
  • Amazon Jobs
  • Apple Jobs
  • Netflix Jobs
  • All Companies →

Job Categories

  • Software Engineering
  • Data, AI & Machine Learning
  • Product Management
  • Design & User Experience
  • Operations & Strategy
  • Remote Jobs
  • All Categories →

Browse by Type

  • Remote Jobs
  • Hybrid Jobs
  • Senior Positions
  • Entry Level
  • All Jobs →

Resources

  • Google Interview Guide
  • Salary Guide 2025
  • Salary Negotiation
  • LeetCode Study Plan
  • All Articles →

Company

  • Dashboard
  • Privacy Policy
  • Contact Us
© 2026 Leethub LLC. All rights reserved.
Home›Jobs›Cohere›Staff Software Engineer, GPU Infrastructure (HPC)
Cohere

About Cohere

AI solutions built for enterprise trust and security

🏢 Tech👥 501-1000 employees📅 Founded 2019📍 Grange Park, Toronto, ON💰 $1.5b⭐ 4
B2BArtificial IntelligenceMachine LearningSaaS

Key Highlights

  • Headquartered in Grange Park, Toronto, ON
  • $1.5 billion in funding from top investors
  • Clients include Royal Bank of Canada, Fujitsu, and Oracle
  • Focus on AI solutions for regulated industries

Cohere, headquartered in Grange Park, Toronto, ON, specializes in enterprise-grade AI solutions tailored for regulated industries such as banking and telecom. With $1.5 billion in funding, Cohere has secured contracts with major clients including Royal Bank of Canada, Fujitsu, and Oracle, providing ...

🎁 Benefits

Cohere offers comprehensive benefits including 100% coverage for health, dental, and vision insurance premiums, a $2,000 annual education benefit, six...

🌟 Culture

Cohere's culture emphasizes security and trust in AI adoption, focusing on enterprise needs rather than consumer trends. The company prioritizes a sup...

🌐 Website💼 LinkedInAll 128 jobs →
Cohere

Staff Software Engineer, GPU Infrastructure (HPC)

Cohere • Canada

Posted 6h ago🏠 RemoteSeniorStaff engineer📍 Canada
Apply Now →

Skills & Technologies

PythonKubernetesAWSDockerMachine learningHpc

Overview

Cohere is hiring a Staff Software Engineer for their GPU Infrastructure team to build and operate superclusters for AI model training. You'll work with technologies like Python, Kubernetes, and AWS. This role requires expertise in high-performance computing (HPC) and cloud infrastructure.

Job Description

Who you are

You have extensive experience in software engineering, particularly in building and operating high-performance computing (HPC) systems — your background includes working with cloud infrastructure and managing complex deployments across multiple environments. You are proficient in Python and have hands-on experience with container orchestration tools like Kubernetes, which you have used to streamline workflows and enhance system reliability.

Your expertise extends to cloud platforms such as AWS, where you have designed and implemented scalable solutions that meet the demanding needs of AI workloads. You understand the intricacies of GPU infrastructure and are familiar with the challenges of deploying AI models at scale. You are comfortable participating in on-call rotations, ensuring that systems remain stable and performant.

You thrive in collaborative environments, working closely with AI researchers to understand their needs and translating them into robust infrastructure solutions. Your problem-solving skills are top-notch, and you enjoy tackling complex challenges that require innovative thinking and technical acumen. You are committed to continuous learning and staying updated with the latest advancements in AI and infrastructure technologies.

Desirable

Experience with machine learning frameworks and libraries is a plus, as is familiarity with observability tools that help monitor and optimize system performance. You may also have experience with other cloud providers or hybrid cloud solutions, which can enhance your contributions to the team.

What you'll do

As a Staff Software Engineer at Cohere, you will play a critical role in building and maintaining the infrastructure that supports our cutting-edge AI models. You will collaborate with a team of engineers and researchers to design superclusters that are capable of handling large-scale AI workloads. Your responsibilities will include optimizing resource allocation, ensuring system stability, and implementing best practices for cloud infrastructure management.

You will be involved in the entire lifecycle of infrastructure development, from initial design through deployment and ongoing maintenance. This includes writing code to automate processes, developing monitoring solutions to track system performance, and troubleshooting issues as they arise. Your work will directly impact the efficiency and effectiveness of our AI model training processes, enabling us to deliver high-quality products to our customers.

In addition to technical responsibilities, you will also mentor junior engineers, sharing your knowledge and expertise to help them grow in their roles. You will participate in code reviews and contribute to the development of best practices within the team. Your insights will help shape the future of our infrastructure and ensure that we remain at the forefront of AI technology.

What we offer

Cohere provides a supportive and inclusive work environment where you can thrive. We offer a flexible remote work policy, allowing you to balance your professional and personal life effectively. Our benefits include a generous vacation policy, mental health support, and parental leave top-ups to ensure that you feel valued and supported.

You will have access to resources for personal enrichment, including opportunities to engage in arts and culture, fitness, and well-being activities. We believe in fostering a culture of continuous learning and growth, and we encourage you to apply even if your experience doesn't match every requirement. Join us in our mission to scale intelligence and make a meaningful impact in the world of AI.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Cohere.

Apply Now →Get Job Alerts