
AI solutions built for enterprise trust and security
Cohere, headquartered in Grange Park, Toronto, ON, specializes in enterprise-grade AI solutions tailored for regulated industries such as banking and telecom. With $1.5 billion in funding, Cohere has secured contracts with major clients including Royal Bank of Canada, Fujitsu, and Oracle, providing ...
Cohere offers comprehensive benefits including 100% coverage for health, dental, and vision insurance premiums, a $2,000 annual education benefit, six...
Cohere's culture emphasizes security and trust in AI adoption, focusing on enterprise needs rather than consumer trends. The company prioritizes a sup...

Cohere • Toronto
Cohere is hiring a Site Reliability Engineer to develop and operate AI platforms for advanced NLP applications. You'll work with technologies like AWS, Docker, and Kubernetes to ensure high-performance and reliable machine learning systems. This role requires experience in deploying scalable systems and a strong understanding of API management.
You have a strong background in site reliability engineering, with experience in building and maintaining high-performance, scalable systems. You understand the intricacies of deploying machine learning models and have a solid grasp of cloud infrastructure, particularly AWS. Your expertise in containerization technologies like Docker and orchestration tools such as Kubernetes allows you to manage complex deployments effectively. You are proficient in programming languages like Python, enabling you to automate processes and enhance system reliability. You thrive in collaborative environments, working closely with cross-functional teams to deliver optimized solutions that meet customer needs. You are passionate about AI and its potential to transform industries, and you are eager to contribute to innovative projects that push the boundaries of technology.
Experience with monitoring and alerting tools, as well as familiarity with REST APIs, will be beneficial in this role. A background in natural language processing (NLP) or machine learning will set you apart as you work on cutting-edge AI applications.
As a Site Reliability Engineer at Cohere, you will be responsible for developing, deploying, and operating the AI platform that delivers large language models through user-friendly API endpoints. You will collaborate with various teams to ensure that NLP models are deployed in low-latency, high-throughput environments, maintaining high availability and performance standards. Your role will involve optimizing system performance, troubleshooting issues, and implementing best practices for reliability and scalability. You will also engage with customers to understand their needs and provide tailored solutions that enhance their experience with our AI products. Your contributions will directly impact the efficiency and effectiveness of our AI systems, helping to drive the widespread adoption of AI technologies.
Cohere provides a supportive work environment that values mental health and well-being, offering benefits such as a separate budget for mental health care and a 100% parental leave top-up for up to six months. We encourage personal enrichment through benefits towards arts and culture, fitness, and workspace improvement. Our flexible remote work policy allows you to choose between working from our offices in Toronto, New York, San Francisco, London, or Paris, or from the comfort of your home. You will enjoy a generous vacation policy, with six weeks of vacation (30 working days) to recharge and pursue personal interests. Join us at Cohere and be part of a team that is shaping the future of AI.
Apply now or save it for later. Get alerts for similar jobs at Cohere.