
Empowering developers with advanced AI tools
Clarifai is a leading deep learning AI platform headquartered in Wilmington, DE, specializing in computer vision and artificial intelligence. With over $101.2 million raised in Series C funding, Clarifai empowers developers by providing API tools that automate image recognition and metadata tagging,...
Clarifai offers a work-from-home stipend, cell phone reimbursement, and comprehensive insurance including medical, dental, and vision. Employees enjoy...
Clarifai fosters a culture of innovation and collaboration, focusing on empowering developers with advanced AI tools. The company encourages continuou...

Clarifai • Remote (USA)
Clarifai is seeking a Senior Site Reliability Engineer to ensure the smooth operation and high availability of their AI platform. You'll work with Kubernetes, Python, and Golang to tackle infrastructure challenges. This role requires expertise in cloud infrastructure and microservice architecture.
You have 5+ years of experience in site reliability engineering, focusing on ensuring the availability and performance of distributed systems. Your background includes working with Kubernetes and cloud infrastructure, allowing you to effectively manage and orchestrate complex environments. You are proficient in programming languages such as Python and Golang, enabling you to develop tools and scripts that enhance system reliability. Your understanding of microservice architecture principles helps you design resilient systems that can scale efficiently. You are familiar with security best practices for cloud-based systems, ensuring that the infrastructure remains secure and compliant. Additionally, you have experience with relational databases and message queues, which are critical for maintaining data integrity and communication between services.
Knowledge of developing and building custom Kubernetes operators is a plus, as it allows for greater automation and efficiency in managing Kubernetes clusters. Familiarity with various RPC frameworks can enhance your ability to implement efficient communication between microservices. You are always eager to learn and adapt to new technologies, contributing to a culture of continuous improvement within the team.
In this role, you will be responsible for ensuring the smooth operation and high availability of Clarifai's core services. You will monitor system performance, identify bottlenecks, and implement solutions to enhance system reliability. Collaborating with engineering teams, you will address infrastructure challenges related to serving and training large neural networks in both cloud and on-premise environments. Your expertise will guide the development of best practices for incident management and response, ensuring that the team can quickly address any issues that arise. You will also play a key role in capacity planning, helping to forecast resource needs and optimize costs associated with cloud infrastructure.
As part of your responsibilities, you will develop and maintain CI/CD pipelines to streamline deployment processes and improve the overall efficiency of the development lifecycle. You will work closely with cross-functional teams to ensure that infrastructure changes align with product goals and user needs. Your contributions will directly impact the performance and reliability of Clarifai's AI platform, enabling organizations to leverage AI technology effectively.
Clarifai offers a collaborative and inclusive work environment where you can thrive as a Senior Site Reliability Engineer. You will have the opportunity to work on cutting-edge AI technology and contribute to projects that have a meaningful impact on various industries. We provide competitive compensation and benefits, along with opportunities for professional growth and development. Join us in our mission to empower organizations with AI-driven insights and solutions.
Apply now or save it for later. Get alerts for similar jobs at Clarifai.