Platform Engineer - Reliability & Scale

LangChain • San Francisco, CA

Posted 1d ago🏛️ On-Site Senior Platform engineer 📍 San francisco

Apply Now →

Skills & Technologies

AWS Docker Kubernetes Python PostgreSQL

Overview

LangChain is hiring a Senior Platform Engineer to architect and operate critical systems for AI observability and app deployments. You'll work with technologies like AWS, Docker, and Kubernetes to ensure reliability and scalability. This role requires strong experience in distributed systems and data-intensive applications.

Job Description

Who you are

You have 5+ years of experience in platform engineering, focusing on building and maintaining reliable systems that support high throughput and data-intensive applications. Your background includes working with distributed systems, and you have a strong understanding of the challenges involved in scaling these systems effectively. You are proficient in cloud technologies, particularly AWS, and have hands-on experience with containerization tools like Docker and orchestration platforms such as Kubernetes.

Your expertise extends to database management, where you have optimized queries and designed schemas for performance. You are skilled in programming languages such as Python, which you use to automate processes and enhance system reliability. You understand the importance of monitoring and alerting systems and have implemented solutions that ensure high uptime and quick recovery from incidents.

You are a problem solver at heart, capable of debugging complex issues and identifying performance bottlenecks in production environments. Your ability to influence technical decisions and shape platform strategies is a key asset, as you work collaboratively with cross-functional teams to drive improvements and innovations.

Desirable

Experience with observability tools and practices is a plus, as is familiarity with AI systems and their deployment in production environments. You are comfortable working in a startup culture and are excited about the opportunity to contribute to a growing company that is making a significant impact in the AI space.

What you'll do

In this role, you will join the platform engineering team at LangChain, where you will be responsible for designing and implementing systems that support the LangSmith and LangGraph Platform products. You will work on scaling critical systems that power AI observability and app deployments, ensuring that they can handle increasing loads while maintaining reliability.

You will drive the reliability of these systems by building monitoring, alerting, and automated recovery solutions that keep uptime high. Your role will involve solving complex problems, such as debugging performance bottlenecks and optimizing database queries to enhance system efficiency.

As you shape the platform strategy, you will influence technical decisions regarding infrastructure, tooling, and operational practices. You will respond to incidents and work on continuous improvement initiatives to enhance the overall performance and reliability of the systems.

What we offer

LangChain offers a competitive salary range of $175,000 to $225,000 for Senior Engineers, along with meaningful equity and comprehensive benefits, including health and dental coverage, flexible vacation, a 401(k) plan, and life insurance. We believe in fostering a supportive and inclusive work environment where you can thrive and grow your career. Join us in our mission to make intelligent agents ubiquitous and be part of a team that is trusted by millions of developers worldwide.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at LangChain.

Apply Now →Get Job Alerts

About LangChain

Key Highlights

🎁 Benefits

🌟 Culture