
Building safe and reliable AI systems for everyone
Anthropic, headquartered in SoMa, San Francisco, is an AI safety and research company focused on developing reliable, interpretable, and steerable AI systems. With over 1,000 employees and backed by Google, Anthropic has raised $29.3 billion in funding, including a monumental Series F round of $13 b...
Anthropic offers comprehensive health, dental, and vision insurance for employees and their dependents, along with inclusive fertility benefits via Ca...
Anthropic's culture is rooted in AI safety and reliability, with a focus on producing less harmful outputs compared to existing AI systems. The compan...

Anthropic • San Francisco, CA | New York City, NY
Anthropic is seeking an Applied Safety Research Engineer to develop methods for evaluating AI safety. You'll work with machine learning and Python to design experiments that improve model evaluations. This role requires a research-oriented mindset and experience in applied ML.
You have a strong background in applied machine learning and engineering, with experience in designing experiments that enhance evaluation quality. You understand the importance of creating representative test data and simulating realistic user behavior to ensure model safety. Your analytical skills allow you to identify gaps in evaluation coverage and inform necessary improvements. You are comfortable working at the intersection of research and engineering, and you thrive in collaborative environments where you can contribute to meaningful AI safety initiatives.
Experience with safety evaluations in AI systems is a plus, as well as familiarity with user behavior analysis and grading accuracy validation. You are passionate about ensuring AI systems are safe and beneficial for users and society.
In this role, you will design and run experiments aimed at improving the quality of AI safety evaluations. You will develop methods to generate representative test data and simulate realistic user behavior, which are crucial for validating grading accuracy. Your work will involve analyzing how various factors impact model safety behavior, including multi-turn conversations and user diversity. You will also be responsible for productionizing successful research into evaluation pipelines that run during model training and launch, directly influencing how Anthropic understands and enhances the safety of its models.
Anthropic provides a collaborative work environment with a focus on building beneficial AI systems. You will have access to competitive compensation and benefits, including optional equity donation matching, generous vacation and parental leave, and flexible working hours. Our office in San Francisco is designed to foster collaboration among colleagues, and we are committed to creating a supportive workplace culture that values your contributions.
Apply now or save it for later. Get alerts for similar jobs at Anthropic.