
Building safe and reliable AI systems for everyone
Anthropic, headquartered in SoMa, San Francisco, is an AI safety and research company focused on developing reliable, interpretable, and steerable AI systems. With over 1,000 employees and backed by Google, Anthropic has raised $29.3 billion in funding, including a monumental Series F round of $13 b...
Anthropic offers comprehensive health, dental, and vision insurance for employees and their dependents, along with inclusive fertility benefits via Ca...
Anthropic's culture is rooted in AI safety and reliability, with a focus on producing less harmful outputs compared to existing AI systems. The compan...

Anthropic • San Francisco, CA | New York City, NY
Anthropic is seeking Software Engineers for their Safeguards team to develop safety mechanisms for AI systems. You'll work with Java and Python to build monitoring systems and abuse detection infrastructure. This role requires 5-10 years of experience in software engineering.
You have a Bachelor’s degree in Computer Science, Software Engineering, or comparable experience, along with 5-10+ years of experience in a software engineering position, preferably with a focus on safety mechanisms in AI systems. You are skilled in programming languages such as Java and Python, and you have a strong understanding of building robust systems that can monitor and enforce safety protocols effectively. You are detail-oriented and have experience in developing monitoring systems that can detect unwanted behaviors from API partners. You thrive in collaborative environments and are eager to work with researchers and analysts to improve AI safety.
Experience with machine learning frameworks and familiarity with AI safety principles would be a plus. You are comfortable analyzing user reports and have a proactive approach to identifying and mitigating risks associated with AI usage. You are passionate about building systems that prioritize user well-being and uphold ethical standards in technology.
As a Software Engineer on the Safeguards team, you will be responsible for developing monitoring systems that detect unwanted behaviors from our API partners and potentially take automated enforcement actions. You will surface these behaviors in internal dashboards for manual review by analysts. Your role will also involve building abuse detection mechanisms and infrastructure to surface abuse patterns to our research teams, helping to harden models at the training stage. You will work on creating robust and reliable multi-layered defenses for real-time improvement of safety mechanisms that work at scale. Additionally, you will analyze user reports of inappropriate content or accounts, ensuring that our AI systems operate within acceptable use policies.
At Anthropic, we offer competitive compensation and benefits, including optional equity donation matching, generous vacation and parental leave, and flexible working hours. You will have the opportunity to work in a lovely office space in San Francisco or New York City, collaborating with a diverse team of committed researchers, engineers, and policy experts. We encourage you to apply even if your experience doesn't match every requirement, as we value diverse perspectives and backgrounds in our mission to create safe and beneficial AI systems.
Apply now or save it for later. Get alerts for similar jobs at Anthropic.