Member of Technical Staff - Alignment Lead

Reflection • SF

Posted 8h ago🏛️ On-Site Lead Ai research engineer 📍 San francisco

Apply Now →

Skills & Technologies

Machine learning Python TensorFlow PyTorch Reinforcement learning

Overview

Reflection is seeking a Lead AI Research Engineer to drive the alignment stack for their AI models. You'll work with methodologies like RLHF and RLAIF, focusing on improving model performance. This role requires a graduate degree in Computer Science or related fields and deep technical expertise in alignment methodologies.

Job Description

Who you are

You hold a graduate degree (MS or PhD) in Computer Science, Machine Learning, or a related discipline, and possess a deep technical command of alignment methodologies such as PPO, DPO, and rejection sampling. Your experience includes scaling these methodologies to large models, showcasing your strong engineering skills and comfort with complex ML codebases and distributed systems.

You have a proven track record of improving model behavior through data, reward modeling, or reinforcement learning techniques. Your background includes owning ambitious research or engineering agendas that led to measurable improvements in model performance. You thrive in collaborative environments, working closely with cross-functional teams to achieve shared goals.

Desirable

Experience with synthetic data pipelines and optimizing large-scale RL pipelines for stability and efficiency would be a plus. Familiarity with curating high-quality training data and designing feedback loops that translate alignment research into generalizable model gains is also desirable.

What you'll do

In this role, you will drive the entire alignment stack, focusing on instruction tuning, RLHF, and RLAIF to enhance model accuracy and instruction following. You will lead research efforts to design next-generation reward models and optimization objectives that significantly improve human preference performance. Your responsibilities will include curating high-quality training data and designing synthetic data pipelines to address complex reasoning and behavioral gaps.

You will optimize large-scale reinforcement learning pipelines for stability and efficiency, ensuring rapid iteration cycles for model improvements. Collaboration will be key as you work closely with pre-training and evaluation teams to create tight feedback loops that translate alignment research into generalizable model gains. Your leadership will guide the team in pushing the boundaries of AI alignment methodologies.

What we offer

Reflection offers a supportive work environment with a mission to build open superintelligence accessible to all. We provide fully paid parental leave for all new parents, including adoptive and surrogate journeys, along with financial support for family planning. Our benefits include paid time off when needed, relocation support, and daily lunch and dinner provided for all employees. We also host regular off-sites and team celebrations to foster connections among teammates.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Reflection.

Apply Now →Get Job Alerts

About Reflection

Key Highlights

🎁 Benefits

🌟 Culture