
Unlocking knowledge with AI for smarter organizations
ReflectionAI, headquartered in Brooklyn, New York, provides an AI-driven knowledge management platform that leverages natural language processing to transform unstructured information from meetings, documents, and conversations into a searchable knowledge base. With a focus on enhancing productivity...
Employees at ReflectionAI enjoy competitive salaries, equity options, flexible remote work policies, and generous PTO to maintain a healthy work-life ...
ReflectionAI fosters a culture of innovation and collaboration, encouraging employees to contribute ideas and solutions while prioritizing work-life b...

Reflection • SF
Reflection is seeking a Lead AI Research Engineer to drive the alignment stack for their AI models. You'll work with methodologies like RLHF and RLAIF, focusing on improving model performance. This role requires a graduate degree in Computer Science or related fields and deep technical expertise in alignment methodologies.
You hold a graduate degree (MS or PhD) in Computer Science, Machine Learning, or a related discipline, and possess a deep technical command of alignment methodologies such as PPO, DPO, and rejection sampling. Your experience includes scaling these methodologies to large models, showcasing your strong engineering skills and comfort with complex ML codebases and distributed systems.
You have a proven track record of improving model behavior through data, reward modeling, or reinforcement learning techniques. Your background includes owning ambitious research or engineering agendas that led to measurable improvements in model performance. You thrive in collaborative environments, working closely with cross-functional teams to achieve shared goals.
Experience with synthetic data pipelines and optimizing large-scale RL pipelines for stability and efficiency would be a plus. Familiarity with curating high-quality training data and designing feedback loops that translate alignment research into generalizable model gains is also desirable.
In this role, you will drive the entire alignment stack, focusing on instruction tuning, RLHF, and RLAIF to enhance model accuracy and instruction following. You will lead research efforts to design next-generation reward models and optimization objectives that significantly improve human preference performance. Your responsibilities will include curating high-quality training data and designing synthetic data pipelines to address complex reasoning and behavioral gaps.
You will optimize large-scale reinforcement learning pipelines for stability and efficiency, ensuring rapid iteration cycles for model improvements. Collaboration will be key as you work closely with pre-training and evaluation teams to create tight feedback loops that translate alignment research into generalizable model gains. Your leadership will guide the team in pushing the boundaries of AI alignment methodologies.
Reflection offers a supportive work environment with a mission to build open superintelligence accessible to all. We provide fully paid parental leave for all new parents, including adoptive and surrogate journeys, along with financial support for family planning. Our benefits include paid time off when needed, relocation support, and daily lunch and dinner provided for all employees. We also host regular off-sites and team celebrations to foster connections among teammates.
Apply now or save it for later. Get alerts for similar jobs at Reflection.