Member of Technical Staff - Data Quality Engineer (Pre-training)

Posted 3d ago🏛️ On-Site Mid-Level Data quality engineer 📍 San francisco

Overview

Reflection is seeking a Data Quality Engineer to ensure high standards for data used in training AI models. You'll collaborate with research teams to establish measurable quality signals. This role requires strong engineering fundamentals and a curiosity about data quality.

Job Description

Who you are

You have a strong engineering background and a deep curiosity about data quality and its impact on model performance. You understand the importance of data in AI innovation and are eager to ensure that the data used for training models meets high standards for quality and reliability. You thrive in collaborative environments, working closely with research and pre-training teams to translate requirements into measurable quality signals. You are detail-oriented and have experience in designing and validating automated QA methods to assess data quality across large campaigns.

Desirable

Experience with large language models (LLMs) and familiarity with various languages and modalities would be beneficial. You are comfortable working with external data vendors and providing actionable feedback to improve data quality. A background in AI or machine learning is a plus, as is experience in data management or quality assurance processes.

What you'll do

As a Data Quality Engineer at Reflection, you will own the upstream data quality for LLM pre-training. You will partner closely with research and pre-training teams to ensure that the data used for training meets the necessary quality standards. Your role will involve translating complex requirements into concrete, measurable quality signals that can be applied across large data campaigns. You will also design, validate, and scale automated QA methods to reliably measure data quality, ensuring that the data used in training is both high-quality and impactful.

You will be responsible for developing processes that incorporate human-in-the-loop feedback to enhance data quality. This will involve collaborating with external data vendors to provide actionable insights and feedback on data quality issues. You will play a crucial role in shaping how models perform on critical capabilities by ensuring that the data used is reliable and meets the high standards set by the team.

What we offer

At Reflection, we are committed to building a supportive and inclusive work environment. We offer fully paid parental leave for all new parents, including those on adoptive and surrogate journeys. Our benefits include financial support for family planning, paid time off when you need it, and relocation support. We also provide daily lunch and dinner for our team members, along with regular off-sites and team celebrations to foster connections among teammates. Join us in our mission to build open superintelligence and make it accessible to all.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Reflection.

Apply Now →Get Job Alerts

About Reflection

Key Highlights

🎁 Benefits

🌟 Culture