Member of Technical Staff - Data Ingestion Engineer

Reflection • SF

Posted 12h ago🏛️ On-Site Mid-Level Data engineer 📍 San francisco

Apply Now →

Skills & Technologies

Apache Airflow AWS Python SQL

Overview

Reflection is seeking a Member of Technical Staff - Data Ingestion Engineer to build and operate large-scale data ingestion systems. You'll work with technologies like Apache, Airflow, and AWS to enhance data quality for AI models. This role requires experience in data engineering and distributed systems.

Job Description

Who you are

You have a strong background in data engineering, with experience building and operating large-scale data ingestion systems. Your expertise in Python and SQL allows you to manipulate and analyze data effectively, ensuring high-quality datasets for AI training. You are familiar with tools like Apache and Airflow, which you have used to streamline data workflows and improve efficiency. You thrive in collaborative environments, working closely with researchers and engineers to understand data needs and optimize ingestion processes. You are comfortable running experiments to evaluate different data acquisition strategies and are adept at analyzing results to drive improvements. You have a keen eye for detail, identifying gaps and redundancies in ingested data to enhance overall data quality.

Desirable

Experience with cloud platforms such as AWS is a plus, as it enables you to leverage scalable infrastructure for data processing. Familiarity with distributed systems and web crawling techniques will further enhance your ability to build robust ingestion pipelines. You are open to learning new technologies and methodologies, continuously seeking ways to improve your skills and contribute to the team's success.

What you'll do

In this role, you will be responsible for building and operating the ingestion systems that transform large-scale data sources into structured datasets for AI model training. You will work on web crawling, data extraction, and dataset delivery, ensuring that the data collected is reliable and well-structured. You will collaborate with the pre-training and data quality teams to close the loop between data collection and model performance, iterating quickly based on measurable impact. Your work will involve running experiments to evaluate different crawling strategies and extraction methods, analyzing the ingested data to identify areas for improvement. You will also be tasked with maintaining and optimizing existing ingestion systems, ensuring they operate efficiently and effectively.

What we offer

At Reflection, we provide a supportive and inclusive work environment where you can thrive. We offer competitive compensation and benefits, including fully paid parental leave and financial support for family planning. Our team enjoys a healthy work-life balance with generous paid time off and relocation support. You will have opportunities to connect with teammates through daily lunches and regular team celebrations, fostering a strong sense of community within the company. Join us in our mission to build open superintelligence and make it accessible to all.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Reflection.

Apply Now →Get Job Alerts

About Reflection

Key Highlights

🎁 Benefits

🌟 Culture