
Unlocking knowledge with AI for smarter organizations
ReflectionAI, headquartered in Brooklyn, New York, provides an AI-driven knowledge management platform that leverages natural language processing to transform unstructured information from meetings, documents, and conversations into a searchable knowledge base. With a focus on enhancing productivity...
Employees at ReflectionAI enjoy competitive salaries, equity options, flexible remote work policies, and generous PTO to maintain a healthy work-life ...
ReflectionAI fosters a culture of innovation and collaboration, encouraging employees to contribute ideas and solutions while prioritizing work-life b...

Reflection • SF
Reflection is seeking a Member of Technical Staff - Data Ingestion Engineer to build and operate large-scale data ingestion systems. You'll work with technologies like Apache, Airflow, and AWS to enhance data quality for AI models. This role requires experience in data engineering and distributed systems.
You have a strong background in data engineering, with experience building and operating large-scale data ingestion systems. Your expertise in Python and SQL allows you to manipulate and analyze data effectively, ensuring high-quality datasets for AI training. You are familiar with tools like Apache and Airflow, which you have used to streamline data workflows and improve efficiency. You thrive in collaborative environments, working closely with researchers and engineers to understand data needs and optimize ingestion processes. You are comfortable running experiments to evaluate different data acquisition strategies and are adept at analyzing results to drive improvements. You have a keen eye for detail, identifying gaps and redundancies in ingested data to enhance overall data quality.
Experience with cloud platforms such as AWS is a plus, as it enables you to leverage scalable infrastructure for data processing. Familiarity with distributed systems and web crawling techniques will further enhance your ability to build robust ingestion pipelines. You are open to learning new technologies and methodologies, continuously seeking ways to improve your skills and contribute to the team's success.
In this role, you will be responsible for building and operating the ingestion systems that transform large-scale data sources into structured datasets for AI model training. You will work on web crawling, data extraction, and dataset delivery, ensuring that the data collected is reliable and well-structured. You will collaborate with the pre-training and data quality teams to close the loop between data collection and model performance, iterating quickly based on measurable impact. Your work will involve running experiments to evaluate different crawling strategies and extraction methods, analyzing the ingested data to identify areas for improvement. You will also be tasked with maintaining and optimizing existing ingestion systems, ensuring they operate efficiently and effectively.
At Reflection, we provide a supportive and inclusive work environment where you can thrive. We offer competitive compensation and benefits, including fully paid parental leave and financial support for family planning. Our team enjoys a healthy work-life balance with generous paid time off and relocation support. You will have opportunities to connect with teammates through daily lunches and regular team celebrations, fostering a strong sense of community within the company. Join us in our mission to build open superintelligence and make it accessible to all.
Apply now or save it for later. Get alerts for similar jobs at Reflection.