
Simplifying healthcare access for millions
Doctolib is the leading European platform for online medical appointment scheduling, serving over 17,000 healthcare professionals and connecting with 6 million patients monthly. Headquartered in Levallois-Perret, Île-de-France, Doctolib is present in 435 healthcare facilities across France and Germa...
Employees enjoy competitive salaries, stock options, generous PTO, and a flexible remote work policy, promoting a healthy work-life balance....
Doctolib fosters a culture centered around improving healthcare access, emphasizing technology-driven solutions and a commitment to user experience. T...

Doctolib • Paris, Paris, France
Doctolib is seeking a Senior Data Engineer focused on AI to build and optimize data foundations for AI models. You'll work with GCP and various data technologies to ensure high-quality data for healthcare applications.
You have 5+ years of experience as a Data Engineer, with a strong focus on building scalable data pipelines and ensuring data quality for AI applications. Your expertise in Google Cloud Platform (GCP) allows you to design and maintain data infrastructures that support machine learning and AI initiatives. You are familiar with both structured and unstructured data, and you understand how to integrate various data sources into unified models that can be utilized for AI consumption.
Your background includes working with NoSQL and Vector Databases, enabling you to efficiently store and retrieve embeddings and documents. You have a solid understanding of data governance and privacy, ensuring that the data you work with is compliant and reliable. You thrive in collaborative environments, working closely with machine learning and platform teams to define data schemas and partitioning strategies that enhance performance and scalability.
Experience with large language models (LLMs) and multimodal models is a plus, as is familiarity with data quality and lineage frameworks. You are comfortable optimizing data pipelines for performance and cost, leveraging GCP native services to achieve the best results.
In your role at Doctolib, you will be responsible for building and optimizing the data foundations within the AI Team. This includes designing, building, and maintaining scalable data pipelines on GCP tailored for AI and machine learning use cases. You will implement data ingestion and transformation frameworks that power retrieval systems and training datasets for LLMs and multimodal models.
You will ensure high standards of data quality for AI model inputs, collaborating with engineers and data scientists to facilitate efficient training, evaluation, and deployment of AI models. Your work will involve architecting and managing NoSQL and Vector Databases to store and retrieve data effectively, ensuring that the data is well-structured and compliant.
You will also integrate various data sources, including text, speech, images, and documents, into unified data models that are ready for AI consumption. Your role will require you to optimize the performance and cost of data pipelines using GCP services such as BigQuery, Dataflow, Pub/Sub, Cloud Storage, and Vertex AI. You will contribute to data quality and lineage frameworks, ensuring that AI models are trained on validated and reliable data.
At Doctolib, you will join a dedicated team on a mission to transform healthcare through AI. We offer a collaborative work environment where your contributions will have a direct impact on the healthcare industry. You will have the opportunity to work with cutting-edge technologies and be part of a team that values innovation and excellence. We encourage you to apply even if your experience doesn't match every requirement, as we believe in the potential of diverse backgrounds and perspectives.
Apply now or save it for later. Get alerts for similar jobs at Doctolib.