
Empowering corporate mentorship for effective learning
Together is a corporate mentorship management platform founded in 2018, headquartered in CityPlace, Toronto, ON. The platform streamlines the mentorship lifecycle, facilitating connections among employees at companies like Heineken, Reddit, and 7-Eleven. With $1.7 million in seed funding, Together a...
Together offers competitive salaries and equity packages, 4 weeks of paid vacation, and a comprehensive health, dental, and vision plan through Honeyb...
Together fosters a culture of autonomy and impact, allowing employees to take on significant responsibilities without bureaucratic constraints. The fo...

Together AI • San Francisco
Together AI is hiring a Senior Research Engineer focused on LLM Evaluation and Behavioral Analysis. You'll develop evaluation frameworks and pipelines to ensure model reliability and performance. This role requires expertise in machine learning and programming skills.
You have a strong background in machine learning and AI, with at least 5 years of experience in research or engineering roles focused on model evaluation and behavioral analysis. Your expertise in Python and familiarity with frameworks like TensorFlow and PyTorch enable you to build robust evaluation systems. You understand the intricacies of model behavior, including reasoning, tool use, and multi-step interactions, and are adept at identifying subtle failure modes in AI systems.
You possess a solid understanding of evaluation metrics and methodologies, allowing you to design high-quality behavioral test suites that accurately measure model performance. Your experience with SQL and data manipulation equips you to shape datasets effectively and influence model improvements based on empirical evidence. You thrive in collaborative environments, working closely with cross-functional teams to ensure that models behave intelligently and consistently in production.
Experience with CI/CD pipelines and automated testing frameworks is a plus, as is familiarity with A/B testing methodologies. You are comfortable working in fast-paced settings and can adapt to evolving project requirements while maintaining a focus on quality and reliability.
In this role, you will build and iterate on evaluation frameworks that measure model performance across various dimensions, including instruction following, function calling, and long-context reasoning. You will develop specialized evaluation suites that assess argument correctness, schema adherence, and tool selection, ensuring that models can handle complex tasks effectively. Your work will involve creating CI/CD automated pipelines for A/B comparisons, regression detection, and behavioral drift monitoring, which are crucial for maintaining high standards of model quality.
You will collaborate with training, post-training, inference, and product teams to identify regressions and shape datasets that drive model improvements. Your insights will directly influence how Together AI measures model quality and reliability across releases, making your contributions vital to the success of the organization.
Together AI offers a dynamic work environment where innovation and collaboration are at the forefront. You will have the opportunity to work on cutting-edge open-source-aligned LLMs and inference stacks, contributing to projects that have a significant impact on the AI landscape. We provide competitive compensation and benefits, along with opportunities for professional growth and development. Join us in shaping the future of AI and making a difference in the world of technology.
Apply now or save it for later. Get alerts for similar jobs at Together AI.