LeethubLeethub
JobsCompaniesBlog
Go to dashboard

Leethub

Curated tech jobs from FAANG and top companies worldwide.

Top Companies

  • Google Jobs
  • Meta Jobs
  • Amazon Jobs
  • Apple Jobs
  • Netflix Jobs
  • All Companies →

Job Categories

  • Software Engineering
  • Data, AI & Machine Learning
  • Product Management
  • Design & User Experience
  • Operations & Strategy
  • Remote Jobs
  • All Categories →

Browse by Type

  • Remote Jobs
  • Hybrid Jobs
  • Senior Positions
  • Entry Level
  • All Jobs →

Resources

  • Google Interview Guide
  • Salary Guide 2025
  • Salary Negotiation
  • LeetCode Study Plan
  • All Articles →

Company

  • Dashboard
  • Privacy Policy
  • Contact Us
© 2026 Leethub LLC. All rights reserved.
Home›Jobs›Together AI›Senior Research Engineer, LLM Evaluation and Behavioral Analysis
Together AI

About Together AI

Empowering corporate mentorship for effective learning

👥 21-100 employees📍 CityPlace, Toronto, ON💰 $1.7m
B2BHRLearningSaaSCommunity

Key Highlights

  • Founded in 2018, headquartered in Toronto, ON
  • Raised $1.7 million in seed funding
  • Partnerships with Heineken, Reddit, and 7-Eleven
  • 4 weeks paid vacation and competitive equity packages

Together is a corporate mentorship management platform founded in 2018, headquartered in CityPlace, Toronto, ON. The platform streamlines the mentorship lifecycle, facilitating connections among employees at companies like Heineken, Reddit, and 7-Eleven. With $1.7 million in seed funding, Together a...

🎁 Benefits

Together offers competitive salaries and equity packages, 4 weeks of paid vacation, and a comprehensive health, dental, and vision plan through Honeyb...

🌟 Culture

Together fosters a culture of autonomy and impact, allowing employees to take on significant responsibilities without bureaucratic constraints. The fo...

🌐 WebsiteAll 51 jobs →
Together AI

Senior Research Engineer, LLM Evaluation and Behavioral Analysis

Together AI • San Francisco

Posted 2w agoSeniorAi research engineer📍 San francisco
Apply Now →

Skills & Technologies

PythonMachine learningTensorFlowPyTorchSQL

Overview

Together AI is hiring a Senior Research Engineer focused on LLM Evaluation and Behavioral Analysis. You'll develop evaluation frameworks and pipelines to ensure model reliability and performance. This role requires expertise in machine learning and programming skills.

Job Description

Who you are

You have a strong background in machine learning and AI, with at least 5 years of experience in research or engineering roles focused on model evaluation and behavioral analysis. Your expertise in Python and familiarity with frameworks like TensorFlow and PyTorch enable you to build robust evaluation systems. You understand the intricacies of model behavior, including reasoning, tool use, and multi-step interactions, and are adept at identifying subtle failure modes in AI systems.

You possess a solid understanding of evaluation metrics and methodologies, allowing you to design high-quality behavioral test suites that accurately measure model performance. Your experience with SQL and data manipulation equips you to shape datasets effectively and influence model improvements based on empirical evidence. You thrive in collaborative environments, working closely with cross-functional teams to ensure that models behave intelligently and consistently in production.

Desirable

Experience with CI/CD pipelines and automated testing frameworks is a plus, as is familiarity with A/B testing methodologies. You are comfortable working in fast-paced settings and can adapt to evolving project requirements while maintaining a focus on quality and reliability.

What you'll do

In this role, you will build and iterate on evaluation frameworks that measure model performance across various dimensions, including instruction following, function calling, and long-context reasoning. You will develop specialized evaluation suites that assess argument correctness, schema adherence, and tool selection, ensuring that models can handle complex tasks effectively. Your work will involve creating CI/CD automated pipelines for A/B comparisons, regression detection, and behavioral drift monitoring, which are crucial for maintaining high standards of model quality.

You will collaborate with training, post-training, inference, and product teams to identify regressions and shape datasets that drive model improvements. Your insights will directly influence how Together AI measures model quality and reliability across releases, making your contributions vital to the success of the organization.

What we offer

Together AI offers a dynamic work environment where innovation and collaboration are at the forefront. You will have the opportunity to work on cutting-edge open-source-aligned LLMs and inference stacks, contributing to projects that have a significant impact on the AI landscape. We provide competitive compensation and benefits, along with opportunities for professional growth and development. Join us in shaping the future of AI and making a difference in the world of technology.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Together AI.

Apply Now →Get Job Alerts