
Transforming global content for diverse audiences
Welocalize, founded in 1997 and headquartered in Midtown South, New York, NY, is the 4th largest language service provider in North America and the 9th globally. With over 250 language options, Welocalize transforms and localizes content for 2,000+ clients, including major brands like Disney, Uber, ...
Welocalize offers competitive salaries, equity options, generous PTO policies, and a remote-friendly work environment to support work-life balance. Em...
Welocalize fosters a culture of inclusivity and global awareness, emphasizing the importance of language diversity. The company is committed to levera...

Welocalize • Cairo, Egypt
Welocalize is seeking an Arabic (Levantine) AI Evaluation Specialist to support the testing and evaluation of an Arabic language model. You'll design prompts and evaluate AI responses to enhance language models. This role requires native-level fluency in Levantine Arabic and experience in AI evaluation.
You are a detail-oriented individual with native-level fluency in Levantine Arabic, possessing a Bachelor's degree or equivalent experience in Linguistics, Computational Linguistics, Communications, Technical Writing, or a related analytical field. You have a strong understanding of AI evaluation processes and are familiar with prompt engineering, linguistic QA, or translation. Your cultural familiarity with regional norms and high-context communication styles, particularly in the GCC region, enhances your ability to evaluate AI systems effectively. You are committed to continuous learning and are eager to attend webinars and training sessions to refine your skills.
In this role, you will be instrumental in refining and evaluating large language models (LLMs) by designing scenario-based and edge-case prompts to test AI behavior. You will develop evaluation rubrics to assess AI responses across various criteria, including instruction-following, factuality, tone, safety, refusals, and helpfulness. Your responsibilities will include performing side-by-side evaluations of AI outputs, scoring them on a defined scale, and creating high-quality source documents that serve as the single source of truth for testing. You will also write accurate and well-structured Golden Responses that adhere to instructions and handle ambiguity effectively. Your expertise will contribute significantly to building smarter, more reliable, and more helpful AI technology.
Welocalize offers a competitive pay rate of $10 USD per hour for this remote position based in Egypt. You will have the opportunity to work 40 hours a week, Monday through Friday, for a project duration of 3 months. This role provides a unique chance to engage with cutting-edge AI systems and contribute to the development of advanced language models. You will be part of a collaborative team that values your input and encourages professional growth through continuous learning and skill enhancement.
Apply now or save it for later. Get alerts for similar jobs at Welocalize.