
Transforming global content for diverse audiences
Welocalize, founded in 1997 and headquartered in Midtown South, New York, NY, is the 4th largest language service provider in North America and the 9th globally. With over 250 language options, Welocalize transforms and localizes content for 2,000+ clients, including major brands like Disney, Uber, ...
Welocalize offers competitive salaries, equity options, generous PTO policies, and a remote-friendly work environment to support work-life balance. Em...
Welocalize fosters a culture of inclusivity and global awareness, emphasizing the importance of language diversity. The company is committed to levera...

Welocalize • Cairo, Egypt
Welocalize is seeking an Arabic (Gulf) AI Evaluation Specialist to support the testing and evaluation of an Arabic language model. You'll design prompts and evaluate AI responses to enhance language model performance. This role requires native-level fluency in Gulf Arabic and experience in AI evaluation.
You are a detail-oriented individual with a Bachelor's degree or equivalent experience in Linguistics, Computational Linguistics, Communications, Technical Writing, or a related analytical field. Your native-level fluency in Gulf Arabic allows you to understand the nuances of the language and culture, which is essential for evaluating AI systems effectively. You have experience in AI evaluation, prompt engineering, or linguistic QA, and you are familiar with the regional norms and high-context communication styles prevalent in the GCC region. You are eager to learn and adapt, as you will be required to attend webinars and continuous learning sessions to stay updated on best practices in AI evaluation.
In this role, you will be instrumental in refining and evaluating large language models (LLMs) by designing scenario-based and edge-case prompts to test AI behavior. You will develop evaluation rubrics to assess AI responses across various criteria, including instruction-following, factuality, tone, safety, refusals, and helpfulness. Your responsibilities will include performing side-by-side evaluations of AI outputs and scoring them on a defined scale. You will also create high-quality source documents that serve as the single source of truth for testing and write accurate Golden Responses that handle ambiguity effectively. Your expertise will contribute to building smarter, more reliable, and helpful AI technology.
This position offers a competitive pay rate of $10 USD per hour, with a commitment of 40 hours a week, Monday to Friday. The project duration is three months, starting on February 2nd. You will have the opportunity to work remotely from Egypt, allowing for flexibility in your work environment. As part of the team, you will engage in continuous learning and development, enhancing your skills in AI evaluation and contributing to cutting-edge technology in the field.
Apply now or save it for later. Get alerts for similar jobs at Welocalize.