Descripción del trabajo

Role Overview

This position is for a Model Validator working on agentic AI systems. The role focuses on designing evaluation approaches, stress-testing agent behavior, and verifying that model outputs and tool usage align with business requirements.

Key Responsibilities

Build evaluation sets that can benchmark agent performance by tracing reasoning paths.
Carry out adversarial tests by presenting conflicting or challenging instructions to expose weak spots in the agent.
Run regression checks to measure how much agent behavior changes across test cycles.
Validate tool calls to ensure the agent is invoking the right external APIs and databases.
Review thought chains and pinpoint where the agent’s logic starts to drift from the business requirement document.
Apply judge LLMs to score and assess model outputs.
Use semantic debugging by analyzing the agent’s thought trace to identify decision issues.

Skills and Technical Expectations

Strong Python programming ability.
Hands-on familiarity with evaluation frameworks such as DeepEval and LangSmith.
Solid understanding of data- and SQL-driven testing methods.
Ability to work with model traces, reasoning paths, and output grading workflows.

Screening Criteria

SDET profiles are preferred because their coding and testing background suits evaluation development.
Prior domain exposure is an added advantage for this role.

Location and Work Mode

This is a full-time onsite opportunity based in Andhra Pradesh, India.

Model Validator - Agentic AI

Where you'll work