Staff Engineer - SDET, Pytest, Agentic Systems

NextGen Healthcare India

full-time

Required skills

LangChain
Python
AWS
API
Artificial Intelligence
Azure
CloudWatch
communication skills
DevOps
end-to-end
Google Cloud
Lambda
NLP
React
state management

About the role

NextGen Healthcare India

Website: nextgen.com
Job details:

We are seeking a highly skilled and performance-driven AI SDET (Software Development Engineer in Test) to join our engineering team. In this role, you will be responsible for defining and executing the technical quality assurance strategy for our Agentic AI solutions. You will focus on testing complex orchestrators and sub-agent architectures built on Python using open-source frameworks (e.g. Strands/LangChain ) and deployed on a managed execution infrastructure. Your primary mission is to ensure the reliability, accuracy, and efficiency of multi-agent systems while specifically optimizing for operational cost (token management) and system performance (latency and throughput).

Design and implement comprehensive test suites for multi-agent architectures, focusing on the seamless interaction between the central orchestrator and specialized sub-agents.
Utilize and test implementations within Agentic execution frameworks (e.g. ADK, Bedrock Agent core) and agents, ensuring robust tool-calling, state management, and memory retention.
Develop automated monitors and test cases to track token consumption; identify and mitigate "looping" behaviors or redundant calls that drive up API costs.
Conduct rigorous performance testing to measure and optimize end-to-end latency, specifically focusing on agent reasoning time versus API response time.
Build and maintain automated evaluation pipelines using metrics such as faithfulness, relevancy, and correctness (e.g., FAVES) to validate LLM outputs.
Test the decision-making capabilities of the orchestrator, ensuring it correctly routes tasks to sub-agents and handles edge cases or "agentic failures" gracefully.
Build from-scratch Python-based automation frameworks tailored for non-deterministic AI outputs, moving beyond standard assertion-based testing.
Integrate AI-specific testing gates into DevOps pipelines to ensure every deployment meets performance and cost benchmarks.
Partner with AI Research Scientists and Data Engineers to understand model behavior and provide feedback on prompt engineering and agent efficiency

Education Required

Bachelor’s or Master’s degree in Computer Science, Information Technology, Artificial Intelligence, or a related technical field

Experience Required

Total Experience: 5–9 years in Software Development Engineer in Test (SDET) or Quality Engineering roles.
AI/LLM Experience: Minimum 1+ years of hands-on experience in testing LLM-based applications, RAG pipelines, or Agentic workflows.
Framework Experience: Proven experience with AWS Bedrock Agent Core and/or Strands. Equivalent experience with LangChain, LangGraph, LlamaIndex, or Google ADK (Agent Development Kit) is highly acceptable.
Agentic Systems: Direct experience in building or testing systems involving multi-agent coordination, tool-use (function calling), and autonomous planning.
Cloud Experience: Strong familiarity with AWS services (Lambda, CloudWatch, Bedrock) or equivalent Google Cloud/Azure AI services.

Knowledge, Skills, and Abilities

High proficiency in Python, including experience with asynchronous programming.
Deep understanding of agentic patterns (ReAct, Plan-and-Execute) and the nuances of testing non-deterministic systems.
Ability to analyze logs and traces to identify bottlenecks in agent reasoning and suggest cost-saving measures in prompt design or model selection.
Proficiency with Pytest and experience with observability/tracing tools like LangSmith, AWS Cloudwatch, or AWS X-Ray.
Knowledge of NLP and LLM evaluation techniques, including the use of "LLM-as-a-judge" for grading complex sub-agent outputs.
Exceptional analytical skills to debug "hallucinations" or logical errors in the orchestrator’s planning phase.
Strong verbal and written communication skills, with the ability to articulate technical risks related to AI performance and cost to stakeholders.

Click on Apply to know more.

This page is fully interactive when JavaScript is enabled. Please enable JavaScript to apply or browse related roles.