We are seeking an AI QA Specialist (Conversational & LLM Systems) to ensure the reliability, consistency, and measurable quality of our AI-driven conversational experiences.

In this role, you will define and maintain evaluation strategies for AI and LLM systems, creating and managing versioned datasets that cover core scenarios, edge cases, negative paths, and safety conditions. You will validate conversational behavior end to end—from intent recognition and slot extraction to state transitions, business rules, and tool or function-calling correctness.

You will play a key role in detecting regressions and evaluation drift as models or prompts evolve, defining meaningful metrics and thresholds (accuracy, precision, recall, F1), and providing clear quality signals to support release decisions. Working closely with QA and ML teams, you will integrate evaluation practices into CI/CD and help bring structure and determinism to inherently non-deterministic systems.

If you enjoy bringing rigor, data discipline, and analytical thinking to AI and conversational platforms—and ensuring quality at the intersection of language, logic, and system behavior—this may be your next mission.

May your next career adventure begin here.

Overall responsibilities and duties:

  • Define and maintain evaluation strategies for AI / LLM systems.
  • Create and manage versioned datasets (core, edge, negative, safety cases).
  • Validate conversational behavior:
    • intent matching and slot extraction.
    • state transitions and business rules.
    • tool / function calling correctness.
  • Detect regressions and evaluation drift across model or prompt changes.
  • Define metrics and thresholds (accuracy, precision, recall, F1).
  • Support release decisions with quality signals and reports.
  • Collaborate with QA and ML teams to integrate evaluation into CI/CD.

Qualifications:

  • 4+ years of experience in QA, data, ML, or AI-adjacent roles.
  • Hands-on experience testing AI / LLM or NLU-based systems.
  • Strong understanding of non-determinism and evaluation challenges.
  • Experience with structured outputs (JSON schemas, tool/function calling).
  • Strong analytical mindset and test data design skills.
  • Ability to define deterministic validation where possible.
  • Experience testing voice or conversational systems.
  • Background in data quality, analytics, or ML pipelines.
  • Experience with observability or monitoring.
  • Automotive / embedded / AAOS experience.
  • Scripting skills (Python preferred).

What you’ll enjoy at Appning:

  • Work From Home Allowance to support your home office setup and comfort.
  • Comprehensive Health Insurance for every employee.
  • Generous Time Off – Extra days off based on your years with us.
  • A Day Off on Your Birthday to celebrate your way.
  • Employee Workplace Program (EWP) designed to support your wellbeing and growth.
  • Flexible Working Hours to help you balance work and life.
  • Free Breakfast and Snacks at the office.
  • Exclusive Platform Discounts through our Inspiring Benefits Program.
  • Referral Program – Bring great people and get rewarded.
  • Amazing Team Gatherings that you’ll actually look forward to.

    Submit application

    Why do you want to join our team?