Senior ML Ops / LLM Ops Engineer

This role focuses on building and operating the ML Ops / LLM Ops pipeline that closes it: ingest production signal, redact it, store it, slice it, classify it, surface the failures, mine new eval cases, and alert on regressions. You drive the toolchain decisions, the data-governance posture, and the day-to-day reliability of the pipeline itself. The Head of AI sets vision and priorities and you own the technical execution end-to-end.

May your next career adventure begin here.

Overall responsibilities and duties

  • Design and build a source-agnostic ingestion pipeline for production ML / LLM traffic
  • Design storage tiering based on automotive and company requirements, policy-driven retention windows, and privacy requirements
  • Build slicing dashboards and the query path engineers use to debug production at 11p.m.
  • Enable autoraters and lightweight LLM classifiers across production traffic
  • Build the rule-based triage layer for obvious failures
  • Stand up the eval-mining workflow and wire regression alerts to model and prompt deploys
  • Implement PII redaction at the ingestion boundary and safety / abuse classification on inbound content
  • Define dashboard architecture, wipeout mechanisms, tool and hosting selection, and operate the pipeline end-to-end

Qualifications

Must Have

  • Proven experience building and operating data or ML platform systems in production, covering ingest, schema, storage, access control, alerts, and on-call
  • Hands-on experience building and running ML / LLM evaluation systems in production (offline regression sets, online autoraters, LLM-as-judge pipelines, golden datasets)
  • Hands-on experience with LLM tracing and observability tooling
  • Experience shipping PII redaction or comparable data-handling controls in a regulated or multi-tenant environment, with a pragmatic approach to data governance
  • Strong understanding of how ML and LLM-based systems fail in production: hallucination, retrieval failures, agent loops that don’t terminate, ASR / TTS degradation, and prompt or model regressions across deploys
  • Production Python proficiency; hands-on engineer, not advisory. Comfortable leveraging AI in everything you build

Nice to Have

  • Preferable multi-tenant or white-label SaaS experience with per-tenant data isolation
  • Azure experience and ability to make self-host vs managed SaaS calls on tradeoffs
  • Experience with autorater methodology and contamination defenses
  • Knowledge of vector databases, embedding-based clustering or unsupervised failure-mode discovery
  • Experience with data-versioning tooling (LakeFS, DVC, Delta Lake)
  • GDPR / right-to-erasure work
  • Embedded, automotive, or another constrained environment context
  • Working knowledge of a language beyond English sufficient to validate non-English failure modes
  • Prior experience using Cloud (Microsoft Azure and AWS);
  • Prior experience with Claude Code;
  • Prior experience with GitHub;
  • Languages: Python primary, SQL, and some TypeScript for dashboards;
  • LLM APIs: Claude (Anthropic), OpenAI, open-source models as needed
  • Android/AAOS ecosystem as clients

What success looks like

First 30 days

  • Toolchain benchmark complete, with recommendations made on analytics product, data-versioning layer, and hosting posture (self-host vs managed)
  • Ingestion pipeline running end-to-end against a staging dataset, with PII redaction at the boundary

First 90 days

  • P0 pipeline elements in production: ingestion, redaction, safety classification, storage, both dashboard surfaces (trace-level and aggregate-only), feedback channel, access control, and user-deletion paths
  • First production-mined eval cases flowing into the eval suite via the human-review PII gate, with provenance intact

First 6 months

  • Eval-set mining is a routine workflow, with new hard cases reaching the eval suite weekly
  • Cost and usage attribution per tenant and per feature is reconciled monthly and informs product and commercial decisions
  • Regression alerts are trusted by the AI team – not noisy, not silent – and tied to deploys
  • The pipeline absorbs onboarding of additional tenants without rework; data governance is enforced, not aspirational; right-to-erasure works end-to-end

What you’ll enjoy at Appning

  • Work From Home Allowance to support your home office setup and comfort.
  • Comprehensive Health Insurance for every employee.
  • Generous Time Off – Extra days off based on your years with us.
  • A Day Off on Your Birthday to celebrate your way.
  • Employee Workplace Program (EWP) designed to support your wellbeing and growth.
  • Flexible Working Hours to help you balance work and life.
  • Free Breakfast and Snacks at the office.
  • Exclusive Platform Discounts through our Inspiring Benefits Program.
  • Referral Program – Bring great people and get rewarded.
  • Amazing Team Gatherings that you’ll actually look forward to.

How to Apply?

Send the following:

  • CV
  • One code sample or GitHub repo that demonstrates your platform or data engineering work.

    Submit application

    Why do you want to join our team?