Senior ML Ops / LLM Ops Engineer
This role focuses on building and operating the ML Ops / LLM Ops pipeline that closes it: ingest production signal, redact it, store it, slice it, classify it, surface the failures, mine new eval cases, and alert on regressions. You drive the toolchain decisions, the data-governance posture, and the day-to-day reliability of the pipeline itself. The Head of AI sets vision and priorities and you own the technical execution end-to-end.
May your next career adventure begin here.
Overall responsibilities and duties
- Design and build a source-agnostic ingestion pipeline for production ML / LLM traffic
- Design storage tiering based on automotive and company requirements, policy-driven retention windows, and privacy requirements
- Build slicing dashboards and the query path engineers use to debug production at 11p.m.
- Enable autoraters and lightweight LLM classifiers across production traffic
- Build the rule-based triage layer for obvious failures
- Stand up the eval-mining workflow and wire regression alerts to model and prompt deploys
- Implement PII redaction at the ingestion boundary and safety / abuse classification on inbound content
- Define dashboard architecture, wipeout mechanisms, tool and hosting selection, and operate the pipeline end-to-end
Qualifications
Must Have
- Proven experience building and operating data or ML platform systems in production, covering ingest, schema, storage, access control, alerts, and on-call
- Hands-on experience building and running ML / LLM evaluation systems in production (offline regression sets, online autoraters, LLM-as-judge pipelines, golden datasets)
- Hands-on experience with LLM tracing and observability tooling
- Experience shipping PII redaction or comparable data-handling controls in a regulated or multi-tenant environment, with a pragmatic approach to data governance
- Strong understanding of how ML and LLM-based systems fail in production: hallucination, retrieval failures, agent loops that don’t terminate, ASR / TTS degradation, and prompt or model regressions across deploys
- Production Python proficiency; hands-on engineer, not advisory. Comfortable leveraging AI in everything you build
Nice to Have
- Preferable multi-tenant or white-label SaaS experience with per-tenant data isolation
- Azure experience and ability to make self-host vs managed SaaS calls on tradeoffs
- Experience with autorater methodology and contamination defenses
- Knowledge of vector databases, embedding-based clustering or unsupervised failure-mode discovery
- Experience with data-versioning tooling (LakeFS, DVC, Delta Lake)
- GDPR / right-to-erasure work
- Embedded, automotive, or another constrained environment context
- Working knowledge of a language beyond English sufficient to validate non-English failure modes
- Prior experience using Cloud (Microsoft Azure and AWS);
- Prior experience with Claude Code;
- Prior experience with GitHub;
- Languages: Python primary, SQL, and some TypeScript for dashboards;
- LLM APIs: Claude (Anthropic), OpenAI, open-source models as needed
- Android/AAOS ecosystem as clients
What success looks like
First 30 days
- Toolchain benchmark complete, with recommendations made on analytics product, data-versioning layer, and hosting posture (self-host vs managed)
- Ingestion pipeline running end-to-end against a staging dataset, with PII redaction at the boundary
First 90 days
- P0 pipeline elements in production: ingestion, redaction, safety classification, storage, both dashboard surfaces (trace-level and aggregate-only), feedback channel, access control, and user-deletion paths
- First production-mined eval cases flowing into the eval suite via the human-review PII gate, with provenance intact
First 6 months
- Eval-set mining is a routine workflow, with new hard cases reaching the eval suite weekly
- Cost and usage attribution per tenant and per feature is reconciled monthly and informs product and commercial decisions
- Regression alerts are trusted by the AI team – not noisy, not silent – and tied to deploys
- The pipeline absorbs onboarding of additional tenants without rework; data governance is enforced, not aspirational; right-to-erasure works end-to-end
What you’ll enjoy at Appning
- Work From Home Allowance to support your home office setup and comfort.
- Comprehensive Health Insurance for every employee.
- Generous Time Off – Extra days off based on your years with us.
- A Day Off on Your Birthday to celebrate your way.
- Employee Workplace Program (EWP) designed to support your wellbeing and growth.
- Flexible Working Hours to help you balance work and life.
- Free Breakfast and Snacks at the office.
- Exclusive Platform Discounts through our Inspiring Benefits Program.
- Referral Program – Bring great people and get rewarded.
- Amazing Team Gatherings that you’ll actually look forward to.
How to Apply?
Send the following:
- CV
- One code sample or GitHub repo that demonstrates your platform or data engineering work.
