Overview
Why Hivel
Hivel is an AI-native engineering intelligence platform helping teams measure and improve software delivery speed, quality and impact.
We integrate deeply with GitHub, Jira, CI/CD systems, and developer tools to transform engineering activity into real-time insights that help leaders understand how software is actually built.
In the AI era, engineering output has increased dramatically - but visibility and trust in that output hasn’t. Hivel solves that by validating engineering signals at scale and turning them into reliable decision intelligence.
Why This Role
At Hivel, quality isn’t just about UI or APIs - it’s about validating the correctness, reliability, and trustworthiness of insights generated from complex engineering data and AI models.
This role is responsible for ensuring that both deterministic systems (data pipelines, analytics engines) and probabilistic systems (LLMs, AI insights, model outputs) behave correctly, consistently and predictably.
You will help build the validation layer that ensures engineering leaders can trust the insights they see.
What You’ll Do
Automation & Platform Quality
- Design scalable automation frameworks for backend, APIs, and integrations
- Build regression and validation suites across microservices and data pipelines
- Implement contract testing for internal and third-party APIs
- Develop simulation environments to test real-world engineering workflow scenarios
Data & Insight Validation
- Validate correctness of metrics derived from Git/Jira/CI/CD data
- Build automated data reconciliation and anomaly detection tests
- Test edge cases in derived analytics (cycle time, AI vs human code, productivity metrics)
- Create validation harnesses for pipeline transformations and aggregations
AI & LLM Evaluation
- Design evaluation frameworks for AI features like code review, telemetry analysis, and insight generation
- Build benchmark datasets and scoring pipelines for model output validation
- Implement automated checks for hallucinations, consistency, reasoning quality, and factual accuracy
- Compare model outputs across versions, prompts, and providers
- Define acceptance thresholds for AI output reliability
Release Quality Ownership
- Define quality gates for releases (data accuracy, inference correctness, system reliability)
- Monitor production signals for regressions or anomalies
- Collaborate with engineering to debug failures across services, pipelines, or models
What We’re Looking For
- 3 - 5 years of experience in QA / SDET / Automation Engineering
- Strong experience in API testing, backend systems, and automation frameworks
- Hands-on experience validating data pipelines or analytics platforms
- Ability to write validation queries and data integrity checks
- Strong debugging mindset across distributed systems
- Familiarity with probabilistic system testing principles
Preferred Experience
- Experience evaluating LLM outputs or testing AI-powered features
- Familiarity with LLM evaluation concepts (prompt testing, benchmark scoring, regression baselines)
- Experience testing data-intensive platforms or developer tools
- Exposure to observability, tracing, and reliability engineering
- Knowledge of CI/CD-driven testing and release validation
What You’ll Get
- Work on one of the deepest engineering analytics platforms being built today
- Build validation systems for both deterministic software and probabilistic AI
- Direct exposure to platform architecture, data systems, and AI pipelines
- Ownership of quality across the full stack - from ingestion to insights
- Opportunity to define how AI systems should be tested and trusted
- Fast-paced, ownership-driven environment with strong technical depth