Engineering Metrics That Actually Predict Delivery Risk
The short version
A handful of metrics genuinely predict whether delivery is about to slip: the four DORA metrics (deploy frequency, lead time, change-failure rate, time to restore) plus a few flow signals — PR cycle time, review latency, and work-in-progress. Ignore vanity metrics like lines of code, commit counts, and story points "velocity." And never tie these to individuals.
Leaders want one honest answer: is the roadmap going to land, or is it quietly slipping? Most engineering dashboards don't answer it because they track activity, not flow and stability. Here are the metrics that actually carry signal — and how to use them without doing harm.
The four that matter most: DORA
Years of research on software delivery converged on four outcome metrics that correlate with both speed and stability:
- Deployment frequency — how often you ship to production. Infrequent, batched releases concentrate risk.
- Lead time for changes — commit to production. Long lead times mean slow feedback and big, risky changes.
- Change-failure rate — the share of deploys that cause a problem. Rising failure rate is an early warning that quality is eroding.
- Time to restore service — how fast you recover from an incident. Resilience matters more than never failing.
Together these balance speed against stability — which is the point. A team shipping fast with a climbing failure rate isn't healthy; neither is a stable team that ships once a quarter.
Flow metrics that give earlier warning
DORA tells you about outcomes; these tell you where things are clogging, often before the outcomes move:
- PR cycle time — open to merge. The most useful single flow metric; it exposes bottlenecks in review and rework.
- Review latency — how long PRs wait for a first review. Long waits are usually the hidden tax on delivery.
- Work in progress (WIP) — how many things are in flight at once. High WIP means lots of context-switching and little finishing.
- Batch size — how large changes are. Big PRs are slower to review and far riskier to ship.
The pattern that predicts a slip: rising PR cycle time and review latency, climbing WIP, and a creeping change-failure rate — usually weeks before a missed milestone shows up on the roadmap. Watch the leading indicators, not just the outcome.
The vanity metrics to ignore
- Lines of code — more code is a cost, not an achievement.
- Commit counts — easily gamed, measures nothing useful.
- Story-point "velocity" as a productivity score — points are a planning aid, not output; comparing velocity across teams is meaningless.
- Hours worked — measures input and burnout risk, not delivery.
Two rules for using metrics well
Measure teams and systems, never individuals. The moment a metric is used to rank or evaluate people, it gets gamed and trust collapses — and you lose the signal. Delivery metrics are for finding system bottlenecks, not grading engineers.
Use them as questions, not verdicts. A rising change-failure rate doesn't tell you what's wrong — it tells you where to look. Pair the numbers with the team's context before acting.
Pull the data, don't survey for it
These metrics already exist in your tools — GitHub, GitLab, Jira, and your CI/CD pipelines. Pulling them directly gives you an objective, continuous picture instead of a quarterly survey that arrives too late to change anything. That's also why a data-driven assessment can spot delivery risk that interviews miss.
See your real delivery health
Jimmlr pulls these metrics straight from your engineering tools and turns them into an objective view of delivery risk — with a prioritized plan to fix the bottlenecks.
Schedule a discovery call