Every conversation with a VP of Engineering about AI training eventually arrives at the same question: "How do we measure ROI?" And almost every answer I have seen in the market is some version of lying with numbers.
The common claims: "30% productivity improvement." "2x faster code reviews." "50% reduction in development time." These numbers are either fabricated, cherry-picked from the best-case scenario, or measured so poorly that they are meaningless.
Here is how we actually measure it.
What you cannot measure (honestly)
You cannot directly measure "developer productivity gain from AI tools." The concept is not well-defined enough to measure. A developer's output in week 12 is different from week 8 not just because of AI tools — they also got more familiar with the codebase, the requirements got clearer, a blocking dependency was resolved, or they just had a better week. Isolating the AI-tool variable is nearly impossible in a real engineering organization.
Anyone who claims a clean "X% productivity gain" is either running a controlled academic study (rare and not generalizable) or making it up.
What you CAN measure
Instead of chasing a single productivity number, we track four proxy metrics that are individually imperfect but collectively paint a honest picture.
1. Tool adoption rate
The simplest metric and the one most organizations skip. What percentage of licensed engineers are actually using the tool at least weekly?
In our experience, an untrained organization settles at about 25-35% regular adoption. After structured training, that number moves to 70-85%. The delta tells you whether your training investment changed behavior. It does not tell you whether that behavior is productive — but if people are not using the tool, nothing else matters.
How to measure it: usage dashboards that track unique active users per week. Not sessions, not API calls — unique humans who used the tool for at least one substantive task. Most AI coding tools provide this data, or you can instrument it through your telemetry pipeline.
2. Internal support ticket reduction
Before AI tool rollout, your internal experts are answering the same questions repeatedly in Slack and email. After effective training, those questions should decrease because (a) people can use the AI tool to answer straightforward questions themselves, and (b) the training materials cover the common cases.
Realistic numbers: in a 200-person engineering org, we tracked internal "how do I" messages in dedicated Slack channels. Pre-training: roughly 45 per week. Six weeks post-training: roughly 20 per week. That is a 55% reduction in a specific, measurable category. It does not mean 55% productivity gain. It means your senior engineers got back roughly 10-15 hours per week that they were spending on internal support.
The math: 15 hours/week x $100/hr blended cost = $1,500/week = $78,000/year in recovered senior-engineer time. That number alone often exceeds the cost of the training program.
3. New-engineer onboarding time
This one takes longer to measure but is highly convincing to leadership. Track the time from a new engineer's first day to their first merged PR in each team, before and after AI tool training is part of the onboarding process.
Realistic numbers: we have seen teams go from a median of 12 working days to first merged PR down to 8 working days when Claude Code (or similar tools) plus a CLAUDE.md for the repo are part of the onboarding kit. That is a 33% improvement in a metric leadership already cares about. It is not "AI made them 33% faster." It is "AI tools reduced the time engineers spend reading unfamiliar code and understanding project conventions, which is the primary bottleneck for new hires."
4. Self-service resolution rate
What percentage of "I need help with X" situations does an engineer resolve without asking another human? This is hard to measure precisely, but you can approximate it by surveying engineers monthly: "In the past week, how many times did you use an AI tool to answer a question you would have otherwise asked a colleague?"
Realistic numbers: trained engineers report resolving 3-5 questions per week via AI tools that they would have previously asked a human. In a 200-person org, that is 600-1,000 avoided interruptions per week. Even if half of those are overestimates (people are bad at self-reporting), it is a meaningful reduction in interrupt-driven work.
Putting it together
No single metric tells the story. But when you present leadership with a dashboard showing:
- Adoption went from 30% to 78%
- Internal support questions dropped 55%
- New-hire onboarding shortened by 4 days
- Self-service resolution saves an estimated 500+ interruptions per week
That is a credible, defensible story. It does not require fake precision. It does not claim "30% productivity gain." It says: "We trained people to use the tools they already had licenses for, and here is what changed in four specific areas we can measure."
The uncomfortable truth
Most AI training vendors do not want to have this conversation because honest measurement sometimes shows modest results. Not every team sees dramatic improvement. Some teams were already doing fine. Some teams have work that AI tools are not yet good at.
We would rather show you honest numbers and let you decide whether the investment is worth it than manufacture a "10x ROI" slide that falls apart the moment your CFO asks how you calculated it.
How mature is your AI adoption?
Take our free 3-minute assessment. Score your organization on 8 dimensions and get a personalized 90-day action plan. No account required.
More from Context Courses
What happened when 200 engineers got Claude Code access with zero training
The actual adoption curve: 10% power users, 60% occasional, 30% never touched it. What changed when structured onboarding was added.
I stopped telling engineers to 'just try AI' and started giving them a failing test
Adoption went from 30% to 80% in the pilot group when we replaced open-ended exploration with a specific lab exercise.