You can run flawless interviews and still make a terrible decision in the forty-five minutes where everyone compares notes. The calibration meeting deserves as much design as the interviews.
Picture the scene, because you have lived it: four interviewers, one conference room, a shortlist decision due. The most senior person opens with "so — thoughts on Rajiv?" and the next forty minutes are a drift of impressions, anecdotes and gentle convergence toward whatever the first confident voice proposed. Weeks of structured assessment, settled by a meeting with no structure at all.
The calibration meeting is where assessment evidence becomes a decision — or gets overwritten by hierarchy and recency. It deserves the same design rigour as everything upstream.
The non-negotiables
- Scores before speech. Every assessor submits written, evidence-cited ratings against the scorecard *before* the meeting. The meeting compares documents, not memories. This single rule prevents the anchoring cascade in which the first opinion becomes the room's opinion.
- Reverse-seniority airtime. The most junior assessor speaks first on each competency, the hiring principal last. Run it the other way and you are not calibrating; you are collecting agreement.
- Evidence language only. The chair's job is one relentless translation: "tell me what you saw or heard that supports that." "I'm not sure about her fit" must become specific or be set aside. Impressions are admissible only as hypotheses that evidence can test.
- Competency by competency, not candidate by candidate. Walk through the scorecard dimensions across all finalists rather than holistic verdicts per person. It keeps the comparison anchored to the role and surfaces exactly where candidates differ — which is usually narrower and more decidable than overall impressions suggest.
- Divergence is the agenda. Where scores agree, move on. Where they split, slow down: a two-point spread between assessors is either an evidence gap, an interpretation difference, or a genuine candidate inconsistency across rooms. All three are decision-critical information. Meetings that smooth over divergence to finish on time are discarding their most valuable data.
Failure modes to design against
- The advocate problem. Interviewers who "found" or championed a candidate argue like counsel, not assessors. Name the dynamic openly and weight their evidence, not their enthusiasm.
- Recency and contrast effects. The last candidate interviewed and the candidate-after-a-weak-candidate both get inflated. Side-by-side scorecard comparison is the corrective.
- Vetoes without evidence. A senior member's unexplained "no" should carry exactly as much weight as the evidence attached to it: none, until articulated. Boards that allow naked vetoes are running an oligarchy with assessment theatre attached.
- The consensus trap. The goal is not unanimity; it is a decision the evidence supports. Recording a documented dissent is healthier than manufacturing agreement — and it improves the post-hire review enormously.
Closing the loop
Two practices convert calibration from meeting to system. First, end with explicit residual risks: every hire has them, and naming the top two — with how they will be tested in referencing or managed in onboarding — turns anxiety into a plan. Those risks should flow directly into reference questions and the probation design. Second, schedule the look-back: revisit the calibration record at the hire's one-year mark. Which judgements held? Which assessor reads what well? Committees that audit themselves get measurably better; committees that don't repeat themselves with confidence.
A well-run calibration meeting takes ninety disciplined minutes and is, in our experience, the single most improvable hour in most organisations' senior hiring. It is also a standard component of every mandate in our executive search practice — several of our case studies turn on a calibration discussion that caught what individual interviews missed. If your decision meetings still open with "so — thoughts?", talk to us. The redesign takes a week and pays back on the first hire.
Frequently asked questions
When should the calibration meeting happen?
Within 48 hours of the final assessment touch, with all written scores submitted beforehand. Longer delays let memory decay and corridor conversations pre-form the consensus the meeting is supposed to test.
What if assessors disagree sharply on a candidate?
Treat it as information, not friction: a sharp split signals an evidence gap, an interpretation difference, or genuine candidate inconsistency across rooms. Resolve it with targeted referencing or an additional structured touch — not by averaging or by seniority.
Should the hiring committee aim for consensus?
Aim for an evidence-supported decision, not unanimity. Documented dissent is healthy and improves post-hire reviews. Manufactured consensus usually means hierarchy, fatigue, or advocacy won — not that the evidence agreed.
Leaders you can bet the company on.
Talk to Humane Insights about your next leadership hire or challenge.
Book a conversation

