Whitepaper

The Stori Trait Assessment: A Dual-Output Framework for Candidate Intelligence

Fraser Hill / Stori Labs — 2026

For decades, hiring has relied on proxies — resumes, behavioral interviews, and intuition. These tools have persisted not because they are accurate, but because they produce outcomes that appear acceptable. Most hires do not fail catastrophically. Most employees are “good enough.” And so the system has remained largely unchallenged.

Artificial intelligence has not created this problem. It has exposed it. Resumes are now generated and optimized by machines. Candidates rehearse with AI coaches. The signals that hiring processes were designed to read have become indistinguishable from noise.

This paper describes the measurement system designed to replace those proxies: a structured trait assessment and narrative interview framework that captures how people actually think, work, and perform — grounded in behavioral reality rather than self-reported history.

1. The problem with hiring today

The tools companies use to evaluate candidates have barely changed in fifty years. Resumes list credentials. Behavioral interviews ask candidates to recall polished highlights. Reference checks confirm employment dates. Each of these is a proxy — an indirect signal that correlates weakly, and often misleadingly, with actual job performance.

Research from Schmidt and Hunter (1998) established that unstructured interviews predict job performance at r = 0.38. Structured interviews improve to r = 0.51 — better, but still a coin flip dressed up as a process. Meanwhile, cognitive ability tests (r = 0.65) and personality assessments (r = 0.31 for conscientiousness alone, higher in combination) consistently outperform interviews, yet most companies still treat the interview as their primary evaluation tool.

The arrival of generative AI has made this worse. Resumes are now written by machines, making them indistinguishable. Candidates can rehearse with AI coaches. The 400-word resume — already a poor signal — has become noise.

The question is no longer how to interview better. It is whether the interview, as traditionally practiced, should remain the primary source of hiring data at all.

2. Eight years of original research

Following his work in executive search for leadership roles within a division of J.P. Morgan, Stori founder Fraser Hill spent eight years (2012–2020) conducting over 1,700 leadership interviews across banking, technology, and professional services. The research, published in The CEO's Greatest Asset (2020), examined what differentiates high performers from the rest — not on paper, but in how they think, decide, communicate, and respond to pressure.

The core finding: the traits that predict success are observable in conversation, but traditional interview formats are not designed to elicit them. Behavioral questions (“Tell me about a time when...”) invite rehearsed performance. They measure interview skill, not the underlying cognitive and behavioral patterns that drive results.

This research identified a set of twelve behavioral facets — observable, measurable dimensions of how people work — organized into four meta-traits. These became the foundation of the Stori Trait Assessment.

3. The Stori Trait Assessment (STA)

The STA is a dual-output questionnaire. From a single, six-minute experience, it expresses a person's self-reported disposition in two complementary languages:

Stori Traits

Four meta-traits and twelve facets, in Stori's applied language. As a self-report this is a self-portrait of disposition — and the self-report side of Stori's cross-validation. The operative Stori meta-traits are observed from the interview (see below).

Big Five Personality Profile

Openness, Conscientiousness, Extraversion, Agreeableness, Emotional Stability. The most widely studied personality framework in psychology, enabling cross-system benchmarking.

Both outputs derive from the same 120 adjective rankings. The Stori Traits tell you how someone works. The Big Five overlay tells you where they sit on the most widely studied personality framework in psychology. Together, they provide both practical utility and research-grounded context.

4. Assessment design: forced-choice triads

The STA uses a forced-choice triad model. Candidates see 40 blocks of three adjectives and rank each block: most like me (+2), more like me (+1), less like me (-1). Each of the 120 adjectives appears exactly once across the entire assessment.

This design is not arbitrary. Forced-choice formats are built to reduce three critical biases that undermine traditional self-report questionnaires:

  • Acquiescence bias — the tendency to agree with statements regardless of content. Forced ranking makes “agree with everything” impossible.
  • Social desirability — the tendency to present oneself favorably. When all three options are positive, there is no “right answer” to game.
  • Central tendency — the tendency to choose middle options. Triads force differentiation.

Each triad draws from three different meta-traits, with each meta-trait omitted from exactly ten triads. This rotation ensures balanced measurement without fatigue. Completion takes approximately six minutes.

5. Twelve facets of human performance

Each facet is anchored by ten adjectives drawn from the International Personality Item Pool (IPIP) — a public-domain lexical base that underpins decades of peer-reviewed personality research. The adjectives were selected and refined using insights from the 1,700-interview research corpus.

Curious

Thinking

Inquisitive, Analytical, Investigative, Exploratory, Probing, Reflective, Questioning, Studious, Insightful, Curious

Imaginative

Thinking

Creative, Visionary, Inventive, Vivid, Abstract, Conceptual, Dreaming, Innovative, Artistic, Imaginative

Intuitive

Thinking

Instinctive, Discerning, Sensitive, Intuitive, Holistic, Foresighted, Subtle, Prescient, Speculative, Strategic

Driven

Discipline

Motivated, Purposeful, Ambitious, Persistent, Determined, Industrious, Self-starting, Diligent, Competitive, Goal-oriented

Principled

Discipline

Honest, Ethical, Trustworthy, Principled, Reliable, Genuine, Transparent, Moral, Fair, Authentic

Consistent

Discipline

Organized, Structured, Steady, Predictable, Consistent, Systematic, Dependable, Thorough, Regular, Focused

Courageous

Execution

Brave, Bold, Confident, Decisive, Daring, Resilient, Assertive, Fearless, Steadfast, Courageous

Adaptable

Execution

Flexible, Resourceful, Versatile, Calm, Composed, Open-minded, Easy-going, Agile, Adaptable, Patient

Accountable

Execution

Responsible, Dependable, Dutiful, Loyal, Reliable, Committed, Conscientious, Answerable, Accountable, Faithful

Articulate

Communication

Well-spoken, Clear, Coherent, Concise, Verbal, Precise, Fluent, Lucid, Eloquent, Articulate

Influential

Communication

Persuasive, Charismatic, Inspiring, Confident, Assertive, Convincing, Poised, Engaging, Energetic, Motivating

Perceptive

Communication

Perceptive, Observant, Attentive, Attuned, Receptive, Tactful, Astute, Responsive, Empathetic, Aware

6. A within-person profile, expressed in the Big Five

Forced-choice data is inherently ipsative — it reveals which traits are strongest relative to each other within one person, rather than how that person ranks against others. We treat it for exactly what it honestly is: a within-person portrait of disposition, not a cross-candidate ranking instrument.

To express that disposition in a shared, well-validated language, each response is mapped onto the Big Five — the most extensively researched personality framework — and reported on its familiar, population-referenced scale, where 50 is average. This is shown to the candidate as self-insight. Critically, we do not use the questionnaire to rank candidates against one another; neither end of any dimension is “better,” and each is a strength in different roles and situations.

The questionnaire's real value is twofold: it gives a person a clear view of how they see themselves, and it provides the self-report side of Stori's cross-validation against what the interview actually demonstrates. Comparison between candidates is grounded in that demonstrated behavior — not in a self-report score.

7. Two lenses on the same taxonomy

Stori reads a person through two distinct instruments that share one trait language. The questionnaire is a self-report lens, expressed as a Big Five profile. The interview is an evidence lens: the four Stori meta-traits are observed from what a person actually did and built — read from the interview, not inferred from a questionnaire. A personality test cannot measure whether someone communicates well; the interview can.

Because both speak the same twelve-facet language, they can be cross-referenced facet by facet — so a self-reported tendency can be corroborated by a real moment from the interview, and the gap between how someone sees themselves and what they demonstrated becomes its own signal. Each Stori facet carries a primary Big Five loading (and, where the literature supports it, a secondary one); the mapping below shows how the two frameworks relate.

Big Five DimensionStori Meta-Trait LinkContributing Facets
OpennessThinkingCurious, Imaginative, Intuitive
ConscientiousnessDiscipline + ExecutionDriven, Principled, Consistent, Accountable
ExtraversionCommunicationArticulate, Influential
AgreeablenessCommunicationPerceptive (+ secondary loadings)
Emotional StabilityExecutionCourageous, Adaptable

The Stori meta-traits align with their Big Five anchors while preserving the distinctiveness of the Stori framework. In practice they are observed from interview evidence, while the Big Five is measured from self-report — two independent readings of the same facets. Formal construct-validity studies are planned as assessment data scales.

8. Interview intelligence: the evidence layer

A trait score tells you what someone is like. An interview tells you what they have done. Neither is complete alone. The STA framework is designed so that both speak the same language.

When a candidate completes a Stori interview, the full transcript is analyzed by AI to extract structured intelligence across the same facet dimensions measured by the trait assessment. The system identifies timestamped moments where the candidate demonstrates specific facets — a moment of Courageous when they describe a high-stakes decision, evidence of Driven when they talk about pursuing an aggressive target, Accountable when they own a mistake.

These highlights are surfaced as tagged, seekable moments in the interview player. A hiring manager can click “Driven” and watch the 45 seconds where it showed up. This is not a summary or a score — it is the primary evidence, timestamped and accessible.

The unified report cross-references trait scores with interview evidence, showing alignment strength for every facet. Strong alignment means the candidate's self-assessed personality matches their demonstrated behavior. Weak alignment is a signal worth exploring — and now you can explore it with a click.

9. The Narrative Method: evidence over answers

The Narrative Method does not attempt to measure traits directly. Instead, it reconstructs a candidate's experiences through people, decisions, context, and outcomes — then identifies the patterns that emerge across the narrative. Traits are inferred from this evidence, not requested from the candidate.

Rather than asking behavioral questions that invite rehearsed answers, the interview asks candidates to recall real experiences through the relationships, comparisons, and consequences that surround them. This is how human memory works — and it is what makes the conversation difficult to manufacture. Authentic narratives become increasingly difficult to fabricate consistently when relationships, decisions, context, comparisons, and consequences must remain internally coherent throughout an interview.

People and Relationships

  • Tell me about the best manager you've worked for.
  • What made them effective?
  • Who else was on that team?
  • How did your peers respond to them?
  • What did you learn from them that you still use today?

People remember experiences through people. Managers, peers, mentors, and reports act as memory anchors that help candidates recall real events, decisions, and relationships. The objective is not to identify who someone worked for — it is to understand how they experienced, interpreted, and learned from those relationships.

Grounded in our principle that Human Memory Is Story-Based.

Comparative Reflection

  • Rank your last three roles by how much you learned.
  • Which manager had the biggest impact on your career?
  • Which role stretched you the most?

Most rehearsed answers describe what happened. Comparisons require candidates to evaluate trade-offs, make judgments, and explain why one experience mattered more than another. The explanation often reveals more than the ranking itself.

Grounded in our principle that Competencies Should Be Discovered, Not Requested.

Context and Consequence

  • What constraints were you working under?
  • What options were available to you?
  • What happened after the decision was made?
  • If you had chosen differently, what do you think would have happened?

Behavior in isolation is difficult to interpret. The same action can reflect good judgment, poor judgment, courage, or recklessness depending on the context in which it occurred. Understanding people requires understanding the environment in which their decisions were made.

Grounded in our principle that Truth Lives In Context.

Patterns Over Anecdotes

A candidate describes taking ownership of a failing project. Interesting on its own.

The same candidate later describes taking responsibility for a struggling team, leading a difficult initiative, stepping into a leadership vacuum, and rebuilding a broken process.

Now we are no longer evaluating a story. We are observing a pattern.

Anecdotes can be rehearsed. Patterns are far harder to manufacture. The objective is not to collect impressive examples — it is to identify behaviors that repeat across different environments, relationships, and stages of a career.

Grounded in our principle that Performance Is Defined By Patterns.

The AI interviewer has a probe-or-proceed protocol: if an answer lacks specifics, it asks follow-up questions; if the candidate provides concrete detail, it moves on. This ensures every transcript contains genuine behavioral data, not rehearsed highlights.

The result is an interview that is difficult to game — not because any single question is unanswerable by an AI coach, but because a coherent life narrative, consistent across people, context, comparisons, and consequences, is far harder to manufacture than a polished answer.

10. Psychometric reliability

The STA is designed to meet the following reliability benchmarks, consistent with established forced-choice personality instruments:

alpha .84 / omega .88Internal Consistency Target
r = .78 - .82Test-Retest Target
SEM = +/- 3.5 TStandard Error Target

These benchmarks will be refined as assessment data scales and norming populations are established. The AI narrative layer operates under strict constraints: it can polish phrasing but cannot modify any numeric score. All computations are deterministic, logged, and auditable.

11. Operational guardrails

The STA enforces a strict separation between data and narrative:

  • Scores are deterministic. The AI cannot change, round, or reinterpret any numeric output. T-scores, percentiles, and banding are computed by fixed algorithms.
  • Narrative generation follows templates. Each facet-band combination has a pre-written deterministic narrative. The AI polishes language but cannot alter meaning.
  • PII is masked. No personally identifiable information is sent to AI providers for narrative generation.
  • Every run is logged. Timestamps, model versions, and run IDs create a complete audit trail.
  • Fallback is always deterministic. If the AI fails validation twice, the system falls back to raw deterministic text with no AI involvement.

The evidence engine: outcome over assertion

A narrative interview is only as valuable as what is done with it. Before any report is written, Stori converts the conversation into a structured evidence index: every concrete moment is extracted as a record carrying the verbatim quote, the context in which it was said, a specificity score, an outcome-versus-claim flag, and the facets and Big Five dimensions it evidences.

  • Specificity is scored. Names, numbers, dates and concrete results raise an item's weight; unanchored assertions (“I am results-driven”) carry almost none.
  • Outcomes outrank claims. A demonstrated artifact — a system built, a result delivered, a decision made — is weighted far above a self-description, because a body of specific, lived outcomes cannot be rehearsed the way an answer can.
  • Every statement carries a receipt. Each line in the final report traces to a tagged quote and its context, making the read auditable rather than asking the reader to trust a model.
  • The system abstains under thin evidence. Where the interview does not support a conclusion, Stori records low confidence and declines to assert one, rather than inventing a result.

The report is then composed only from this pre-vetted index, leading with the highest-specificity, outcome-backed evidence. Because the evidence is extracted and scored before anything is written, the same significant moments surface consistently, and demonstrated behavior — not interview polish — drives the result.

12. A unified language for understanding people

The Stori Trait Assessment is not another hiring tool bolted onto the same broken process. It is a new measurement system designed from first principles:

  • A forced-choice assessment that reduces the biases of traditional self-report.
  • Two lenses on one shared taxonomy — a self-report Big Five profile from the questionnaire, and Stori meta-traits observed from the interview — that cross-validate each other.
  • A Narrative Method interview designed to surface authentic behavior, not rehearsed performance.
  • Interview highlights that tag behavioral evidence to the same facet language as the trait assessment.
  • An evidence engine that weights demonstrated outcomes over claims, with a receipt — a quote and its context — behind every statement.
  • A unified report that cross-references personality data with demonstrated behavior, so every score has evidence behind it.

The result is a candidate intelligence layer where traits, interviews, and evidence speak the same language — and where hiring decisions are based on measurement, not proxies.

Lexical base from the International Personality Item Pool (IPIP) — public domain. Scoring, dual-output mapping, interview intelligence, and visualization architecture are patent pending. © 2025 Stori Labs / Fraser Hill. All rights reserved.

This assessment is an interpretive tool grounded in established psychometric frameworks. It is not a clinical or diagnostic instrument.