TMM Founding Discussion: Bridging Theory, Modeling & Measurement

Participants

Sigert Ariens (KU Leuven)
Laura Bringmann (University of Groningen)
Markus Eronen (University of Groningen)
Andrew Heathcote (University of Amsterdam)
Craig Hedge (Aston University)
Maria Robinson (University of Warwick)
Adam Sanborn (University of Warwick)
Francis Tuerlinckx (KU Leuven)
Niels Vanhasbroeck (University of Amsterdam)
Wolf Vanpaemel (KU Leuven)
Kenny Yu (KU Leuven)

Key Themes

The Inescapability of Theory in Measurement

A central theme that emerged was that all measurement is inherently theory-laden—there is no such thing as a truly “theory-neutral” measure. Even researchers who claim to want assumption-free measurement are implicitly making theoretical commitments about cognitive processes; they simply aren’t being explicit about them.

This raises an important implication: rather than pretending we can escape theoretical assumptions, the field would benefit from making these assumptions explicit. A theory, in this sense, is simply a coherent articulation of the assumptions that are already present in any measurement approach.

The Spectrum from Thin to Thick Theories

The discussion surfaced an important distinction between what might be called “thin” and “thick” theories. Thin theories capture basic, robust phenomena—observations like “people respond more slowly when fatigued.” These are local, replicable patterns that provide a foundation for scientific work. Thick theories, by contrast, are elaborate conceptual frameworks that attempt to explain broader patterns.

Both have their place in psychological science. A thin theory can be a perfectly valid starting point, provided it identifies something robust and replicable rather than a one-off surprising finding. The field sometimes conflates these levels, expecting all theoretical work to resemble grand unified frameworks when more modest, well-grounded phenomena might be the appropriate unit of progress.

Questioning the Statistical vs. Cognitive Model Divide

A provocative thread throughout the discussion concerned the commonly drawn distinction between “statistical models” (regression, ANOVA, etc.) and “cognitive” or “theoretical” models. Does this distinction actually hold up under scrutiny? And if so, what does it buy us?

Several perspectives emerged. One view holds that the distinction has merit because cognitive models often contain non-linearities and structural constraints that can generate genuinely surprising predictions—outcomes the researcher couldn’t have anticipated before running the model. Linear statistical models, by contrast, are sufficiently flexible that they rarely surprise us.

A contrasting view emphasizes that even supposedly “atheoretical” statistical models require theoretical reasoning. Why assume independence between observations? Why assume normally distributed errors? These aren’t theory-free choices—they embed assumptions about the underlying psychological processes. The distinction may be less about the models themselves and more about how much effort has gone into articulating and testing the assumptions.

Perhaps most usefully, the difference lies in the stiffness of predictions. Cognitive models, when well-specified, make constrained predictions that can be clearly violated. They often impose ordinal constraints (parameter A must exceed parameter B for the theory to make sense) that create real opportunities for falsification. Statistical models, with their flexibility, can accommodate almost any pattern with enough terms.

The Hidden Life of Predictions

An important question arose: do cognitive models actually generate novel predictions in practice? The honest answer seems to be yes—but this process is largely invisible in published work.

In practice, the prediction-testing cycle often unfolds during peer review, where reviewers push authors to demonstrate that their model can account for phenomena beyond the original scope. Researchers also make informal predictions that guide where to look in data—whether to examine autocorrelations, particular parameter patterns, or specific experimental manipulations. When these predictions succeed, they validate the model’s structure; when they fail, they prompt revision.

The problem is that these prediction stories rarely make it into published papers. The field lacks conventions for documenting this process, creating a misleading impression that cognitive modeling is purely post-hoc curve fitting. Building a culture that values and documents genuine prediction would strengthen the field’s credibility.

The Gap Between Experimental and Individual-Difference Contexts

A recurring tension concerns what happens when models developed in experimental contexts are applied to individual differences. Parameters that have clear theoretical interpretations in controlled experiments may behave quite differently when used to characterize stable traits.

Consider drift rate in evidence accumulation models: in experimental contexts, it represents the rate of evidence accumulation—essentially processing speed. But when used to measure individual differences, drift rate correlates more strongly with accuracy than with reaction time, suggesting it may capture something closer to attention control or executive function. The parameter’s meaning shifts with context.

This creates what might be called an “existential crisis” for individual-differences research using cognitive models. The theoretical grounding that makes these models appealing doesn’t automatically transfer when the goal shifts from explaining experimental effects to characterizing stable individual traits.

Toward Cumulative Science

How should science accumulate given all these challenges? One ideal would be a literature where researchers share not just findings but quantitative posteriors—probability distributions over parameters that could serve as informed priors for subsequent studies. This would create genuine cumulative knowledge, where each study builds quantitatively on previous work.

This vision requires separating different sources of uncertainty: uncertainty about core theoretical parameters versus uncertainty about task-specific measurement properties. When researchers shift tasks or populations, they need explicit models of how task parameters relate to the underlying constructs of interest. Building this infrastructure is technically demanding but would represent a major advance over the current practice of treating each study as essentially independent.

The key insight is that measurement properties of tasks deserve as much careful modeling as the psychological processes of interest. A generative model—one that can actually produce data resembling what we observe—abstracts away from task specifics up to a level where theoretical concepts can sit, enabling genuine accumulation across contexts.

Looking Forward: Workshop Ideas

The discussion concluded with planning for an in-person workshop. Several compelling ideas emerged for what such a gathering might include:

Favorite and Least-Favorite Models: Participants would present a model they find compelling and one they find problematic, explaining what makes each succeed or fail. This “show and tell” format would help researchers from different subfields develop more nuanced appreciation—and appropriate skepticism—about approaches outside their expertise.

Assumption Archaeology: A structured exercise in identifying every assumption that goes into a measurement in a typical experiment, then reasoning about which assumptions are theoretically motivated versus merely auxiliary. This would make explicit the often-hidden theoretical commitments in empirical work.

Model Building Workshops: Training in how to actually construct cognitive models, potentially building on existing workshops in the field. The goal would be demystifying the modeling process for researchers who find it opaque.

Model Validation: Developing clearer frameworks for what it means to validate a computational model and what standards such validation should meet.

Key Takeaways

No measurement is assumption-free. Being explicit about theoretical assumptions is better than pretending they don’t exist.
The theory spectrum matters. Distinguishing between thin (phenomena-level) and thick (framework-level) theories helps clarify what we’re actually testing.
Model distinctions are fuzzy. The line between “statistical” and “cognitive” models is less clear than often assumed. Both require theoretical reasoning; the key difference may be in the stiffness of their predictions.
Prediction is happening, but invisibly. Cognitive models do generate and test predictions, but this process is poorly documented in the literature.
Context shapes meaning. Parameters that have clear interpretations in experimental settings may mean something quite different in individual-difference contexts.
Cumulative science requires infrastructure. Moving beyond isolated studies requires explicit models of how tasks relate to constructs, enabling posteriors to serve as priors.
We need diverse perspectives. Progress requires bringing together mathematical psychologists, empirical researchers, philosophers, and psychometricians—each brings blind spots the others can illuminate.

This summary is based on meeting notes from the November 14, 2025 discussion. For questions or to get involved, contact the TMM network.