What is Fitness?
“The FitnessGram PACER Test is a multistage aerobic capacity test that progressively gets more difficult as it continues.”
If you went to school in the U.S. in the early 2000s, that sentence probably just triggered a mild stress response. The PACER test—along with mile runs, push-ups, and sit-ups—was part of FitnessGram, the school-based fitness assessment used in many American schools (Plowman & Meredith, 2013).
More broadly, fitness is often measured using a handful of standardized tests or metrics. These can include:
aerobic endurance (PACER or mile run)
muscular strength (push-ups, sit-ups)
flexibility
body composition
physiological estimates like VO₂ max from wearable devices
In theory, this appears to be a well-thought-out model of physical fitness. But researchers and educators have raised several concerns about whether these kinds of tests actually capture something as complex as athletic ability or health. Critics note that these assessments often rely on a small number of standardized drills, which may measure specific physical capacities while overlooking other important aspects of movement like coordination, agility, balance, or sport-specific skills (Young et al., 2021). Scholars have also argued that fitness testing can produce imprecise estimates of physical fitness and may not reflect broader physical abilities or long-term health outcomes (Harris & Cale, 2019; Krochmal et al., 2021; Lloyd et al., 2010).
Even more precise metrics, like VO₂ max, highlight a related issue: some tests estimate fitness from outcomes, while others directly measure underlying physiological processes, such as how the body uses oxygen during exercise. Different methods tell us different things—but none fully capture the whole picture.
In other words, these tests may provide a snapshot of certain components of fitness, but they do not fully capture how a body moves, performs, or improves over time.
Researchers studying morality face a similar challenge. How do you measure something as complex as moral thinking, empathy, and ethical decision-making? What counts as a good “test” of moral fitness—and how do we know it’s actually measuring what we think it is?
Those questions are exactly what leading moral psychology researchers will explore in an upcoming webinar on “Measuring Morality,” hosted by the Consortium on Moral Decision-Making on Monday, March 23rd!
Choosing Fitness with Dr. Daryl Cameron
One theme that emerges in discussions of assessing morality is the difference between capacity and motivation.
Research in moral psychology often asks whether people can empathize, recognize harm, or reason through ethical dilemmas. In other words, it asks questions about moral capacity. But another equally important question is whether people are actually motivated to engage those capacities in the situations where they matter.
Psychologist Daryl Cameron, who organizes the Moral Psychology Research Consortium, often emphasizes this distinction. As he explains:
“There is a lot of focus on the ‘can you’ / capacity question… but another critical aspect is the metaphor of habit and practice, which is more about the ‘will you’ / motivation part.”
Cameron’s research on motivated empathy shows that empathizing with others can be cognitively and emotionally effortful (Cameron et al., 2019). People frequently have the ability to empathize, but they may choose to avoid situations where doing so would require sustained emotional engagement. In that sense, empathy is not just a capacity people possess—it is something people choose to engage or disengage depending on context and motivation.
Seen this way, moral engagement begins to look less like a fixed trait and more like a form of practice. Just as physical fitness depends not only on what our muscles are capable of but also on whether we consistently show up to train them, moral development may depend on whether people choose to engage empathy, reflection, and care when opportunities arise.
Measuring Fitness with Dr. Nick Byrd
A challenge in studying morality is making sure that researchers are actually measuring what they think they are measuring.
Philosopher-scientist Nick Byrd focuses on this foundational issue. His work examines how researchers design experiments to study reasoning and decision-making, and how the structure of those experiments shapes what we learn from them (Byrd et al., 2023). In many studies, researchers present participants with a stimulus—such as a moral dilemma—and record the final judgment participants report. But focusing only on those inputs and outputs can leave an important part of the process hidden.
As Byrd explains:
“Many social scientists pay attention only to inputs and outputs and ignore what happens in between: they show people stimuli and record people’s final decision. If we also observe processes that occur during decision-making (such as what people thought while thinking aloud or what they typed into a chat with another participant), then we can do better than wild speculation about what happened inside the black box of decision-making.”
Process-tracing approaches aim to open that “black box” by observing what happens while people are making decisions. Instead of only recording the final answer, researchers can examine how participants reason through a problem, what information they attend to, and how their thinking unfolds over time.
Seen through the lens of fitness, this distinction matters. Many training programs evaluate success based only on outcomes—weight lost, miles run, or strength gained—without observing how the training actually happened. But without seeing the process, it is difficult to know why a plan worked for some people and not others. Observing the process can reveal things like poor form, overtraining, or misaligned routines that would otherwise remain invisible.
Full Fitness with Me! (Jillian Lee Meyer)
One way I’ve been thinking about my own work in this space is through the idea of a holistic training plan.
If you walk into a gym and only train one muscle group—say arms—you might get stronger in that area, but you wouldn’t be building overall fitness. Good training programs make sure you’re hitting different muscle groups: strength, endurance, mobility, flexibility, recovery, and more. The goal is a balanced training of the whole system.
I’ve been approaching moral measurement in a similar way. Much of moral psychology focuses on one framework lens at a time—whether that’s cultural, survival, conceptual, or spiritual. My work tries to bring several of these traditions together into a single interactive task. Participants will move through a series of everyday moral dilemmas and respond to tradeoffs across four interdisciplinary dimensions: autonomy vs. community, harm sensitivity vs. harm tolerance, utilitarian vs. deontological reasoning, and transcendent vs. situational meaning. Rather than isolating one “moral muscle,” the goal is to see how people navigate multiple moral considerations across situations.
The gameified task is designed to be exploratory and individualized. As participants move through the dilemmas, they begin to see patterns in how they approach different kinds of moral challenges. In that sense, the experience isn’t just about measurement—it’s also about reflection. Just as people experiment with different exercises and training styles to find what works best for their bodies, participants can explore how different moral perspectives resonate with them and challenge their assumptions.
Conclusion
Once in a middle school gym class we did a unit on Dance Dance Revolution, and I absolutely dominated. Suddenly I was the most athletic kid in the room. I remember wondering why that couldn’t be the official fitness test instead of things like running the mile or pull-ups.
Of course, the real issue is that no single test captures something as complex as physical fitness. Running measures endurance. Pull-ups measure upper body strength. DDR apparently measures rhythm and questionable early-2000s dance confidence. Each test reveals something different.
Measuring morality has a similar challenge. Surveys, experiments, and behavioral tasks each capture different pieces of how people think about right and wrong. No single method tells the whole story—but together they can give us a more comprehensive picture.
If you’d like to hear more about measuring morality, join the Consortium on Moral Decision-Making webinar on “Measuring Morality” March 23 from 1:30–5 PM EST!






Looking forward to it! Thanks for priming the pump for Monday's event!