Yascha Mounk
The Good Fight
C. Thi Nguyen on Why Measuring Everything Ruins Everything
Preview
0:00
-1:17:56

C. Thi Nguyen on Why Measuring Everything Ruins Everything

Yascha Mounk and C. Thi Nguyen examine what happens when life becomes a game—and the game stops being fun.

C. Thi Nguyen is a philosophy professor at the University of Utah. His latest book is The Score: How to Stop Playing Someone Else’s Game.

In this week’s conversation, Yascha Mounk and C. Thi Nguyen discuss why metrics both help and harm institutional decision-making, how game design principles can improve classroom learning, and whether some aspects of human life are inherently unmeasurable.

This transcript has been condensed and lightly edited for clarity.


Yascha Mounk: Tim Scanlon, the moral philosopher who’s been on this podcast in the past, really hates debate because he thinks it’s antithetical to philosophy. I wasn’t meaning to start there at all, but it occurs to me that perhaps the way genuine intellectual disagreement and argument relates to the codified form of debate is actually what your book is about.

You’ve been thinking a lot about the role of metrics in the world. There’s a simple argument for metrics and a simple argument against metrics. After having reflected on your work, I’ve come to perhaps the obvious conclusion that both of those are too simple.

We need some metrics in order to be able to have productive processes, to be relatively efficient, to make sure that we actually are achieving something. When you’re trying to bake bread for your neighborhood, whether at the end of this process you have two loaves of bread and most people starve or a hundred loaves of bread and most people can have breakfast makes an obvious difference.

On the other hand, there are all of these different areas in which the metrics we use end up either misdirecting our productive processes or robbing us of the joy that activity should actually entail. In the most straightforward form, in some forms of central planning, you might end up with 100 loaves of bread, but they’re actually inedible. That’s not well captured in the metrics. As long as the 100 loaves of bread come out of the oven, it doesn’t matter that nobody will actually be able to eat them.

In a broader sense, you talk a lot in your work about how you got into philosophy because you love ideas, and then you got caught up in the system of rating research journals and rating philosophy departments. You found yourself trying to do the kind of work that’s going to be prized by those journals and by those departments, even if that’s more narrow than the things you’re actually interested in.

Obviously there’s a controversy around teaching at the middle and high school level. We want to make sure that people learn to read and do math, but if you have overly narrow metrics for them, it might actually teach to the test and not to the real skill that we’re trying to make sure they learn.

So how do we think about this paradox of metrics where we clearly need them? But the moment we adapt to them, there’s this huge danger that we get caught up in being misdirected in our productive efforts or in just losing the joy of what made us engage in philosophy or rock climbing or any number of activities in the first place.

Thi Nguyen: That’s a great place to start. I think you’re right. There are two fantasies that people have when they approach metrics. One fantasy is that the metric captures everything that’s important and we just need to optimize for it. The other fantasy is one that I had lived in in the past, which is that these things are just wholly terrible. They’re evil. They miss everything that’s important. We should just get rid of them and enter some kind of non-metricized utopia.

What I’ve come to think is not only do metrics have a very powerful function and a very powerful cost, but those are inextricably connected. Their good functioning is related to their cost. I started this in part because I was the liaison officer for my philosophy department in which I had to report learning outcomes. It’s very hard to report what you actually care about in philosophy. There’s not a lot of good standardized tests that are accessible and understandable and obviously legible.

Mounk: That’s true at two levels, by the way. What you might really care about is some understanding of deep issues in the world. That’s why philosophers are in the game. Even if you have a relatively utilitarian understanding of what the purpose of studying philosophy as an undergrad is, the purpose is that you become a really logical thinker who spots flaws in arguments and processes. Philosophers actually do end up doing very well if they go into all kinds of traditional areas of business because they have that skill set. Measuring the first thing is impossible, but even measuring that is really, really hard, right?

Nguyen: Yes. I remember one time I was running through this with a bunch of students in a tech ethics class and we were talking about the metrics that were actually being used in the university. They were like graduation speed and grade point average. I think around the time that I pointed out that my incentives were to get better student evaluations and to make my students’ grades go up—which I could just do by making the test easier—one of my students raised their hand and they were like, “So I thought the original problem that we were starting on is that university metrics didn’t capture subtler issues of virtue and community, but now it turns out they don’t even capture learning skills.”


We hope you’re enjoying the podcast! If you’re a paying subscriber, you can set up the premium feed on your favorite podcast app at writing.yaschamounk.com/listen. This will give you ad-free access to the full conversation, plus all full episodes and bonus episodes we have in the works! If you aren’t, you can set up the free, limited version of the feed—or, better still, support the podcast by becoming a subscriber today!

Set Up Podcast

If you have any questions or issues setting up the full podcast feed on a third-party app, please email leonora.barclay@persuasion.community


I think when you start to worry about this stuff, there’s one standard response: these are bad metrics, this particular batch of metrics, but we should just find better metrics. We need to do more refined work to find better metrics. I had this suspicion, and a lot of this book comes from trying to chase down the suspicion, that there were some areas of human life that did not metricize well or are inherently unable to be measured at institutional scale.

I want to point out one thing here. There are two different questions. One is the question of whether some part of life is essentially unquantifiable or essentially untrackable by a very subtle and sensitive science. That’s not the question that I’m really interested in. I’m interested in what the typical measures are that will rise to the top of the attention of a large scale institution.

Mounk: Explain to us, what is the specific thing that metrics should be doing and what is the universe of things that they shouldn’t be trying to do?

Nguyen: I think one of the clearest formulations I got was from the historian of quantification, Theodore Porter. This is the first moment where I felt like I was actually on the track of understanding what was going on.

He puts it this way—he was interested in why there’s this historical tendency of politicians and bureaucrats and administrators to reach for quantitative justification compulsively, even when the metrics were known to be bad. His explanation was something like this: there are two ways of knowing about the world, qualitative and quantitative. They’re good at different things. The problem comes from reaching compulsively for one, even when it’s inappropriate.

Qualitative reasoning is rich and subtle and context sensitive, but it travels badly between people from different contexts because it requires a huge amount of shared context and shared background to understand. Standard example: I’m a professor, I write student evaluations. They’re long written paragraphs that address multiple dimensions and use the language of philosophy and the language of intellectual virtue or whatever to talk about what’s going on in the student essay.

These evaluations won’t be easily comprehensible by distant people, by people in the business school and the CS department, and their evaluations won’t be comprehensible to me. Crucially, they won’t aggregate well, specifically because they’re so multi-dimensional. Each person writing qualitative evaluation is able to quickly on the fly decide what are the dimensions that matter. Do I want to talk about rigor or creativity or care or expressiveness? We don’t have to settle on the same one. So they don’t aggregate. This will be useless to someone in HR that has to hire a student if they’re confronted with every resume with 5,000 written evaluations.

So what Porter says is, quantitative numbers in an institution are designed to combat these problems on multiple fronts. The way they combat them is by pre-designing a shared stable context-invariant kernel—some chunk that is understood approximately the same across the institution or across the world, and we stabilize it. To make it easily comprehensible across many contexts, we have to make it very thin.

Again, I think the obvious example is letter grades. Everyone agrees what an A means, a B means, a C means, approximately. So we can all collect information into the same bucket and it aggregates instantly. Porter’s insight here is that the very thing that makes metrics socially powerful is the design process that removes context and nuance—the thing that makes it travel. He calls this the portability theory.

Mounk: There’s a trade-off between the portability and the richness of detail and capturing. The more you capture, the less portable it is and the less legible it is. So you’re always going to be choosing between those two constraints.

Nguyen: Right, there’s two dimensions under that. One is the less shared context we need, the easier it is to travel, and the other is the more freedom people have to reinterpret an evaluation and decide in the moment what matters, the less it’ll be able to aggregate. So you eliminate both of those. We prefix the dimensions of evaluation and we remove enough nuance to make it travel. This is called the portability theory.

He has this beautiful moment, I think you in particular would like this, where he says something like, “information is a specific thing, a way, a kind of human understanding that’s been prepared to travel to distant strangers and be understood in different contexts”. I find that so profound, and I think that stabs right at the heart of this inescapable tension. The reason metrics have social power is that they’re de-nuanced and that’s not something that we can hope to get around.

Instead, we should expect that there’s this constant trade-off and the things that are most legible at scale will be the least nuanced and we’re gonna have to use those to interact with each other. But also, we have to be constantly aware that they’ve achieved that kind of social centrality precisely because they’re de-nuancing. Then it becomes hard because that’s exactly the terms that everyone can understand instantly. Again, as Porter points out, that was the design goal. We achieved the design goal.

Mounk: Then we understand what the designer is meaning to achieve and also what it leaves out and we can try and compensate for what is lost in other ways.

I have two thoughts that come out of the really rich things you’ve been saying. The first is that you can sometimes see people who hate the idea of metrics by and large, who have adopted systems of information sharing that explicitly don’t rely on metrics fall back into the logic of metrics because of the inherent need for it for those information flows.

I’m thinking here of the fact that there is a kind of codified language in recommendation letters for graduate programs in the United States. Truth be told, I don’t know that codified language well enough, which is probably a problem for my students. But you write a rich letter of recommendation where you explain what’s so great about the student and why they could grow up to be an amazing scholar. And then in the last line, you’re supposed to say, “I recommend this student to you,” “I strongly recommend the student to you,” “I enthusiastically recommend this student to you.”

Basically, we’ve adopted this implicit set of norms where that modifier used there—whether you leave it out, or we say strongly, or we say enthusiastically, or unreservedly, or adamantly, I don’t know what the damn adjectives are—actually has become a metric. That word is actually what carries the most important information. We’re pretending that we’re engaged in this deep act of qualitative description. Actually the thing that matters is the one thing that is basically a secret code for one, two, three, or four, or five. That’s basically just a code for a number that you’re expressing, which is how exceptional or non-exceptional is this student?


Auf deutsch lesen 🇩🇪

Lire en français 🇫🇷


The other thing that I was thinking about is that there’s a basic problem in democratic theory, which is that you need rules and you need a constitution and you also need norms. Some people say—I had a student who argued this once—”Well, if we need all of these norms for the system to survive, if people have to refrain from playing constitutional hardball of various kinds, everything breaks down. Why don’t we just formalize all of these rules?” But that’s just never going to work. There is just not a way to formalize all of the rules. Some norms are going to stay norms without being able to be hardened into rules in that kind of way. There seems to be an analog here where you can probably improve metrics in various ways. There’s definitely better metrics and worse metrics. Some metrics are terribly designed and some metrics are pretty well designed. But we can never fully get around the metrics and the metrics are always going to lose something about the world.

Nguyen: These are two incredibly different and incredibly important questions. Let’s talk about your letter of recommendation example, which is so good first, because I think that’s simpler. The second thing is something I really want to talk to you about, because this is to me the most interesting thing that thinking about this stuff has thrown up about explicit rules.

But let’s go back to your letter. I think it’s such a good example. At first it’s a good example because it shows the twin needs. So we have one need for richness and complexity and the other need for clear, simple things that cross cut and just cut through complexity. This is not an accident. This is the basic nature of communication between limited beings. We want richness, but we also have so little time.

I think this is also a feature. I’ve been talking just about the metric side, but when you think about—since I think so much about scoring systems and games and rankings—one of the things we know is how incredibly motivating clear and simple numbers are as scores and goals. It is much easier to motivate yourself to run if you have a number that you’re trying to hit and you’re trying to make that number go up each time.

But the thing that I’m constantly worried about is—in the running case, I’m worried about just making that number go up forever without stepping back to wonder if that’s the right proxy. I think it’s very good for some people to do that for a while and then reflect: is this still the right target?

There’s something similar that happens with a letter of recommendation. I think there are two different ways to react to the case you have. One way is to treat it as just the right thing. We both have clear, simple communication and we have a rich context and both are accessible. When I do admissions, a lot of the times we use the simple language to do the really first rough pass. Then when we actually have to make decisions, we ignore that and we look at the text and we’re like, okay, what we’re looking for is someone that has these qualities.

So the letter of recommendation system you’re talking about makes both kinds of information available. But in some contexts, you can also imagine institutions that are just gonna isolate the simplified information and pull that out of context. I think a really easy way to see this happening is Rotten Tomatoes. A lot of times you get a movie review that is rich and detailed and complicated. They put the number rating at the bottom, because they don’t want you to look at the top and be like, “It’s this.” They want you to read the text and then see the approximation.

Mounk: It aggregates all of the reviews and it aggregates it by translating a rich 1,500 word review saying this was good, but I was disappointed with that, et cetera. And it just decides, is it good or is it bad?

Nguyen: I think there’s so many different ways of interacting with the systems and some systems make it really easy and make the qualitative stuff really prominent and the balance between them prominent and other systems are really good at extracting the quantitative and suppressing the other rich information and that kind of informational ecosystem is what’s really important.

It’s like what we try to do but also sometimes when you have both you have a choice but sometimes this system has done the pre-filtering for you and the rich qualitative information is very inaccessible. Then that system encourages engaging only with the thin number.

Mounk: That’s very interesting. How should we think about both what makes for a good rather than bad metric and what makes for a system in which the metric has an appropriate role versus a system in which the metric is given too little role or too much role?

Nguyen: That’s a superb question. So it seems really doubtful that any metric is good or bad across the board. It’s highly contextual. Here’s one of my favorite examples. I think a lot of us know that BMI, Body Mass Index, is a terrible way to manage your health if people just try to make their BMI go down.

But that’s not what BMI was built for. BMI was built as a large-scale public health measure that was like a kind of litmus test. If across an entire nation, BMI suddenly jumps five points suddenly, we know that something has happened.

Mounk: Or if it goes down a lot, which is not the problem today in the United States. If BMI suddenly craters, well, perhaps we’ve just discovered GLP-1, but most likely it’s because there’s a famine going on. So it’s really useful at that level.

Nguyen: Exactly. As used in the context of knowing exactly the degree to which it’s a rough measure that’s just a start to an investigation that’s supposed to be the first alarm bell for people to go looking to see if anything’s actually going on, there is no problem. Shifted to another context where people treat it as just the all-important goal that expresses a singular person’s health, terrible measure.

One way to think about what metrics are doing is that they are giving us objectivity for a very specific sense of objectivity. This is the case where we’re really close to thinking about the law. There are lots of senses of the term objective, but one sense of the term objective is that something is repeatable across multiple actors. This is usually, I think, the standard in the law. So a theater reporter calls this legal objectivity. We want different people looking at the same case to apply the same rule the same way.

A standard example for me is what we actually care about in a lot of cases is intellectual and emotional maturity, but that doesn’t have legal objectivity. Eighteen years of age does have legal objectivity, so even if that’s imprecise, we use that.

I think a really common worry you hear right now is that metrics are deeply biased, and that’s true some of the time, but other times I think the deeper problem happens when the metric is perfectly objective but only tracks easily countable features.

A lot of the time, public health thinking concentrates on lifespan and mortality rate and doesn’t put into the equation things like happiness, community and tradition. I’m thinking about everything from decisions during COVID to recommendations about whether you should eat high-fat cheeses.

Again, it’s not to say that those numbers are false—and it’s not like it’s unimportant. But there’s this large-scale systematic biasing where public attention gets shifted toward the things that are very easy to count in a legally objective manner.

Mounk: That’s a great distinction. I recently had Atul Gawande on the podcast, and he was talking about a very similar topic, which is that we are great at extending the lifespan by a few months at the end of people’s lives. But, first, most healthcare spending is on those few months of life. Second, the quality of life that people have in that time tends to be terrible. They are constantly in the hospital, and they are in significant pain. You are drawing out the process of dying, often much longer than patients would want it to be.

That is downstream from the fact that the metric of “is this patient alive?” or “is this patient dead?” is pretty easy to measure, despite some hard cases. The metric of “was it a good death or a bad death?” is much harder. On the whole, did life go better for living those extra eight weeks in pain, half-conscious in the hospital, or not? That is incredibly difficult to assess.

That is a very important consideration. Now, I want to think through a hierarchy of complaints about this. One quick thought about discrimination is that criticism often measures against a baseline of perfection rather than a baseline of available alternatives. I know that is what you had in mind, but there is a line of criticism of the SATs for admissions. I did not do standardized tests growing up. It is not part of European educational culture, so I find them slightly bewildering. All of the studies indicate that, yes, SATs are somewhat discriminatory against certain ethnic groups and certain sociocultural or socioeconomic backgrounds, but they are far less discriminatory than the alternatives.

Those alternatives include things like personal statements, for which you can now have an AI model do the work, or you can pay for the most expensive tutor, and which rely on having parents with the money to send you volunteering in Paraguay so you can generate a compelling story. The question is not whether they are biased in all kinds of ways. The question is whether they are more or less biased than the alternative we are going to get. To criticize them without that comparison is simplistic.

What I actually want to get at is a hierarchy of failure modes of metrics. The first failure mode is the stark one that is incredibly important in politics and economics: measuring how many screws come off the assembly line without measuring whether they are usable screws. That is how the Soviet economy went bust. That is important, but relatively obvious.

The second failure mode involves unintended consequences that are more subtle. One of my critiques of the American grading system, coupled with grade inflation, is that it makes students incredibly risk-averse. In a system where a really good grade pulls up your GPA and a really bad grade pulls it down, you can afford to take some classes where you might underperform because you can also take classes where you overperform. Your overall GPA is not affected that much. But in a system where good students get an A in every course and one C-minus completely wrecks your GPA, you cannot afford to take that course. Nobody intended grade inflation to make it riskier to take hard courses, but that is the impact. That is a more subtle failure mode, but still relatively obvious.

What is really interesting about your work is a third kind of failure mode. The moment there is a metric, it changes the nature of the activity. Liberating ourselves from the metric can make us rediscover the beauty of the activity itself. The example you give is your attitude toward rock climbing and how that changed over time. Perhaps you can talk us through that.

Nguyen: You have your finger right on, I think, the kind of philosophical tension that I find incredibly rich here. Let me first say that you had a question a while back that I did not give an answer to, which is what metrics are systematically good at and systematically bad at. A general characterization we can give is that large-scale metrics are systematically bad. There are two claims. One is that when the demand is that they are highly usable at scale and highly public, they systemically eliminate high-expertise and high-skill judgment. The other is that, given that they have to be applied in a stable way, they are systematically bad at highly variable phenomena.

The philosopher Elizabeth Barnes has this great case. I talk about her in the book. One of the things she says is that health is the kind of thing that we are systematically not going to be able to measure. Her argument does not go via expertise. What her argument goes via is that health is interest-relative. What counts as a healthy knee for an Olympian who needs four years of maximum performance versus someone who wants to walk pain-free for their life are different notions of health, because they are indexed to different interests. That is another whole category of things that metrics are systematically hard to capture.

You asked about climbing, and this is really central for me. Part of the reason I wrote this book was that I ran into a puzzle that I did not know if anyone else was interested in, but I could not stop thinking about. I had written a bunch of stuff about games, and I had written that stuff because I had gotten frustrated with attempts to praise games as a kind of cinema, where all people talked about were the cutscenes, the dialogue, and the graphics. They did not talk about freedom or the quality of action. I really wanted to talk about how it felt, how rich the decisions were, and how interesting the decisions were.

One of the most interesting things I found was this moment from the great German board game designer Reiner Knizia, where he says, “The most important part of my game designer toolbox is the scoring system, because it sets the player’s desires.” I was a game player, and I thought that was exactly right. I was also a philosopher, and I thought, my God, that is so true and so deep, and so weird to put that so starkly.

What a game designer is doing is describing an alternate self with alternate desires and alternate abilities. You open up a rulebook, and it is like, okay, I am collecting sheep, or okay, I am killing my opponent, and you just want it.

Mounk: It is, in a way, a way of exploring a form of agency. What is it like to be somebody facing these constraints and having these goals? That is not what most people think when they start to play Settlers of Catan. But you are experiencing that modality of agency as you play Settlers of Catan. That is your argument, to be clear. I am echoing.

Nguyen: Yes. The way I ended up putting it was that the medium the game designer works in, as an artist, is the medium of agency itself. They are giving you different action abilities and different desires. This may be a too-geeky philosophy-of-art way to put it, but that is what I think is going on.

One of my favorite examples comes from the fact that I rock climb, but I also paint. I was thinking about how, when I am at the bottom of a cliff and I look up at it, my practical experience of the challenges of that cliff changes completely depending on whether my goal is to climb it or to paint it. Those goals completely change my relationship to all the parts of the cliff.

So half of what I had written was about how scoring systems are an essential part of this agency-bending, freeing, liberating practice of play. The other half was about exactly what you were talking about, and what we have been talking about so far: how metrics, as a kind of scoring system, narrow and sharpen our values down to a nub and make us unfree.

Mounk: There is a puzzle here. Why is it that, in one context, scores are what allow us to experience the joy of playing a game, and even deep forms of agency of different kinds, on your philosophical gloss, while in another context the KPIs your boss wants you to follow, even if you think they are stupid, are what make you feel so radically unfree in your job? Or why is it that the imperative to chase a high GPA to get a good job is what means that you do not take true pleasure in the process of your education?

Nguyen: Thinking about it that way really pressed on me is that for a long time I had the theory that scoring systems and institutions were bad because they were hyper-explicit, because instead of allowing room for discretion, we had written down exactly the application conditions. Then it took me about a year to realize that games are a precise counterexample. What enables the flexibility of plunging into alternate agencies in games is exactly that there are scoring systems that are hyper-explicit, that tell you exactly what you get points for. That is what makes it so easy to flow into them.

So I started thinking about my experience with rock climbing. Rock climbing is the thing that saved my soul during graduate school. I was burning out and depressed and working all the time when I started rock climbing. The interesting thing about rock climbing for me is that until I started, I was a very anti-physical person. I was against it. I thought it was dumb. I thought exercise was basically something you did to burn calories to lose weight. That was idiotic, but that is what I thought.

Rock climbing tuned me into something else, and it did so because of the scoring system. It told me to climb harder climbs inside a ranking system. Every rock climb has a community-established difficulty rating, and the internal scoring system is to climb harder routes. Interestingly, to climb harder routes I had to learn how to control and perceive my body. Over the course of that, I learned that movement was beautiful, and that subtle movement was incredibly beautiful, often in a way very similar to the beauty I found in philosophy, this kind of fine-grained subtlety, but also really different. I got that because of the rock-climbing scoring system.

After about five years, that scoring system stopped being useful to me, partially because I am just a mediocre climber with poor athletic ability. When I plateaued, the scoring system was good as long as I could keep improving. But at another stage of my life, as a parent and academic who is not that athletic, it started to destroy the joy of climbing to stay within the same scoring system. So I created an alternate goal set for myself. I started aiming to climb moderate routes as elegantly as I could, and I found again what I loved.

I think this is a microcosm of the complexity of scoring systems. They can profoundly help us by cluing us into an activity. One of the things that helped me understand this most was a book by Tal Brewer called The Retrieval of Ethics. He is a neo-Aristotelian ethicist. The Retrieval of Ethics is about how subtle the value of activities is, how easy it is to miss from the outside what matters, and how the only way to discover it is by doing the activity.

What you do, he argues, is go through a process of performing a rough and inept version of the activity, guided by a simple description. That helps you see more of what it is about. You then revise your sense of what is valuable and how to guide yourself, do it again, and arrive at something more refined. This happens in climbing, in philosophy, in cooking. Over time, you come to see the subtle value. But you need a blunt, simple version at first.

One of the upsides of clear scoring systems is that they are good starting conditions. They get us into the activity. But they are also inflexible and blunt. If you get stuck in a scoring system, you can lose sight of what matters. In that sense, they are similar to metrics. They are useful, clear, blunt, accessible systems. They get us into things quickly and are easy to communicate. But if you treat them inflexibly and get stuck in them, that is when the problems begin to accrue.

Mounk: Earlier we were talking about social problems, problems of incentivizing the right actions, of social allocation, and so on. Now it feels like the exact same tension is emerging from the perspective of engaging in an activity like rock climbing, cooking, knitting, or whatever it is that you have a passion for.

The poles of a complete absence of metrics and the tyranny of metrics end up being similarly undermining. If you have no metric at all and the bakery only produces two loaves of bread, that is not great. If you have the wrong metric and produce a hundred loaves of bread that are inedible, that is bad as well. In the same way, if you start weightlifting for the first time, as I did about nine or ten months ago, and you just lift a little bit every day, you do not get the joy of it. Part of the joy is seeing the progression you are making, pushing yourself. It is very hard to push yourself unless you keep some track of what your weight was ten days ago and whether you can add a few extra reps or jump up ten or twenty pounds for a session. You need the metric in order to experience joy.

But if you become obsessed with the metric when you are no longer progressing, and you are frustrated and no longer enjoying the activity, you also go wrong. You speak movingly about the beauty of climbing as being in tune with movement and elegance. If you forget that what you love is the elegance of movement and focus only on getting up the route, you lose something essential. So you need to cultivate a complex relationship to metrics.

I was also thinking about two archetypes. One is the striver student, whom you and I have taught many times. When I was a graduate student teaching for the first time, I met with my students at the beginning of term to get a sense of who they were. I asked one student how the term was going, and they said, not very well, I feel like I am falling behind on extracurriculars. That was different from the attitude at my university when I was an undergraduate. We did a lot of theater because we enjoyed it. There was no sense of CV building. It may be that doing theater helped people later in their careers, but that was not the mindset.

When I got to Harvard for my PhD, that was very much the mindset of the students. Everything had to count. The sad thing is that they probably did love theater or whatever they were doing, but they were no longer doing it for that reason. They were doing it to fill a line on a CV, and that changed their relationship to the activity. I am sure it robbed them of some of the joy.

On the other hand, we all know the smart, talented person who has no goals. Maybe they come from a rich background, maybe they are just drifting. They play all the time, live in the moment, and then twenty years later they are no longer playing. They are stuck. A complete lack of any attempt to achieve in a measurable, regimented way does not usually produce a flourishing creative spirit in midlife. It often produces someone stuck with the psychology of a sixteen-year-old but the constraints of an adult.

So there is something interesting here. The more obvious points we were making about social systems also apply at the individual level. If I want to enjoy rock climbing, cooking, weightlifting, knitting, or whatever it is, I do want a metric, but I do not want the metric to override why I do the activity in the first place. It is a complicated thing, not to solve, because I do not think it is solvable, but to navigate.

Nguyen: I think we are surfing scoring systems. What you are really talking about, and you put it beautifully, is our complex relationship with clarity and with very intensely clarified expressions. It is another rendition of exactly what we were talking about with letters of recommendation. This is why games are so fascinating to me, not just as a player, but as a philosopher who thinks about human agency and reason.

A lot of this started because, in two different places, I saw the same distinction. When I was in graduate school, my advisor, Barbara Herman, a Kantian ethicist, once said in a graduate seminar, after I said something, “You’re failing to distinguish between a goal and a purpose.” I replied that those were just the same word. She said, “No. When you have your friends over for a night of cards, the goal is to win, and the purpose is to have fun.” You know, deeply, that unless you are a very strange person, if you lose but had a great time in that context, it was a good evening. It was not a wasted evening.

That distinction is extremely important, and games illuminate something fascinating about us. Our purpose is often to have fun, or to relax, but the only way to get there is by hyper-focusing on a goal. I climb to clear my head, and I cannot clear my head by trying to clear my head. I clear my head by climbing. The same is true of fishing. One reason I fish is that staring intensely at the surface of the water, searching for rising trout, creates a meditative stance that I cannot reach directly.

We need very clear goals to get us into mental states we cannot otherwise approach. What games are, in a sense, are pre-packaged mental states. They say, “Do this,” and suddenly you are hyper-focused on balance. “Do this,” and suddenly you are focused on complex logical interplay, or geometric patterns in chess. They are pre-packaged.

That pre-packaged nature is both the strength and the weakness. The strength is that someone else has sculpted it, and we can just take it up. It is described so precisely. The weakness is that someone else made it.

This is the heart of the difference between games and metrics. I thought about this for years before arriving at an answer that finally satisfied me, and it is simple. You cannot easily change metrics. You can easily change games. If you do not like a particular game or scoring system, you can switch to another one, modify it, or house-rule it. Nothing in the social structure of games forces you to play Dungeons and Dragons or forces you to do CrossFit. There is an ecosystem of variation.

This is the most philosophical point for me. If I am playing Mario and I am not very good at it, and I play normally, while you are excellent and speed-run it, I might get 100 points and you get 10. If we ask how to compare those points, the answer is that we do not need to. There is no structural demand for cross-comparison in games.

That is the crucial difference with metrics. The power of metrics is that they enable cross-cutting, stabilized social interaction. To perform that function, they cannot be modified, detached, splintered, or house-ruled. If you do that, they lose their function.

Mounk: It’s interesting. That rings true to me. I was thinking of an example that explains why it rings true, or at least illustrates one way in which it rings true. Often, people start doing something as a hobby. They are good at it and they love it. It is one of the things that gives them the most joy in life. Then, because they are good at it, they realize they can monetize it in some way.

They can make a living off it, and at first they are excited. The thing they most love can become the thing they spend their waking hours doing. They can turn it into a job, and what could be better than that? You are an amateur musician and suddenly you realize you can live off doing music. Or you are really good at video games and suddenly it turns out that if you stream video games, you can make enough money to have a comfortable living. You are ecstatic.

In the process, though, you often encounter people who say that for the first two years they were happy, and then after five or ten years they start saying, “I’ve lost the joy in it.” One way of putting it is that it is no longer a hobby, it is a job. But I think the reason is that this distinction has been lost. Earlier, winning was the goal, but the purpose was to have fun. Now the purpose is to make an income.

So you have to win, because if you do not win, people stop watching your stream. You have shifted the nature of the activity in the process. When I think about some of the things I most enjoy doing, they are things that may have a professional character, but where it does not feel like that shift in metrics has happened. I need to finance running the podcast, but it is not my source of income. So for me, it is like this. I want people to listen to the conversation, and I am proud that we have built a good audience. Ultimately, though, what I care about is that I get an excuse to call you up and say, come have an interesting conversation with me for an hour and a half.

Nguyen: This is deeply true. I love the example of doing it for a living. I had been thinking of different examples, but that is a perfect one. There are two ways it hits. First, in a given game, we do not simply give ourselves over entirely to the local goal. We modulate it. When you are playing with friends, if you are too little competitive, the game is not fun. If you are too hyper-competitive, it is not fun either. When you know what the purpose is, you can regulate your relationship to the goal rather than follow it all the way out. There are gaming contexts where, to have the most fun, you need to be brutal, because that is what the game is designed to demand. There are other games where, if you go all out, you wreck the experience. You can modulate your experience depending on the game.

The second point, which is the really interesting one in the context of jobs, is that we often modulate the goal itself. When we are playing games, because we are not locked into a particular one, we can change the goal as we go. This happens a lot in fishing. Sometimes I fish for easy fish. Sometimes I fish to catch a lot of fish. Sometimes I go fishing to try to catch one big fish. I can modulate the goal depending on my mood, the environment, or what feels fun that day. That is possible because the goal is not linked to something downstream.

Once it becomes a job, you are locked in. You cannot modify the goal or tailor it to yourself. There is a deep sense in which what makes games distinctive lies in the broader ecosystem of play. This is a nineteenth-century insight about aesthetics. Kant argues that aesthetics is the realm where we are free to think about things differently because we are not locked into a practical purpose. Play is the space where we can reconceive what we are doing.

There is a related distinction from R.G. Collingwood, the philosopher of art, between art and craft. Craft is when we know exactly what we are trying to do in advance. We know the specifications and the ingredients, and we are locked into a procedure. Art is when you discover what you are trying to do as you are doing it, and discover what you are thinking as you go. That is exactly what you are pointing to. When you can no longer play with the goal, you can no longer reconfigure the activity.

What you often see in people who manage to sustain an activity over a long period of time is that they change how they do it, so that it tracks their changing selves and their changing sense of what matters.

Mounk: One of the things that really struck me is a student who, I think, watched one of your videos. I do not think it was a student of yours. She realized, in a very typical story, that she was a high-achieving student who worked really hard and was successful at gaming all of the metrics that life threw at her, and that she was unhappy because that was not what she actually wanted to do.

I think she wrote on the background of her phone something like, “Is this the game you want to be playing?” It is a kind of self-admonition, a reminder to always ask that question. It is not a simplistic version of this idea, which is very popular, that says, “Stop playing the game. Reject all the metrics. Just go hang out with your friends on a beach.” That is not the question she is asking.

What she is saying is more interesting. Life is going to consist of playing games, not just fun games on a game night with friends, but games like wanting to achieve at some endeavor. I want to become a professional philosopher, and that will involve publishing in certain places and jumping through a certain number of hoops. Even if it is inevitable that you will have to play certain kinds of games in life, you need to make sure that you do not get so caught up in a particular game and its rules that you end up doing an activity you do not actually want to be doing. So the right move is not to stop playing games. It is to keep reflecting on whether you are still playing the game that you actually want to be playing.

Nguyen: I think one of the reasons the games framing has turned out to be powerful for a lot of people is that, in our hearts, we know that games are voluntary and involve some degree of choice. Framing activities as games highlights that feature. One response to a lot of this material is, “Look, I just can’t. There are metrics. They’re connected to money. I can’t help it. I’ve got to do the thing.” Sometimes that is true. Some incentive systems are unavoidable, at least for practical reasons.

But I think that, a lot of the time, we have more degrees of freedom than we realize, and we forget that. This framing is a reminder that even if you do not have total freedom in the practical world, you usually have some. And you should not give that up accidentally.

Those degrees of freedom can take different forms. Sometimes it is changing the way you do your work. Sometimes it is making a decision about which work you want to be involved in. Sometimes it is changing the degree to which you hold a goal in your heart.

I am an academic. I will always have to report my citation rates and publication rankings to administrators. Those numbers have to pass through me. But I still have a choice about whether I treat them as mere informational signals, or as something I hold in my soul, or even something I ironize, laugh at, or occasionally try to tank just to see. There are degrees of freedom we can exert.

Mounk: I have been trying to think through what some of my economist friends might say in response to this conversation. This applies more at the institutional level than at the individual level. When I go back to the simple example of a bakery, they would say, “We actually have a great system for that, and it is the price system. It is supply and demand.”

If your purpose is to make sure people are well fed, then as a bakery you can say, “We offer really cheap products. They are not amazing, but they are fine and healthy.” You can turn a profit that way. You set your price low and a lot of people buy it. If you want to create the best croissant in the Île-de-France, you price it much higher. Fewer people buy it and you produce fewer croissants, but you can still make a profit because of the higher price.

Economists would claim that this is actually the richest system of incentives we have, and the best way of balancing competing objectives. It is a very straightforward system that we use every day in social life without thinking much about it. You can get a dollar cup of coffee at Dunkin’ Donuts, or you can go to a third-wave or boutique coffee shop, maybe a fourth-wave coffee shop now, and pay eight dollars and fifty cents for a coffee. Those are two very different kinds of coffee, and they allow the people producing them to engage in different forms of agency and different ways of interacting with the world, while still making an income. So how do you think about the price system, and the system of supply and demand, in relation to all of this?

Nguyen: I have an answer to this that I really want to try out on you, because I am not an economist. You know so much more about economics than me, so let me see what you think about this. Actually, let me back up a little bit and take a running start, because I just made a connection I have never made before because of the way you answered this.

One of the first places I started thinking about this was with transparency metrics, and I was really struggling to understand them. I was reading Onora O’Neill’s lectures on trust, and she has this great line where she says that people think transparency increases trust, but it actually decreases trust, because transparency asks experts to explain themselves to non-experts, and they cannot. So they have to make things up, and that becomes deceptive.

I think even worse than that, public transparency works by setting targets that are comprehensible to everybody, which is great because you cannot hide bias and corruption. But it is also terrible insofar as the actually important things in some spaces require a lot of expertise to see. Transparency metrics often end up looking like judging arts funding by box office sales or judging philosophy education by how quickly someone gets a job, because those are very publicly accessible metrics.

What I ended up thinking was that transparency involves another one of these trade-offs. It eliminates bias and corruption, but it also eliminates sensitivity and expertise in the name of accessibility. I just now realized that the response to the economics and price system might be something very similar.

What prices emphasize are values that are highly accessible and cross-cutting. I was thinking about this in relation to a story I tell in the book, which is a fully true story about my wife’s grandparents’ house in San Diego. Her grandfather was a naval engineer, and he built hidden staircases, weird bookshelves that folded into the wall, and secret rooms. They had an enormous food garden, a carefully designed ecosystem of interlinked plants that produced incredibly rich vegetables.

When they died, no one in the family was living in San Diego, so the house was sold to flippers. The flippers removed the bookcases, removed the secret passageways, bulldozed the food forest, and replaced it with concrete and grass. The reason was that all that weirdness would not sell well on the market, because its appeal was peculiar.

That is a simplistic story, but the worry is that what prices emphasize are values that are highly accessible and legible to a very large number of people. They tend to miss very local, specific, and subtle values. In particular, the values of arts in small-scale communities with very developed aesthetic sensibilities are often radically underrepresented by price, because those values are hard to access outside the community.

I do not yet have a perfectly clear way to put this, but there seems to be something deeply similar between the worry about transparency and the worry about prices. Both depend on aggregating what is quickly available, and so they tend to emphasize values that are legible at scale while underrepresenting highly subtle values.

Mounk: First, the price mechanism and supply and demand are a form of metric. Everything we have said about metrics so far applies. They are useful and important in many ways, and we cannot do without them, but they also oversimplify things in important respects. Measuring what is the best movie by overall box office revenue, or even by the profit it made for investors, would be a misunderstanding of what an aesthetic judgment about a meaningful piece of art is. Not that I think the average Oscar winner in recent years is systematically very different from, or necessarily better than, the highest–box office movie. That is a different issue, and communities of experts can go badly wrong.

The second thing I would say is that there are features of the price mechanism that are probably superior to many other metrics we might care about, and this has to do with the croissant example. Because anyone can offer a product to anyone at any price, and it only takes a relatively small subsection of the public to purchase that good for it to be viable, the price system allows people to express different forms of value through a single mechanism.

When I go to Lidl and buy a cheap croissant, I am expressing that what I want is something that tastes fine, gives me a bit of pleasure, and fits within a constrained budget. When I go to a boutique bakery in Brooklyn and buy a croissant for ten times the price, I am expressing that this is a luxury indulgence for me. I am a foodie, and the slight difference in quality matters more to me than the monetary cost. I am willing to invest in that because I am motivated by fine distinctions of taste.

Through the same metric, the price system, you can have these two radically different expressions of value in a way that other metrics, including those created in academia partly out of mistrust of market mechanisms, often fail to do. Instead, you end up with weird quasi-markets. In philosophy, which is my field, you end up with certain kinds of work being valued because of a monolithic scheme of what has become fashionable or ossified in a small number of journals. That can be far more flattening.

Put another way, I think you, whom I regard as one of the most creative and interesting philosophers writing today, are in some respects more valued in the commercial marketplace. Your book is published by Penguin Press, which is also my publisher, a large mainstream press. In contrast, the work you are doing right now would probably struggle to be published in the top philosophy journals in the United States. I do not want to misstate things. You are respected in academic philosophy. But in some ways, the market works better here. You do not have to sell this book to everyone. If 0.1 percent of the U.S. population bought it, it would be a huge commercial success. That allows for a better mechanism of value expression than some of the alternative metrics.

Nguyen: I am not an economist, but I think I agree with two parts of what you said. One is that, as an aggregator, the price system is open to different expressions of value. The other is that it also suffers from the same information-compression problem as metrics, because it is one method of aggregating values that is lossy in all kinds of ways. It is easy to come up with a list of things that seem extremely overvalued on the market and incredibly important things that seem undervalued.

I think I need to think a lot more about exactly the form of that lossiness. I have been thinking more about metrics established inside institutions than about how prices work. On the ground, when you work with metrics, you can see lots of examples where the metric clearly misses what is important. Porter and others have given explanations of how rigid metrics fail in that way.

In the marketplace, I can also see plenty of examples of misses, especially with high-variance, subtle things. The croissant example is interesting. A croissant in Brooklyn may actually be appropriately valued. But then I look at other aesthetic terrains, like indie tabletop role-playing games. People pour their lives into creating completely lovely, astonishingly innovative things, and they might sell a handful of five-dollar PDFs to a tiny audience.

I am not an economist, so I do not have a clean explanation of the difference, but there is a pattern. It is the same high-subtlety, high-variance stuff. It requires a lot of background knowledge that is often concentrated in a very small community, and in those cases prices seem to miss the value.

A croissant in Brooklyn is different because a large portion of the market understands what makes that croissant valuable. When I point to other cases where prices miss, they tend to involve small subcommunities that care deeply about something idiosyncratic, where the value is not particularly public or accessible. I wonder whether, in the croissant case, what is really going on is that, through historical and cultural happenstance, that value is broadly appreciated and therefore priced appropriately, whereas in other cases it is not.

Mounk: That’s a great point. That stands to reason, right? If something is a highly sophisticated, aesthetically appealing thing but sufficiently niche, it may struggle to find a market. That is not something the price system is going to encourage. Then there are questions about whether that should appropriately remain the realm of hobbyists, or whether the state should support certain forms of activity like that.

I have in the background the difference between the art scene in the United States and the art scene in Germany. Germans are often horrified that in America everything is left to the market. Some Americans are horrified that in Europe taxpayers subsidize someone going to the opera at three hundred euros a ticket. The answer is probably somewhere in the middle. The more multiplicity of metrics you can have, the better.

I think there are things about various forms of the arts in the United States, including the performing arts, that market incentives misshape. I also think there is something strange, artificial, and stagnant about the arts in Europe, including theater, which I used to do in Germany, because they are selected by some mix of experts and political interests.

Nguyen: We are back at the same point again. I think it is super interesting that I keep finding the same issue with transparency metrics. In the background, one of the things I found really striking about James Scott’s Seeing Like a State is how it helped organize these thoughts for me about information compression in large-scale institutions. One of the central moves of the book is to alternate examples between global capital markets and centralized communist governments, and to show that the same kinds of problems arise in both cases.

The upside of transparency is that it can break through stuck, dogmatic systems. The downside is that it misses sensitive, local weirdness, and those two things go together. We see the same structure again and again. Even the strongest critics of quantification culture will usually admit that when you have a system that is deeply sexist, and no one wants to admit it or believes how bad it is, you sometimes need a blunt metric. You need something like, “Look, you employ five percent women, and women perform just as well on standardized measures.” That kind of brute measure can cut through corruption and bias and break a system out of a lock.

I think the same is true of markets. In some cases, when there is no market at all, institutions can get trapped in very narrow feedback loops and a single way of thinking. The market can break in and disrupt that. But on the other side, both transparency metrics and markets miss weird, subtle values. The same bluntness that lets them break into a space also causes them to flatten what matters inside it.

One of my persistent worries about transparency is that it is extremely hard, from the outside, to tell the difference between bias and subtle expertise. That problem shows up very clearly in aesthetic domains. One of the ways I started thinking about this years ago was through quantified scoring systems and wine culture, and more generally through art. It is very hard to tell the difference between a group of posers and a group of people who are genuinely responding to something you do not yet see.

For a long time, I thought hard bop jazz, Coltrane and that whole tradition, was just random fast noise, and that people who claimed to love it were hipster poseurs. Then I got it, and I realized I had been wrong the entire time. That experience stuck with me.

So yes, I think there is a fascinating structural similarity here. Large-scale, blunt systems like markets and transparency metrics can break open closed spaces and disrupt entrenched bias, but they do so in a way that simultaneously misses an enormous amount of subtlety. Both effects come together in the same move.

In the rest of this conversation, Yascha and Thi discuss how cooking combines both metrics and creativity, and recommendations for games. This part of the conversation is reserved for paying subscribers…

Listen to this episode with a 7-day free trial

Subscribe to Yascha Mounk to listen to this post and get 7 days of free access to the full post archives.