I was thinking the other day about how strange linear programming duality is, and how great it would be if something like it applied in real life. This led me to thinking about how human beings optimize in practice.
I think a huge number of optimization problems at every level from public policy to personal decision-making can be framed as “Maximize A and B” where A and B are two values. Conflict arises when A and B compete and need to be traded off for each other.
The first key insight is:
People almost always implement “maximize A and B” as either “maximize A given B” or “maximize B given A,” and these are NOT the same strategy.
If someone is implementing “maximize A given B,” I’ll say they’re treating A as the objective and B as the constraint. It is important to note that even though the objective A may seem like the thing you’re working hardest on and care the most about because you’re trying to maximize it, the constraint B is actually the value you’re putting more weight on. That’s the second insight:
When you think you’re prioritizing A you might actually be putting most of your energy in guaranteeing a different value B, and optimizing A with only the residual energy that remains.
I have taken a good number of college math classes, and I would roughly divide the pedagogy into three categories, based on what the lectures seem to be optimizing for out of (A) student understanding and (B) material covered.
Classes in category 1 (common among large introductory courses like linear algebra or real analysis) feel as if they’re designed to make sure the median student understands all the material. Examples are copious, homework exercises are comprehensive, and each important argument or tool is practiced deliberately and with spaced repetition. The revealed preference of the lecturer is “Maximize material covered conditioned on student understanding.”
Classes in category 2 (common among upper-level graduate courses) feel as if they’re designed to cram as much of the instructor’s pet topic into a semester as humanly possible. Homework is sparse if it exists, while details, proofs, and entire months of intermediate background material are skipped or brushed under the rug. By the end of the course, the number of students not completely lost is between 0 and the number of instructor’s doctoral students taking the class, inclusive. Usually, the lecturer is both blissfully unaware that nobody is following and perfectly happy to slow down for questions and fill in details when prodded. So they clearly care about student understanding at some level. The problem is that they skip five steps for every one covered, and even generously filling in one or two of those steps helps almost nobody. The revealed preference of the lecturer is “Maximize student understanding conditioned on covering all the material.”
The third – and possibly largest – category is an uncanny middle ground between these two extremes.
Take a Data Structures class I sat through in some previous life. Before the first two midterms, we met all the usual suspects – BSTs, hashtables, suffix arrays – the stuff techbros memorize to pass Google interviews and never touch again. Once or twice the instructor gets a bit of color in his cheeks and does something a little risqué like put a BST inside a hashtable, but on the whole you can follow along by watching the lecture videos at 3x speed with Katy Perry playing in the background.
Well I’m zooming along happily and then right after the second midterm, a switch flips. The instructor has covered all the “Data Structures 101” and has six lectures left to introduce us to the bountiful fruits of modern research. You can almost see him giddily preparing lecture notes the night before and bashfully remarking, “oops, this part needs a whole two lectures on circuit complexity to make sense, teehee.” The fraction of students who are nodding along excitedly in lecture drops from to .
This kind of sharp phase transition has happened to me enough times that I’m kind of numb to the process. I almost know from day one that at some point lectures will suddenly stop making sense, even if I loved the lecturer’s style at the beginning. Classes in category 3 (which tend to be upper-level undergraduate courses or introductory graduate classes) start out “maximize material covered conditioned on student understanding”, and then BAM! experience a sharp transition around the two-thirds mark into “maximize student understanding conditioned on material covered.” In Algebra 1, the lecturer covers rings, fields, and a smattering of Galois theory, and then runs out of patience and suddenly starts preaching the mAgIcAl LaNgUaGe Of ScHeMeS. An exquisite course on Riemann surfaces runs adrift after the second midterm into the dynamics on moduli spaces of nonorientable genus something somethings.
And the sad thing is, I really understand where these lecturers are coming from. After all, a human being can only optimize for one thing at once.
The criminal justice system primarily cares about two things: (A) doing bad things to guilty people, and (B) not doing bad things to innocent people. For almost all of human history, the default optimization protocol was “minimize B given A,” in other words, “guilty until proven innocent.” This kind of thinking is built into us: we would rather wipe out villages of extra innocents than let dangerous criminals or enemies go free. Almost every culture has ancient concepts of original sin or guilt by association. In Chinese literature, the bad guys’ catch phrase is “斩草除根” (when cutting grass, pull out the roots), which is usually used to justify killing the good guy’s children to prevent them from retaliating when they grow up. Murder some ten-year-olds just to be safe. After all, it’s the humble thing to do.
At some momentous inflection point in history, the fundamental legal axiom flipped to “innocent until proven guilty.” The switch between these two optimization protocols, which are superficially doing the same thing, “maximize A and minimize B,” was possibly the most important and unlikely step ever made in the advance of human civilization. “Innocent until proven guilty” affirms the principle that an individual human being has intrinsic value, and that we cannot murder someone just to be safe. What it means, unfortunately, is that we let scumbags and criminals go all the time and this is by design. If you think this was an easy principle for human beings to agree upon, you have not met human beings.
A diagnostic cancer test primarily cares about two things: (A) telling cancer patients they have cancer, and (B) not telling healthy people they have cancer. In a world where technology is not perfect and we have to trade off between some amount of A or some amount of B, the medical profession uses the protocol “minimize B conditioned on A.”
This is not as trivial a choice as it might seem – remember that one probability problem about false positive rates they ask on every standardized test? Even if the false positive rate is only 1%, most diagnoses will be false positives because very few people have cancer, but many people don’t have cancer. But it’s still worth it – it’s much much more important that every early cancer patients is diagnosed correctly than that healthy people don’t get scared and inconvenienced, even if we scare a huge number of such people.
An immigration policy cares about two things: (A) letting good people in, and (B) keeping bad people out. There was a point in the history of the North American continent where the immigration policy was entirely open, ignoring B altogether. This was an unmitigated disaster for the Americans of the time, as European immigrants came in with their guns, germs, and steel and wiped out 90% of the native population. In recent history, it seems like the opposite policy is the case, “maximize A conditioned on B,” but it is a huge source of controversy because we cannot agree on whether which of A or B should be the objective and which should be the constraint. Merely saying both sides care about A and B does nothing to solve the problem.
Here’s a parable about the kind of person I am. A psychologist once gave five-year-old me an infinite marshmallow test: “For each 15 minutes you wait, you get one more marshmallows at the end!” Legend says I’m still waiting in that room.
Of course, the marshmallow test is not mostly about impulse control or delayed gratification, as it’s usually sold. It’s about being willing to sacrifice (A) your own comfort to (B) pass other people’s tests and get their approval. I was always very much willing to play the game “maximize A conditioned on B” – when I could laze out and be comfortable I would, but only after guaranteeing I’d pass the test.
I spent a lot of time as a child being alternatively confused about or contemptuous of other kids who didn’t do as well at tests, especially when they claimed to be “doing their best.” It seemed to me that “doing your best” means passing the test at all costs, and it was glaringly obvious to me that every single other student could do that, especially given how easy the tests were. It took a long time for me to realize that “do your best” actually meant “maximize B conditioned no A” – don’t mutilate yourself to get other people’s approval – and even longer to understand that this might actually be right.
I wanted to conclude this essay by making sweeping generalizations about human psychology, but then I realized that I’m still not confident the phenomenon I’m describing is real. Here are the claims I’d like to make:
- All hard decisions involve tradeoffs between (at least) two competing values.
- Instead of treating competing values as roughly equal in weight, usually human beings will weigh one WAY more than the other, so in practice “maximize A and B” rounds off to “maximize A conditioned on B.”
- Often this is the correct behavior, even if it is surprising. Usually one of A or B will actually be several orders of magnitude more impactful than the other.
- Sometimes this is the incorrect behavior but people still do it because human beings can only optimize one function at a time.
- Many interpersonal conflicts occur because one person is trying to solve “maximize A given B” and the other is trying to solve “maximize B given A” and each thinks they’re solving the same problem as the other person, just in a better way.
- We need to learn to maximize functions like A+B.
Thoughts, examples, counterexamples?