"Everything can be made radically elementary." ~Steven Rudich

Category: Essays and Short Stories

Where do your eyes go?

I. Prelude

When my wife first started playing the wonderful action roguelike Hades, she got stuck in Asphodel. Most Hades levels involve dodging arrows, lobbed bombs, melee monsters, and spike traps all whilst hacking and slashing as quickly as possible, but Asphodel adds an extra twist: in this particular charming suburb of the Greek underworld, you need to handle all of the above whilst also trying not to step in lava. Most of the islands in Asphodel are narrower than your dash is far, so it’s hard not to dash straight off solid ground into piping-hot doom.

I gave my wife some pointers about upgrade choices (*cough* Athena dash *cough*) and enemy attack patterns, but most of my advice was marginally helpful at best. She probably died in lava another half-dozen times. One quick trick, however, had an instant and visible effect.

“Stare at yourself.”

Watch your step.

By watching my wife play, I came to realize that she was making one fundamental mistake: her eyes were in the wrong place. Instead of watching her own character Zagreus, she spent most of her time staring at the enemies and trying to react to their movements and attacks.

Hades is almost a bullet hell game: avoiding damage is the name of the game. Eighty percent of the time your eyes need to be honed on Zagreus’s toned protagonist butt to make sure he dodges precisely away from, out of, or straight through enemy attacks. In the meantime, most of Zagreus’s own attacks hit large areas, so tracking enemies with peripheral vision is enough to aim your attacks in the right general direction. Once my wife learned to fix her eyes on Zagreus, she made it through Asphodel in only a few attempts.

This is a post about the general skill of focusing your eyes, and your attention, to the right place. Instead of the standard questions “How do you make good decisions based on what you see?” and “How do you get better at executing those decisions?”, this post focuses on a question further upstream: “Where should your eyes be placed to receive the right information in the first place?”

In Part II, I describe five archetypal video games, which are distinguished in my memory by the different answers to “Where do your eyes go?” I learned from each of them. I derive five general lessons about attention-paying. Part II can be safely skipped by those allergic to video games.

In Part III, I apply these lessons to three specific minigames that folks struggle with in graduate school: research meetings, seminar talks, and paper-reading. In all three cases, there can be an overwhelming amount of information to attend to, and the name of the game is to focus your eyes properly to perceive the most valuable subset.

II. Lessons from Video Games

Me or You?

Hades and Dark Souls are similar games in many respects. Both live in the same general genre of action RPGs, both share the core gameplay loop “kill, die, learn, repeat,” and both are widely acknowledged to be among the best games of all time. Their visible differences are mostly aesthetic: for example, Hades’ storytelling is more lighthearted, Dark Souls’ more nonexistent.

But there is one striking difference between my experiences of these two games: in Hades I stared at myself, and in Dark Souls I stared at the enemy. Why?

One answer is obvious: in Dark Souls, the camera follows you around over your shoulder, so you’re forced to stare at the enemies, while in Hades the isometric camera is centered on your own character. This is good game design because the camera itself gently suggests the right place for your eyes to focus, but it doesn’t really explain why that place is right.

The more interesting answer is that your eyes go where you need the most precise information.

In both games, gameplay centers around reacting to information to avoid enemy attacks, but what precisely you need to react to is completely different. Briefly, you need spatial precision in Hades, and temporal precision in Dark Souls. 

In Hades, an enemy winds up and lobs a big sparkly bomb. The game marks where it’ll land three seconds later as a big red circle. You don’t need to know precisely when the bomb was lobbed and by whom – getting out of the red circle one second early is fine. But you do need to see precisely where it’ll land so you can dash out of the blast zone correctly. When there’s dozens of bombs and projectiles flying across the screen, there might be only a tiny patch of safe ground for you to dash to, and being off by an inch in any direction spells disaster. So you center your vision on yourself and the ground around you, to get the highest level of spatial precision about incoming attacks.

In Dark Souls, a boss winds up and launches a three-hit combo: left swipe, right swipe, pause, lunge. As long as you know precisely when it’s coming, where you’re standing doesn’t matter all that much – the boss’s ultra greatsword hits such a huge area that you won’t be able to dash away in time regardless. Instead, the way to avoid damage is to press the roll button in the right three 0.2-second intervals and enjoy those sweet invincibility frames. The really fun part, though? The boss actually has five different attack patterns, and whether he’s doing this particular one depends on the direction his knees move in the wind-up animation. So you better be staring at the enemy in Dark Souls to react at precisely the right time.

Human eyes have a limited amount of high-precision central vision, so make it count. Don’t spend it where peripheral vision would do just as well.

Present or Future?

Rhythm games have been popular for a long time, so you’ve probably played one of the greats: Guitar Hero, Beat Saber, Osu!, piano. Let’s take Osu! as a prototypical example. The core gameplay is simple: circles appear on the screen to the beat of a song, and to earn the most points, you click them accurately and at the right rhythm. Harder beatmaps have smaller circles that are further apart and more numerous; a one-star beatmap might have you clicking every other second along gentle arcs, while a five-star map for the same song forces your cursor to fly back and forth across the screen eight times a second.

There’s one key indicator that I’m mastering a piece in a rhythm game: my eyes are looking farther ahead. When learning a piano piece for the first time, I start off just staring at and trying to hit the immediate next note. But as I get better at the piece, instead of looking at the very next note I have to play, I can look two or three notes ahead, or even prepare for an upcoming difficulty halfway down the page. My fingers lag way behind the part I’m thinking about.

Exercise: head on over to https://play.typeracer.com/ and play a few races, paying attention to how far ahead you can read compared to what you’re currently typing. I predict that with more practice, you’ll read further and further ahead of your fingers, and your typing will be smoother for it. It’s a quasi-dissociative experience to watch yourself queue up commands for your own body two seconds in advance.

Act on Information

As a weak Starcraft player (and I remain a weak Starcraft player), I went into every game with the same simple plan. Every game, I’d build up my economy to a certain size, and then switch over to producing military units. When my army hit the supply cap, I’d send it flooding towards the enemy base.

At some point, I heard that “Scouting is Good,” so at the beginning of each match I’d waste precious resources and mental energy sending workers to scout out what my enemy was doing. Unfortunately, acquiring information was as far as my understanding of scouting extended. Regardless of what I saw at the enemy base, I’d continue following my one cookie-cutter build order. At very best, if I saw a particularly dangerous army coming my way, I’d react by executing that build order extra urgently. This amounted to the veins in my forehead popping a bit more while nothing tangible changed about my gameplay.

To place your eyes in the right place is to gather the right information, and the point of information-gathering is to improve decision-making. Conversely, the best way to improve at information-gathering is to act on information. If you don’t act on information, not only do you not benefit from gathering it, you do not learn to gather it better. If I went back to learn scouting in Starcraft again, I’d start by building a flowchart of what choices I’d change depending on the information I received.

Filter Aggressively

I was introduced to Dota 2 and spent about four hundred hours on it in the summer of 2020 (of course, this makes me an absolute beginner, so take this part with a healthy pinch of salt). Dota is overwhelming because of the colossal amount of information and systems presented to you – hundreds of heroes and abilities, hundreds of items and item interactions, and multitudes of counterintuitive but fascinating mechanics that must have gone through the “it’s not a bug, it’s a feature” pipeline.

To play Dota is to be constantly buffeted by information. I watch the health bars of the enemy minions to last hit them properly, or I won’t make money. I watch my own minions to deny them from my opponent. I track my gold to buy the items I need as soon as I can afford them. I pay attention to the minimap to make sure nobody from the enemy team is coming around to gank. I watch my team’s health and mana bars, and the enemy team’s, to look for a weak link or opportunity to heal. I can click on the enemy heroes to look at their inventories, figure out who is strong and who is weak, and react accordingly. And this might all be extraneous information: maybe the only important information on the screen is the timer at the top of the screen which says the game started 1 minute and 50 seconds ago.

The clock might be the most important information on this screen.

To understand why the game timer might be the most decision-relevant information out of all of the above, you have to understand a particularly twisted game mechanic in Dota, the jungle monster respawn system. You see, the jungle monsters in Dota each spawn in a rectangular yellow box that you can make visible by holding ALT. The game is coded such that, every minute, a monster respawns if its spawn box is empty. You read that right – the monsters don’t have to die to respawn, they just have to leave the box. Exploiting this respawn mechanic to make many copies of the same monster is called “stacking,” and is a key job for support players: if you attack the jungle monsters about 7 seconds before the minute, they’ll chase you just far enough that a duplicate copy of the monster spawns. This means that near the beginning of the game, a good support player can “stack” two, three, or even four copies of a jungle monster for his teammates to kill later, even if nobody on the entire team is strong enough to fight a single one directly. Fifteen minutes later, leveled up teammates can come back and kill the entire stack for massive amounts of gold and experience.

Stacking is further complicated by an endless litany of factors, but the most interesting is probably this: the enemy can easily disrupt your stacks. The game code only checks if the yellow spawn box is empty, not what’s inside the box. A discerning opponent can foil your whole stacking game-plan just by walking into the appropriate box at the 1:00 mark and standing there for a second. More deviously yet, he might buy an invisible item at the beginning of the game to drop in the box before you even reach the area.

Anyhow, some support players have a job to do every minute, which is either stacking or a closely related thing called “pulling.” When they’re gone doing this job, this leaves the hero they’re supporting vulnerable to a two-on-one. This is where the game timer comes in: the enemy support leaving the lane might be my best opportunity to be aggressive and land an early kill. And so, out of all the fancy information on the screen, I need to be checking the game timer frequently. Some early game strategies hinge on correctly launching all-out attacks at 1:50, and not, say, 1:30.

Treat the world as if it’s out to razzle dazzle you, and your job is to get the sequins out of your eyes. Filter aggressively for the decision-relevant information, which may not be obvious at all. 

Looking Outside the Game

There is a certain class of video games that are difficult, if not impossible, to play without a wiki or spreadsheet open on a second monitor. This can be due to poor game design, but just as often it’s the way the game is meant to be played for good reason, and it’s the mark of an inflexible mind to refuse to look outside the game when this is necessary.

Consider Kerbal Space Program. You can learn the basics by playing through the tutorials, and can have plenty of fun just exploring the systems. But unless you’re a literal rocket scientist you’ll miss many of the deep secrets to be learned through this game. There’s no way you’ll come up with the optimal gravity turn trajectory yourself. If you don’t do a little Googling or a lot of trial-and-error, your attempts at aerobraking will probably devolve into unintentional lithobraking. You’ll have a nightmarish time building a spaceplane without knowing the correct relationship between center of mass, center of lift, and center of thrust, and it’s highly unlikely you’ll figure out that a rocket is a machine that imparts a constant amount of momentum instead of a constant amount of energy, or that you can exploit this fact by accelerating at periapsis. And god forbid you try to eyeball the perfect transfer window in the next in-game decade to make four sequential gravity assists like Voyager II.

Some games are meant to be played on multiple monitors.

The right place to look can be outside the game entirely. Whether it’s looking up guides or wikis, plugging in numbers to a spreadsheet to calculate optimal item builds, or using online tools to find the best transfer windows between planets, these can be the right place to put your eyes instead of on the game window itself.

III. Applications to Research

In this last part of the post, I apply the principles above to three core minigames in academic mathematics: talks, papers, and meetings. For each of these minigames, we’ll try to figure out the best places for our eyes to go, informed by the following questions.

  1. Should I focus on myself or the other person? 
  2. How far into the future should I be looking?
  3. How can I act on the information I receive?
  4. Out of all the information being thrown at me, what is decision-relevant?
  5. Might the best way to get better at the game be outside the game itself?


When giving a talk, self-consciousness is akin to keeping your eyes on yourself. Telling yourself not to be self-conscious is about as useful as trying not to think about the polar bear; negative effort rarely helps. Move your eyes elsewhere: to the future and to other people. Rehearse your presentation and anticipate the most difficult parts to explain. Pay attention to your audience and actually look at them. See if you can figure out who is engaged and who is daydreaming. Find one or two audience members with lively facial expressions, study them, and act on that information – their furrowed brows will tell you whether you’re going too fast.

When listening to a talk, realize that there will typically be more information than any one audience member can digest. Sometimes this is the fault of the speaker, but just as often, this information overload is by design and functions similarly to price segmentation. Noga Alon recently joked to me, “Going to a talk is difficult for everyone because nobody understands the whole thing, but it’s especially difficult for undergraduates because they still expect to.” Information at a variety of levels of abstraction are presented in the same talk, so that audience members with widely varying backgrounds can all get something from it. An undergraduate student might understand only the opening slide, and a graduate student the first ten minutes, while that one inquisitive faculty member will be the only person who understands the cryptic throwaway remarks about connections to class field theory at the very end. Filter aggressively for the parts of the talk aimed at you in particular. 

Remember that the topic of the talk itself is rarely as interesting as the background material mentioned in passing in the first ten minutes. These classics – the core theorems and examples mentioned time and time again, the simplest proof techniques that appear over and over – are the real gems if you don’t know them already. Sometimes you can learn a whole new subfield by sitting in on a number of talks in the area and only listening to the first ten minutes of each. Bring something discreet to keep yourself occupied for the other fifty.

Remember also: information you never act on is useless information. A couple years back, I was taking a nap in a computer science lecture about a variation on an Important Old Theorem. As I was nodding off, I noticed to my surprise that my adviser, sitting nearby, was quite engaged with the talk. I was very curious what caught his attention, and he enlightened me on our walk back to the math department: instead of listening to the talk, he’d spent most of the hour looking for a better proof of the Important Old Theorem introduced in the first five minutes. From this, I learned that the most important information in a talk might be an unsolved problem, because it is certainly the easiest information to act on.

My PhD adviser after a seminar talk.

This conversation with my adviser had a great effect on me, and every so often I practice this perspective by going to a talk for the sole purpose of hearing new problems. As soon as I hear an interesting problem, I zone out and try to solve it immediately. Anecdotally, it worked a couple times.


Most of this section was already covered in Of Math and Memory (part I, part II, part III), but I’ll reiterate here the relevant bits. Mathematical proofs are rarely meant to be written or read linearly. Instead, they are ideally arranged as a collection of outlines of increasing detail: a five-word title, a paragraph-long abstract, two pages of introduction, a four page technical outline, and only then the complete 20-page proof. Each outline is higher-resolution than the next, giving readers the chance to pick the level of understanding that suits their needs.

This organization is meant to solve one basic difficulty: it’s very hard to follow a proof without knowing where it’s going. Without reading the proof outline, you can’t tell which of the ten lemmas are boilerplate and which are critical innovations. Without running through a calculation at a high level, there’s no way to know which of alpha, n, epsilon, x, and y are important to track, and which are throwaway error terms. Reading through a paper line-by-line without knowing where you’re going, it’s easy to get lost in the weeds and dragged down endless rabbit holes – black boxes from previous papers to unpack, open problems to mull over, lightly explained computations which might contain typos – and while these rabbit holes might be worth exploring, you would do well to map them all out before picking one to dive into. 

When reading a paper, orient your eyes towards the future whenever possible, like reading several words ahead in a game of TypeRacer. Scan the paper at a high level to understand the big picture, then read all the theorem and lemma statements to see how they fit together, and only then decide which weeds to get into. Only check a difficult computation once you already know what its payoff will be.


In research, the vast majority of your time is spent in one of two ways: bashing your head against a wall alone, or bashing your head against a wall with company. These two activities can be more or less traded freely for each other to suit your level of introversion, and I’ve found that I usually prefer meeting with others to work on research together over working alone.

One of the pitfalls of working with others, especially when you are young and underconfident, is that you can naturally slide into the role Richard Hamming calls a “sound absorber”:

For myself I find it desirable to talk to other people; but a session of brainstorming is seldom worthwhile. I do go in to strictly talk to somebody and say, “Look, I think there has to be something here. Here’s what I think I see …” and then begin talking back and forth. But you want to pick capable people. To use another analogy, you know the idea called the `critical mass.’ If you have enough stuff you have critical mass. There is also the idea I used to call `sound absorbers’. When you get too many sound absorbers, you give out an idea and they merely say, “Yes, yes, yes.” What you want to do is get that critical mass in action; “Yes, that reminds me of so and so,” or, “Have you thought about that or this?” When you talk to other people, you want to get rid of those sound absorbers who are nice people but merely say, “Oh yes,” and to find those who will stimulate you right back. 

~ Richard Hamming, You and Your Research

On top of underconfidence, I suspect that the chief mistake “sound absorbers” make is that they have the wrong idea about where their eyes should be in a research meeting. I think a “sound absorber” is completely fixated on personally solving the problem. Not having generated interesting ideas for solving the problem, they contribute nothing at all. Again, this mistake is akin to being too self-conscious, and keeping your eyes on yourself when there is no useful information to be had there.

Personally solving the problem is certainly a great outcome for a research meeting, but it’s by no means the only goal. First of all, there’s a world of difference between personally solving the problem and getting the problem solved. If your collaborators are any good, they are just as likely to come up with the next crucial idea as you are, so truly optimizing for getting the problem solved involves spending a substantial fraction of your time supporting the thought processes of others. Repeat their thoughts back to them, write down and check their calculations, draw a nice picture or analogy for what they’re doing on the blackboard, project your enthusiasm for their insights. You can do all this without generating a single original thought, and still help with getting the problem solved.

Getting the problem solved is a higher value than personally solving the problem, but higher still is the value of improving at problem-solving in general, and this holds doubly if you’re still performing your Gravity Turn. Especially when meeting with your PhD adviser or another senior mentor, focus a substantial minority of your attention on modelling the thought processes of your mentor. Figure out and note down what examples and lemmas they pull out of their toolbox time and time again, what calculations and simplifications they do instinctively, and how they react when stuck on a problem. Learn the particularities of how they perform literature searches, who they ask for help about what, and how they decide if and when to give up. None of these decisions are arbitrary; they form an embodied model of the terrain in your field. Watching the other person can often be a better use of your time than staring blankly at the problem.

In hurried conclusion, research is like a simpler version of Dota: we are bombarded by information on all fronts, most of which we don’t even notice, and tasked to make complicated, heavy-tailed decisions. A fundamental skill in any such game is orienting your eyes – literally and figuratively – at the most valuable and decision-relevant information. Reacting to and executing on this information comes later, but you can never act properly if you don’t even see what you need to do.

Gravity Turn

[The first in a sequence of retrospective essays on my five years in math graduate school.]

My favorite analogy for graduate school is the gravity turn: the maneuver a rocket performs to get from the launch pad to orbit. I like to imagine a first-year graduate student as a Falcon X rocket, newly-constructed and tasked with delivering a six-ton payload into low Earth orbit. 

Picture this: you begin graduate school, fresh as a rocket arriving at Cape Canaveral and bubbling with excitement for your maiden voyage. Your PhD adviser, on the other hand, is the Hubble Space Telescope. Let’s call her Dr. Hubble (not to be confused with the astronomer of the same name). Dr. Hubble is ostensibly the ideal guide for your first orbit insertion. After all, she is famously good at staying in orbit – she’s been up there since 1990. 

But problems quickly arise as you probe Dr. Hubble for advice on how to approach the launch. Namely:

  1. She left Earth more than thirty years ago, and space technology has since been completely revolutionized. 
  2. She states all advice at an extremely high level with birds-eye-view detachment, observing, as she is, from a vantage point a thousand miles overhead. 
  3. Most fatally, the Hubble Space Telescope vessel does not include the lower-stage rockets that brought her into space. In fact, she doesn’t include large engines of any kind. Her thirty years of experience free-falling in orbit will do you very little good until you break out of the stratosphere.

The problem is even worse than this, however. It is not that Dr. Hubble, despite her best intentions, gives outdated advice. It is not even that Dr. Hubble cannot consciously articulate all the illegible skills she’s reflexively performing to stay in orbit. The problem is that even if you could perfectly imitate what Dr. Hubble is doing right now, you would likely still crash and burn.

What I didn’t understand going into graduate school is that academic mathematicians are often working in a state akin to the free-fall of orbit. The Hubble Space Telescope remains in orbit around Earth because it travels horizontally so quickly that, even as it’s continuously accelerating towards the Earth, it continually misses. The laws of physics have arranged it so that it is not possible – barring deliberate sabotage – for her to fall back into a sub-orbital trajectory.

Similarly, a successful research professor is embedded in an intricate system that, as surely as Newton’s laws, keeps her in a state of steadily producing new research. Many of her ground-breaking papers are not one-off productions – they produce sequels, variants, and interdisciplinary applications year after year. She has cultivated dozens of long-time collaborators of the highest level who freely share ideas and research directions, and has the reputation to find more at will. She attends conferences every other month that keep her updated on the leading edge of the field. Every year her research group grows, as if by clockwork, adding a couple graduate students and postdocs to whom she can delegate projects with only the gentlest supervision. As a result, the careers of many other people depend on Dr. Hubble to continue producing research at a steady rate. Every incentive is aligned for objects in motion to stay in motion, and it would take deliberate sabotage to bring Dr. Hubble out of her successful research trajectory.

This is not to say that academic researchers all start cruising in free-fall after they leave graduate school or make tenure. It is perfectly normal for a spaceship that reaches orbit to proceed onto its next adventure after some rest, continuing on to visit another planet or leave the solar system altogether. The best researchers I know are similarly courageous, taking on more responsibilities and pushing past their comfort zones time and time again. I’m merely remarking that once one reaches a certain horizontal velocity in space, it is actively hard to fall back down from the sky.

Contrast this to the sorry state of Dr. Hubble’s new graduate student stranded on the launchpad under the blistering Florida sun. He has no prior publications producing continuous dividends, no access to brilliant and dependable collaborators, no knowledge or intuition about what problems are within reach, no students to farm ideas out to, and no reputation to trade off for any of the above. Above all, nobody else really depends on him, so his motivation to succeed is mainly shallow self-interest. This is particularly hard on him, as there are many things he would do in a heartbeat for someone else that he can’t work up the energy to do for himself. The singular advantage he has over his adviser is youth – a finite amount of extra fuel that he must burn quickly and judiciously like a first-stage booster rocket in order to reach her altitude.

There is a paradox inherent to orbit insertion: rockets launch straight up, while orbit is all horizontal. For some diabolical reason, a spaceship must spend its initial phase accelerating in a direction completely perpendicular to its desired velocity. That reason is called the atmosphere: in order to avoid continuously paying the toll of air resistance, a rocket spends a period of time flying straight up. But any additional vertical motion past the upper atmosphere is wasted motion, so at some point (and sooner is better than later), the rocket starts turning smoothly towards the horizon and accelerating towards orbit. Thus is birthed the smooth quasi-hyperbolic curve known as the gravity turn, the ideal orbit insertion trajectory.

How is graduate school like a gravity turn? For one, it is an enormous error in a gravity turn to try to directly imitate the velocity vector of a ship in space while still at sea level. Regardless of its power, a rocket launched horizontally will quickly nose-dive into the Atlantic. Similarly, a student can rarely succeed in graduate school by solely imitating the activities of established researchers. The student must engage instead in certain activities, such as studying fundamental background material and actively networking, that are mostly orthogonal to a research professor’s day-to-day.

For another, it is an equally enormous error to dip your nose cone towards the horizon too late, and spend too much fuel accelerating vertically. Once you break the atmosphere, all excess vertical velocity is wasted motion. At some point during graduate school, the student must transition away from activities that only grant temporary altitude. Becoming knowledgeable gets you to a great place to start doing research at a higher level. But spending too much time studying without attempting original research renders you a mere encyclopedia. Taking classes, networking, applying for fellowships, and going to student summer schools all follow the same principle – there is an appropriate amount to do, past which they increasingly approach wasted motion as far as getting into orbit is concerned. (Of course, if you enjoy any given activity intrinsically, then by all means continue to do it as much as you want.)

An additional consideration is that, while the gravity turn is the most technically fuel-efficient method of orbit insertion, not everyone who arrived in orbit took this most efficient path. In every department there are superstar students who were outfitted with nuclear reactors in place of conventional rocketry, and these folks get to space by pointing their nose cones in any old direction and blasting off. If you’re such a person, just blast off; calculating the optimum gravity turn curve might be the real wasted motion. Also, many of your professors will likely have fallen in this rarefied category in their own graduate school experience, so their advice on efficient gravity turns will be entirely theoretical in nature.

It is worth remarking though, that even a nuclear rocket might learn something useful from practicing the gravity turn maneuver. Just because you have an easy time leaving Earth’s atmosphere and have no need of finesse, doesn’t mean your travels won’t land you on Venus someday. And breaching that monstrous atmosphere will take every ounce of efficiency you can muster.

A natural question remains: if many graduate school activities only count for temporary vertical altitude, what constitutes horizontal motion that is useful for permanently entering orbit? Examples include:

  1. Producing good research, as every nice paper you write continues to pay dividends year after year. 
  2. Becoming an attractive collaborator, partly by acquiring enough reputation that people are willing to work with you, and partly by being productive and pleasant enough that they stick around. 
  3. Learning to support the research of others, as much of your potential impact lies not in personal contribution, but in the network effects accumulated from being a positive community member.

This last skill begins at the very start of graduate school, where the biggest immediate impact you can likely have is facilitating your adviser’s and other collaborators’ research.

I will close by reminding the reader that the gravity turn maneuver is not a truth delivered from up high that holds for all time across all circumstances, but an engineered solution to an inelegant and ever-varying practical problem. Launching from a moon base, for example, does not require a gravity turn at all because the moon has no atmosphere to fight against. There, you could comfortably reach orbit by blasting off almost horizontally from the lip of a crater. Only you know exactly where you’re launching from and the thrust-to-weight ratio of your vessel. Adjust your gravity turn accordingly.

I hope it is a comforting thought that free-fall is possible: that one day through all the striving of graduate school you may reach a position where the system propels you forward in your research and all you have to do is sit back and relax. I hope that on that day you continue to strive anyway.

Two Explorations

Much has been written about the fundamental opposition between explore and exploit, chaos and order, yin and yang. In this post I make two observations about the psychology of this opposition.

In the first part, I challenge the metaphor of the comfort zone: a slowly-changing region in activity-space where everything inside is comfortable and everything outside induces anxiety. The point is that anxiety depends not only on the spookiness of the activity itself, but on one’s proximity to safety. I am afraid because it is dark; I am terrified because the light switch is all the way down the hall. In particular, it is possible to reduce anxiety by bringing your comfort zone with you in the form of a safety behavior or a trusted companion. This serves as an alternative to Comfort Zone Expansion.

In the second part, I note that explore and exploit are often embodied in the human personality as two competing subagents. In almost everyone I’ve met, one of these subagents dominates the other. I tell four typical stories of this imbalance, and then suggest that something better is possible. This is perhaps the central example of integrating disagreeing subagents.

Part 1: Distance to Safety

1. Mother and Child

(The following is my retelling of an old story, dating back probably to at least Jean Piaget. I first encountered a variant of this story in The Monkey Wars, where identical behavior was observed in primates.)

A mother brings her child, a girl of perhaps three or four years of age, to an empty playground at the park. Autumn has progressed to the stage where leaves fall in twos and threes. The girl peeks out of the folds of her mother’s coat, staring now at the swing set, now at the neon tube slide. With a nudge, the mother pushes her daughter onto the mulch and gives her an encouraging nod. After glancing back to her mother several times, the girl slowly approaches the playground and begins to play.

Soon, she is clambering about, but periodically looks back at the bench where her mother is sitting. Each time their eyes meet, the girl waves, pig-tails dancing, and the mother waves back. The girl then returns to free play.

At some point, the mother briefly leaves her post to chat with an acquaintance. The girl, pepping herself up to go down the slide for the first time, looks back, finds her mother gone, and panics. She curls up into a ball at the top of the playground and fights back tears, trying to make herself as small as possible. The thought of going down the slide vanishes from her mind.

2. The Difficulty of Dark Souls

Dark Souls (III) is one of those difficult video games which spawns endless internet debates about whether every game needs an easy mode. (“I have enough challenge in my day job, I just want to relax when I play a game,” says the one. “Git gud scrub,” says the other.) Dark Souls is not in fact mechanically difficult, or at least not exceptionally so. Just off the top of my head, Celeste, XCOM 2, and FTL were all significantly more difficult mechanically for me than Dark Souls. And yet I do believe that Dark Souls was the hardest game I ever played.

Call me a coward, but the main difficulty of Dark Souls for me is the psychological difficulty of its dominant aesthetic: loneliness and nihilism. You, the protagonist, are born “Nameless, accursed Undead, unfit even to be cinders. And so it is… that ash seeketh embers.” All the friendlies in the world have been sitting in the same positions for eons before you died, and will be sitting there long after you return to ash. The brief encounters with friendly NPCs in the wild are ephemeral: each person passes in and out of your life, and you are as likely to meet them again as to stumble upon their ashes. The entire rest of the world is Out to Get You: treasure chests that turn out to be mimics, halberdiers hide between crates, face-eaters hang from ceilings. Your job is to save a world that doesn’t care to be saved.

I could never play Dark Souls for more than a couple hours at a time, and found myself constantly teleporting back to base. I told myself I came home to level up, to upgrade gear, and to purchase items, but the truth of the matter is: I kept coming home just to hear a friendly human voice again.

Code Vein is one of many Dark Souls knockoffs, notorious mainly for the addition of anime waifus to the familiar formula. For all its abysmal enemy variety and boring level design, I loved playing Code Vein for one simple reason: I can bring a companion. Bringing a companion on my journey solved all my anxiety in-game. It didn’t matter so much that my digital companion got stuck in corners and committed suicide in boss fights and repeated the same half-dozen canned lines. The feeling that someone has my back let me enjoy a Souls-like world for entire afternoons at a time, and I almost never felt the need to teleport back home.


Human beings explore too little. One common solution to this problem is Comfort Zone Expansion, gently straying beyond the boundary of your comfort zone to notice things out there are not as scary as you thought. This can be a fine solution, but is inadequate for situations where you are required to immediately leave your comfort zone far behind for long periods of time.

Perhaps you travel internationally for the first time. Perhaps you attend a self-improvement workshop with cult-ish vibes in a remote location. Perhaps you try to prove the Riemann hypothesis, and are beset on all sides by the diabolical malice inherent in the primes.

If so, remember that exploration anxiety is a function of how far (you feel like) you are from safety.

The little girl comfortably clambers around the playground when her mother is nearby. When the mother disappears, the girl curls up in fear and is incapable of sliding down the exact same slide she went down a minute ago. The slide didn’t change, the girl’s distance to safety did. Playing Dark Souls, I teleport to base after finding each new bonfire (checkpoint). Playing Code Vein, a companion follows me around and the psychological need return home disappears. Every time a married person gives an acceptance speech, they thank their spouse for being “their rock.” This is a rather unflattering term for “unwavering center of my comfort zone.”

What are concrete applications of this principle?

  1. One purpose of collaboration is for each collaborator to serve as a mobile comfort zone for the others. This might explain why most successful startups are founded by a small group and not an individual or a larger group. The comfort zone effect hits rapidly diminishing returns past one trusted collaborator. In this lens, the purpose of open communication in collaboration is simply to feel psychologically close to the other person.
  2. The optimal solution to comfort zone expansion may be planting a small number of well-spaced “bases of operation.” Instead of continuously expanding one connected chunk of activity-space, plant comfort flags on the points of an ε-net. Comfort zones, like lighthouses and highway truck stops, cover more space if you place them far apart. If anxiety is a major limiting factor for you, consider focusing your energy on a small number of extremely different activities so that the comfort zones that radiate out of each one together cover as much space as possible.

Part 2: Explore Versus Exploit

The previous part upgrades the usual model for comfort zone expansion, taking for granted the value of exploration.

This part turns to a new topic altogether: the internal conflict between the drive to explore and the drive to exploit.

Hereafter I assume a multi-agent model of the mind, and refer to two common subagents: the “explore” subagent which tends towards freedom, creativity, contrarianism, and chaos, and the “exploit” subagent which tends towards structure, discipline, lawfulness, and order. Whether to interpret these subagents as full-blown independent subpersonalities or merely conflicting desires within a single mind is entirely up to you, and should not affect the meaning of this post.

I begin with four archetypal stories to illustrate varying levels of imbalance between “explore” and “exploit.” These are amalgams of real stories from my own life and others’.

1. The Unmoored

The Unmoored was stifled as a girl, surveilled at every quarter-hour by parents, teachers, tutors, and coaches. Every day from dawn to dusk was packed with activity that would gentle her mind and ennoble her condition. As her fingers marched from piano to textbook to tennis racket, her mind danced farther and farther away into Wonderland.

But even her dreams are turned against her. After she shows a passing aptitude for story-telling, her parents sign her up for creative writing classes and poetry jams. The beloved characters in her flights of fancy are clamped into straightjackets and paraded onto stage to be judged by panels of condescending curmudgeons.

When she finally escapes her shackles, there is no temperance to her wildness, no second-guessing, no backward glances. She drops out of pre-med, then out of college, then out of polite society altogether. The world is joy and light and one-way airfares.

Many years later, she snaps awake in the lap of a street urchin in the outskirts of Ulaan Bataar. She’d had a strange dream: she was back in front of the piano playing Mendelssohn, and she liked that feeling of rote and mindless obedience to the sheet music. She shakes off that absurd notion and takes another hit.

2. The Dreamer

Like the Unmoored, the Dreamer yearns to be free, to one day live fully unconstrained to pursue his creative vision. He has not quite decided whether he wants to write screenplays, musicals, or novels; his creative side finds the even the idea of making such a decision fettering. In the meantime, he works an programming job that makes good money.

Unlike the Unmoored, the Dreamer knows the instrumental value of discipline and constraint. Under the flickering lamplight, he writes short stories with a tomato timer by his side, following a classic book of writing prompts. The more exotic the prompt, the more alive he feels bending around it.

At times, the Dreamer worries that his day job is changing him. He cannot help but take joy in turning on his monitors in the morning, in passing code review on the first try, in following procedures to the letter to build something that comes alive before his very eyes.

When he notices himself feeling this way, the Dreamer clamps down on this joy and reminds himself, “I hate this boring, technical job. I’m only working it to get my art off the ground. One day, I will finally be free of this drudgery.”

The tomato timer goes off and he goes back to writing. He never considers that part of his soul might not want to be free.

3. The Magpie

Unlike the Unmoored and the Dreamer, the Magpie is primarily motivated by the joy and comfort of the known. She delights in decorating and redecorating her cozy little apartment, in organizing her books and plants in tidy rows, and in folding origami animals that she can leave around all the rooms, each a permanent addition to her little family.

One day, she hopes to add a few extra bedrooms and a full bathtub to that apartment, and a partner and children to that family. She is the Dreamer’s coworker, but unlike him she hopes to stay at this programming job for the rest of her career. She imagines inviting direct reports into her well-lit corner office to admire her cactus collection and ask for her advice on the operating system she helped architect.

The Magpie understands the instrumental value of exploration and creativity, but fears it. Every year, she takes a vacation to a scary new place to challenge herself, but more importantly to bring back souvenirs, pictures, and memories, to better decorate her place. At work, she pushes herself to learn new technologies and programming languages, but she yearns for the day she won’t have to do this any more.

The Magpie believes that at the end of the day, one should only explore until one finds the best place to nest.

4. The Recluse

Like the Magpie, the Recluse is primarily motivated by comfort and familiarity. But his world was constantly in flux for too long, so he does everything in his power to hide away in his comfort zone and wall off the rest of the world.

His parents divorced before he entered middle school, and he ping-ponged between their two new families, never truly belonging to either. Traveling, learning new things, meeting strangers all terrify him, and yet he’s forced to do more and more of all of these to survive.

When he finally finds a place to settle down, he never leaves it again. In every relationship, he becomes deeply codependent. He is a consistent member of a few local clubs; new faces join and leave, but he is one of the few who remain through the years. If he had his way, the clubs would meet in his living room and he’d never have to go outside.

Occasionally, he reconnects with an old friend who is a Dreamer or an Unmoored, and becomes deeply fascinated with their alien way of being. It is hard for him to understand how comfort and familiarity can be claustrophobic, but he’s glad he has these friends. They bring him new knowledge and experiences in a safe and digested way, or if not, at least they send him postcards.


In each of the above four stories, there is an imbalance between the “explore” subagent and the “exploit” subagent.

The Unmoored lives to “explore.” The “exploit” subagent is greatly suppressed or externalized.

The Dreamer also lives to “explore”, but understands the instrumental value of “exploit.” He views “exploit” as a unsightly means to an end, and suppresses the needs of the personality (comfort, safety, order) associated with it.

The Magpie lives to “exploit,” but understands the instrumental value of “explore.” She views “explore” as a dangerous means to an end, and suppresses the needs of the personality (freedom, creativity, chaos) associated with it.

The Recluse also lives to “exploit.” The “explore” subagent is greatly suppressed or externalized.

In all four cases, one subagent or the other dominates the personality, and holds the other subagent and its needs in contempt. Internal alignment of the two subagents can only occur if the whole person recognizes not only the instrumental value of each subagent, but respects their needs as ends in themselves. Here is my loose prescription for alignment, which might be attempted with an exercise like Internal Double Crux:

  1. If you are near the extremes (the Unmoored or the Recluse), learn to recognize at least the instrumental value of the suppressed subagent. If you lean heavily towards exploring, recognize that more systematic exploiting can often make you better at exploring in the long run. Similarly, if you lean heavily towards exploiting, recognize that more systematic exploring can often make you better at exploiting in the long run. Hopefully, you will level up into the Dreamer or the Magpie.
  2. If you are near the middle (the Dreamer or the Magpie), learn to respect the needs of the weaker subagent as ends in themselves. If you lean towards exploring, realize that it’s genuinely ok for you to enjoy checking boxes, following rules, and tidying things up. If you lean towards exploiting, realize that it’s genuinely ok for you to enjoy trying crazy things, breaking rules, and making a mess.

Pain is not the unit of Effort

(Content warning: self-harm, parts of this post may be actively counterproductive for readers with certain mental illnesses or idiosyncrasies.)

What doesn’t kill you makes you stronger. ~ Kelly Clarkson.

No pain, no gain. ~ Exercise motto.

The more bitterness you swallow, the higher you’ll go. ~ Chinese proverb.

I noticed recently that, at least in my social bubble, pain is the unit of effort. In other words, how hard you are trying is explicitly measured by how much suffering you put yourself through. In this post, I will share some anecdotes of how damaging and pervasive this belief is, and propose some counterbalancing ideas that might help rectify this problem.

I. Anecdotes

1. As a child, I spent most of my evenings studying mathematics under some amount of supervision from my mother. While studying, if I expressed discomfort or fatigue, my mother would bring me a snack or drink and tell me to stretch or take a break. I think she took it as a sign that I was trying my best. If on the other hand I was smiling or joyful for extended periods of time, she took that as a sign that I had effort to spare and increased the hours I was supposed to study each day. To this day there’s a gremlin on my shoulder that whispers, “If you’re happy, you’re not trying your best.”

2. A close friend who played sports in school reports that training can be harrowing. He told me that players who fell behind the pack during for daily jogs would be singled out and publicly humiliated. One time the coach screamed at my friend for falling behind the asthmatic boy who was alternating between running and using his inhaler. Another time, my friend internalized “no pain, no gain” to the point of losing his toenails.

3. In high school and college, I was surrounded by overachievers constantly making (what seemed to me) incomprehensibly bad life choices. My classmates would sign up for eight classes per semester when the recommended number is five, jigsaw extracurricular activities into their calendar like a dynamic programming knapsack-solver, and then proceed to have loud public complaining contests about which libraries are most comfortable to study at past 2am and how many pages they have left to write for the essay due in three hours. Only later did I learn to ask: what incentives were they responding to?

4. A while ago I became a connoisseur of Chinese webnovels. Among those written for a male audience, there is a surprisingly diverse set of character traits represented among the main characters. Doubtless many are womanizing murderhobos with no redeeming qualities, but others are classical heroes with big hearts, or sarcastic antiheroes who actually grow up a little, or ambitious empire-builders with grand plans to pave the universe with Confucian order, or down-on-their-luck starving artists who just want to bring happiness to the world through song.

If there is a single common virtue shared by all these protagonists, it is their superhuman pain tolerance. Protagonists routinely and often voluntarily dunk themselves in vats of lava, have all their bones broken, shattered, and reforged, get trapped inside alternate dimensions of freezing cold for millennia (which conveniently only takes a day in the outside world), and overdose on level-up pills right up to the brink of death, all in the name of becoming stronger. Oftentimes the defining difference between the protagonist and the antagonist is that the antagonist did not have enough pain tolerance and allowed the (unbearable physical) suffering in his life to drive him mad.

5. I have a close friend who often asks for my perspective on personal problems. A pattern arose in a couple of our conversations:

alkjash: I feel like you’re not actually trying. [Meaning: using all the tools at your disposal, getting creative, throwing money at the problem to make it go away.]

alkjash’s friend: What do you mean I’m not trying? I think I’m trying my best, can’t you tell how hard I’m trying? [Meaning: piling on time, energy, and willpower to the point of burnout.]

After several of these conversations went nowhere, I learned that asking this friend to try harder directly translated in his mind to accusing him of low pain tolerance and asking him to hurt himself more.

II. Antidotes

I often hear on the internet laments like “Why is nobody actually trying?” Once upon a time, I was honestly and genuinely confused by this question. It seemed to me that “actually trying” – aiming the full force of your being at the solution of a problem you care about – is self-evidently motivating and requires zero extra justification if you care about the problem.

I think I finally understand why so few people are “actually trying.” The reason is this pervasive and damaging belief that pain is the unit of effort. With this belief, the injunction “actually try” means “put yourself in as much pain as you can handle.” Similarly, “she’s trying her best” translates to “she’s really hurting right now.” Even worse, people with this belief optimize for the appearance of suffering. Answering emails at midnight and appearing fatigued at meetings are somehow taken to be more credible signals of effort than actual results. And if you think that’s pathological, wait until you meet someone for whom telling them about opportunities actively hurts them, because you’ve just created another knife they feel pressured to cut themselves with.

I see a mob of people walking up to houses and throwing themselves bodily at the closed front doors. I walk up to block one man and ask, “Stop it! Why don’t you try the doorknob first? Have you rung the doorbell?” The man responds in tears, nursing his bloody right shoulder, “I’m trying as hard as I can!” With his one good arm, he shoves me aside and takes a running start to lunge at the door again. Finally, the timber shatters and the man breaks through. The surrounding mob cheers him on, “Look how hard he’s trying!”

Once you understand that pain is how people define effort, the answer to the question “why is nobody actually trying?” becomes astoundingly obvious. I’d like to propose two beliefs to counterbalance this awful state of affairs.

1. If it hurts, you’re probably doing it wrong.

If your wrists ache on the bench press, you’re probably using bad form and/or too much weight. If your feet ache from running, you might need sneakers with better arch support. If you’re consistently sore for days after exercising, you should learn to stretch properly and check your nutrition.

Such rules are well-established in the setting of physical exercise, but their analogs in intellectual work seem to be completely lost on people. If reading a math paper is actively unpleasant, you should find a better-written paper or learn some background material first (most likely both). If you study or work late into the night and it disrupts your Circadian rhythm, you’re trading off long-term productivity and well-being for low-quality work. That’s just bad form.

If it hurts, you’re probably doing it wrong.

2. You’re not trying your best if you’re not happy.

Happiness is really, really instrumentally useful. Being happy gives you more energy, increases your physical health and lifespan, makes you more creative and risk-tolerant, and (even if all the previous effects are unreplicated pseudoscience) causes other people to like you more. Whether you are tackling the Riemann hypothesis, climate change, or your personal weight loss, one of the first steps should be to acquire as much happiness as you can get your hands on. And the good news is: at least anecdotally, it is possible to substantially raise your happiness set-point through jedi mind tricks.

Becoming happy is a fully general problem-solving strategy. And although one can in principle trade off happiness for short bursts of productivity, in practice this is never worth it.

Culturally, we’ve been led to believe that over-stressed and tired people are the ones trying their best. It is right and proper to be kind to such people, but let’s not go so far as to support the delusion that they are inputting as much effort as their joyful, boisterous peers bouncing off the walls.

You’re not trying your best if you’re not happy.

Is Success the Enemy of Freedom? (Full)

I. Parables

A. Anna is a graduate student studying p-adic quasicoherent topology. It’s a niche subfield of mathematics where Anna feels comfortable working on neat little problems with the small handful of researchers interested in this topic. Last year, Ann stumbled upon a connection between her pet problem and algebraic matroid theory, solving a big open conjecture in the matroid Langlands program. Initially, she was over the moon about the awards and the Quanta articles, but now that things have returned to normal, her advisor is pressuring her to continue working with the matroid theorists with their massive NSF grants and real-world applications. Anna hasn’t had time to think about p-adic quasicoherent topology in months.

B. Ben is one of the top Tetris players in the world, infamous for his signature move: the reverse double T-spin. Ben spent years perfecting this move, which requires lightning fast reflexes and nerves of steel, and has won dozens of tournaments on its back. Recently, Ben felt like his other Tetris skills needed work and tried to play online without using his signature move, but was greeted by a long string of losses: the Tetris servers kept matching him with the other top players in the world, who absolutely stomped him. Discouraged, Ben gave up on the endeavor and went back to practicing the reverse double T-spin.

C. Clara was just promoted to be the youngest Engineering Director at a mid-sized software startup. She quickly climbed the ranks, thanks to her amazing knowledge of all things object-oriented and her excellent communication skills. These days, she finds her schedule packed with what the company needs: back-to-back high-level strategy meetings preparing for the optics of the next product launch, instead of what she loves: rewriting whole codebases in Haskell++.

D. Deborah started her writing career as a small-time crime novelist, who split her time between a colorful cast of sleuthy protagonists. One day, her spunky children’s character Detective Dolly blew up in popularity due to a Fruit Loops advertising campaign. At the beginning of every month, Deborah tells herself she’s going to finally kill off Dolly and get to work on that grand historical romance she’s been dreaming about. At the end of every month, Deborah’s husband comes home with the mortgage bills for their expensive bayside mansion, paid for with “Dolly money,” and Deborah starts yet another Elementary School Enigma.

E. While checking his email in the wee hours of the morning, Professor Evan Evanson notices an appealing seminar announcement: “A Gentle Introduction to P-adic Quasicoherent Topology (Part the First).” Ever since being exposed to the topic in his undergraduate matroid theory class, Evan has always wanted to learn more. He arrives bright and early on the day of the seminar and finds a prime seat, but as others file into the lecture hall, he’s greeted by a mortifying realization: it’s a graduate student learning seminar, and he’s the only faculty member present. Squeezing in his embarrassment, Evan sits through the talk and learns quite a bit of fascinating new mathematics. For some reason, even though he enjoyed the experience, Evan never comes back for Part the Second.

F. Whenever Frank looks back to his college years, he remembers most fondly the day he was kicked out of the conservative school newspaper for penning a provocative piece about jailing all billionaires. Although he was a mediocre student with a medium-sized drinking problem, on that day Frank felt like a man with principles. A real American patriot in the ranks of Patrick Henry or Thomas Jefferson. After college, Frank met a girl who helped him sort himself out and get sober, and now he’s the proud owner of a small accounting firm and has two beautiful daughters Jenny and Taylor. Yesterday, arsonists set fire to the Planned Parenthood clinic across the street, and his employees have been clamoring for Frank to make a political statement. Frank almost threw caution to the wind and Tweeted #bodilyautonomy from the company account right there, but then the picture on his desk catches his eye: his wife and daughters at Taylor’s elementary school graduation. It’s hard to be a man of principles when you have something to lose.

G. Garrett is a popular radio psychologist who has been pressured by his sponsors into being the face of the yearly Breast Cancer Bike-a-thon. Unfortunately, Garrett has a dark secret: he’s never ridden a bicycle. Too embarrassed to ask anyone for help or even be seen practicing – he is a respected public figure, for god’s sake – Garrett buys a bike and sneaks to an abandoned lot to practice by himself after sunset. He thinks to himself, “how hard can it be?” Garrett shatters his ankle ten minutes into his covert practice session and has to pull out of the event. Fortunately, Garrett’s sponsors find an actual celebrity to fill in for him and breast cancer donations reach record highs.

II. Motivation

What is personal success for?

We say success opens doors. Broadens horizons. Pushes the envelope. Shatters glass ceilings.

Success sets you free.

But what if it doesn’t?

Take a good hard look at the successful people around you. Doctors too busy to see their children on weekdays. Mathematicians too brilliant in one field to switch to another. Businessmen too wealthy to avoid nightly wining and dining. Professional gamers too specialized to learn a new hero. Public figures too popular to change their minds.

Remember that time Michael Jordan took a break from basketball and played professional baseball? They said he would have made an excellent professional player given time. Jordan said baseball was his childhood dream. Even so, in just over a year Jordan was back in basketball. It is hard not to imagine what a baseball player Michael Jordan could have been, had he been less successful going in.

I think it was in college that I first noticed something wasn’t right about this picture. I spent my first semester studying and playing Go for about eight hours a day. I remember setting out a goban on the carpet of my dorm room and studying patterns in the morning as my roommate left for classes; when he returned to the room in the evening, he was surprised to see me still sitting there contemplating the flow of the stones. Because this was not the first or tenth time this had happened, he commented something like, “You must be really smart to not need to study.”

I remember being dumbstruck by that statement. It suggested that my freedom to play board games for eight hours a day was gated by my personal success, and other Harvard students would be able to live like me if only they were smarter. But you know who else can play board games for eight hours a day? Basement-dwelling high school dropouts, who are – for all their unsung virtues – definitely not smarter than Harvard students.

When I entered college, they told me a Harvard education would empower me do anything I want. The world would be my oyster. I took that message to heart in those four years – I fell in love, played every PC game that money could buy, studied programming languages and systems programming, and read more than one Russian novel. When I talked to my peers, however, I was constantly surprised at the overwhelming sameness of their ambitions. Four years later, twenty out of thirty-odd graduating seniors at our House planned to work in finance or consulting.

(Now, it could be that college really empowers these bright young scholars to realize their childhood dreams of arbitraging the yen against the kroner. But this is, as they say in the natural sciences, definitely not the null hypothesis.)

All of this would have made a teenager hate the idea of success altogether. I was not a teenager anymore, so I formulated a slightly more sophisticated answer: Regardless of how successful I become, I resolve to live like a failure.

This is a post about all the forces, real and imagined, that can make success the enemy of personal freedom. As long as these forces exist, and as long as human heart yearns for liberty, few people will ever want wholeheartedly to succeed. Were it not already reality, that is a state of affairs too depressing to contemplate.

(Just to be clear, people are plenty motivated to succeed when basic needs are at stake – to put food on the table, to get laid, to pay for the mortgage. But after those needs get met, success just doesn’t look all that great and only certain sorts of delightful weirdos keep striving. The rest of us mostly just lay back and enjoy the fruits of their labor.)

III. Factorization

I think all of the experiences in Section I can be summed up by the umbrella-term “Sunk Cost Fallacy,” but that theory is a little too low-resolution for my tastes. In this section I identify three main psychological factors of the phenomenon.

1. You rose to meet the challenge. Your peer group rose to meet you.

We are constantly sorted together with people of the same age group, at similar levels of competence, at similar stages in our careers. To keep up with the group, you have to run as fast as you can just to stay in place, as the saying goes. And if you run twice as fast as that, you just end up in a new, even harder-to-impress peer group. When your friends are all level 80, it’s dreadfully difficult to restart at level 1.

Your friends may even be sympathetic, but it rarely helps matters.

Maybe you want to try something totally new, and your friends are too invested in their pet genre to emigrate with you.

Maybe you’re excited to learn a new skill one of your hyper-competent friends is specialized in, and you ask them to coach you. Unfortunately, this turns out to be a massive mistake, because your friend only remembers how she got from level 75 to 80, and sort of assumes everything below is trivial. It’s technically possible to learn area formulas as a special case of integral calculus, but only technically.

Maybe you transition to a new role within the team, you struggle to learn a new set of tricks, and you start hating yourself for not pulling as much weight as you’re used to. You start to see a mix of pity and frustration in your teammates eyes as you drag the whole team down.

2. Yesterday, you were bad at everything, and that really sucked. Today you’re good at one thing, and you’re hanging on for dear life.

It’s hard to move out of your comfort zone when your comfort zone is one hundred square feet on top of Mount Olympus and every cardinal direction points straight off a cliff. Seems like just yesterday you stood at the base of this mountain among the rest of the mortals, craning your neck to get a peek at what it’s like up here.

Kindly god-uncle Zeus calls a special thunderstorm for your arrival. Dionysus pours you a frothy drink and shares a bawdy tale. Hephaestus personally fashions you a blade as a symbol of your newfound status. Aphrodite invites you to her parlor for a night of good old-fashioned philosophy. They all act so welcoming, so natural, so in their element, and you know you’re only up here by a stroke of pure luck.

When Hermes returns the next morning and invites you to fly with him on his winged boots to see the world, you decline graciously. Not because you don’t want to – they’re winged boots! – but because the moment you try anything out of the ordinary you’ll be found out for the impostor that you are and god-uncle Zeus will show you his not-so-kindly side and chain you to a liver-eating eagle or a boulder that only obeys the laws of gravity intermittently.

3. Success gave you something to lose.

They say beware the man with nothing to lose.

I say envy him, because he alone is free.

You fondly recall the good old days of two thousand and two when you could go online and post diatribes against religion as a “militant atheist.” In those days, you had nothing, and you were free. You were unattached. You were intellectually wealthy but financially insolvent. You could see one end of the place you call home from the other.

Now that you’ve made it big, you’d have to carefully position mirrors at the ends of three hallways to see that far. You’re attached to wonderful person(s) of amenable sexual orientation(s). You have a reputation to maintain in the ever-smaller circles that you walk. Children in your community look up to you, or so you tell yourself. And so, even though deep in your heart you still believe that only idiots believe in an old man in the sky your Twitter profile identifies you as “spiritual, yearning, exploring.”

IV. Resolution?

It seems to me we have a problem.

We are not a species known for risk-taking, so human flourishing really depends on the explicit emphasis of exploration and openness to new experience. And yet it seems that the game is set up so that the most successful people are least incentivized to explore further. That all the trying new things and pushing boundaries and calling for revolution is likely to come from those with neither the power to get it done nor the competence to do it correctly.

But it’s not a hopeless case by any means. Many of the most successful people got there precisely by valuing freedom, creativity, and exploration, and still practice these values – so far as they can – within the confines of their walled gardens. We live in an information age where getting good at things is as easy as it’s ever been. And at very least we pay lip service to healthy adages like “Stay hungry, stay foolish.”

But what does one do personally to maintain one’s freedom?

I don’t claim to have a fully general solution to this problem, but here is a rule that’s helped me in the past.

When learning something new, treat yourself like a five-year-old.

If you’ve never spoken a word of Korean in your life, it doesn’t matter if you’re a professor of English Literature. As far as learning Korean goes, you’re a five-year-old. Treat yourself like one. Make yourself a snack for memorizing the vertical vowels. Take a break after reading your first sentence and come back tomorrow. When you’re done for the day, suck your thumb while staring at the first Korean word you’ve ever learned and feel the honest pride well up in your heart.

If you’ve never washed a dish in your life, it doesn’t matter if you’re a professional chef. As far as washing dishes is concerned, you’re a five-year-old. Treat yourself like one. Make yourself a snack for figuring out how to dispense dish soap without getting it everywhere. Take a break after finishing the bowls and come back tomorrow. When you’re all done, take a moment to take in that beautiful empty sink and feel the honest pride well up in your toddler heart.

Do you see how profoundly counterproductive it would be for the Korean learner to beat herself up for not being able to converse fluently with her Asian friends after two weeks? Do you see how completely unkind it would be for the novice dishwasher to call himself a useless piece of shit for not being able to execute the most basic of adult tasks?

Be kind to yourself and adjust your expectations to reality. When learning something new, treat yourself like a five-year-old.

Of Math and Memory, Part 3 (Final)

In Part 1, we noted how extraordinarily taxing mathematics can be on short-term working memory. Being able to hold one extra Greek letter in your head can make the difference between following a lecture and getting completely lost. Having background and mathematical maturity means many standard techniques need not be forcibly remembered, freeing up space for the few genuinely novel ideas.

In Part 2, we gave a simple conceptual model for short-term memory, based on a fundamental principle of information theory: compression is equivalent to prediction. The more predictable data is (or the better we get at predicting it), the less new information you have to store.

What do these ideas mean concretely for mathematicians? In this concluding post, we give a practical algorithm for making the most of your short-term memory in mathematics, which I call dyadic scanning.

We will start with the concrete problem of how to read a paper, and later generalize to how to write papers, how to listen and give talks, and how to have mathematical conversations, all while making the most of our short-term memory.

Dyadic Scanning

Consider the markings on a standard ‘Murican ruler:


The two longest vertical lines mark the beginning and end of an inch. It is then divided dyadically into half-inches, quarter-inches, eighth-inches, and sixteenth-inches by progressively smaller “teeth.”

A mathematical paper is organized much like the markings on a ruler: first it is divided into a few main theorems, each of which is divided into several major lemmas, which are then interspersed with minor or technical lemmas and definitions, themselves pieced together from many tedious details. The obvious and standard way of reading a paper is a sequential scan:


Sequential Scan (BAD)

Up until a couple years ago, this was how I attempted to read any given paper: read it through once from beginning to end, pausing on each detail and tracing it down to the lowest level until I could follow it line-by-line. The sequential scan is a fairly useful way to build foundations and mathematical maturity: one spends a lot of time piecing together details and developing a taste for rigor. It is, however, a generally misguided and inefficient approach to reading mathematics in general.

“I could follow line by line, but I have no idea what’s going on” is a common complaint that comes out of reading mathematics like this.

What are the downsides of the sequential scan?

You get easily lost in details, missing the forest for the trees.

You sacrifice agency, accepting the order in which ideas are presented as if from on high.

Most importantly, you don’t know where you’re going.

You can’t ask “Where was condition (a) of Lemma 2 used in the proof of Lemma 3? Can we weaken it?” if you don’t even know that Lemma 2 only exists to prove Lemma 3 later on. This is the kind of question that senior mathematicians are asking all the time to shore up their understanding.

Unless the paper is very cleverly and thoughtfully written, reading it sequentially is going in blind. You will have the hardest possible time predicting each next step, and therefore have to bear the heaviest possible burden remembering every detail.  Without knowing what will be used where or how, you will have to default to remembering everything.

Here is a more enlightened way to read a paper, which I call dyadic scanning.


Dyadic Scan (GOOD)

Instead of reading the paper in a single pass, you split your reading into logarithmically many passes of progressively higher resolution. In the first pass, you figure out the overarching organization and the main results. In the second, you locate the main lemmas, how they fit together, and where the genuine innovations are in the paper. In the third, you piece together how all the minor technical lemmas are involved in the proof, and the locale where each one is relevant. Only in the fourth pass, or later, do you dig into the details of the rigorous proofs. More passes are added if the paper is especially dense, or if you’re especially unfamiliar with the field. The longer and more technical the paper is, the longer you should wait before diving into details.

Why make dyadic scans?

You know where you’re going. If you read mathematics this way, you will know what each line of mathematics is for before digging into why it’s true. Knowing the purpose of Lemma 2 lets you figure out which terms are important and which terms are negligible error terms. Knowing when you’re done using that functional equation means you can free up that memory for something new.

You’re forced to develop an eye for what matters. Not every paper is written so that the main lemmas stand out from the minor lemmas stand out from the boring technical details. Doing dyadic scans forces you to develop a taste for what matters, and to discriminate between the innovative and the boilerplate.

You read in an order closer to how mathematics is actually done. Very few proofs are devised in the order they’re presented. There may be an important and difficult technical lemma that requires a seven-page calculation using the calculus of variations, but I can guarantee you that the author did not work out the details of this argument before fleshing out the main arc of the proof first. Reading via dyadic scans reinforces the habit of keeping the big picture in your head at all times. The rigorous correctness of each detail matters less than you think – often any given technical argument can be done in several other ways.


For clarity, I’ve focused on dyadic scanning for reading papers. The method applies equally well to other settings. I will not bore you with the details, but here is a sketch of how.

On paper-writing

A well-written paper should make it easy for the reader to pick out its dyadic structure. To some extent, this is already standard practice: the abstract is an outline for the introduction, which is an outline for the entire paper. A great example where such a structure is pushed further is this paper of Tao which proves an almost version of the infamous Collatz conjecture. It is a fairly dense 49-page paper, so in addition to the standard abstract and introduction, there is a ten-page extended outline detailing the main arc of the proof and highlighting the important ideas.

When arguments get even longer than this, it is not uncommon to see a proof divided among multiple papers, with the first one serving as an extended introduction for the whole series.

This does not mean that a paper should necessarily be written in the explicit order of the dyadic scans: all the main theorems first, then the main lemmas, then the minor lemmas, and then the technical arguments should be bunched up at the very end. Often this will result in a very unnatural structure obfuscating the dependencies between ideas. It may be better to split the paper into subsections which are as functionally independent as possible, and carefully point out the dependencies and relative importance of various parts when they appear.

On giving and receiving talks

Attending talks is even more taxing on working memory than reading papers; the audience will generally have a wider variation in background, they will rarely have the luxury of pen and paper, and regardless of whether the talk is given via slides or a blackboard only a fraction of the total content is visible at any given time.

It is therefore even more essential when giving talks that the audience knows exactly where you’re going. Be sure to have multiple levels of signposting and continuous reminders on how each argument fits into the big picture. Most of the details in the lower levels should be omitted altogether unless they are absolutely essentially.

On the receiving end, my general advice is that trying to follow every word in a talk is a mistake akin to making a single sequential pass in reading a paper. Instead, treat going to a talk as taking a single dyadic pass into a topic, out of potentially many. Which pass to treat it as depends on your current level of exposure. If you are a beginner, watch the talk as if taking the first dyadic pass, noting the key words and main ideas and how they fit together. If you have some exposure to the field, you can treat the talk like a second or third pass, paying attention to the major details and innovations. If you are an expert in the field, almost nothing in the talk will be new to you, and you can really dig into the key details or open problems. Be realistic about how much you expect to get out of the talk, and plan accordingly for what to focus on.

On mathematical conversations

Much of the advice in this post applies just as well to the more informal setting of a mathematical conversation, where often one person must convey a fairly complicated argument verbally to one or more others. The key difference in this setting is that the listener(s) can play a much more active role.

The simplest level of active listening is asking for clarifications and more details when things are not clear. A more sophisticated level of active listening is asking directly for the pieces of the dyadic structure that one is missing: if the speaker dives into the details of a lemma before its purpose is made clear, it is often correct for you to ask for that purpose instead.

In Conclusion

In all of these activities, the fundamental resource is your very limited working memory, and the more you can predict, the less you have to remember. By looking ahead, by asking for clarification, by making multiple passes, we can “cheat” and see the future, freeing up our memory for what really matters.

Of Math and Memory, Part 2

Last time, I wrote that having a good memory is essential in mathematics.

Today I will describe my model for working memory.

Compression and Prediction

Data compression is the science of storing information in as few bits as possible. I claim that optimizing your working memory is mainly a problem of data compression: there’s a bounded amount of data you can store over a short period of time, and the problem is to compress the information you need so that this storage is as efficient as possible.

One of the fundamental notions in data compression is that compression is equivalent to prediction. Another way of saying this is: the more you can predict, the less you have to remember.

Here are three examples.

I. Text compression

Cnsdr ths prgrph. ‘v rmvd ll th vwls nd t rmns bsclly rdbl, bcs wth jst th cnsnnts n cn prdct wht th mssng vwls wr. Th vwls wr rdndnt nd cld b cmprssd wy.

All text compression algorithms work basically the same way: they store a smaller amount of data from which the rest of the information can be predicted. The better you are at predicting the future, the less arbitrary data you have to carry around.

II. Memory for Go

Every strong amateur Go player can, after a slow-paced game, reproduce the entire game from memory. An average game consists of between one and two hundred moves, each of which can be placed on any of the 19×19 grid points.


A typical amateur game, midway through.

Anyone who practices playing Go for a year or two will gain this amazing ability. It is not because their general memory improved either: if you showed them a sequence of nonsensical, randomly generated Go moves, they would have almost as hard of a time remembering them as an absolute novice.

The reason it’s so easy to remember your own games is because your own moves are so predictable. Given a game state, you don’t have to actually remember the coordinates where the stone landed. You just have to think “what would I do in this position?” and reproduce the train of thought.

The only moves in the game you really need to explicitly store in memory are the “surprising” moves that you didn’t expect. Surprise, of course, is just another word for entropy. The better you are at prediction, the less surprise (entropy) you’ll meet, and the less you have to remember.

III. Mathematical theorems

A general feature of learning things well is that you get better at predicting. Fill in the blank:

If a and b are both the sum of two squares, then so is ___.

A beginning student looks at this statement and recalls the answer is ab, simply by retrieving this answer directly from memory.

A practiced number theorist doesn’t need to store this exact statement directly in memory; instead, they know that any of an infinite variety of such statements can be reconstructed from a small number of core insights. Here, the two core insights are that a sum of two squares is the norm of a Gaussian integer, and that norms are multiplicative.

Getting better at prediction in mathematics often follows the same general pattern: identifying the small number of core truths from which everything else follows.

We reduced the problem of improving your working memory to the problem of predicting the future. At face value, this reduction seems less than useless, because predicting the future is harder than memorizing flash cards. Thankfully, human beings are embodied agents who can interact with our world. In particular, we can cheat by instead making the world easier to predict.

More on this next time.

Of Math and Memory, Part 1

Memory is not sexy in mathematics.

“Rote memorization” is the most degrading slur you can fling at a math class. “Reciter of digits of pi” is the most awful caricature of mathematicians in the public eye. In grad school, the cardinal sin is to read a paper with a focus on memorizing names and results: we are bombarded with exhortations like if you learned the Arzelà-Ascoli theorem deeply, it would be impossible to forget. Apparently, if you really understand mathematics, everything (down to the accents on the names of 19th century Italian mathematicians) would be so natural as to render rote memorization completely unnecessary.

All these attitudes can be quite detrimental to the young mathematician who, at the end of the day, needs to memorize an enormous amount of arbitrary data in order to get up to speed in their field. In this post, I will tell some archetypal stories about how memory, especially short-term working memory, is perhaps the scarcest resource in mathematical work.

In a future post, I will attempt to provide some solutions to address this scarcity.


Have you ever tried to copy a phone or bank account number from one place to another, without the benefit of Ctrl-C? You stare at the number for ten seconds, repeating it back to yourself in a rap-like rhythm. That sick beat, you hope, will help you remember an extra digit or two.

Conjure up that feeling of impending doom as you repeat those numbers back to yourself, knowing full well you can’t move 10 digits in one go. That’s the feeling of not having enough working memory. It’s the same feeling in each of the following scenarios.

These are not technically true stories, but they are all pieced together from literally true events.

A brilliant analytic number theorist is half-way through a riveting talk on the distribution of low-lying zeroes of L-functions. About three-quarters of the way through the blackboard space, the speaker finally switches gears from giving motivation and carefully treads into a long, technical calculation. Every Cauchy-Schwarz application and Fourier transform is clearly explained and surprisingly simple, until –

Uh oh!

The speaker reaches the bottom of the blackboard and begins erasing. You can almost hear the collective sigh of despair as most of the listeners think the same thought.

We’ve reached the end of the line.

After that half of the calculations are erased, only a handful of senior mathematicians who know the subject inside and out follow the rest of the talk.

Three mathematicians are throwing around ideas in a meeting. One is suddenly struck by inspiration, and starts explaining how to carry out a tricky change-of-variables. Another joins in with excitement, quickly catching on and offering a crude approximation which simplifies things significantly. All of this is happening in the air, so to speak. Writing things down would severely hamper their progress.

The third person, a younger graduate student, has a number of questions about the equations everyone’s keeping in their heads. The first time they ask for clarification, they are reminded gently that all calculations are in characteristic p. The second time, they are informed of a standard fact about eigenvalues of random matrices, and given a minute to catch up.

The third time, they can’t remember whether x was defined in the Fourier domain. They don’t ask.

In the next meeting, there are only two mathematicians.

I’m out for lunch, and need to attend a seminar talk afterwards. My weekly meeting with my PhD adviser is two hours away, and I haven’t made any progress this week. Away from pen and paper, I rack my brain and scrape the bottom of the proverbial barrel for any stray thought that might be worth presenting to him.

By some miracle, a casual remark during lunch sets off a series of revelations. I begin methodically working out the details in my head, getting more and more excited that I’m onto something. I completely ignore the seminar talk, running back and forth over the calculations in my mind. I get more and more confident that it works.

I walk into my adviser’s office and try to explain the idea to him, only to realize that I’d forgotten an essential intermediate step and mixed up two important variables. I get up to the board and attempt to work things out from the beginning, but I’m so flustered by this point that I keep forgetting what I’m doing.

We spend the hour going back and forth on minor technicalities, trying to see if there’s anything to my idea. In the end, my adviser becomes pessimistic that there’s anything at all and gently shoos me out for his next meeting.

When I get back to my office afterwards, I pull out pen and paper to try to salvage the idea.

I figure out all the details in fifteen minutes.

Problem Statement

It is difficult to collaborate with someone with significantly more or less short-term memory. Someone with more will appear to skip ahead three steps at a time, and you will continually feel in their debt for asking them to explain details. Conversely, someone with less will often ask you to rewind and write ideas down that you find inessential.

It’s difficult to read a mathematical paper without a good short-term memory. A reader who needs to keep referring back to the statement of Lemma 4.3(a) does not have the mental capacity to think about the big picture. If the paper is improperly structured, introduces clumsy notation, or is liberally sprinkled with abstruse citations, trying to follow it can feel like taking a forgetful random walk. How many times will I flip back to the conventions section before I remember the difference between S_k and \mathcal{S}_k?

It’s difficult to either follow or give a mathematical talk without a good short-term memory. An audience member can get lost by zoning out briefly and losing track of an important definition or theorem statement. A speaker who doesn’t remember the contents of their slides constantly reads off them and has no attention to pay to the audience. Next, an audience question about a previous slide breaks the artificial flow of the talk and causes a minor catastrophe.

People often worry that they cannot do mathematics because they are not clever enough. This is a very serious worry, because as far as we know everyone is born with a certain amount of clever and nobody really knows how to get more.

I think people should instead worry they cannot do mathematics because their memories are too poor. And I think this is very good news, because memory can be trained, and deficiencies in memory can be optimized around.

To be continued…




The Pit


If you brought a man with keen ears to the edge of the pit and dropped a quarter over exactly the right spot, you could count to eleven before he heard it hit the ground. If you next told the man that a sliver of sunlight was visible from the very bottom of said pit, he might have squinted at you skeptically. If you proceeded to say that the bowels of this same pit were inhabited by twenty-odd live human beings, he would certainly have slapped you across the side of the head and called you a shameless liar. But you wouldn’t have lied once.

The inhabitants of the pit – the pitfolk – were frail people, bone-pale from the perennial lack of sunlight, all taut skin wrapped about wan elbows. However they shifted their bodies to and fro, they were bound – as if by cowed by that measly sliver of sunlight – to walk hunched over, keeping their faces downcast.

Through the decades, the pitfolk developed an extraordinary black and white vision, as all they could see were the meager shadows which shifted around their ankles. From these faded images the pitfolk deduced the whole of their reality. This made for a rather miserable experience, but it did not stop the pitfolk from building an entire way of life about the dance of dim shadows – black silhouettes against grey stone – that was their everything.

Each day at noon, when the dim light against their backs shone brightest, the pitfolk gathered in a large circle with one tribe member or another in the center, and that chosen member would play out a long and complex dance with the long and dextrous fingers attached to his long and sinewy limbs. The whole circle would watch intently as the shadows cast from the dance leaped across the ground, swaying to a silent rhythm. The full performance, which lasted nearly two hours, had been passed down from parent to child as long as any living tribe member could remember, and probably longer. And although each generation brought into it their own unique flicks of the finger and twirls of the elbow, the dance remained remarkably unchanged through the years. As everyone knew, the shadow dance was the story of their tribe.

Of course, each tribe member was free to form their own opinion about what exactly that story was. By some miracle of memory the pitfolk retained the faintest inkling of the great goings-on in the outside world, and through this hint of a memory they interpreted the shadow dance. 

Some saw in the flapping of hand-shadows the wings of the great Father Bird, while others saw the flapping capes of the first men. Some thought the writhing finger-shadows on the ground represented a plague of snakes that would bring about the end of the world, while others saw them as the tongues of a purifying fire that would bring its redemption. Some thought it strange that the great savior had five heads, one shorter and bulkier than the others, while others believed the five heads actually represented five spirits in one body, one of them a pudgy child. All such disagreements existed, and many more, but as the pitfolk had no means of communication other than the shadow dance, it appeared to each of them as if everyone else agreed with their understanding of things. At least on one thing they did agree: that the shadow dance was the story of their tribe, and that story must be passed down.

The youngest of the tribe was a boy, not yet seven, whose name was Two-Crossed-Fingers – that was the way they made his shadow sign. Unlike the older tribe members, Two-Crossed-Fingers was still learning moves of the shadow dance. His elders had mostly calcified on their interpretations of the dance, and perhaps even began to bore of it after so many years of dialectic, but the boy was still wide-eyed with excitement, playing through each piece of the story in delighted confusion. It was his greatest dream to complete the shadow dance and take his place within the tribe, so he studied very hard and very long.

There were many points when Two-Crossed-Fingers became stuck on a motion that seemed impossible. To draw different shapes simultaneously with his two hands challenged his mind. To stretch his arms wide apart and swing round and round challenged his body. To watch the tendrils of darkness consume each other in terrifying awful motions challenged his heart. And as Two-Crossed-Fingers was especially young and frail, these challenges were especially hard on him. But nevertheless he persisted, practicing deep into the dark of night, and to his amazement he discovered a certain way of interpreting the shadow dance that made what seemed to be impossible motions easy.

When he saw the two different shapes he had to draw as the two sexes, two sides of the same humanity, that motion merged into one unified story.

When he saw the swinging of his arms as the collapsing of the great bond between people, its frantic energy became natural and effortless.

When he accepted the throwing of one hand into the other as the final sacrifice that saved the last remnants of humanity, his terror subsided and was replaced by inner peace.

Two-Crossed-Fingers basked in the feeling of these revelations, and there was no doubt in his mind that these were the one true interpretation of the shadow dance. He was a great deal impressed by the genius who had woven together a dance so that the truth itself would shine through its execution, and thus carry forever the story of his people through the ages. Each step he took brought him closer to this story, and closer to his place in the tribe. He had no reason to suspect that he, uniquely, was the only one who had felt the meaning of the dance. Two-Crossed-Fingers had no reason to suspect that, unlike him, all twenty-two other tribespeople were just going through the motions, and they each had their own clumsily patched-together notion of things.

The day finally came when Two-Crossed-Fingers was deemed ready to perform. He twirled into the center of the ring just before noon, flush with excitement yet oddly composed. Many years had already been spent on this journey, but it was just the beginning. Two-Crossed-Fingers planned to spend many yet sharing the joy of the dance with his people. They had so few joys.

Just as the boy threw his hands out in opposite shapes to mime the courtship dance of man and woman, all of the other pitfolk heard a strange and shocking sound.

You must know that no sounds had been heard or uttered down here for a very long time.

It was a very weak scratching sound that seemed to carry down the pit from a great distance above, and it slowly grew more and more insistent. Bits of dirt and rubble began to tumble down into the pit, scaring what little color there was out of the downcast faces of the pitfolk. Nobody moved, nor blinked, nor paid any attention to anything except the growing noise.

Nobody except for the dancing boy in the center, who was so entranced by his moment that he didn’t notice the new stimuli.

The scratching sounds and falling rubble built into a crescendo, until even Two-Crossed-Fingers couldn’t ignore them, but the boy, despite his shock, continued to dance. Who could guess what went through his mind at that time? Whatever it was, he knew that the shadow dance, once initiated, must continue to completion.

A great Crack! was heard, reverberating around the cavern, and suddenly existence itself shattered – or so it seemed to the pitfolk. A great boulder had been dislodged from the opening of the pit, and sunlight poured in at an intensity they had never before experienced. The light was blinding.

Their eyes watered, their knees buckled, and they all knew that the end was nigh. But Two-Crossed-Fingers continued to dance the bond between good and evil unfurling into chaos. If his eyes began to bleed as he sped up his frantic motions, he did not seem to notice. The boy was possessed by a singular purpose – to retell the story of his tribe one final time.

Faster and faster he danced until the shadows made only a blur on the ground, telling no story at all except in the boy’s mind’s eye. One by one the muscles in his body gave way, but still he whirled. If anyone had stopped to ask him why he bothered dancing as the world fell apart, that single note of confusion might have broken his trance. But no one paid him any mind, so no stray thoughts entered the boy’s head, so Two-Crossed-Fingers danced to the very end. 

As the final sacrifice was thrown into the purifying fire to save humanity from its ultimate doom, he let out a long breath of relief. 

Then everything went white.

When we opened up the pit, we couldn’t believe our eyes. The records clearly showed that a mining accident had sealed the shaft nearly two hundred years ago, and yet when we dug down to its depths, we discovered twenty-three living human beings, cringing and frightened in a circle, thin as sticks. I’m proud to say that the team took action rapidly and without hesitation, climbing back up with the pit people strapped to their backs. We herded them like frightened sheep into our trucks, and soon had them in the local hospital.

What I remember most from that day was one little boy who latched onto me with a vice-like grip. He shook up and down, clearly frightened out of his wits, and a smear of blood ran down the side of his cheek. Yet still there was a line of defiance in his brow.

The mission itself had to continue, but a few of us volunteered to stay behind with the pit people and take care of their rehabilitation, as the hospital was understaffed for this kind of work. It took many months of intensive care to nurse the pit people back to health. None of them knew any spoken language, but they were surprisingly quick studies, and the programs worked as well as could be expected. Eventually they were able to tell us in simple words their unbelievable tale –  that all of them, down to the oldest man, had lived in the pit since birth, and never known any other life.

We tried our best to help the pit people, but it was difficult, for they had been down there so very long. What kept us going was how grateful they were, and they were very expressive of their gratitude with their long, bony limbs. The pit people all agreed that it had been a living nightmare down there in the pit, surviving in some demi-state between life and unlife. The plainest things – the green of grass, the ripples on pondwater, the crunch of tires against gravel – brought tears of ecstasy to their eyes.

There was one exception: the youngest among them, a boy who couldn’t have been more than eight years old. The same boy who had left such an impression on me that very first day.

When I was tasked to observe him, I found the boy was grateful and agreeable, but not extraordinarily so. He smiled a distant smile and made odd motions with his arms, often waving his crossed fingers at me as if he was about to lie. But he refused to learn to speak.

One day, he pulled me away from my lunch break and into his tent. The boy turned the lamp off so that only a thin sliver of light made it through the flap, and proceeded to do the strangest little dance. Even in the darkness, it was such a grotesque and unnatural series of motions to my eyes, joints bent in all the wrong angles, that I reflexively cast my eyes away. 

I think this offended him, and he stopped dead in his tracks. 

Ever since that day, he ignored me and all the other staff entirely, despite my best efforts to help him open up. The boy had no interest in the toys and games that fascinated other children.

Even so, I held out hope that something would change.

One chilly winter morning some weeks later, I was woken up by a nurse to learn that the boy had gone missing. There was no trace of him around the camp, and a long rope ladder had disappeared with him.

Propelled by a sinking feeling in my stomach, I jumped into my truck and quickly drove my way back to the opening of the pit where we first found them. Just as I suspected, a rope ladder fell into the darkness, its ends amateurishly looped around a tree stump nearby.

After refastening the rope properly, I descended down the ladder, too anxious to go back for proper protective gear. I knew that the boy was at the bottom of the pit, and part of me was already rehearsing for what I would do when I found him. Would I comfort him, or be cross? Would I have to take him back by force?

As I climbed down, the sounds of life retreated into the distance and the light above faded to a tiny point, but still the bottom was nowhere in sight. I felt as if I was descending into another plane, where time and space and smell and taste all faded into metaphor.

Finally, my boots hit solid ground. Using my smartphone as a makeshift flashlight, I discovered the same large cavern that we initially found the pit people in. It was about sixty feet across, and almost perfectly circular. The cold stone stretched out flat and empty, devoid of any sign that dozens of people had ever spent their lives here. 

To my surprise, there was not even a single trace of the boy.

I stood still for a moment, drinking in the emptiness. In the corner of my eye, the shadows on the ground seemed to dance and cavort gracefully, but when I turned to stare at them they steadied. 

It must have been the unsteadiness of my hands.

Another silhouette flitted across my peripheral vision, but when I turned, there was again nothing.

I was unsettled.

I am not a claustrophobic man, but the emptiness, the silence, and the chill in the air made it unbearable to stay too long. All thoughts of the boy disappeared from my mind.

As soon as my limbs recovered I clambered back up the rope ladder as fast as they would take me, out of this plane of demi-life. The trip upwards seemed to take twice as long as the trip down, and my limbs almost gave way before I made it.

After returning to the land of the living, I thought for a moment to pull up the rope ladder and take it with me.

In the end, I decided against it.

Objectives vs Constraints

I was thinking the other day about how strange linear programming duality is, and how great it would be if something like it applied in real life. This led me to thinking about how human beings optimize in practice. 

I think a huge number of optimization problems at every level from public policy to personal decision-making can be framed as “Maximize A and B” where A and B are two values. Conflict arises when A and B compete and need to be traded off for each other. 

The first key insight is:

People almost always implement “maximize A and B” as either “maximize A given B” or “maximize B given A,” and these are NOT the same strategy.

If someone is implementing “maximize A given B,” I’ll say they’re treating A as the objective and B as the constraint. It is important to note that even though the objective A may seem like the thing you’re working hardest on and care the most about because you’re trying to maximize it, the constraint B is actually the value you’re putting more weight on. That’s the second insight:

When you think you’re prioritizing A you might actually be putting most of your energy in guaranteeing a different value B, and optimizing A with only the residual energy that remains.


I have taken a good number of college math classes, and I would roughly divide the pedagogy into three categories, based on what the lectures seem to be optimizing for out of (A) student understanding and (B) material covered.

Classes in category 1 (common among large introductory courses like linear algebra or real analysis) feel as if they’re designed to make sure the median student understands all the material. Examples are copious, homework exercises are comprehensive, and each important argument or tool is practiced deliberately and with spaced repetition. The revealed preference of the lecturer is “Maximize material covered conditioned on student understanding.”

Classes in category 2 (common among upper-level graduate courses) feel as if they’re designed to cram as much of the instructor’s pet topic into a semester as humanly possible. Homework is sparse if it exists, while details, proofs, and entire months of intermediate background material are skipped or brushed under the rug. By the end of the course, the number of students not completely lost is between 0 and the number of instructor’s doctoral students taking the class, inclusive. Usually, the lecturer is both blissfully unaware that nobody is following and perfectly happy to slow down for questions and fill in details when prodded. So they clearly care about student understanding at some level. The problem is that they skip five steps for every one covered, and even generously filling in one or two of those steps helps almost nobody. The revealed preference of the lecturer is “Maximize student understanding conditioned on covering all the material.”

The third – and possibly largest – category is an uncanny middle ground between these two extremes.

Take a Data Structures class I sat through in some previous life. Before the first two midterms, we met all the usual suspects – BSTs, hashtables, suffix arrays – the stuff techbros memorize to pass Google interviews and never touch again. Once or twice the instructor gets a bit of color in his cheeks and does something a little risqué like put a BST inside a hashtable, but on the whole you can follow along by watching the lecture videos at 3x speed with Katy Perry playing in the background.

Well I’m zooming along happily and then right after the second midterm, a switch flips. The instructor has covered all the “Data Structures 101” and has six lectures left to introduce us to the bountiful fruits of modern research. You can almost see him giddily preparing lecture notes the night before and bashfully remarking, “oops, this part needs a whole two lectures on circuit complexity to make sense, teehee.” The fraction of students who are nodding along excitedly in lecture drops from 1-o(1) to o(1).

This kind of sharp phase transition has happened to me enough times that I’m kind of numb to the process. I almost know from day one that at some point lectures will suddenly stop making sense, even if I loved the lecturer’s style at the beginning. Classes in category 3 (which tend to be upper-level undergraduate courses or introductory graduate classes) start out “maximize material covered conditioned on student understanding”, and then BAM! experience a sharp transition around the two-thirds mark into “maximize student understanding conditioned on material covered.” In Algebra 1, the lecturer covers rings, fields, and a smattering of Galois theory, and then runs out of patience and suddenly starts preaching the mAgIcAl LaNgUaGe Of ScHeMeS. An exquisite course on Riemann surfaces runs adrift after the second midterm into the dynamics on moduli spaces of nonorientable genus something somethings.

And the sad thing is, I really understand where these lecturers are coming from. After all, a human being can only optimize for one thing at once.


The criminal justice system primarily cares about two things: (A) doing bad things to guilty people, and (B) not doing bad things to innocent people. For almost all of human history, the default optimization protocol was “minimize B given A,” in other words, “guilty until proven innocent.” This kind of thinking is built into us: we would rather wipe out villages of extra innocents than let dangerous criminals or enemies go free. Almost every culture has ancient concepts of original sin or guilt by association. In Chinese literature, the bad guys’ catch phrase is “斩草除根” (when cutting grass, pull out the roots), which is usually used to justify killing the good guy’s children to prevent them from retaliating when they grow up. Murder some ten-year-olds just to be safe. After all, it’s the humble thing to do.

At some momentous inflection point in history, the fundamental legal axiom flipped to “innocent until proven guilty.” The switch between these two optimization protocols, which are superficially doing the same thing, “maximize A and minimize B,” was possibly the most important and unlikely step ever made in the advance of human civilization. “Innocent until proven guilty” affirms the principle that an individual human being has intrinsic value, and that we cannot murder someone just to be safe. What it means, unfortunately, is that we let scumbags and criminals go all the time and this is by design. If you think this was an easy principle for human beings to agree upon, you have not met human beings.

A diagnostic cancer test primarily cares about two things: (A) telling cancer patients they have cancer, and (B) not telling healthy people they have cancer. In a world where technology is not perfect and we have to trade off between some amount of A or some amount of B, the medical profession uses the protocol “minimize B conditioned on A.”

This is not as trivial a choice as it might seem – remember that one probability problem about false positive rates they ask on every standardized test? Even if the false positive rate is only 1%, most diagnoses will be false positives because very few people have cancer, but many people don’t have cancer. But it’s still worth it – it’s much much more important that every early cancer patients is diagnosed correctly than that healthy people don’t get scared and inconvenienced, even if we scare a huge number of such people.

An immigration policy cares about two things: (A) letting good people in, and (B) keeping bad people out. There was a point in the history of the North American continent where the immigration policy was entirely open, ignoring B altogether. This was an unmitigated disaster for the Americans of the time, as European immigrants came in with their guns, germs, and steel and wiped out 90% of the native population. In recent history, it seems like the opposite policy is the case, “maximize A conditioned on B,” but it is a huge source of controversy because we cannot agree on whether which of A or B should be the objective and which should be the constraint. Merely saying both sides care about A and B does nothing to solve the problem.


Here’s a parable about the kind of person I am. A psychologist once gave five-year-old me an infinite marshmallow test: “For each 15 minutes you wait, you get one more marshmallows at the end!” Legend says I’m still waiting in that room.

Of course, the marshmallow test is not mostly about impulse control or delayed gratification, as it’s usually sold. It’s about being willing to sacrifice (A) your own comfort to (B) pass other people’s tests and get their approval. I was always very much willing to play the game “maximize A conditioned on B” – when I could laze out and be comfortable I would, but only after guaranteeing I’d pass the test.

I spent a lot of time as a child being alternatively confused about or contemptuous of other kids who didn’t do as well at tests, especially when they claimed to be “doing their best.” It seemed to me that “doing your best” means passing the test at all costs, and it was glaringly obvious to me that every single other student could do that, especially given how easy the tests were. It took a long time for me to realize that “do your best” actually meant “maximize B conditioned no A” – don’t mutilate yourself to get other people’s approval – and even longer to understand that this might actually be right.


I wanted to conclude this essay by making sweeping generalizations about human psychology, but then I realized that I’m still not confident the phenomenon I’m describing is real. Here are the claims I’d like to make:

  • All hard decisions involve tradeoffs between (at least) two competing values.
  • Instead of treating competing values as roughly equal in weight, usually human beings will weigh one WAY more than the other, so in practice “maximize A and B” rounds off to “maximize A conditioned on B.”
  • Often this is the correct behavior, even if it is surprising. Usually one of A or B will actually be several orders of magnitude more impactful than the other.
  • Sometimes this is the incorrect behavior but people still do it because human beings can only optimize one function at a time.
  • Many interpersonal conflicts occur because one person is trying to solve “maximize A given B” and the other is trying to solve “maximize B given A” and each thinks they’re solving the same problem as the other person, just in a better way.
  • We need to learn to maximize functions like A+B.

Thoughts, examples, counterexamples?