"Everything can be made radically elementary." ~Steven Rudich

Optimizing Looks Weird

[Calling it for this November, happy to have gotten a few short posts out.]

For a couple years in my childhood, my mom picked up an obsession that can only be described as extreme couponing. At the time, CVS and Rite Aid offered an enormous variety of discounts and rebates with strange, time-limited conditions and a glaring loophole: most of them stacked upon each other. If you rolled into the store on the date of the correct sale with a hundred dollars worth of coupons and rewards dollars, you could buy out their inventory of certain products without spending a cent, and end up with more ExtraBucks than you’d started with. And so it came to pass that at regular intervals, I’d be called out to the parking lot to help my mom haul in a twenty-year supply of Oral-B toothbrushes or a trunkful of sour cream and onion potato chips. At dinnertime, we’d inevitably be regaled with the story of yet another indignant cashier who called a manager after my mom pulled out her folder of coupons, only to be forced by said manager to apologize to the customer.

Genuinely optimizing looks weird and transgressive.

In StarCraft, there is a trick that every beginner Zerg player learns called the “extractor trick.” The game imposes a cap on the number of soldiers you can build, which is a very severe constraint. The extractor trick lets you surpass this cap by (roughly speaking) manually killing your own soldiers, building new ones, and then resurrecting the dead.

Feynman was famous for – among other things – his method of learning by teaching. I think what he noticed is that much of our collective brainpower is locked inside social cognition, and his method is a way of coopting this inaccessible processing power to learn math and physics. Nowadays, Feynman’s method is common practice: in graduate reading seminars everyone signs up to give a lecture on a topic they know nothing about; in many math classes, the professor learns as they go and stays one week ahead of the class.

Noticing the loopholes in the rules requires curiosity and confidence in ones own faculties, but that’s only half the battle. Many people notice loopholes and exploit them a tiny bit, like walking out of the convenience store with a free bag of chips. The rest of the battle is how you turn this exploit into a method, a career, or a business: it requires the courage to go all in to exploit these loopholes as far as they’ll go.

Aggro is the Foundation

Today I want to review a concept present in many domains, but most clearly articulated by TCG players. In a game like Magic the Gathering or Hearthstone, gameplay divides neatly into two phases: deck-building and execution. Deckbuilding involves all the choices and calculations that go into preparing your custom deck of cards before you even sit down (or log in) to draw your first hand. Execution is playing your deck, and your particular draws, as well as possible in the moment.

Execution is hard, but essentially learnable. Deckbuilding is the truly difficult and creative part: it requires not just extensive game knowledge and creativity, but a deep understanding of the metagame – what decks other players are likely to bring and how to counter them. One core philosophy that I learned from great deckbuilders is the understanding that aggro decks – fast, simple decks that try to kill the opponent as quickly as possible – are the foundation upon which the entire metagame is built. Of all the kinds of decks, building an aggro deck is the least difficult; usually your choices are limited to cheap, efficient early-game cards that end the game as quickly as possible, and there isn’t a huge amount of room to optimize for metagame.

In contrast, other decks (typically classified as “midrange” or “control” decks) need to be built contextually with aggro in mind – the dominant aggro deck in the metagame sets a pace you have to match. If the fastest aggro deck can ends the game in three turns, then you need to have cards you can afford to play in the first three turns. If they play a lot of minions on the board, you need a lot of removal to kill those minions. If they play many damage spells, you need counterspells or healing. In a sense, aggro is the foundation upon which the other layers of the metagame are built in layer upon layer of abstraction.

Many other games have the feature that there are a few pure aggro strategies – which is not necessarily even a good strategy – upon which any deep understanding of the game must be built. In Starcraft and other RTS games, the rush strategies that are possible dictate the pace of the game: if the earliest enemy rush can come at 3 minutes and 30 seconds into the game, then your strategy must make sure to start building defenses at 3 minutes (or at least leave the possibility open conditional on scouting information). In real life geopolitics, war, especially nuclear war, is the foundation: even if no military conflict actually happens, military strength must factor into negotiations at every higher level of consideration. In contrast, the execution of nuclear war is straightforward and unfettered by metagame considerations.

I write all this to suggest that there is also a basic aggro strategy in mathematics research upon which all other metagame considerations must be built, and that strategy is the simplest one of working on a hard technical problem by yourself. You will be told to spend lots of time going to learning seminars and talks, you’ll be told to network and hobnob, and you’ll be told (by me, in the very last post) to attend to meta-considerations and learn how to play a supporting role in research. All of this, however, must rest upon a solid foundation of knowing how to play aggro, how to carry out productive research. Without this foundation, you’ll have no idea what to focus on in a seminar, not a clue what to learn from and ask of other researchers, and only a fuzzy model of what support you can provide to another mathematician trying to carry out their research.

Just as not every Magic player plays aggro decks, not every mathematician needs to do solo research to be effective. But every Magic player needs to know what aggro decks are in play, and every mathematician needs to know how to do solo research. That’s the foundation upon which the entire metagame is built.

Research in Tandem (Part 3)

Today I want to end this discussion about research collaboration with my most useful tip for grad students: build an explicit model of how collaborators work, especially your PhD advisor.

One of your primary goals in graduate school is to set aside 20% of your brain for simulating your advisor, who is typically the best mathematician you are in close contact with. Learn and imitate their reflexes, their tastes, their decision trees. Spend substantial chunks of time during research meetings being curious about minds and modelling how other mathematicians operate. How did they come up with this? What do they know that I don’t? Why did they try this approach first?

Even if this is the only thing you manage to do in grad school, you end up as a low-resolution clone of your advisor – which is not ideal but nevertheless a better-than-average outcome.

Here are seven points of inquiry to jumpstart your quest to model another mathematician.

Research Direction

  1. What problems do they work on?
  2. How do they choose these problems?
  3. How do they weight the important of a problem versus aesthetic interest in it, versus the actual likelihood of actually solving the problem?


  1. Who do they work with most frequently?
  2. What qualities do they praise about their closest collaborators? How is labor usually divided in their collaborations?
  3. By what criteria do they evaluate other mathematicians?

Your relationship

  1. What exactly do they want from you?
  2. Conversely, what exactly do you have to offer them?
  3. Most mathematicians are somewhat motivated by genuine care for young people, but there are pragmatic considerations beyond that. Can you help realize their mathematical vision? Do you carry out humble work that makes their life easier? Are you stimulating and enjoyable to be around?

Patterns of thought

  1. What patterns do you notice in their thinking over time?
  2. What are their common first refrains when working on a problem?
  3. Which pictures, techniques and lemmas do they rely on time and time again to orient themselves?


  1. What are their glaring weaknesses?
  2. From where you’re standing, are these weaknesses gaps that you can fill, or dump stats that you should deprioritize as well?
  3. Do they ever advise you “do as I say, not as I do”? How seriously should you take such advice?


  1. How did they get started in math?
  2. Getting into orbit requires different strategies from staying in space; what did they do at the start of their own career?
  3. What mathematicians did they themselves admire and learn the most from?

Work-life balance

  1. What is their working life like?
  2. How much time do they spend on teaching, traveling, and administrative nonsense?
  3. Would you actually want to work a day in their shoes? If not, what would you adjust to make it ideal for you?

Research in Tandem (Part 2)

Today we continue our discussion about research meetings with a few concrete strategies. When I was a new graduate student, I often had long meetings with my PhD advisor and other professors that went completely over my head. Such meetings are extremely demanding: they require a broad base of shared knowledge, they involve carrying out complex calculations and spatial manipulations entirely through verbal communication, and they proceed at a meandering conversational pace that often jumps back and forth between many different approaches and perspectives.

The meta-heuristic underlying all of the following tips is: learn to play a supporting role. Every mathematician wants to be the genius who single-handedly carries the team to the finish line with insight after insight. In contrast, nobody tries to playing support. Therein lies an enormous well of untapped potential for you to contribute directly to mathematical inquiry without having the faintest clue what’s going on.

Take notes

Never be afraid to interrupt the flow of conversation to walking up to the blackboard (or pulling out paper or laptop) to draw pictures and note down what’s being said. Just copying down what others are saying might not seem like much contribution, but you’ll soon learn the many benefits this practice has.

You help catch mistakes and ambiguities that were skated by in conversation. Taking notes improves your long-term memory and learning. Having visible log for the history of the meeting frees up precious working memory for you and your collaborators to forge rapidly ahead. Writing things down forces you to develop evocative notation, useful pictures, and modular lemma statements that compress amorphous heuristics into concrete, versatile building blocks. As collaborations extend over weeks, months, and years, everyone will be thanking you later for keeping notes, however half-assed they may be.

If you have nothing to contribute, the first thing you can do is take notes.

Toss Bricks

The Thirty-Six Stratagems are a compilation of aphorisms for war and politics deeply engrained into Chinese culture, of comparable influence to the Art of War. One of my favorites is 拋磚引玉, which roughly translates as “toss out a brick to lure out the jade.”

If you’re stuck – and you often will be – instead of silently waiting for others to present good ideas, present your own bad ideas. Throw this brick out as a way of baiting insights – which are the jade in this analogy – out from your peers. It’s common knowledge that the fastest way to get a question answered on the internet is to post a wrong answer. The same heuristic applies to research: if a conversation stalls, throwing out a brick. Your collaborators will rush to point out all the reasons your approach is wrong and naïve, and how to improve it. Before you know it, beautiful pieces of jade will have appeared in its place.

There is an art to tossing the right bricks. I don’t suggest yelling out “Let’s try category theory” at every turn when there’s no connection whatsoever to the current problem. Best practice for throwing bricks is akin to semi-bluffing in poker: a brick is a hand that is currently useless, but there’s still a chance it might work out on the river. You probably have bad ideas and fuzzy intuitions you’re embarrassed to share that seem very slightly relevant to the problem. Just lower your filters and babble them out.

I can’t count the number of times I’ve opened my mouth to spew out a nonsense thought that didn’t even make syntactical – let alone logical – sense, only for one of my brilliant collaborators to charitably error-correct said sentence into a useful insight. “Ah yes, of course, that’s exactly what I meant,” is usually how I continue this conversation, “But just to be pedantic, could you explain that in more detail?”

If you’re not courageous enough to present a brick as if it’s a genuine insight, preface it with a disclaimer: “So here’s an idea that definitely doesn’t work, but I’d like to figure out why.”

To be continued…

Research in Tandem (Part 1)

I once heard the following story about Szemerédi: His daughter was in elementary school at the time, and her teacher asked everyone to share a bit about their parents’ occupations. The kids went around the room and each said a little story about what their parents did for a living: “My mommy is a doctor, she pulls teeth.” That sort of thing. When it came time at last for Szemerédi’s daughter to share, what she said shocked and worried the teacher: “My daddy just lies in bed and stares at the ceiling all day.”

There’s a stereotype that mathematicians spend all their productive time holed away from the world like this, staring at blank surfaces while intricate equations play out in their minds’ eyes. To the contrary, I find that a substantial fraction of my mathematical progress these days occurs during research meetings, which ideally go like this.

Two to four people sit in a room or Zoom call together, for a meeting slated to last an hour or two. We set our sights on a problem of common interest, and then start bouncing half-formed ideas off each other.

At times, we hit upon a tool or keyword that seems useful, and there’s a flurry of activity as everyone digs through their memories and Google Scholar for relevant literature. If this goes well, we find a relevant paper – invariably a paper about expander graphs – and the meeting devolves into a puzzle hunt where we collectively attempt to decipher the beautiful mathematics painstakingly hidden away in said paper. Eventually, we discover that fifty years ago a physicist solved a special case of our problem, formulated in an entirely different language, and the paper trail ends there.

Other times, one of us lets out a sigh and admits defeat, “This equation is way too difficult to solve, can we at least solve the toy problem where all of the functions are just constants?” This humble simplification draws a gasp of disbelief from the others, “That should be trivial, just apply lemma so-and-so and decomposition thus-and-thus.” We proceed to bully the most junior member of the team – typically a graduate student – into calculating decomposition thus-and-thus live on the blackboard. The idiosyncrasies of the problem turn out to be more intricate than we’d expected at first blush, and the nested summations soon get out of hand. It takes a good half hour before we give up on executing the calculation rigorously, all the time nodding to each other more convinced than ever, “Yes, the decomposition definitely should work. Although it looks a bit messy around here it really has to come down to iterating the Cauchy-Schwarz inequality.”

Finally, as we all agree to go to lunch and promise to check the calculation independently – trivial though it must certainly be – the guy in the corner who’s been silent all meeting finally pipes up, “I think I solved the original problem!” He goes up to the board, erases all the nested sums, and proves the theorem in two lines. Seeing the awe in our faces, he tries to comfort us sheepishly, “Well, I only got the key idea from watching the calculations you all were doing. When you wrote the letter `s’ in that curvy way, that’s what gave me the idea of using the Gauss integral.”

Next time, I’ll write about strategies for keeping research meetings productive…

The Fundamental Growth Curve (Part 3)

Last time we introduced the basic model of growth, which looks like this:

The thesis is that noticeable growth is typically punctuated by a long intermediate period of low return on investment, which we call “the Wall.” In the remaining posts on this topic, I plan to cover (a) the common failure modes that arise due to the existence of the Wall, and (b) prescriptions for how to minimize or completely skip over the dreaded wall.

Failure Modes

Skill and complexity creep

In the year that League of Legends launched, you could become a top player in three months of unstructured training by focusing on one hero and drilling mechanics. The level of play is always this low at the beginning of things. Nowadays, it takes years of dedicated practice and encyclopedic game knowledge to reach the same relative status in the game.

Creep is an ever-present threat to the health of every community of skill: consider the research field where the low-hanging fruit has been picked barren, the video game where the barrier-to-entry is a hundred hero by hundred hero table of matchup knowledge, or the industry where the pool of interview questions grows ever more esoteric and adversarial. Left unchecked, the wall grows higher and higher, until new blood stops bridging the gap altogether and the entire community dies out.

Picking the wrong-sized pond

As a child, my parents told me the typical Asian advice that I should befriend older, smarter kids from whom I had much to learn. Thankfully, I mostly ignored this advice. I’ve seen friends try to follow this strategy, usually to their detriment.

The farther down you start in the status hierarchy, the further away those sweet positional gains become. And imagine, god forbid, that you really imbibe this backwards advice and continuously try to jump up hierarchies to where you don’t belong. You’ll spend your whole life being the odd one out, the real impostor, the weakest link, the lowest-status grunt who is passed up for every opportunity and promotion.

Conversely, it is also possible to be too big of a fish in too small a pond. They say that if you’re always the smartest person in the room, you’re in the wrong room. There’s a different kind of stagnation that happens when you reach the peak of your local hierarchy and don’t search for greener pastures.

Bringing down the Wall

Artificial divisions

It is common knowledge that only children can become chess grandmasters. Neuroplasticity and ability to learn probably plays a role, but another important factor is that there exist long and delicate pipelines of positional gains for bootstrapping children through the Wall. For example, in tournaments and classes, young children are carefully subdivided into two-year age brackets and locality; this artificial partitioning of the population allows for that many more first place trophies to win and local mini-ladders for kids to climb. A seven-year old prodigy can start winning games in the county at the under-8 level with only a bit of talent and study, then the state level, then the next age bracket, and so on. The positional gains are thus paid to her in installments that keep her coming back. Adults who want to learn chess have no such luck.

Systems for training difficult skills can be optimized by placing people in granular divisions with comparable peers. Conversely, as an individual, one should judiciously hop between ponds to find places where fruitful positional gains are within reach. As a rule of thumb, the sweet spot seems to be rooms where you’re around the 75th percentile.


One way difficult disciplines can prosper is by subdividing in a different way, specializing into mutualistic subdisciplines. A software engineering team might be a hostile, zero-sum competitive environment if everyone is trying to be the best at everything. But suppose the team members each leverage their unique strengths, and you end up with a one expert in frontend, one in backend, one who knows how to speak the voodoo language of customers and product managers, and that one machine learning guy. Suddenly everyone has the benefits of high status in their respective domain and access to mentors who can help patch up their weaknesses.

One problem math academia faces today, and part of why the wall called graduate school is so difficult to get over, is the insufficient specialization of labor. Sure, we specialize in subject matter, but whether you’re an algebraic topologist or a knot theorist, you still have to excel at research proper, paper-writing, mentorship, public speaking, etc. etc. Effectively, all these sub-dimensions of competence are projected onto a single massive meta-ladder that is impossibly tall (man am I mixing metaphors today) for the novice to climb.

The Fundamental Growth Curve (Part 2)

[Followup to the previous post. I will likely collect and reorganize these shortforms once the month is over.]

The narrative from the previous post outlines my basic model for how growth works. For a typical skill, be it running cross country or solving Rubik’s cubes or engaging in mathematical research, return on investment follows a curve like this:

In the first stage, growth is fast and cheap. The core fundamentals are the easiest to pick up and best documented. Adepts willing and able to mentor novices are plenty. As the weakest one of the group, you’re no threat to anybody’s status (excepting, perhaps, the second weakest one, but they’re in no position to do anything to you).

This is the regime where the 80/20 principle holds: 80% of the absolute value in the activity is picked up with only 20% of the effort, applied with discernment. Most members in any community are novices somewhere in this first stage, enjoying the immediate gratification of visible gains while lacking the stomach for serious investment.

The gains here are absolute in nature. You might not garner any attention for picking up jogging, but your sleep apnea clears up. You might not win any money playing a game, but you become competent enough to enjoy it and be able to appreciate professional play. You might not prove any new theorems taking undergraduate math classes, but you finally understand how to calculate expected value and stop getting duped by slot-machines.

In the second stage, there is a long plateau, a “wall” that most people hit when the newbie gains fall off and serious commitment or exceptional talent seem necessary to progress.

Past the fundamentals, there exist conflicting and ambiguous schools of thought on how to progress to experthood. The athlete is told to follow three different diets by three different nutritionists. Long debates are held about which chapters of Hartshorne are mandatory to truly learn algebraic geometry. The single-digit-kyu Go player is told to focus on opening, on fighting, on life-and-death problems, on endgame, and each is apparently vitally important and the one true path to greatness.

Months of physical time are spent to see small improvements that have no practical impact on your life: whether you run a 9 minute mile or a 6 minute mile, the only thing you’ll win is a participation ribbon. You blog for years only to see your readership jump from dozens to hundreds. You practice your craft doggedly and jump from the worst surgeon in the hospital to the second-best, but when it comes time for their heart transplants, the billionaires still pass you by to wait in line for that star surgeon.

In the third stage, absolute improvements get even slower, but you finally hit a level of mastery where positional, or relative, gains kick in. And my, do they kick in fast.

You become one of the top players in your cohort, and people start to notice you. Coaches give you special treatment, you win minor awards, get sent to training camps, participate in more rarified cohorts.

You have a renewed and enormous motivation to improve: every tiny absolute improvement could move you up one giant discrete ranking. Jumping from the 11th best-selling author to the 10th doubles your book sales. Shaving a couple seconds off your personal best means a new state record and a full ride to college. Writing one additional paper in graduate school edges out the next candidate to land you the fancy fellowship that keeps your academia dreams alive.

And look, the divide between the second and third stages is not just a shallow artifact of human status regulation. It’s an essential feature of the collaborative optimization problems we face all the time. As long as people work in teams and specialize according to their relative advantage, an individual hardly contributes in a given dimension unless they are the best or nearly so.

Look at it this way: if eight college friends are taking a road trip through gravelly mountain roads, the best driver is going to drive the van most of the time, and the second best driver might help pick up some slack. How well these two drive is dreadfully important, but as for the other six – it makes not a wit of difference if they even have licenses. Everyone in that van is incentivized to pour their resources into helping that best driver become even more skilled.

To be continued…

The Fundamental Growth Curve (Part 1)

In honor of NaNoWriMo, I’m joining a friend to write low-effort shortform blog posts every day this month.

The first topic I’ll write about is what growth feels like, and where people hit walls, plateau, and stagnate. Learn to optimize around the plateau, and you’ll be unstoppable.

A Parable

You have no hand-eye coordination to speak of, but your parents want you to touch grass, so you sign up for cross country. In the beginning, you’re the slowest of the pack, but you improve rapidly. You train nearly every day and your mile time drops from nine minutes to six-twenty in three months. The numbers go down. Walking up the stairs doesn’t wind you anymore. Finally the seniors are waiting for someone else to catch up. Across the whole team, you may only rise from the dead last to the 20th percentile, but to you it feels like winning the Olympics.

Then you hit a brick wall.

You need to shave off a minute twenty to make varsity, and every last second gives you a fight for its life. You slog through nearly two years of training. Seven hundred and thirty days drowning in an endless barrage of shin splints and intrusive thoughts. The fact that you rise from the 20th percentile to the 80th means fuck-all; every freshman who speeds past you to snag a trophy sends you spiraling back into that pit of self-doubt.

And then, by virtue of some minor miracle, you finally push through that wall. You paid your dues. The best runner graduates. Puberty finally kicks in. You have a good night’s rest before tryouts. Whatever the reason, you make varsity.

Coach knows your name – finally! – and starts giving you individual attention. Every second you gain is a triumph – you can see yourself climbing those rankings every single meet. Out of the blue, the gossipy neighbor asks your parents for advice on preparing her middle-school daughter for high-school sports, and mom finally starts taking this running thing seriously. They rearrange their schedules around your training, and cut you some slack on the academic side. Everything is coming together.

You spent three years pumping your exhausted legs through final laps listening to crowds cheering for someone else. Finally, one day, you cross the finish line and you know you won the race. You know because the crowd has just started cheering, and the one they’re cheering for is you.

To be continued…

Where do your eyes go?

I. Prelude

When my wife first started playing the wonderful action roguelike Hades, she got stuck in Asphodel. Most Hades levels involve dodging arrows, lobbed bombs, melee monsters, and spike traps all whilst hacking and slashing as quickly as possible, but Asphodel adds an extra twist: in this particular charming suburb of the Greek underworld, you need to handle all of the above whilst also trying not to step in lava. Most of the islands in Asphodel are narrower than your dash is far, so it’s hard not to dash straight off solid ground into piping-hot doom.

I gave my wife some pointers about upgrade choices (*cough* Athena dash *cough*) and enemy attack patterns, but most of my advice was marginally helpful at best. She probably died in lava another half-dozen times. One quick trick, however, had an instant and visible effect.

“Stare at yourself.”

Watch your step.

By watching my wife play, I came to realize that she was making one fundamental mistake: her eyes were in the wrong place. Instead of watching her own character Zagreus, she spent most of her time staring at the enemies and trying to react to their movements and attacks.

Hades is almost a bullet hell game: avoiding damage is the name of the game. Eighty percent of the time your eyes need to be honed on Zagreus’s toned protagonist butt to make sure he dodges precisely away from, out of, or straight through enemy attacks. In the meantime, most of Zagreus’s own attacks hit large areas, so tracking enemies with peripheral vision is enough to aim your attacks in the right general direction. Once my wife learned to fix her eyes on Zagreus, she made it through Asphodel in only a few attempts.

This is a post about the general skill of focusing your eyes, and your attention, to the right place. Instead of the standard questions “How do you make good decisions based on what you see?” and “How do you get better at executing those decisions?”, this post focuses on a question further upstream: “Where should your eyes be placed to receive the right information in the first place?”

In Part II, I describe five archetypal video games, which are distinguished in my memory by the different answers to “Where do your eyes go?” I learned from each of them. I derive five general lessons about attention-paying. Part II can be safely skipped by those allergic to video games.

In Part III, I apply these lessons to three specific minigames that folks struggle with in graduate school: research meetings, seminar talks, and paper-reading. In all three cases, there can be an overwhelming amount of information to attend to, and the name of the game is to focus your eyes properly to perceive the most valuable subset.

II. Lessons from Video Games

Me or You?

Hades and Dark Souls are similar games in many respects. Both live in the same general genre of action RPGs, both share the core gameplay loop “kill, die, learn, repeat,” and both are widely acknowledged to be among the best games of all time. Their visible differences are mostly aesthetic: for example, Hades’ storytelling is more lighthearted, Dark Souls’ more nonexistent.

But there is one striking difference between my experiences of these two games: in Hades I stared at myself, and in Dark Souls I stared at the enemy. Why?

One answer is obvious: in Dark Souls, the camera follows you around over your shoulder, so you’re forced to stare at the enemies, while in Hades the isometric camera is centered on your own character. This is good game design because the camera itself gently suggests the right place for your eyes to focus, but it doesn’t really explain why that place is right.

The more interesting answer is that your eyes go where you need the most precise information.

In both games, gameplay centers around reacting to information to avoid enemy attacks, but what precisely you need to react to is completely different. Briefly, you need spatial precision in Hades, and temporal precision in Dark Souls. 

In Hades, an enemy winds up and lobs a big sparkly bomb. The game marks where it’ll land three seconds later as a big red circle. You don’t need to know precisely when the bomb was lobbed and by whom – getting out of the red circle one second early is fine. But you do need to see precisely where it’ll land so you can dash out of the blast zone correctly. When there’s dozens of bombs and projectiles flying across the screen, there might be only a tiny patch of safe ground for you to dash to, and being off by an inch in any direction spells disaster. So you center your vision on yourself and the ground around you, to get the highest level of spatial precision about incoming attacks.

In Dark Souls, a boss winds up and launches a three-hit combo: left swipe, right swipe, pause, lunge. As long as you know precisely when it’s coming, where you’re standing doesn’t matter all that much – the boss’s ultra greatsword hits such a huge area that you won’t be able to dash away in time regardless. Instead, the way to avoid damage is to press the roll button in the right three 0.2-second intervals and enjoy those sweet invincibility frames. The really fun part, though? The boss actually has five different attack patterns, and whether he’s doing this particular one depends on the direction his knees move in the wind-up animation. So you better be staring at the enemy in Dark Souls to react at precisely the right time.

Human eyes have a limited amount of high-precision central vision, so make it count. Don’t spend it where peripheral vision would do just as well.

Present or Future?

Rhythm games have been popular for a long time, so you’ve probably played one of the greats: Guitar Hero, Beat Saber, Osu!, piano. Let’s take Osu! as a prototypical example. The core gameplay is simple: circles appear on the screen to the beat of a song, and to earn the most points, you click them accurately and at the right rhythm. Harder beatmaps have smaller circles that are further apart and more numerous; a one-star beatmap might have you clicking every other second along gentle arcs, while a five-star map for the same song forces your cursor to fly back and forth across the screen eight times a second.

There’s one key indicator that I’m mastering a piece in a rhythm game: my eyes are looking farther ahead. When learning a piano piece for the first time, I start off just staring at and trying to hit the immediate next note. But as I get better at the piece, instead of looking at the very next note I have to play, I can look two or three notes ahead, or even prepare for an upcoming difficulty halfway down the page. My fingers lag way behind the part I’m thinking about.

Exercise: head on over to https://play.typeracer.com/ and play a few races, paying attention to how far ahead you can read compared to what you’re currently typing. I predict that with more practice, you’ll read further and further ahead of your fingers, and your typing will be smoother for it. It’s a quasi-dissociative experience to watch yourself queue up commands for your own body two seconds in advance.

Act on Information

As a weak Starcraft player (and I remain a weak Starcraft player), I went into every game with the same simple plan. Every game, I’d build up my economy to a certain size, and then switch over to producing military units. When my army hit the supply cap, I’d send it flooding towards the enemy base.

At some point, I heard that “Scouting is Good,” so at the beginning of each match I’d waste precious resources and mental energy sending workers to scout out what my enemy was doing. Unfortunately, acquiring information was as far as my understanding of scouting extended. Regardless of what I saw at the enemy base, I’d continue following my one cookie-cutter build order. At very best, if I saw a particularly dangerous army coming my way, I’d react by executing that build order extra urgently. This amounted to the veins in my forehead popping a bit more while nothing tangible changed about my gameplay.

To place your eyes in the right place is to gather the right information, and the point of information-gathering is to improve decision-making. Conversely, the best way to improve at information-gathering is to act on information. If you don’t act on information, not only do you not benefit from gathering it, you do not learn to gather it better. If I went back to learn scouting in Starcraft again, I’d start by building a flowchart of what choices I’d change depending on the information I received.

Filter Aggressively

I was introduced to Dota 2 and spent about four hundred hours on it in the summer of 2020 (of course, this makes me an absolute beginner, so take this part with a healthy pinch of salt). Dota is overwhelming because of the colossal amount of information and systems presented to you – hundreds of heroes and abilities, hundreds of items and item interactions, and multitudes of counterintuitive but fascinating mechanics that must have gone through the “it’s not a bug, it’s a feature” pipeline.

To play Dota is to be constantly buffeted by information. I watch the health bars of the enemy minions to last hit them properly, or I won’t make money. I watch my own minions to deny them from my opponent. I track my gold to buy the items I need as soon as I can afford them. I pay attention to the minimap to make sure nobody from the enemy team is coming around to gank. I watch my team’s health and mana bars, and the enemy team’s, to look for a weak link or opportunity to heal. I can click on the enemy heroes to look at their inventories, figure out who is strong and who is weak, and react accordingly. And this might all be extraneous information: maybe the only important information on the screen is the timer at the top of the screen which says the game started 1 minute and 50 seconds ago.

The clock might be the most important information on this screen.

To understand why the game timer might be the most decision-relevant information out of all of the above, you have to understand a particularly twisted game mechanic in Dota, the jungle monster respawn system. You see, the jungle monsters in Dota each spawn in a rectangular yellow box that you can make visible by holding ALT. The game is coded such that, every minute, a monster respawns if its spawn box is empty. You read that right – the monsters don’t have to die to respawn, they just have to leave the box. Exploiting this respawn mechanic to make many copies of the same monster is called “stacking,” and is a key job for support players: if you attack the jungle monsters about 7 seconds before the minute, they’ll chase you just far enough that a duplicate copy of the monster spawns. This means that near the beginning of the game, a good support player can “stack” two, three, or even four copies of a jungle monster for his teammates to kill later, even if nobody on the entire team is strong enough to fight a single one directly. Fifteen minutes later, leveled up teammates can come back and kill the entire stack for massive amounts of gold and experience.

Stacking is further complicated by an endless litany of factors, but the most interesting is probably this: the enemy can easily disrupt your stacks. The game code only checks if the yellow spawn box is empty, not what’s inside the box. A discerning opponent can foil your whole stacking game-plan just by walking into the appropriate box at the 1:00 mark and standing there for a second. More deviously yet, he might buy an invisible item at the beginning of the game to drop in the box before you even reach the area.

Anyhow, some support players have a job to do every minute, which is either stacking or a closely related thing called “pulling.” When they’re gone doing this job, this leaves the hero they’re supporting vulnerable to a two-on-one. This is where the game timer comes in: the enemy support leaving the lane might be my best opportunity to be aggressive and land an early kill. And so, out of all the fancy information on the screen, I need to be checking the game timer frequently. Some early game strategies hinge on correctly launching all-out attacks at 1:50, and not, say, 1:30.

Treat the world as if it’s out to razzle dazzle you, and your job is to get the sequins out of your eyes. Filter aggressively for the decision-relevant information, which may not be obvious at all. 

Looking Outside the Game

There is a certain class of video games that are difficult, if not impossible, to play without a wiki or spreadsheet open on a second monitor. This can be due to poor game design, but just as often it’s the way the game is meant to be played for good reason, and it’s the mark of an inflexible mind to refuse to look outside the game when this is necessary.

Consider Kerbal Space Program. You can learn the basics by playing through the tutorials, and can have plenty of fun just exploring the systems. But unless you’re a literal rocket scientist you’ll miss many of the deep secrets to be learned through this game. There’s no way you’ll come up with the optimal gravity turn trajectory yourself. If you don’t do a little Googling or a lot of trial-and-error, your attempts at aerobraking will probably devolve into unintentional lithobraking. You’ll have a nightmarish time building a spaceplane without knowing the correct relationship between center of mass, center of lift, and center of thrust, and it’s highly unlikely you’ll figure out that a rocket is a machine that imparts a constant amount of momentum instead of a constant amount of energy, or that you can exploit this fact by accelerating at periapsis. And god forbid you try to eyeball the perfect transfer window in the next in-game decade to make four sequential gravity assists like Voyager II.

Some games are meant to be played on multiple monitors.

The right place to look can be outside the game entirely. Whether it’s looking up guides or wikis, plugging in numbers to a spreadsheet to calculate optimal item builds, or using online tools to find the best transfer windows between planets, these can be the right place to put your eyes instead of on the game window itself.

III. Applications to Research

In this last part of the post, I apply the principles above to three core minigames in academic mathematics: talks, papers, and meetings. For each of these minigames, we’ll try to figure out the best places for our eyes to go, informed by the following questions.

  1. Should I focus on myself or the other person? 
  2. How far into the future should I be looking?
  3. How can I act on the information I receive?
  4. Out of all the information being thrown at me, what is decision-relevant?
  5. Might the best way to get better at the game be outside the game itself?


When giving a talk, self-consciousness is akin to keeping your eyes on yourself. Telling yourself not to be self-conscious is about as useful as trying not to think about the polar bear; negative effort rarely helps. Move your eyes elsewhere: to the future and to other people. Rehearse your presentation and anticipate the most difficult parts to explain. Pay attention to your audience and actually look at them. See if you can figure out who is engaged and who is daydreaming. Find one or two audience members with lively facial expressions, study them, and act on that information – their furrowed brows will tell you whether you’re going too fast.

When listening to a talk, realize that there will typically be more information than any one audience member can digest. Sometimes this is the fault of the speaker, but just as often, this information overload is by design and functions similarly to price segmentation. Noga Alon recently joked to me, “Going to a talk is difficult for everyone because nobody understands the whole thing, but it’s especially difficult for undergraduates because they still expect to.” Information at a variety of levels of abstraction are presented in the same talk, so that audience members with widely varying backgrounds can all get something from it. An undergraduate student might understand only the opening slide, and a graduate student the first ten minutes, while that one inquisitive faculty member will be the only person who understands the cryptic throwaway remarks about connections to class field theory at the very end. Filter aggressively for the parts of the talk aimed at you in particular. 

Remember that the topic of the talk itself is rarely as interesting as the background material mentioned in passing in the first ten minutes. These classics – the core theorems and examples mentioned time and time again, the simplest proof techniques that appear over and over – are the real gems if you don’t know them already. Sometimes you can learn a whole new subfield by sitting in on a number of talks in the area and only listening to the first ten minutes of each. Bring something discreet to keep yourself occupied for the other fifty.

Remember also: information you never act on is useless information. A couple years back, I was taking a nap in a computer science lecture about a variation on an Important Old Theorem. As I was nodding off, I noticed to my surprise that my adviser, sitting nearby, was quite engaged with the talk. I was very curious what caught his attention, and he enlightened me on our walk back to the math department: instead of listening to the talk, he’d spent most of the hour looking for a better proof of the Important Old Theorem introduced in the first five minutes. From this, I learned that the most important information in a talk might be an unsolved problem, because it is certainly the easiest information to act on.

My PhD adviser after a seminar talk.

This conversation with my adviser had a great effect on me, and every so often I practice this perspective by going to a talk for the sole purpose of hearing new problems. As soon as I hear an interesting problem, I zone out and try to solve it immediately. Anecdotally, it worked a couple times.


Most of this section was already covered in Of Math and Memory (part I, part II, part III), but I’ll reiterate here the relevant bits. Mathematical proofs are rarely meant to be written or read linearly. Instead, they are ideally arranged as a collection of outlines of increasing detail: a five-word title, a paragraph-long abstract, two pages of introduction, a four page technical outline, and only then the complete 20-page proof. Each outline is higher-resolution than the next, giving readers the chance to pick the level of understanding that suits their needs.

This organization is meant to solve one basic difficulty: it’s very hard to follow a proof without knowing where it’s going. Without reading the proof outline, you can’t tell which of the ten lemmas are boilerplate and which are critical innovations. Without running through a calculation at a high level, there’s no way to know which of alpha, n, epsilon, x, and y are important to track, and which are throwaway error terms. Reading through a paper line-by-line without knowing where you’re going, it’s easy to get lost in the weeds and dragged down endless rabbit holes – black boxes from previous papers to unpack, open problems to mull over, lightly explained computations which might contain typos – and while these rabbit holes might be worth exploring, you would do well to map them all out before picking one to dive into. 

When reading a paper, orient your eyes towards the future whenever possible, like reading several words ahead in a game of TypeRacer. Scan the paper at a high level to understand the big picture, then read all the theorem and lemma statements to see how they fit together, and only then decide which weeds to get into. Only check a difficult computation once you already know what its payoff will be.


In research, the vast majority of your time is spent in one of two ways: bashing your head against a wall alone, or bashing your head against a wall with company. These two activities can be more or less traded freely for each other to suit your level of introversion, and I’ve found that I usually prefer meeting with others to work on research together over working alone.

One of the pitfalls of working with others, especially when you are young and underconfident, is that you can naturally slide into the role Richard Hamming calls a “sound absorber”:

For myself I find it desirable to talk to other people; but a session of brainstorming is seldom worthwhile. I do go in to strictly talk to somebody and say, “Look, I think there has to be something here. Here’s what I think I see …” and then begin talking back and forth. But you want to pick capable people. To use another analogy, you know the idea called the `critical mass.’ If you have enough stuff you have critical mass. There is also the idea I used to call `sound absorbers’. When you get too many sound absorbers, you give out an idea and they merely say, “Yes, yes, yes.” What you want to do is get that critical mass in action; “Yes, that reminds me of so and so,” or, “Have you thought about that or this?” When you talk to other people, you want to get rid of those sound absorbers who are nice people but merely say, “Oh yes,” and to find those who will stimulate you right back. 

~ Richard Hamming, You and Your Research

On top of underconfidence, I suspect that the chief mistake “sound absorbers” make is that they have the wrong idea about where their eyes should be in a research meeting. I think a “sound absorber” is completely fixated on personally solving the problem. Not having generated interesting ideas for solving the problem, they contribute nothing at all. Again, this mistake is akin to being too self-conscious, and keeping your eyes on yourself when there is no useful information to be had there.

Personally solving the problem is certainly a great outcome for a research meeting, but it’s by no means the only goal. First of all, there’s a world of difference between personally solving the problem and getting the problem solved. If your collaborators are any good, they are just as likely to come up with the next crucial idea as you are, so truly optimizing for getting the problem solved involves spending a substantial fraction of your time supporting the thought processes of others. Repeat their thoughts back to them, write down and check their calculations, draw a nice picture or analogy for what they’re doing on the blackboard, project your enthusiasm for their insights. You can do all this without generating a single original thought, and still help with getting the problem solved.

Getting the problem solved is a higher value than personally solving the problem, but higher still is the value of improving at problem-solving in general, and this holds doubly if you’re still performing your Gravity Turn. Especially when meeting with your PhD adviser or another senior mentor, focus a substantial minority of your attention on modelling the thought processes of your mentor. Figure out and note down what examples and lemmas they pull out of their toolbox time and time again, what calculations and simplifications they do instinctively, and how they react when stuck on a problem. Learn the particularities of how they perform literature searches, who they ask for help about what, and how they decide if and when to give up. None of these decisions are arbitrary; they form an embodied model of the terrain in your field. Watching the other person can often be a better use of your time than staring blankly at the problem.

In hurried conclusion, research is like a simpler version of Dota: we are bombarded by information on all fronts, most of which we don’t even notice, and tasked to make complicated, heavy-tailed decisions. A fundamental skill in any such game is orienting your eyes – literally and figuratively – at the most valuable and decision-relevant information. Reacting to and executing on this information comes later, but you can never act properly if you don’t even see what you need to do.

Gravity Turn

[The first in a sequence of retrospective essays on my five years in math graduate school.]

My favorite analogy for graduate school is the gravity turn: the maneuver a rocket performs to get from the launch pad to orbit. I like to imagine a first-year graduate student as a Falcon X rocket, newly-constructed and tasked with delivering a six-ton payload into low Earth orbit. 

Picture this: you begin graduate school, fresh as a rocket arriving at Cape Canaveral and bubbling with excitement for your maiden voyage. Your PhD adviser, on the other hand, is the Hubble Space Telescope. Let’s call her Dr. Hubble (not to be confused with the astronomer of the same name). Dr. Hubble is ostensibly the ideal guide for your first orbit insertion. After all, she is famously good at staying in orbit – she’s been up there since 1990. 

But problems quickly arise as you probe Dr. Hubble for advice on how to approach the launch. Namely:

  1. She left Earth more than thirty years ago, and space technology has since been completely revolutionized. 
  2. She states all advice at an extremely high level with birds-eye-view detachment, observing, as she is, from a vantage point a thousand miles overhead. 
  3. Most fatally, the Hubble Space Telescope vessel does not include the lower-stage rockets that brought her into space. In fact, she doesn’t include large engines of any kind. Her thirty years of experience free-falling in orbit will do you very little good until you break out of the stratosphere.

The problem is even worse than this, however. It is not that Dr. Hubble, despite her best intentions, gives outdated advice. It is not even that Dr. Hubble cannot consciously articulate all the illegible skills she’s reflexively performing to stay in orbit. The problem is that even if you could perfectly imitate what Dr. Hubble is doing right now, you would likely still crash and burn.

What I didn’t understand going into graduate school is that academic mathematicians are often working in a state akin to the free-fall of orbit. The Hubble Space Telescope remains in orbit around Earth because it travels horizontally so quickly that, even as it’s continuously accelerating towards the Earth, it continually misses. The laws of physics have arranged it so that it is not possible – barring deliberate sabotage – for her to fall back into a sub-orbital trajectory.

Similarly, a successful research professor is embedded in an intricate system that, as surely as Newton’s laws, keeps her in a state of steadily producing new research. Many of her ground-breaking papers are not one-off productions – they produce sequels, variants, and interdisciplinary applications year after year. She has cultivated dozens of long-time collaborators of the highest level who freely share ideas and research directions, and has the reputation to find more at will. She attends conferences every other month that keep her updated on the leading edge of the field. Every year her research group grows, as if by clockwork, adding a couple graduate students and postdocs to whom she can delegate projects with only the gentlest supervision. As a result, the careers of many other people depend on Dr. Hubble to continue producing research at a steady rate. Every incentive is aligned for objects in motion to stay in motion, and it would take deliberate sabotage to bring Dr. Hubble out of her successful research trajectory.

This is not to say that academic researchers all start cruising in free-fall after they leave graduate school or make tenure. It is perfectly normal for a spaceship that reaches orbit to proceed onto its next adventure after some rest, continuing on to visit another planet or leave the solar system altogether. The best researchers I know are similarly courageous, taking on more responsibilities and pushing past their comfort zones time and time again. I’m merely remarking that once one reaches a certain horizontal velocity in space, it is actively hard to fall back down from the sky.

Contrast this to the sorry state of Dr. Hubble’s new graduate student stranded on the launchpad under the blistering Florida sun. He has no prior publications producing continuous dividends, no access to brilliant and dependable collaborators, no knowledge or intuition about what problems are within reach, no students to farm ideas out to, and no reputation to trade off for any of the above. Above all, nobody else really depends on him, so his motivation to succeed is mainly shallow self-interest. This is particularly hard on him, as there are many things he would do in a heartbeat for someone else that he can’t work up the energy to do for himself. The singular advantage he has over his adviser is youth – a finite amount of extra fuel that he must burn quickly and judiciously like a first-stage booster rocket in order to reach her altitude.

There is a paradox inherent to orbit insertion: rockets launch straight up, while orbit is all horizontal. For some diabolical reason, a spaceship must spend its initial phase accelerating in a direction completely perpendicular to its desired velocity. That reason is called the atmosphere: in order to avoid continuously paying the toll of air resistance, a rocket spends a period of time flying straight up. But any additional vertical motion past the upper atmosphere is wasted motion, so at some point (and sooner is better than later), the rocket starts turning smoothly towards the horizon and accelerating towards orbit. Thus is birthed the smooth quasi-hyperbolic curve known as the gravity turn, the ideal orbit insertion trajectory.

How is graduate school like a gravity turn? For one, it is an enormous error in a gravity turn to try to directly imitate the velocity vector of a ship in space while still at sea level. Regardless of its power, a rocket launched horizontally will quickly nose-dive into the Atlantic. Similarly, a student can rarely succeed in graduate school by solely imitating the activities of established researchers. The student must engage instead in certain activities, such as studying fundamental background material and actively networking, that are mostly orthogonal to a research professor’s day-to-day.

For another, it is an equally enormous error to dip your nose cone towards the horizon too late, and spend too much fuel accelerating vertically. Once you break the atmosphere, all excess vertical velocity is wasted motion. At some point during graduate school, the student must transition away from activities that only grant temporary altitude. Becoming knowledgeable gets you to a great place to start doing research at a higher level. But spending too much time studying without attempting original research renders you a mere encyclopedia. Taking classes, networking, applying for fellowships, and going to student summer schools all follow the same principle – there is an appropriate amount to do, past which they increasingly approach wasted motion as far as getting into orbit is concerned. (Of course, if you enjoy any given activity intrinsically, then by all means continue to do it as much as you want.)

An additional consideration is that, while the gravity turn is the most technically fuel-efficient method of orbit insertion, not everyone who arrived in orbit took this most efficient path. In every department there are superstar students who were outfitted with nuclear reactors in place of conventional rocketry, and these folks get to space by pointing their nose cones in any old direction and blasting off. If you’re such a person, just blast off; calculating the optimum gravity turn curve might be the real wasted motion. Also, many of your professors will likely have fallen in this rarefied category in their own graduate school experience, so their advice on efficient gravity turns will be entirely theoretical in nature.

It is worth remarking though, that even a nuclear rocket might learn something useful from practicing the gravity turn maneuver. Just because you have an easy time leaving Earth’s atmosphere and have no need of finesse, doesn’t mean your travels won’t land you on Venus someday. And breaching that monstrous atmosphere will take every ounce of efficiency you can muster.

A natural question remains: if many graduate school activities only count for temporary vertical altitude, what constitutes horizontal motion that is useful for permanently entering orbit? Examples include:

  1. Producing good research, as every nice paper you write continues to pay dividends year after year. 
  2. Becoming an attractive collaborator, partly by acquiring enough reputation that people are willing to work with you, and partly by being productive and pleasant enough that they stick around. 
  3. Learning to support the research of others, as much of your potential impact lies not in personal contribution, but in the network effects accumulated from being a positive community member.

This last skill begins at the very start of graduate school, where the biggest immediate impact you can likely have is facilitating your adviser’s and other collaborators’ research.

I will close by reminding the reader that the gravity turn maneuver is not a truth delivered from up high that holds for all time across all circumstances, but an engineered solution to an inelegant and ever-varying practical problem. Launching from a moon base, for example, does not require a gravity turn at all because the moon has no atmosphere to fight against. There, you could comfortably reach orbit by blasting off almost horizontally from the lip of a crater. Only you know exactly where you’re launching from and the thrust-to-weight ratio of your vessel. Adjust your gravity turn accordingly.

I hope it is a comforting thought that free-fall is possible: that one day through all the striving of graduate school you may reach a position where the system propels you forward in your research and all you have to do is sit back and relax. I hope that on that day you continue to strive anyway.