"Everything can be made radically elementary." ~Steven Rudich

Category: Shortform

Optimizing Looks Weird

[Calling it for this November, happy to have gotten a few short posts out.]

For a couple years in my childhood, my mom picked up an obsession that can only be described as extreme couponing. At the time, CVS and Rite Aid offered an enormous variety of discounts and rebates with strange, time-limited conditions and a glaring loophole: most of them stacked upon each other. If you rolled into the store on the date of the correct sale with a hundred dollars worth of coupons and rewards dollars, you could buy out their inventory of certain products without spending a cent, and end up with more ExtraBucks than you’d started with. And so it came to pass that at regular intervals, I’d be called out to the parking lot to help my mom haul in a twenty-year supply of Oral-B toothbrushes or a trunkful of sour cream and onion potato chips. At dinnertime, we’d inevitably be regaled with the story of yet another indignant cashier who called a manager after my mom pulled out her folder of coupons, only to be forced by said manager to apologize to the customer.

Genuinely optimizing looks weird and transgressive.

In StarCraft, there is a trick that every beginner Zerg player learns called the “extractor trick.” The game imposes a cap on the number of soldiers you can build, which is a very severe constraint. The extractor trick lets you surpass this cap by (roughly speaking) manually killing your own soldiers, building new ones, and then resurrecting the dead.

Feynman was famous for – among other things – his method of learning by teaching. I think what he noticed is that much of our collective brainpower is locked inside social cognition, and his method is a way of coopting this inaccessible processing power to learn math and physics. Nowadays, Feynman’s method is common practice: in graduate reading seminars everyone signs up to give a lecture on a topic they know nothing about; in many math classes, the professor learns as they go and stays one week ahead of the class.

Noticing the loopholes in the rules requires curiosity and confidence in ones own faculties, but that’s only half the battle. Many people notice loopholes and exploit them a tiny bit, like walking out of the convenience store with a free bag of chips. The rest of the battle is how you turn this exploit into a method, a career, or a business: it requires the courage to go all in to exploit these loopholes as far as they’ll go.

Aggro is the Foundation

Today I want to review a concept present in many domains, but most clearly articulated by TCG players. In a game like Magic the Gathering or Hearthstone, gameplay divides neatly into two phases: deck-building and execution. Deckbuilding involves all the choices and calculations that go into preparing your custom deck of cards before you even sit down (or log in) to draw your first hand. Execution is playing your deck, and your particular draws, as well as possible in the moment.

Execution is hard, but essentially learnable. Deckbuilding is the truly difficult and creative part: it requires not just extensive game knowledge and creativity, but a deep understanding of the metagame – what decks other players are likely to bring and how to counter them. One core philosophy that I learned from great deckbuilders is the understanding that aggro decks – fast, simple decks that try to kill the opponent as quickly as possible – are the foundation upon which the entire metagame is built. Of all the kinds of decks, building an aggro deck is the least difficult; usually your choices are limited to cheap, efficient early-game cards that end the game as quickly as possible, and there isn’t a huge amount of room to optimize for metagame.

In contrast, other decks (typically classified as “midrange” or “control” decks) need to be built contextually with aggro in mind – the dominant aggro deck in the metagame sets a pace you have to match. If the fastest aggro deck can ends the game in three turns, then you need to have cards you can afford to play in the first three turns. If they play a lot of minions on the board, you need a lot of removal to kill those minions. If they play many damage spells, you need counterspells or healing. In a sense, aggro is the foundation upon which the other layers of the metagame are built in layer upon layer of abstraction.

Many other games have the feature that there are a few pure aggro strategies – which is not necessarily even a good strategy – upon which any deep understanding of the game must be built. In Starcraft and other RTS games, the rush strategies that are possible dictate the pace of the game: if the earliest enemy rush can come at 3 minutes and 30 seconds into the game, then your strategy must make sure to start building defenses at 3 minutes (or at least leave the possibility open conditional on scouting information). In real life geopolitics, war, especially nuclear war, is the foundation: even if no military conflict actually happens, military strength must factor into negotiations at every higher level of consideration. In contrast, the execution of nuclear war is straightforward and unfettered by metagame considerations.

I write all this to suggest that there is also a basic aggro strategy in mathematics research upon which all other metagame considerations must be built, and that strategy is the simplest one of working on a hard technical problem by yourself. You will be told to spend lots of time going to learning seminars and talks, you’ll be told to network and hobnob, and you’ll be told (by me, in the very last post) to attend to meta-considerations and learn how to play a supporting role in research. All of this, however, must rest upon a solid foundation of knowing how to play aggro, how to carry out productive research. Without this foundation, you’ll have no idea what to focus on in a seminar, not a clue what to learn from and ask of other researchers, and only a fuzzy model of what support you can provide to another mathematician trying to carry out their research.

Just as not every Magic player plays aggro decks, not every mathematician needs to do solo research to be effective. But every Magic player needs to know what aggro decks are in play, and every mathematician needs to know how to do solo research. That’s the foundation upon which the entire metagame is built.

Research in Tandem (Part 3)

Today I want to end this discussion about research collaboration with my most useful tip for grad students: build an explicit model of how collaborators work, especially your PhD advisor.

One of your primary goals in graduate school is to set aside 20% of your brain for simulating your advisor, who is typically the best mathematician you are in close contact with. Learn and imitate their reflexes, their tastes, their decision trees. Spend substantial chunks of time during research meetings being curious about minds and modelling how other mathematicians operate. How did they come up with this? What do they know that I don’t? Why did they try this approach first?

Even if this is the only thing you manage to do in grad school, you end up as a low-resolution clone of your advisor – which is not ideal but nevertheless a better-than-average outcome.

Here are seven points of inquiry to jumpstart your quest to model another mathematician.

Research Direction

  1. What problems do they work on?
  2. How do they choose these problems?
  3. How do they weight the important of a problem versus aesthetic interest in it, versus the actual likelihood of actually solving the problem?


  1. Who do they work with most frequently?
  2. What qualities do they praise about their closest collaborators? How is labor usually divided in their collaborations?
  3. By what criteria do they evaluate other mathematicians?

Your relationship

  1. What exactly do they want from you?
  2. Conversely, what exactly do you have to offer them?
  3. Most mathematicians are somewhat motivated by genuine care for young people, but there are pragmatic considerations beyond that. Can you help realize their mathematical vision? Do you carry out humble work that makes their life easier? Are you stimulating and enjoyable to be around?

Patterns of thought

  1. What patterns do you notice in their thinking over time?
  2. What are their common first refrains when working on a problem?
  3. Which pictures, techniques and lemmas do they rely on time and time again to orient themselves?


  1. What are their glaring weaknesses?
  2. From where you’re standing, are these weaknesses gaps that you can fill, or dump stats that you should deprioritize as well?
  3. Do they ever advise you “do as I say, not as I do”? How seriously should you take such advice?


  1. How did they get started in math?
  2. Getting into orbit requires different strategies from staying in space; what did they do at the start of their own career?
  3. What mathematicians did they themselves admire and learn the most from?

Work-life balance

  1. What is their working life like?
  2. How much time do they spend on teaching, traveling, and administrative nonsense?
  3. Would you actually want to work a day in their shoes? If not, what would you adjust to make it ideal for you?

Research in Tandem (Part 2)

Today we continue our discussion about research meetings with a few concrete strategies. When I was a new graduate student, I often had long meetings with my PhD advisor and other professors that went completely over my head. Such meetings are extremely demanding: they require a broad base of shared knowledge, they involve carrying out complex calculations and spatial manipulations entirely through verbal communication, and they proceed at a meandering conversational pace that often jumps back and forth between many different approaches and perspectives.

The meta-heuristic underlying all of the following tips is: learn to play a supporting role. Every mathematician wants to be the genius who single-handedly carries the team to the finish line with insight after insight. In contrast, nobody tries to playing support. Therein lies an enormous well of untapped potential for you to contribute directly to mathematical inquiry without having the faintest clue what’s going on.

Take notes

Never be afraid to interrupt the flow of conversation to walking up to the blackboard (or pulling out paper or laptop) to draw pictures and note down what’s being said. Just copying down what others are saying might not seem like much contribution, but you’ll soon learn the many benefits this practice has.

You help catch mistakes and ambiguities that were skated by in conversation. Taking notes improves your long-term memory and learning. Having visible log for the history of the meeting frees up precious working memory for you and your collaborators to forge rapidly ahead. Writing things down forces you to develop evocative notation, useful pictures, and modular lemma statements that compress amorphous heuristics into concrete, versatile building blocks. As collaborations extend over weeks, months, and years, everyone will be thanking you later for keeping notes, however half-assed they may be.

If you have nothing to contribute, the first thing you can do is take notes.

Toss Bricks

The Thirty-Six Stratagems are a compilation of aphorisms for war and politics deeply engrained into Chinese culture, of comparable influence to the Art of War. One of my favorites is 拋磚引玉, which roughly translates as “toss out a brick to lure out the jade.”

If you’re stuck – and you often will be – instead of silently waiting for others to present good ideas, present your own bad ideas. Throw this brick out as a way of baiting insights – which are the jade in this analogy – out from your peers. It’s common knowledge that the fastest way to get a question answered on the internet is to post a wrong answer. The same heuristic applies to research: if a conversation stalls, throwing out a brick. Your collaborators will rush to point out all the reasons your approach is wrong and naïve, and how to improve it. Before you know it, beautiful pieces of jade will have appeared in its place.

There is an art to tossing the right bricks. I don’t suggest yelling out “Let’s try category theory” at every turn when there’s no connection whatsoever to the current problem. Best practice for throwing bricks is akin to semi-bluffing in poker: a brick is a hand that is currently useless, but there’s still a chance it might work out on the river. You probably have bad ideas and fuzzy intuitions you’re embarrassed to share that seem very slightly relevant to the problem. Just lower your filters and babble them out.

I can’t count the number of times I’ve opened my mouth to spew out a nonsense thought that didn’t even make syntactical – let alone logical – sense, only for one of my brilliant collaborators to charitably error-correct said sentence into a useful insight. “Ah yes, of course, that’s exactly what I meant,” is usually how I continue this conversation, “But just to be pedantic, could you explain that in more detail?”

If you’re not courageous enough to present a brick as if it’s a genuine insight, preface it with a disclaimer: “So here’s an idea that definitely doesn’t work, but I’d like to figure out why.”

To be continued…

Research in Tandem (Part 1)

I once heard the following story about Szemerédi: His daughter was in elementary school at the time, and her teacher asked everyone to share a bit about their parents’ occupations. The kids went around the room and each said a little story about what their parents did for a living: “My mommy is a doctor, she pulls teeth.” That sort of thing. When it came time at last for Szemerédi’s daughter to share, what she said shocked and worried the teacher: “My daddy just lies in bed and stares at the ceiling all day.”

There’s a stereotype that mathematicians spend all their productive time holed away from the world like this, staring at blank surfaces while intricate equations play out in their minds’ eyes. To the contrary, I find that a substantial fraction of my mathematical progress these days occurs during research meetings, which ideally go like this.

Two to four people sit in a room or Zoom call together, for a meeting slated to last an hour or two. We set our sights on a problem of common interest, and then start bouncing half-formed ideas off each other.

At times, we hit upon a tool or keyword that seems useful, and there’s a flurry of activity as everyone digs through their memories and Google Scholar for relevant literature. If this goes well, we find a relevant paper – invariably a paper about expander graphs – and the meeting devolves into a puzzle hunt where we collectively attempt to decipher the beautiful mathematics painstakingly hidden away in said paper. Eventually, we discover that fifty years ago a physicist solved a special case of our problem, formulated in an entirely different language, and the paper trail ends there.

Other times, one of us lets out a sigh and admits defeat, “This equation is way too difficult to solve, can we at least solve the toy problem where all of the functions are just constants?” This humble simplification draws a gasp of disbelief from the others, “That should be trivial, just apply lemma so-and-so and decomposition thus-and-thus.” We proceed to bully the most junior member of the team – typically a graduate student – into calculating decomposition thus-and-thus live on the blackboard. The idiosyncrasies of the problem turn out to be more intricate than we’d expected at first blush, and the nested summations soon get out of hand. It takes a good half hour before we give up on executing the calculation rigorously, all the time nodding to each other more convinced than ever, “Yes, the decomposition definitely should work. Although it looks a bit messy around here it really has to come down to iterating the Cauchy-Schwarz inequality.”

Finally, as we all agree to go to lunch and promise to check the calculation independently – trivial though it must certainly be – the guy in the corner who’s been silent all meeting finally pipes up, “I think I solved the original problem!” He goes up to the board, erases all the nested sums, and proves the theorem in two lines. Seeing the awe in our faces, he tries to comfort us sheepishly, “Well, I only got the key idea from watching the calculations you all were doing. When you wrote the letter `s’ in that curvy way, that’s what gave me the idea of using the Gauss integral.”

Next time, I’ll write about strategies for keeping research meetings productive…

The Fundamental Growth Curve (Part 3)

Last time we introduced the basic model of growth, which looks like this:

The thesis is that noticeable growth is typically punctuated by a long intermediate period of low return on investment, which we call “the Wall.” In the remaining posts on this topic, I plan to cover (a) the common failure modes that arise due to the existence of the Wall, and (b) prescriptions for how to minimize or completely skip over the dreaded wall.

Failure Modes

Skill and complexity creep

In the year that League of Legends launched, you could become a top player in three months of unstructured training by focusing on one hero and drilling mechanics. The level of play is always this low at the beginning of things. Nowadays, it takes years of dedicated practice and encyclopedic game knowledge to reach the same relative status in the game.

Creep is an ever-present threat to the health of every community of skill: consider the research field where the low-hanging fruit has been picked barren, the video game where the barrier-to-entry is a hundred hero by hundred hero table of matchup knowledge, or the industry where the pool of interview questions grows ever more esoteric and adversarial. Left unchecked, the wall grows higher and higher, until new blood stops bridging the gap altogether and the entire community dies out.

Picking the wrong-sized pond

As a child, my parents told me the typical Asian advice that I should befriend older, smarter kids from whom I had much to learn. Thankfully, I mostly ignored this advice. I’ve seen friends try to follow this strategy, usually to their detriment.

The farther down you start in the status hierarchy, the further away those sweet positional gains become. And imagine, god forbid, that you really imbibe this backwards advice and continuously try to jump up hierarchies to where you don’t belong. You’ll spend your whole life being the odd one out, the real impostor, the weakest link, the lowest-status grunt who is passed up for every opportunity and promotion.

Conversely, it is also possible to be too big of a fish in too small a pond. They say that if you’re always the smartest person in the room, you’re in the wrong room. There’s a different kind of stagnation that happens when you reach the peak of your local hierarchy and don’t search for greener pastures.

Bringing down the Wall

Artificial divisions

It is common knowledge that only children can become chess grandmasters. Neuroplasticity and ability to learn probably plays a role, but another important factor is that there exist long and delicate pipelines of positional gains for bootstrapping children through the Wall. For example, in tournaments and classes, young children are carefully subdivided into two-year age brackets and locality; this artificial partitioning of the population allows for that many more first place trophies to win and local mini-ladders for kids to climb. A seven-year old prodigy can start winning games in the county at the under-8 level with only a bit of talent and study, then the state level, then the next age bracket, and so on. The positional gains are thus paid to her in installments that keep her coming back. Adults who want to learn chess have no such luck.

Systems for training difficult skills can be optimized by placing people in granular divisions with comparable peers. Conversely, as an individual, one should judiciously hop between ponds to find places where fruitful positional gains are within reach. As a rule of thumb, the sweet spot seems to be rooms where you’re around the 75th percentile.


One way difficult disciplines can prosper is by subdividing in a different way, specializing into mutualistic subdisciplines. A software engineering team might be a hostile, zero-sum competitive environment if everyone is trying to be the best at everything. But suppose the team members each leverage their unique strengths, and you end up with a one expert in frontend, one in backend, one who knows how to speak the voodoo language of customers and product managers, and that one machine learning guy. Suddenly everyone has the benefits of high status in their respective domain and access to mentors who can help patch up their weaknesses.

One problem math academia faces today, and part of why the wall called graduate school is so difficult to get over, is the insufficient specialization of labor. Sure, we specialize in subject matter, but whether you’re an algebraic topologist or a knot theorist, you still have to excel at research proper, paper-writing, mentorship, public speaking, etc. etc. Effectively, all these sub-dimensions of competence are projected onto a single massive meta-ladder that is impossibly tall (man am I mixing metaphors today) for the novice to climb.

The Fundamental Growth Curve (Part 2)

[Followup to the previous post. I will likely collect and reorganize these shortforms once the month is over.]

The narrative from the previous post outlines my basic model for how growth works. For a typical skill, be it running cross country or solving Rubik’s cubes or engaging in mathematical research, return on investment follows a curve like this:

In the first stage, growth is fast and cheap. The core fundamentals are the easiest to pick up and best documented. Adepts willing and able to mentor novices are plenty. As the weakest one of the group, you’re no threat to anybody’s status (excepting, perhaps, the second weakest one, but they’re in no position to do anything to you).

This is the regime where the 80/20 principle holds: 80% of the absolute value in the activity is picked up with only 20% of the effort, applied with discernment. Most members in any community are novices somewhere in this first stage, enjoying the immediate gratification of visible gains while lacking the stomach for serious investment.

The gains here are absolute in nature. You might not garner any attention for picking up jogging, but your sleep apnea clears up. You might not win any money playing a game, but you become competent enough to enjoy it and be able to appreciate professional play. You might not prove any new theorems taking undergraduate math classes, but you finally understand how to calculate expected value and stop getting duped by slot-machines.

In the second stage, there is a long plateau, a “wall” that most people hit when the newbie gains fall off and serious commitment or exceptional talent seem necessary to progress.

Past the fundamentals, there exist conflicting and ambiguous schools of thought on how to progress to experthood. The athlete is told to follow three different diets by three different nutritionists. Long debates are held about which chapters of Hartshorne are mandatory to truly learn algebraic geometry. The single-digit-kyu Go player is told to focus on opening, on fighting, on life-and-death problems, on endgame, and each is apparently vitally important and the one true path to greatness.

Months of physical time are spent to see small improvements that have no practical impact on your life: whether you run a 9 minute mile or a 6 minute mile, the only thing you’ll win is a participation ribbon. You blog for years only to see your readership jump from dozens to hundreds. You practice your craft doggedly and jump from the worst surgeon in the hospital to the second-best, but when it comes time for their heart transplants, the billionaires still pass you by to wait in line for that star surgeon.

In the third stage, absolute improvements get even slower, but you finally hit a level of mastery where positional, or relative, gains kick in. And my, do they kick in fast.

You become one of the top players in your cohort, and people start to notice you. Coaches give you special treatment, you win minor awards, get sent to training camps, participate in more rarified cohorts.

You have a renewed and enormous motivation to improve: every tiny absolute improvement could move you up one giant discrete ranking. Jumping from the 11th best-selling author to the 10th doubles your book sales. Shaving a couple seconds off your personal best means a new state record and a full ride to college. Writing one additional paper in graduate school edges out the next candidate to land you the fancy fellowship that keeps your academia dreams alive.

And look, the divide between the second and third stages is not just a shallow artifact of human status regulation. It’s an essential feature of the collaborative optimization problems we face all the time. As long as people work in teams and specialize according to their relative advantage, an individual hardly contributes in a given dimension unless they are the best or nearly so.

Look at it this way: if eight college friends are taking a road trip through gravelly mountain roads, the best driver is going to drive the van most of the time, and the second best driver might help pick up some slack. How well these two drive is dreadfully important, but as for the other six – it makes not a wit of difference if they even have licenses. Everyone in that van is incentivized to pour their resources into helping that best driver become even more skilled.

To be continued…

The Fundamental Growth Curve (Part 1)

In honor of NaNoWriMo, I’m joining a friend to write low-effort shortform blog posts every day this month.

The first topic I’ll write about is what growth feels like, and where people hit walls, plateau, and stagnate. Learn to optimize around the plateau, and you’ll be unstoppable.

A Parable

You have no hand-eye coordination to speak of, but your parents want you to touch grass, so you sign up for cross country. In the beginning, you’re the slowest of the pack, but you improve rapidly. You train nearly every day and your mile time drops from nine minutes to six-twenty in three months. The numbers go down. Walking up the stairs doesn’t wind you anymore. Finally the seniors are waiting for someone else to catch up. Across the whole team, you may only rise from the dead last to the 20th percentile, but to you it feels like winning the Olympics.

Then you hit a brick wall.

You need to shave off a minute twenty to make varsity, and every last second gives you a fight for its life. You slog through nearly two years of training. Seven hundred and thirty days drowning in an endless barrage of shin splints and intrusive thoughts. The fact that you rise from the 20th percentile to the 80th means fuck-all; every freshman who speeds past you to snag a trophy sends you spiraling back into that pit of self-doubt.

And then, by virtue of some minor miracle, you finally push through that wall. You paid your dues. The best runner graduates. Puberty finally kicks in. You have a good night’s rest before tryouts. Whatever the reason, you make varsity.

Coach knows your name – finally! – and starts giving you individual attention. Every second you gain is a triumph – you can see yourself climbing those rankings every single meet. Out of the blue, the gossipy neighbor asks your parents for advice on preparing her middle-school daughter for high-school sports, and mom finally starts taking this running thing seriously. They rearrange their schedules around your training, and cut you some slack on the academic side. Everything is coming together.

You spent three years pumping your exhausted legs through final laps listening to crowds cheering for someone else. Finally, one day, you cross the finish line and you know you won the race. You know because the crowd has just started cheering, and the one they’re cheering for is you.

To be continued…