### Of Math and Memory, Part 3 (Final)

#### by radimentary

In Part 1, we noted how extraordinarily taxing mathematics can be on short-term working memory. Being able to hold one extra Greek letter in your head can make the difference between following a lecture and getting completely lost. Having background and mathematical maturity means many standard techniques need not be forcibly remembered, freeing up space for the few genuinely novel ideas.

In Part 2, we gave a simple conceptual model for short-term memory, based on a fundamental principle of information theory: *compression is equivalent to prediction*. The more predictable data is (or the better we get at predicting it), the less new information you have to store.

What do these ideas mean concretely for mathematicians? In this concluding post, we give a practical algorithm for making the most of your short-term memory in mathematics, which I call **dyadic scanning**.

We will start with the concrete problem of how to read a paper, and later generalize to how to write papers, how to listen and give talks, and how to have mathematical conversations, all while making the most of our short-term memory.

# Dyadic Scanning

Consider the markings on a standard ‘Murican ruler:

The two longest vertical lines mark the beginning and end of an inch. It is then divided dyadically into half-inches, quarter-inches, eighth-inches, and sixteenth-inches by progressively smaller “teeth.”

A mathematical paper is organized much like the markings on a ruler: first it is divided into a few main theorems, each of which is divided into several major lemmas, which are then interspersed with minor or technical lemmas and definitions, themselves pieced together from many tedious details. The obvious and standard way of reading a paper is a sequential scan:

Up until a couple years ago, this was how I attempted to read any given paper: read it through once from beginning to end, pausing on each detail and tracing it down to the lowest level until I could follow it line-by-line. The sequential scan is a fairly useful way to build foundations and mathematical maturity: one spends a lot of time piecing together details and developing a taste for rigor. It is, however, a generally misguided and inefficient approach to reading mathematics in general.

“I could follow line by line, but I have no idea what’s going on” is a common complaint that comes out of reading mathematics like this.

What are the downsides of the sequential scan?

You get easily lost in details, missing the forest for the trees.

You sacrifice agency, accepting the order in which ideas are presented as if from on high.

Most importantly, **you don’t know where you’re going.**

You can’t ask “Where was condition (a) of Lemma 2 used in the proof of Lemma 3? Can we weaken it?” if you don’t even know that Lemma 2 only exists to prove Lemma 3 later on. This is the kind of question that senior mathematicians are asking all the time to shore up their understanding.

Unless the paper is very cleverly and thoughtfully written, reading it sequentially is going in blind. You will have the hardest possible time predicting each next step, and therefore have to bear the heaviest possible burden remembering every detail. Without knowing what will be used where or how, you will have to default to remembering *everything*.

Here is a more enlightened way to read a paper, which I call **dyadic scanning**.

Instead of reading the paper in a single pass, you split your reading into logarithmically many passes of progressively higher resolution. In the first pass, you figure out the overarching organization and the main results. In the second, you locate the main lemmas, how they fit together, and where the genuine innovations are in the paper. In the third, you piece together how all the minor technical lemmas are involved in the proof, and the locale where each one is relevant. Only in the fourth pass, or later, do you dig into the details of the rigorous proofs. More passes are added if the paper is especially dense, or if you’re especially unfamiliar with the field. The longer and more technical the paper is, the longer you should wait before diving into details.

Why make dyadic scans?

**You know where you’re going.** If you read mathematics this way, you will know what each line of mathematics is for before digging into why it’s true. Knowing the purpose of Lemma 2 lets you figure out which terms are important and which terms are negligible error terms. Knowing when you’re done using that functional equation means you can free up that memory for something new.

**You’re forced to develop an eye for what matters. **Not every paper is written so that the main lemmas stand out from the minor lemmas stand out from the boring technical details. Doing dyadic scans forces you to develop a taste for what matters, and to discriminate between the innovative and the boilerplate.

**You read in an order closer to how mathematics is actually done.** Very few proofs are devised in the order they’re presented. There may be an important and difficult technical lemma that requires a seven-page calculation using the calculus of variations, but I can guarantee you that the author did not work out the details of this argument before fleshing out the main arc of the proof first. Reading via dyadic scans reinforces the habit of keeping the big picture in your head at all times. The rigorous correctness of each detail matters less than you think – often any given technical argument can be done in several other ways.

# Generalizations

For clarity, I’ve focused on dyadic scanning for reading papers. The method applies equally well to other settings. I will not bore you with the details, but here is a sketch of how.

### On paper-writing

A well-written paper should make it easy for the reader to pick out its dyadic structure. To some extent, this is already standard practice: the abstract is an outline for the introduction, which is an outline for the entire paper. A great example where such a structure is pushed further is this paper of Tao which proves an almost version of the infamous Collatz conjecture. It is a fairly dense 49-page paper, so in addition to the standard abstract and introduction, there is a ten-page extended outline detailing the main arc of the proof and highlighting the important ideas.

When arguments get even longer than this, it is not uncommon to see a proof divided among multiple papers, with the first one serving as an extended introduction for the whole series.

This does not mean that a paper should necessarily be written in the explicit order of the dyadic scans: all the main theorems first, then the main lemmas, then the minor lemmas, and then the technical arguments should be bunched up at the very end. Often this will result in a very unnatural structure obfuscating the dependencies between ideas. It may be better to split the paper into subsections which are as functionally independent as possible, and carefully point out the dependencies and relative importance of various parts when they appear.

### On giving and receiving talks

Attending talks is even more taxing on working memory than reading papers; the audience will generally have a wider variation in background, they will rarely have the luxury of pen and paper, and regardless of whether the talk is given via slides or a blackboard only a fraction of the total content is visible at any given time.

It is therefore even more essential when giving talks that the audience knows exactly where you’re going. Be sure to have multiple levels of signposting and continuous reminders on how each argument fits into the big picture. Most of the details in the lower levels should be omitted altogether unless they are absolutely essentially.

On the receiving end, my general advice is that trying to follow every word in a talk is a mistake akin to making a single sequential pass in reading a paper. Instead, **treat going to a talk as taking a single dyadic pass into a topic**, out of potentially many. Which pass to treat it as depends on your current level of exposure. If you are a beginner, watch the talk as if taking the first dyadic pass, noting the key words and main ideas and how they fit together. If you have some exposure to the field, you can treat the talk like a second or third pass, paying attention to the major details and innovations. If you are an expert in the field, almost nothing in the talk will be new to you, and you can really dig into the key details or open problems. Be realistic about how much you expect to get out of the talk, and plan accordingly for what to focus on.

### On mathematical conversations

Much of the advice in this post applies just as well to the more informal setting of a mathematical conversation, where often one person must convey a fairly complicated argument verbally to one or more others. The key difference in this setting is that the listener(s) can play a much more active role.

The simplest level of active listening is asking for clarifications and more details when things are not clear. A more sophisticated level of active listening is asking directly for the pieces of the dyadic structure that one is missing: if the speaker dives into the details of a lemma before its purpose is made clear, it is often correct for you to ask for that purpose instead.

## In Conclusion

In all of these activities, the fundamental resource is your very limited working memory, and the more you can predict, the less you have to remember. By looking ahead, by asking for clarification, by making multiple passes, we can “cheat” and see the future, freeing up our memory for what really matters.

Do you also read math textbooks dyadically? That is, do you first skim a section for the main definitions and theorems before diving into the details?

To be honest it’s been a long time since I’ve read a math textbook, other than popping into a section or two for the odd reference. I suspect that the better-written books are much safer to read linearly than papers are, as they pay more attention to the reader’s experience (as opposed to just paying attention to being correct).