# How To Teach Math

Joshua Mitchell / June 06, 2020

11 min read

In this post, I’ll talk about what makes learning math difficult, what makes teaching math difficult, and how we could successfully teach graduate-level math to high school students.

## The Current Experience of Learning Math#

Consider a specific and complicated idea. Maybe something crazy from Quantum Physics.

What's the quickest path to understanding this idea?

If you're in college, a professor might tell you there is no quick way.

"Take all the Physics classes, the Algebra, Geometry, Calculus, Differential Equations, etc. In a few years, you'll get there."

If you're like most people, you're not going to spend that much time and energy to learn one idea.

Let's say you go rogue and attack the concept directly.

You slap a Quantum Physics textbook on your desk, turn to the chapter you’re interested in, and start reading until you run into something you don't understand.

The problem is, if you're lay-person, you'll get in the weeds quickly and frequently. It’ll feel like trying to kill a hydra by cutting off its heads.

At this point, if you're like most people, you'll blow it off and turn on Netflix.

Let's say you're determined. You circle and look up everything you don't understand.

If a circled phrase leads you to more stuff you don't understand, you circle and look that stuff up as well.

If you keep doing this, eventually you’ll run into words you’re familiar with.

Diagrammed out, this process looks like a tree:

Note that

- the Quantum Physics concept is at the top (node 1), and
- the stuff you had to learn in order to learn the Quantum Physics concept is below it in the tree (nodes 2 through 12)

These chunks of nodes below the top ("branches") act as prerequisites to understanding the root concept (node 1).

Each layer of the tree represents the same idea at different levels of elaboration. The very bottom layer of the tree (5, 6, 7, 9, 10, 12) represents the idea in its most decompressed (and digestible) form.

We can roughly represent any concept like this: as a tree with the concept itself at the top (i.e. the root) and all the "less complicated" prerequisite concepts as branches below it.

We can call a concept "complicated" if it has lots of branches.

Although this exercise is helpful in learning the material, it is far from sufficient to claim mastery. However, this will act as a great model for knowledge later.

## The Current Method of Teaching Math#

Let’s look at our present vehicle for math education: universities.

If we wanted to model everything someone would learn in a math bachelor’s degree, we could invoke our earlier decompression example.

We would have a set of concepts and each concept would have its own concept tree.

Undoubtedly, there would be overlap between the trees of each concept, especially at the bottom (i.e. the fundamentals).

This results in a network (i.e. a graph). Let's call this a knowledge graph.

Acquiring math knowledge can be thought of as taking a subset of the math knowledge graph and adding it to your personal knowledge graph.

Learning math can be thought of as traversing the math knowledge graph while simultaneously adding to your own.

From Graph Theory, we know there are several ways of traversing a graph:

- Breadth-first traversal: a metaphor for getting a bachelor's degree. This is synonymous with
- learning all the basics (MATH 101), then
- learning all the moderate topics (MATH 201, 202..), then
- learning the advanced stuff (MATH 301, 302..), etc.

- Depth-first traversal: a metaphor for doing a personal project. This is synonymous with learning only what gets the job done.
- Random walk: a metaphor for following your curiosity. This is synonymous with learning what you feel like in no particular order.

Universities have gotten a lot of flack for a variety of reasons, but the one I want to highlight is rampant elementitis amongst students.

Elementitis has to do with learning the elements of a discipline without seeing the big picture. Elementitis prevents students from understanding the purpose of the topic being considered, increasing their levels of frustration, apathy, and boredom. This disease is widely spread across the world when it comes to learning some important subjects, such as math. Students spend literally years learning topics in algebra, calculus, or statistics without immediately seeing their application. Imagine a cooking school where cooks-to-be learn how to bake doughs for years without even mentioning the word “pie”. It would be nonsense.

Taking project-based courses helps alleviate this problem, but how does “project-based” learning apply to less tangible subjects like math?

Let’s take a step back. What advantages does a project-based approach bring to the table?

Speaking for myself, project-based learning

- Allows me to avoid overwhelm by deliberately picking an initially small domain and expanding from there
- Enables tight feedback loops, thereby reducing friction that degrades the learning experience
- Acts as a context for understanding theory
- Facilitates intrinsic motivation through purpose and ownership
- Gives me something to show for at the end

The traditional method (sit in lecture → do homework problems → take test) incurs none of these advantages. I think this is the main reason people hate math.

## Why Teaching Math Well Is Hard#

My big hypothesis:

The way math is taught is what prevents high school kids from learning scary graduate level math - not the difficulty or complexity of the math itself.

We need to figure out a way to create an enjoyable math learning experience at scale.

I don’t have all the answers, but I have a few ideas for including the first 3 advantages (avoiding overwhelm, reducing friction, and context) when teaching more theoretical subjects.

Let’s go back to the decompression exercise mentioned earlier.

What used to be notation and jargon we didn’t know (at the top of the tree) has been decompressed into words and phrases we do know (at the bottom). This reduces friction.

(You can think of it like downshifting to a lower gear on a car or bike.)

However, we run into a new problem: we've got a giant wall of text. We traded friction for overwhelm.

In this light, it’s easy to understand why the notation and jargon existed in the first place. For these advanced concepts to exist in our limited working memory, they have to be distilled and compressed.

Even if we understand each decompressed concept in isolation, it's only by having many simple concepts in working memory at the same time and viewed through a certain lens that the flash of insight will occur. The planets all have to be in orbit in the first place for them to align.

Once we have each concept sitting in front of us on our mind’s workbench, then we can frictionlessly examine and understand them in their proper context.

Much like how the difference between two similar colors becomes apparent when you put them side-by-side, complex ideas become simple when they all occupy the right space and shape in your mind.

We want a systematic way to deliver this experience to laypeople as quickly as possible.

## Steps toward a Solution#

So, how can we make this a thing?

The fundamental bottle neck is our working memory. We can only hold so many chunks in our head at once. Hence, we start with the fundamentals and work our way up.

Eventually, after slowly assimilating basic concepts into long term memory, the concepts we can manipulate in our working memory become more complex.

Finally, once we reach the top, we begin to actually enjoy math. We have creative ideas of our own, and, at last, we see the big picture.

Except most people don’t get there.

Most people start with the fundamentals, suffer from elementitis, and then decide they won’t be majoring in math.

How can we “get people to the top” as quickly as possible?

My idea:

Why not trade both friction and overwhelm for (an initial) lack of precision?

Let’s talk about universities again.

As a software engineer, I have the impulse to throw away old solutions and start from scratch, but past experience makes me keep in mind Gall's Law:

"A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system."

Universities, like any system at scale, have a preference for predictable and reliable components.

Meaning that, if you want to provide a consistent educational experience to every student, each class needs to have high fidelity to the curriculum with minimal variance.

For example, if

- student A takes MATH 1301 with professor X and
- student B takes MATH 1301 with professor Y,

they need to make sure A and B learn the same things so that they're both ready for MATH 1302.

This also means you can’t teach things that aren’t 100% precise.

You won’t be ever tested over a metaphor or an oversimplified concept. It’ll always be the real thing - anything else causes instability and variance in the rest of the system.

For example, what if A doesn't need any more math after MATH 1301? Then A will graduate having been taught a technically incorrect thing. That's bad for the university's reputation.

Or, what if a bunch of students all walk into MATH 1302 next semester being wrong about different things in intangible, unmeasurable ways? A logistical nightmare.

Hence, the preference for precision every step of the way.

What if we relaxed these requirements and allowed oversimplification (for scaffolding purposes)?

I can think of two ways this could work.

### Concept Tree Pruning#

When we decompress that complex idea, we are assuming that a precise understanding is necessary at each stage.

We've often heard that the fundamentals should be drilled repeatedly since they form a foundation for everything else (e.g. Bruce Lee's "a thousand kicks" quote).

Hence, we rigidly treat these prerequisite ideas as obstacles to overcome. Learning becomes starting at the bottom of the tree and working your way up.

However, when we drop the requirement that the understanding has to be "complete" at any given point in time, then we don't have to be as rigid. We can skip branches.

Imagine assigning an "importance" score to each branch. Perhaps the Pareto principle applies here, and 80% of the understanding comes from 20% of the branches.

Perhaps we can chop off some percentage of the tree and still have a "pretty good" understanding:

A curriculum that uses this method would start with a tree with lots of missing branches and progressively add branches as the students' understanding matures.

### Concept Tree Compression#

If you've ever tried to upload a picture and been told it's too big, then you know that isn't necessarily a show stopper.

There are plenty of utilities out there that let you press a button and - poof - the image becomes smaller.

But it looks the same as before. Magic.

Unless, of course, you overdo it. Then the loss in quality becomes obvious.

The good news: you have to compress it a lot before that stops being true.

What if we did the same thing with our complex idea?

Well, we already do (kind of)! If you think about it, that's the definition of a metaphor.

A metaphor is just an idea you *are* familiar with that looks like a compressed version of an idea you *aren't* familiar with.

That’s pleasantly efficient. Starting from scratch is hard.

Have you ever played Charades? Context is unbelievably convenient. If you pick the right metaphor, you get to start pretty close to the finish line.

A curriculum that uses this method would start with a "metaphor" concept tree that

- looks vaguely like the original tree, but
- is less complex, and
- manages to hook into ideas and concepts that all students already know (libraries, restaurants, etc)

Slowly, the tree becomes less metaphorical over time. Like a badly compressed picture that gains its quality back little by little.

## Final Thoughts#

Previously, I outlined a few ideas about acquiring knowledge through iterative de-simplification.

The main benefit behind this idea is that motivation, context, and results are present throughout the whole process (as opposed to just the end).

This is my core thesis: make learning useful, motivating, and without headaches every step of the way.

I don't want the light to be just at the end of the tunnel - I want it to be there from the start.