The World Knowledge In One Place

JM

Joshua Mitchell / July 31, 2020

5 min read

I discuss why Science is slowing down and what the solution might look like: a big knowledge graph.

Debt#

If you visit San Jose, Costa Rica, there are no street signs. When you ask people for directions, they'll say something like, "left at the store, right at the church, ..."

Hence, learning to get around isn't a good experience.

At first glance, this might seem like a big problem, but the locals are used to it. It's already that way, and it works for people who are already familiar with it (which is most people).

However, this arrangement is fragile. There are several scenarios that would cause trouble, such as

  • if the city grew to 10 times its current size
  • if 90% of its citizens moved away and were replaced with newcomers
  • if an event happened that demolished a big part of the city

Suddenly, giving directions like that isn't effective.

In software engineering, we have a name for this phenomenon: technical debt.

If you write software improperly or hastily, it might work in the short term, but you'll run into problems if you try to build on top of it, make it do more, or explain it to people.

You could say that San Jose has "city" or "infrastructure" debt.

I propose that the same is true for Science in general in the form of research debt.

It's becoming harder for people in their field to 1) find what they're looking for and 2) understand it when they've found it. Check out this Atlantic article on diminishing returns in science or this survey on scientific progress.

So why is this happening, exactly? Well, a few reasons.

The first is the sheer volume of data and knowledge that's being discovered. The more information we have, the more important it is to structure and index it properly. If you ask most professors how they keep up with all the new ideas nowadays, they'll simply say that they don't.

The second is the fragmentation of knowledge. Specialization is a natural concequence of doing a PhD. Ideas build on each other. Domains split into subdomains, which split into sub-subdomains, and so on. As a result, even scientists in the same field can have immense trouble understanding eachother's work.

The third is that communicating and explaining things well is hard. Finding the right abstractions, the right notation, the right media, and the right words all at once is extremely challenging (especially if you're not incentivized to write for clarity in the first place).

Why We Should Care#

Pretty much every luxury and benefit we enjoy today is the result of science. Vaccines, cures for cancer, cars, planes, you name it.

Speeding up scientific progress by even 0.1% means thousands of lives are saved and millions of lives improved.

What We Can Do About It#

My idea is to coalesce all human knowledge into one big knowledge graph: a set of all known ideas that are all connected to each other.

This theoretical graph would have a few nice properties:

  • If you have an idea and want to see if it's already known, finding out if it exists already is easy. You just start at ideas that you know are relevant and see if they're neighbors.
  • There is no siloing of fields. It becomes much easier to see if two scientists from different domains are doing the same (or adjacent) experiments.
  • For any particular idea you want to understand and any particular set of ideas you're already familiar with, it becomes easy to find the "shortest path" to understand that idea (given what you already know). Prerequisite ideas are built in at a granular level.
  • Serendipity becomes a first-class citizen: any time you have writer's block (or if you're just curious), there's a set of ideas waiting for you that're specifically related to what you're thinking about.
  • There are no duplicates. There are different views and perspectives on the same idea, but those are different ideas themselves, and will be connected appropriately.

Pitfalls#

So wait, why can't Wikipedia do this? What's wrong with just Googling what you want to find?

Well, there's a few reasons:

  • Knowledge duplication is everywhere. Google can serve millions of results - most of it with overlapping content. This creates noise, and makes searching for very specific ideas difficult.
  • The relationship between ideas is what is important. If you want to know how x and y are related, you have to Google "how x and y are related" and hope it exists. It's insufficient to simply Google "x" followed by "y".
  • Google requires you to know, to a large extent, what you're looking for. Frequently, you have an idea that's hard to phrase as a search - you just know what ideas it relates to. Google doesn't support this well.
  • Google is optimized for a single errand - not an entire learning journey. Google can bring you to ideas, but not between them.

Okay, great - so how do we build this knowledge graph?

This is where I run out of steam. I'm not sure what documenting all human knowledge like described above will look like, but it's surely possible. We just need to figure out the right ontology and set of abstractions.


Discuss on Twitter