Three Ways to Draw Data Tables
Intro
This post is about three types of diagram:
These look different, but they’re really just different ways to convey the same information. We’ll learn how to convert back and forth between them, which is a handy skill if you spot them in the wild.
There’s a fancy name for these diagrams. If you want to impress people, you can call them all Binary Relation Diagrams. You could also just call them Relations. (I’ll be saying Relations a lot, because I want to save the word “Table” specifically for the middle diagram type.)
(Note: This post is the first in a series. The series will be about relations, and also about a special kind of relation called a Function. My overall goal is to convince you that functions look like this:
Later in the series, we’ll talk about connections between diagrams, math, programming, and databases. We’ll see how these different fields study similar subject matter in very different ways, and learn how to translate between them.
Don’t worry though. This post is standalone, and it won’t have any math in it. Only pictures :)
The Nitty Gritty
Let’s go through our three ways to draw relation diagrams.
Diagram type one: Directed Graphs
Our first kind of diagram is my favorite. It’s called a “directed graph”, or digraph, and it looks like this:
A digraph is a collection of nodes – labeled circles – and a collection of edges – little arrows going between the nodes. We can model all sorts of situations with digraphs. For instance, a family tree can be represented as a digraph. We draw a node for each person, then draw edges from parents to children:
We can also draw digraphs of love triangles, by drawing edges from people to people they like:
Notice that nodes are allowed to have edges to themselves. Also, nodes don’t have to have any edges. In the “Romantic Comedy” example, C is a sidekick who’s not interested in A or B.
(At least, that’s one way to interpret the diagram. It doesn’t really say anything besides who likes who.)
Relations don’t have to be about relationships between people. For example, we could draw a digraph of which classes use which books in a school:
Notice how this digraph is divided into two parts – the classes and the books. All the edges go from classes to books. Digraphs like this are called bipartite, which means “two parts”.
We don’t have to use arrows to draw our digraph edges. We can use any shape we want, as long as the ends look different. For instance, we could use different colored circles:
Or even add text labels:
I like this because it’s very clear what the edges mean. It’s a lot of work to add all the labels though. We can compromise by adding a legend:
Now that we’ve added legends, it’s very easy to convert to our next way to draw relations: tables.
Diagram Type Two: Two-Column Tables
Here’s the class-book relation, redrawn as a Two-Column Table. We use the labels from the legend as column titles.
class | book |
---|---|
English Lit 101 | Hamlet |
English Lit 101 | Moby Dick |
Whaling 101 | Moby Dick |
Math 203 | The Calculus Reader |
Astronomy 101 | The Planetary System |
This is less exciting, but more orderly.
We can also convert our other graphs into tables. For instance we can take the “Coveting” graph from before, add a legend, and convert it to a table:
person | likes |
---|---|
A | B |
B | A |
C | B |
Look at this for a minute to make sure you understand the translation. If it helps, you can compare the “Pining” graph:
person | likes |
---|---|
A | B |
B | A |
B | C |
Notice how the last row swaps direction, just like the last edge swaps direction.
There’s one more way to draw relations…
Diagram Type Three: 2D Plots
This diagram type makes better use of 2D space. Here’s the Coveting relation again, now drawn as a 2D Plot:
person | likes |
---|---|
A | B |
B | A |
C | B |
likes | A | X |
||
---|---|---|---|---|
B | X |
X |
||
C | ||||
A | B | C | ||
person |
You might notice that this diagram type flips the columns around. This is on purpose. It’s because positions in a 2D plot are traditionally written (horizontal, vertical) or (x, y). Since our relation is (person, likes), I’ve laid out the person column horizontal and left the likes column vertical.
Compare Coveting with Pining:
person | likes |
---|---|
A | B |
B | A |
B | C |
likes | A | X |
||
---|---|---|---|---|
B | X |
|||
C | X |
|||
A | B | C | ||
person |
Let’s apply this to our classes-books relation:
book | Hamlet | X |
|||
---|---|---|---|---|---|
Moby Dick | X |
X |
|||
The Calculus Reader | X |
||||
The Planetary System | X |
||||
English Lit 101 | Whaling 101 | Math 101 | Astronomy 101 | ||
class |
(We’ve used a trick here to take up less space. Because the class-books relation is bipartite – split into classes and books – we can put only classes on the bottom, and only books on the side. For relations that aren’t split up like that, you have to put ALL the nodes on both sides.)
Equivalent diagrams
We now have three ways to draw relations: Digraphs, Two-Column Tables, and 2D Plots.
I want to convince you that these types of diagrams are equivalent: you can convert between them as needed, without losing information.
If this post were about formal math, this is where I would write you a proof. See, to convince each other of things, mathematicians write “proofs”. A “proof” is a strategy in a board game. The basic moves in the board game are supposed to be really simple. So simple, that everybody agrees they should be allowed. Bigger strategies are built up out of simple moves. (In this case, the moves would be things like converting between diagrams.)
I don’t want to explain a bunch of formal game rules here – they can get pretty dry. But I still want to convince you like I’d convince a mathematician. See, mathematicians have a high bar to be “convinced” of things. If you tell them something “works”, they’ll ask how it works. They’ll also ask what conditions need to be met for it to work. And usually, they’ll want to try it out for themselves.
We’re not doing math here, but we are talking about diagrams. So far I’ve been drawing all of them. Why not try drawing some yourself? Here are some exercises you can do (if you like that sort of thing):
Drawing Exercises
Drawing Exercise 1
I want to convince you that Digraphs and 2D Plots contain the same information. This means that if we:
- Start with a Digraph
- Convert it to a 2D Plot
- Convert it back to a Digraph
We should end up back where we started. Let’s test this.
Get a piece of paper. Divide it up into boxes like this:
We’ll fill in the top row first.
In the Start box, draw a Digraph. Any digraph you want! For example, you could draw a bit of your family tree, or a love triangle you came up with. For my technique to work, I’ll request that you meet these conditions:
- All your nodes have different labels.
- None of your nodes overlap, or cover each other entirely.
- You draw at most one arrow from any start node to any end node.
(That last condition means that 🅐⇉🅑 isn’t allowed, but 🅐⇋🅑 is.)
Once you’ve drawn your digraph, convert it to a 2D Plot in the Middle box, like this:
likes | A | X |
||
---|---|---|---|---|
B | X |
|||
C | X |
|||
A | B | C | ||
person |
Now, fold your paper over so that you CAN’T SEE the digraph you started from. (If you’re following along with a friend, trade papers with them at this point.)
Now, convert the 2D plot in your Middle box back to a digraph in your End box. Try drawing the nodes in different places :)
likes | A | X |
||
---|---|---|---|---|
B | X |
|||
C | X |
|||
A | B | C | ||
person |
Unfold your paper. Now, we can check if the starting digraph is the same as the ending digraph– oh, wait!
There’s an important rule I haven’t explained yet.
Digraphs are stretchy.
This means that no matter how we stretch and rearrange a digraph, as long as we don’t add or break any edges, it will still be considered “the same digraph”. So, for example, all of these would be considered “the same”:
Keeping in mind that this is what “the same” means for digraphs, check if your Start and End boxes are the same.
If they are the same, click here:
Great. You’ve completed the exercise.
This might feel a little anticlimactic. What, we’re just back where we started? What was the point of this?
That’s how you feel. Me, I’m wiping sweat off my brow. See, I made kinda a big promise back there. I told you that my procedure would work, no matter what digraph you chose. Lucky me, you picked one where it seems to have worked. If you could find one where it didn’t work, you would have officially PROVEN ME WRONG.
Specifically, you’d have found a COUNTEREXAMPLE to my claim that my procedure would work for any input. Counterexamples are a time-honored technique for proving over-broad claims wrong. For example:
Plato: Any featherless biped must be a man.
Diogenes (holding up a plucked chicken): Behold a man!
In modern times, finding a counterexample to somebody’s claim means you get to publish a paper making fun of them.
If they aren't the same, click here:
There are two possibilities:
- An error happened somewhere.
- I’m wrong, and you’ve found a counterexample.
First, we should check for errors. Go over each edge in the Start box, make sure it’s recorded as a black square in the Middle box, and turned back into an edge in the End box. You might also want to check that none of the squares in the Middle box got filled in black when they shouldn’t have been.
If you can’t find any errors, you may have found a genuine counterexample. Please cite me in your refutation paper.
Drawing Exercise 2
We’ve just converted a digraph to a 2D plot and back. Can we convert a digraph to a table and back?
Let’s try this in the next row. This time, let me pick our starting diagram. Draw this in the Start box:
Now, convert this to a Table in the Middle box.
…
Hmm. Wait a minute. What table should we draw here?
Come up with something and we can compare notes in a sec.
…
Here are some ideas I came up with:
person | likes |
---|---|
A | B |
B | A |
person | likes |
---|---|
A | B |
B | A |
C |
person | likes |
---|---|
A | B |
B | A |
C | |
C |
The problem is that C doesn’t have any edges, so it’s not exactly clear how we should record it in the table.
We’ve come to a design decision. Let’s consider our options.
I think Idea 1 is definitely wrong. It’s missing information – if we picked it, our exercise could end up like this:
person | likes |
---|---|
A | B |
B | A |
Because the table is missing information, we don’t end up back where we started.
Of the other two…
person | likes |
---|---|
A | B |
B | A |
C |
person | likes |
---|---|
A | B |
B | A |
C | |
C |
I think I like Idea 3 the most:
- Idea 2 says “C doesn’t like anybody”.
- Idea 3 says “C doesn’t like anybody, and nobody likes C”,
which is more information. Sorry C.
Are there other options? Well, yes. There is actually a Standard Math Answer to this question. It might seem a little odd at first. Here’s what a mathematician would draw:
person: People | likes: People |
---|---|
A | B |
B | A |
People |
---|
A |
B |
C |
We’ve split our information into TWO tables. One table lists the nodes (People), and the other lists the edges.
“person: People” means that every entry in the column “person” has to come from the table “People”. You can also say:
- “each person is in the set People”
or
- “each person is of type People”.
(This will sound familiar to programmers.)
We can remix our other running example to show off the same idea:
class: Classes | book: Books |
---|---|
English Lit 101 | Hamlet |
English Lit 101 | Moby Dick |
Whaling 101 | Moby Dick |
Classes |
---|
English Lit 101 |
Whaling 101 |
Gym Class |
Books |
---|
Hamlet |
Moby Dick |
Okay, we still need to finish our exercise. What do we put in the Middle box? You actually have a choice here: Idea 3 and the Standard Math Answer will work equally well. Which one you pick comes down to personal preference and who you hang out with.
- Mathematicians usually pick the Standard Math Answer.
- Database programmers use Idea 3. They call it an OUTER JOIN.
(In capital letters. Database programming started in the 70s, when capital letters were still cool.)
Once you’ve made your choice, finish the exercise like last time. Did you end up back where you started?
Conclusion
Now maybe you are starting to be convinced that these:
Are the same. Hooray?
Still, you might be wondering: why are they the same, really?
If you did the exercises, you know that digraphs are stretchy. This means that no matter how we stretch and rearrange a digraph, as long as we don’t add or break any edges, it will still be considered “the same digraph”. So, for example, all of these would be considered “the same”:
Once you accept that these are the same, it’s not too hard to see how to convert between all of our diagram types.
Here, let me show you:
The tildes (~) mean “is like”, which is pretty vague, so I hope you can see what I mean. The ellipsis (…) is because there’s some disagreement about how to put disconnected nodes in a table. (There’s more on that in the exercises.)
I think the strangest part of this diagram is the transformation on the left side – modifying a graph by duplicating nodes. Once you accept that part, the rest is pretty straightforward.
John Von Neumann once said: “Young man, in mathematics you don’t understand things. You just get used to them.”
I’ll leave you with a parting
Exercise
Get a piece of paper. Draw a 2-column table. Rather than trying to come up with a situation, just name the columns “a” and “b”, and fill the table with random numbers and letters. Not too many of them, or you’ll be here all day.
Aren’t you curious what digraph you’ve created? Convert your table to a digraph – watch it unfold. Try rearranging the nodes to see other forms it might take.
Also, try drawing it as a 2D plot. Honestly, this will probably just look like TV static, but it’s good to practice.