18 January 2016

How the Internet Works - Part IV - DNS

[This is the fourth in a 4-part post that describes an activity to demonstrate how the internet works. This part describes how the domain names (like "google.com") are translated into IP addresses.]

Some notes on presentation:
This document presents the activity as a rough script with "ACTION" breakouts when you need to perform some task. In this activity, you will be presenting information and directing the students as they act out the various parts of the internet.

You shouldn't just read from this script verbatim (since that will be rather dull), but you will probably want to have this printed out with you while you're presenting. Use a highlighter and mark key points so that you can find them easily while presenting.

Feel free to include additional information if you feel like it and accept questions and ad lib to follow the class interest. Throughout the script, there are a number of questions for the students that can be used to make the presentation more interactive.

The files for this activity (packets, routing tables and labels) can be downloaded from www.cse4k12.org/internet/how-internet-works.html.
[As part of the preparations for this part, remember to cut out holes for the "images" in packets 3-10 and 3-11. Cutting holes makes the lack of images more obvious to students on the other side of the room.]

Domain name lookup

Earlier we temporarily ignored the issue of how we convert a domain name (like google.com) into the IP (Internet Protocol) address (like 74.125.19.104). How does your computer find the address of the wikipedia.org servers?

Let's answer that question now.

Way back in the early days of the internet, every computer connected to the internet had a small "address book" in the form of a hosts.txt file. So you would have something like:

google.com : 74.125.19.104

with an entry for every website that you knew about.

In this example, the number 74.125.19.104 is one of the IP addresses assigned to Google (it has more than one). This is the equivalent of a telephone number for a computer (or any other device) connected to the internet. There's no need to worry about the details of this number, just that it is required to find a computer on the internet.

Having this "address book" file sounds easy enough. But remember, in order for the internet to be useful, you needed to have the address of *every* website in the world, not just a small sample (like you would have in your personal address book).

When the internet was young and small, one computer had the master copy of this address book and whenever there was an update (not very often in the early days), it would send out the updated address book to all of the computers on the internet.

Problem solved, right?

Scalability

At this point, we need to worry about something called "scalability". Scalability refers to how well a solution to a problem handles things "at scale" or when there are a large number of people using it. An approach that works well for 10 people may not be appropriate when there are 10 million people.
ACTION (optional): Do you have a telephone directory for your city? (Does anyone anymore?)
If you do you can hold it up to compare it in size with a small personal address book. And note that the directory only covers one city (or part of one). How big would this book need to be to hold all the telephone numbers in the world?
Our address book solution has a serious scalability problem. It works well enough when there are a small number of computers (a few thousand) on the internet but becomes unmanageable when there are billions of computers being updated all the time.

The internet currently has over 1 billion users and new computers (and smartphones and other devices) are constantly being added, while others are being moved or deleted.

If everyone needed to have a complete and up-to-date copy of this address book, then the majority of internet traffic would be spent sending updated copies of the address book.

So that's not going to work.

How can we have an internet address book that is huge and keep it accurate in the face of being constantly updated?

Domain Name Servers

Before getting into the details of domain name servers, let's start with a phone system analogy:

Imagine that you need to call someone, but you don't know their phone number. What would you do? You'd probably ask a mutual friend if they had the number. And if they didn't, you'd probably keep trying people until you found someone who knew the number.

Now imagine you have a friend that doesn't know anyone else's phone number, but always knows whom you should ask. You'd always get a response like: “I don't know, but ask Mary because she knows”.

That might not sound very useful at first (since they don't know the answer that you're looking for). But with this friend available, instead of guessing and trying friends at random, you could call this special friend first to find out exactly who you needed to call to get the number.

This is basically how domain name servers on the internet work: "What's the IP address for google.com?" "I don't know, but I know someone who does."

DNS Example

On the internet, these "special friends" that can help you find IP addresses are called "name resolvers" and a computer that is dedicated to resolving internet domain names is called a "name server". There are many name servers distributed throughout the internet.

If you look at the ISP node, you'll see a small triangular node attached - this is the name server provided by the ISP. If you want to know the IP address of "google.com" you ask this name server and it responds with the answer. How it gets the answer is rather interesting, so we're going to cover that now.
ACTION: Re-assign the students to play different roles. You'll be re-using the roles from Part III, but you'll be adding 4 more:
   10 : ISP name server — IP address = 65.32.236.240
   11 : root name server — IP address = 198.41.0.4
   12 : .com name server — IP address = 199.19.56.1
   13 : .org name server — IP address = 192.5.6.30
Note that nodes 9 (www.google.com) and 8 (neighbor) aren't used for this part of the activity.
 
You'll also be re-using the routing tables from Part III. Note that the google.com and wikipedia.org routing table pages also have name servers.
ACTION: As before, assign 1 more student as a “runner”. This person will propagate the packets through the network. 
Note: There are going to be a lot of packets flying around in this part of the activity. You may want to keep the stack of packets with you and hand them out individually just before they are needed. It's also useful to have the students read out the message part of the packet as they hand it to the runner to send it.
So, how do you ask the name server to lookup an address? You send it a packet:
Packet: (2-1)
From: 65.32.200.101 (you)
To: 65.32.236.240 (ISP name server)
Message: Please lookup "www.google.com"
But wait. Where did we get the IP address of the ISP name server? This is usually set up for you by the ISP when you get internet access set up on your computer. That's your starting point for the rest of the internet.
ACTION: Have the runner take packet 2-1 from "you" (7) to the ISP (4) and then to the ISP name server (10).  [Path: 7 > 4 > 10]
Now, when you ask the name server for the IP address of a domain name, it may not know the answer. Just like it is impractical for you to have an address book that covers the entire internet, it is also impractical for the name servers to have one. But the name server has one important piece of information: it knows whom to ask for the answer.

So, if you ask the name server for the address of www.google.com and it doesn't know the address, it knows it can fall back to use one of the "root name servers".
Extra info [optional]: There are 13 root name server clusters, named 'A' through 'M' that are hosted by various agencies: government, education, military and private corporations. Many of these name servers are not single machines, but actually a cluster spread across multiple locations, sometimes across continents.
Since we're starting from scratch, we assume the ISP name server doesn't know the address www.google.com and will need to send a request to a root name server on your behalf:
ACTION: Have the runner take packet 2-2 from ISP name server (10) to the root name server (11).   [Path: 10 > 4 > 1 > 11]
Packet: (2-2)
From: 65.32.236.240 (ISP name server)
To: 198.41.0.4 (root name server)
Message: Please lookup "www.google.com"
When the root name server gets this request, it looks only at the right-most part of the URL ("com" in this case). This part of the URL is known as the TLD or "top-level domain".
Extra info [optional]: There are many different kinds of TLDs, including generic TLDs (gTLD) like com, edu, mil, gov, org, ... and country code TLDs (ccTLD) like us, uk, fr, jp, ca, ... All the different TLDs are handled the same way.
The root name server doesn't have have the answer, but it does know what should be done with TLDs like com. It responds to the ISP name server's request by sending back a message "I don't know, go ask the .com name server":
ACTION: Have the runner take packet 2-3 from the root name server (11) back to the ISP name server (10).  [Path: 11 > 1 > 4 > 10]
Packet: (2-3)
From: 198.41.0.4 (root name server)
To: 65.32.236.240 (ISP name server)
Message: Go ask 199.19.56.1 (.com name server)
That's the only job of the root name server - to tell you where to find the TLD name servers. Once the root name server responds with the address of the appropriate TLD name server, its job is done.

The nice thing about this arrangement is that if any of the .com name servers (or any TLD name servers) need to move to a new location, only the root name servers need to be updated. Updating a small number of machines is much easier than needing to update every computer on the internet with the new addresses.

Back to our example, after the ISP name server gets the response from the root name server, it sends out another request, this time to the .com name server:
ACTION: Have the runner take packet 2-4 from the ISP name server (10) to the .com name server (12).  [Path: 10 > 4 > 2 > 3 > 12]
Packet: (2-4)
From: 65.32.236.240 (ISP name server)
To: 199.19.56.1 (.com name server)
Message: Please lookup "www.google.com"
And the .com name server sends back a response:
ACTION: Have the runner take packet 2-5 from the .com name server (12) back to the ISP name server (10).  [Path: 12 > 3 > 2 > 4 > 10]
Packet: (2-5)
From: 199.19.56.1 (.com name server)
To: 65.32.236.240 (ISP name server)
Message: Go ask 74.125.45.100 (google.com)
What? Again? This time, it's directing us to go ask "google.com". Since a website can have any number of subdomains (like "www.google.com", "maps.google.com", or "images.google.com"), you need to ask the domain (google.com) where they are located because they can have different IP addresses. The google.com site has its own name server built-in to handle these requests.

So the ISP name server sends out another request, this time to google.com:
ACTION: Have the runner take packet 2-6 from the ISP name server (10) to google.com (5). [Path: 10 > 4 > 1 > 5]
Packet: (2-6)
From: 65.32.236.240 (ISP name server)
To: 74.125.45.100 (google.com)
Message: Please lookup "www.google.com"
And finally gets an answer:
ACTION: Have the runner take packet 2-7 from google.com (5) back to the ISP name server (10). [Path: 5 > 1 > 4 > 10]
Packet: (2-7)
From: 74.125.45.100 (google.com)
To: 65.32.236.240 (ISP name server)
Message: It's 74.125.19.104 (www.google.com)
And now the ISP name server can finally send the answer back to you.
ACTION: Have the runner take packet 2-8 from the ISP name server (10) back to you (7). [Path: 10 > 4 > 7]
Packet: (2-8)
From: 65.32.236.240 (ISP name server)
To: 65.32.200.101 (you)
Message: It's 74.125.19.104 (www.google.com)
And now that you know the IP address of www.google.com, you can finally send your request to Google as we did in Part III (packets 1-1 to 1-4).

Caching

So, what about the next time you need to go to www.google.com? Does it need to look it up again each time? No, just like you recorded your friends phone numbers in an address book to avoid the hassle of asking each time, your computer and each of the name servers remembers the results of the domain name lookups.

This is known as a cache.

If your neighbor (“friend”) asks the ISP name server for "www.google.com" it will be able to answer directly because you already asked and the ISP name server has the answer in its cache.

The only problem with this cache is that, as the internet changes, the data in the cache will become outdated (or "stale"). Thus, the name servers only keep the data in the cache for a limited time (like 2 hours). After that time, the cache entry is erased and it sends the request to the root name server again to get a “fresh” value.

Going to wikipedia.org

So, remember earlier (at the end of Part III) that we got our search results and we're now trying to get to "http://en.wikipedia.org/wiki/Doughnuts". What happens when we click on that link?
Note: Getting the address for wikipedia.org is similar to the process for google.com, so you will likely move through this part fairly quickly.
First, we ask our ISP name server for the address of "en.wikipedia.org".
ACTION: Have the runner take packet 3-1 from "you" (7) to the ISP name server (10).  [Path: 7 > 4 > 10]
Packet: (3-1)
From: 65.32.200.101 (you)
To: 65.32.236.240 (ISP name server)
Message: Please lookup "en.wikipedia.org"
Once again, it doesn't know, so it asks the root name server.
ACTION: Have the runner take packet 3-2 from ISP name server (10) to the root name server (11).  [Path: 10 > 4 > 1 > 11]
Packet: (3-2)
From: 65.32.236.240 (ISP name server)
To: 198.41.0.4 (root name server)
Message: Please lookup "en.wikipedia.org"
The root name server sends back the address of the .org name server.
ACTION: Have the runner take packet 3-3 from the root name server (11) back to the ISP name server (10).  [Path: 11 > 1 > 4 > 10]
Packet: (3-3)
From: 198.41.0.4 (root name server)
To: 65.32.236.240 (ISP name server)
Message: Go ask 192.5.6.30 (.org name server)
Now the ISP name server asks the .org name server:
ACTION: Have the runner take packet 3-4 from the ISP name server (10) to the .org name server (13).  [Path: 10 > 4 > 2 > 13]
Packet: (3-4)
From: 65.32.236.240 (ISP name server)
To: 192.5.6.30 (.org name server)
Message: Please lookup "en.wikipedia.org"
And the .org name server sends back the address of wikipedia.org (which has a name server).
ACTION: Have the runner take packet 3-5 from the .org name server (13) back to the ISP name server (10).  [Path: 13 > 2 > 4 > 10]
Packet: (3-5)
From: 192.5.6.30 (.org name server)
To: 65.32.236.240 (ISP name server)
Message: Go ask 208.80.152.2 (wikipedia.org)
So it asks wikipedia.org's name server.
ACTION: Have the runner take packet 3-6 from the ISP name server (10) to wikipedia.org (6).  [Path: 10 > 4 > 2 > 6]
Packet: (3-6)
From: 65.32.236.240 (ISP name server)
To: 208.80.152.2 (wikipedia.org)
Message: Please lookup "en.wikipedia.org"
And wikipedia.org's name server sends back the answer:
ACTION: Have the runner take packet 3-7 from wikipedia.org (6) to the ISP name server (10).  [Path: 6 > 2 > 4 > 10]
Packet: (3-7)
From: 208.80.152.2 (wikipedia.org)
To: 65.32.236.240 (ISP name server)
Message: It's 208.80.152.2 (en.wikipedia.org)
In this case, the IP address for en.wikipedia.org is the same address as for wikipedia.org. It's OK for the same IP address to have multiple names.

Now that the ISP name server has the IP address for wikipedia, it can send the address back to you.
ACTION: Have the runner take packet 3-8 from the ISP name server (10) to you (7).  [Path: 10 > 4 > 7]
Packet: (3-8)
From: 65.32.236.240 (ISP name server)
To: 65.32.200.101 (you)
Message: It's 208.80.152.2 (en.wikipedia.org)
And you can finally send your request to wikipedia.
ACTION: Have the runner take packet 3-9 from you (7) to the wikipedia.org (6).  [Path: 7 > 4 > 2 > 6]
Packet: (3-9)
From: 65.32.200.101 (you)
To: 208.80.152.2 (en.wikipedia.org)
Message: Please show me "wiki/Doughnut"
And wikipedia can respond by sending the page you requested.  Of course Wikipedia has a lot of information about donuts (doughnuts), so it doesn't all fit in one packet. It sends back 3 packets:
ACTION: Have the runner take packet 3-10 from wikipedia.org (6) to the you (7). [6 > 2 > 4 > 7]
Packet: (3-10)
From: 208.80.152.2 (en.wikipedia.org)
To: 65.32.200.101 (you)
Message: Website for "wiki/Doughnut" (part 1)
ACTION: Since there are 3 packets, you can add 2 more runners for this last part. The students on the .com and .org name server nodes have no more work to do and can easily be repurposed as runners.
ACTION: After the first packet goes out, start damaging the network so that each packet follows a different route. Also, you can stall the second packet so that packet 3-12 arrives before packet 3-11.
Packet: (3-11)
From: 208.80.152.2 (en.wikipedia.org)
To: 65.32.200.101 (you)
Message: Website for "wiki/Doughnut" (part 2)
Packet: (3-12)
From: 208.80.152.2 (en.wikipedia.org)
To: 65.32.200.101 (you)
Message: Website for "wiki/Doughnut" (part 3)
When your browser receives these packets, it is responsible for re-assembling the parts into a single webpage.
As part of the preparation, packets 3-10 and 3-11 should have holes cut in them where the images would be displayed. If a student hasn't already asked about the holes in the document, point them out as you say the following.
See these holes on the page? Each one of them represents where an image to be displayed in the webpage. Each of these images has a URL associated with it and each one needs to be requested separately. If these images are on the wikipedia.org site, then we can request them directly (since we know the IP address), but if they're on a different site then we might have to initiate another name server request to find where we can get the images.
There will likely be a large collective groan at this time as everyone imagines the amount of work needed to gather all the images. This is a sign that the lesson was successful and everyone appreciates the amount of work needed to get a webpage.
Any questions?

[End of part IV]