08 August 2010

How the Internet Works - Part III - Routing Traffic

[This is the third in a 4-part post that describes an activity to demonstrate how the internet works. This part describes how the internet routes traffic using IP addresses.]

Some notes on presentation:
This document presents the activity as a rough script with "ACTION" breakouts when you need to perform some task. In this activity, you will be presenting information and directing the students as they act out the various parts of the internet.

You shouldn't just read from this script verbatim (since that will be rather dull), but you will probably want to have this printed out with you while you're presenting. Use a highlighter and mark key points so that you can find them easily while presenting.

Feel free to include additional information if you feel like it and accept questions and ad lib to follow the class interest. Throughout the script, there are a number of questions for the students that can be used to make the presentation more interactive.

The files for this activity (internet packets and labels) can be downloaded from www.cse4k12.org/internet/how-internet-works.html.
Internet

The phone system (even in the very simplified form as described in the previous post), works reasonably well. But the internet differs from a phone network in a few key ways:
  1. Packet-based communication. On the internet, data is bundled into self-contained "packets" that are routed through the internet.
  2. Automatic address lookup. Every internet-connected device has an IP address, which is similar to a telephone number in that it uniquely identifies a computer on the internet. However, you don't need to know this IP address for the website you're visiting. To visit a website, you can simply type the domain name (like "http://www.google.com") and the correct IP address will be automatically looked up.
We'll look at these two differences in turn — starting with a discussion of packets. Address lookup will be covered in Part IV.

But first, a note on IP Addresses

Currently, an IP ("Internet Protocol") address consists of four 8-bit numbers (ranging from 0 to 255) written in decimal and separated by dots, for example: 74.125.19.104. This 32-bit (4 x 8-bit) number is known as an IPv4 address and it can provide unique IDs for up to 232 (about 4 billion) computers.

Because the number of devices connected to the internet will soon exceed the number of available addresses in IPv4, a new standard called IPv6 is being introduced. An IPv6 address is 128-bits long and is written as a set of eight 16-bit hexadecimal numbers separated by colons (e.g., 2001:A3D2:32C9:1F37:0000:0000:0000:0000). These new IPv6 addresses can support up to 3.4 x 1038 unique devices.

In this activity, we'll be using IPv4 addresses exclusively because they are easier to work with, but the same concepts presented here also apply to IPv6 addresses.

Packet-based communication

Unlike the traditional landline phone system, the internet is packet based. This means that computers on the internet communicate with each other by sending packets of information back and forth. Think of it as sending tiny electronic letters to each other with the internet acting as the postal service. Large "letters" are not allowed — they must be broken into smaller pieces, sent separately and then reassembled on the receiving end.

As an example, imagine writing a long letter (email) to grandma. You'd have to cut it into (say) 3 pieces, put each piece in a separate envelope (packet), number them (1,2,3), and mail them separately. When your grandmother receives the letter she would wait for all 3 pieces (packets), tape them together and then read the letter. Note that the pieces (packets) might not be received in the correct 1,2,3 order, so she may need to sort them before taping them together.

The advantage of the packet-based system is that doesn't require a dedicated connection (remember the rope in the phone system example). This allows computers to support a very large number of simultaneous connections — receive a packet, respond to it, go to the next packet. Imagine how inefficient it would be if you needed to have a dedicated connection to access every website you wanted to visit. And popular websites like Google or Wikipedia would only be able to support a relatively small number of simultaneous users. In addition, most of the connection capacity would be wasted while the computers waited for the next command/communication. And once people got a connection, they would probably try to hold on to it for as long as possible.

So, how does this work? Basically the same was as the phone network except that we use "routers" instead of "switches". A "switch" (like we saw in the phone network), takes an incoming connection and connects it to its destination. It also needs to maintain this connection for the duration of the call. A "router" accepts a packet of information and then sends it on its way — completely forgetting about it after sending it off. This allows the router to support a much larger number of users than it could otherwise handle.
ACTION: Collect the switching tables from nodes 1-6 and replace them with routing tables. The switching tables can be set aside since they are no longer needed.
Skipping over the issue (for the time being) of how we lookup an IP address from a domain name like "www.google.com", let's follow an internet connection like we did with a phone connection.

We'll start with a very small "internet". In this network, we have the basic elements that we find on the internet:
  • Two websites: in this case, google.com and wikipedia.org. In this network, Google is represented by 2 nodes (#5 and #9). In general, large websites on the internet will be distributed across multiple machines.
  • Two homes: you and your neighbor
  • An ISP: this is your "Internet Service Provider". That's the company that you pay to keep you connected to the internet.
  • Three routers: "1", "2" and "3"

ACTION: Relabel the nodes on the graph as you mention them:

4 : ISP — IP address = 65.32.236.122
5 : google.com — IP address = 74.125.45.100
6 : wikipedia.org — IP address = 208.80.152.2
7 : you — IP address = 65.32.200.101
8 : neighbor — IP address = 65.32.200.102
9 : www.google.com — IP address = 74.125.19.104

Note that nodes 1, 2 and 3 don't have labels. These are the “routers”. No need for a label, but they should be referred to as routers during this part of the activity.
ACTION: Assign new students to the graph nodes, or re-assign the students to play different roles. Nodes 6 (wikipedia.org) and 8 (neighbor) aren't used for this part of the activity, so you will only need 7 students for the nodes on the graph + 1 additional student to act as a "runner". This runner will propagate the packets through the network.
As mentioned earlier, we'll start by pretending that we already know the IP address of the website we want to visit. So we already know (somehow) that "www.google.com" has an IP address of "74.125.19.104".

Once we have the IP address of a website, we can send a request to it. These requests are similar to letters in that they have a "TO" and a "FROM" and some content that is being sent.

Here is a sample packet that we're going to send to Google:
  • The "FROM" is your IP address.
  • The "TO" is the IP address of Google.
  • The default content is a generic "Hey, show me your website" request. This is what we commonly refer to as "visiting" a website.
When you want to visit a website, you send your request out so that it gets delivered to the correct website. Since your ISP is your connection to the internet, you sent your request first to your ISP which sends it along to its destination.
ACTION: Have the student at node 7 ("you") give the packet 1-1 to the "runner". The runner should then carry the packet, starting at “you”, to the ISP.

Packet: (1-1)
From: 65.32.200.101 (you)
To: 74.125.19.104 (www.google.com)
Message: Please show me website
The routers operate in much the same way at the phone switches do - except that they use IP addresses instead of telephone numbers to decide what to do.
ACTION: Follow this packet through the various routers, up to node #9: www.google.com.
When Google gets this request (packet), it sends back a response. This response is also a packet, but this time FROM google TO you. Since you didn't ask for anything specific, the contents of the packet are the main search page (the default for this website).
ACTION: Have the runner give the 1-1 packet to the "www.google.com" student at node #9. The website will now process the request and create a new packet to send back. Introduce packet 1-2 at node #9 and have the runner route it back to the user.

Packet: (1-2)
From: 74.125.19.104 (www.google.com)
To: 65.32.200.101 (you)
Message: Website contents (main search page with empty search box)

When you get the response packet, your web browser displays it on your computer screen. You can now interact with the webpage and enter a search term, like "donut", and send another request to Google.
ACTION: Have "you" give packet 1-3 to the runner and send it off.

Packet: (1-3)
From: 65.32.200.101 (you)
To: 74.125.19.104 (www.google.com)
Message: Search for "donut"
But this time, let's see what happens when the network is damaged. Let's pretend that link "Ii" is no longer working.
ACTION: Walk over to the "Ii" link and stand there as a reminder that this link is no longer functioning. Alternately, you can have a student stand there to block network traffic.
As with the phone example, the network routes around the damaged parts. Note that with the packet network, we didn't lose the connection and need to re-start it. The packet was already on its way and simply continued around the damage to reach the destination.

When "www.google.com" finally gets the request, it runs the requested search and creates a new packet with the results to send back.
ACTION: Have "www.google.com" give packet 1-4 to the runner to send back.

Packet: (1-4)
From: 74.125.19.104 (www.google.com)
To: 65.32.200.101 (you)
Message: Search results

If we have more damage to the network...
ACTION: Walk over to link "Dd" to take it out of commission. Make sure the keep "Ii" blocked as well.
...the network will route the packet around the new damage as well.

And eventually the search results come back to you.

So, a question: How long does all this take to run end-to-end? Well, how long do you wait for a webpage to return results before you start to get annoyed that the "internet is slow"? In general, anything that takes longer than 1 second is irritatingly slow.

Setup for part IV

So now we have our search results for "donut". Let's say there's a wikipedia link (http://en.wikipedia.org/wiki/Doughnuts) in the results. What happens when we click on that? And how do we find the wikipedia.org website if we don't know its IP address?

[to be continued in Part IV]

2 comments:

VRaptorX said...

So, if I click a link on Google, and it takes me to Wikipedia. Then, on Wikipedia, I click a link to go to... whatever link they used for a reference... what step process is that taking? I mean:

Option 1) Is it going from My ISP to Google. Then Google asks out a request to Wiki? Then Wiki asks out a request to ___, etc.

or

Option 2) My ISP requests from Google. Google gives me stuff back. My ISP then requests to the wiki. Wiki gives me stuff back. My ISP requests ___, etc.

Basically, how involved is the "Middle man" between my ISP and the other end? Technically I'm requesting info, but I'm using another site, with an existing precense, to do so. Does that site also request, or request for me, or is it all me?

Unknown said...

Don't think of it as "clicking a link on Google". And Google doesn't "take" you to Wikipedia, it just provides a link that you can choose to follow.

Google has sent you a set of results. You have them on your machine (in your browser). When you click on a link, you are always initiating it from your browser.

Now, this can be different for dynamic web sites (where they use JavaScript to perform actions on the server for you), but for web sites that return static content (like Wikipedia or Google's search results), all your requests (and link navigations) start from your machine and go through your ISP.

So, it's closer to Option 2, except that your ISP isn't requesting anything for you. *You* are requesting it and it goes through your ISP to get out onto the internet.