Back to article
Networking 101: Understanding the Internet Protocol
Networking 101: Understanding the Internet Protocol
September 12, 2008
Internet Protocol (IP) sits directly on top of layer 2, and is responsible for getting datagrams to their destination. Originally defined in RFC 791, IP has changed and been clarified a few times since, but the fundamental design remains the same. The IP layer does not provide any type of flow-control or sequencing capabilities—that's left to the upper layers. We'll be using "datagram" to refer to an entire IP message, and "packet" to identify an individual IP packet.
IP sends and receives packets to and from IP addresses, but doesn't promise reliable delivery. There is no concept of "retries" in the IP layer. For various reasons, packets may be lost, corrupted, duplicated, delivered out of order or otherwise delayed. IP is also responsible for dealing with IP options and giving feedback in the form of ICMP error and control messages, explained last week in our look at ICMP.
The IP header, 20 bytes long, comes immediately after the layer 2 header (because IP is layer 3). The IP data portion holds everything else, including a TCP or UDP packet in its entirety, as shown in the table below. Also note that the IP header can exceed 20 bytes, if IP options are used.
IP is pretty straightforward, in that IP's goal is simple: get a datagram to the destination, and don't worry about anything but sending it to the next hop router. In reality, IP is more complex; else the header wouldn't have so many fields. Sorry for this, but scrutinizing the IP header is important. The fields, starting at the top (first bit) are:
IP fragmentation is really the key to IP functionality, and exploring fragmentation is also educational: It gives real meaning to those header fields. Not every network passing an IP packet around is capable of sending the same sizes of packets. Various layer 2 frame formats allow different amounts of data to be sent at once. The largest MTU (define) allowed is 65KB, and the smallest is 68 bytes. RFC 1122 says that all hosts must be able to reassemble datagrams that are up to 576 bytes, but should in fact be able to reassemble datagrams that are the size of the interface's MTU.
When shipping an IP datagram over the Internet, you have no idea what the MTU will happen to be along every layer 2 link. Your ISP might be connected via Ethernet to a tier 1 ISP, but the remote site you're trying to access could be on an ISDN link. Therefore, your IP packets will have to be fragmented before the last hop. Fragmentation can happen many times, too. If we wanted to send a 2000 byte packet to a remote site connected via ISDN, we would originally fragment the packet to fit on our 1500 byte link. We will still send an IP packet larger than 576 bytes (ISDN's MTU), so the last router before the ISDN link will have to fragment it as well.
Also recall that IP is not a reliable protocol, so if any IP fragment gets lost along the way, the entire datagram must be resent. IP has no way to request the missing portion, so when something bad happens the result can be a large increase in traffic because of the retransmissions that will likely happen. Sometimes congested routers will have to drop a packet, and if that packet happens to be part of a 65K datagram, then entire thing must be resent. The upper protocol, TCP or other, will normally know if an entire datagram is missing, and can request a retransmission. However, TCP cannot tell if a fragment is missing, since the IP datagram will be incomplete and never sent upstairs to TCP. If TCP never receives the packet, it will eventually be resent. It is clear that the loss of a small portion of a 65K packet doesn't help alleviate a congested link, but rather contributes to more traffic. UDP applications commonly don't exceed 576 bytes for the sending size, and this helps two things. First, there aren't many links with MTUs smaller than 576, so it is likely that the IP datagram will not be fragmented. Second, remember that 576 is the magical number for all end systems speaking IP: they all must be able to reassemble datagrams up to this size. Devices with limited memory may have trouble with anything larger, so this is actually worth considering.
Let's pretend we're a host, and we'd like to send an IP datagram of 1550 bytes (1530 data + 20 header), but our MTU is 1500 bytes. We'll have to send two fragments, and the relevant IP headers will look like this:
• fragment 0, offset = 0, size = 1480, MF bit set.
The IP ID and IP addresses in the fragments are always the same as the original IP datagram, but the header checksum, offset, and length fields will definitely change. When the other end gets the first packet and sees that it is a fragment, it will wait to get the rest, reassemble them, and then pass them up the stack to the next protocol.
After this is sent, we won't hear anything more about it, assuming the DF bit isn't set in the IP flags. But what happens if somewhere along the link the MTU is 400 bytes? Before the 1480 byte packet can be sent, the router on this link will fragment it too. Path MTU, discussed last week, is used to get around the problem of intermediate routers having to fragment packets. Fragmentation takes time and precious resources on routers. The main reason we want to avoid excessive fragmentation is simply because of the extra delay that's inevitably introduced.
Reassembly is always done at the final destination, so intermediate routers don't need to store IP datagrams. This also means that IP packets can be routed independently, over different paths without cause for concern. This is an important concept to understand--it makes IP very versatile. No matter what order the receiver gets the packets, it will be able to reassemble them based on the offset field in the IP header.
And now that we understand fragmentation, we find that it begs the question "does IP really hide the link-layer?"
When he's not writing for Enterprise Networking Planet or riding his motorcycle, Charlie Schluting is the Associate Director of Computing Infrastructure at Portland State University. Charlie also operates OmniTraining.net, and recently finished Network Ninja, a must-read for every network engineer.
Article courtesy of Enterprise Networking Planet