Back to article
Networking 101: Understanding Spanning Tree
No Exploding Switches
August 27, 2008
The much anticipated spanning tree edition of Networking 101 has finally arrived. Yes, you too can have a network that survives multiple exploding switches. Read on.
The spanning tree protocol was invented by Dr. Radia Perlman, distinguished engineer at Sun Microsystems. Dr. Perlman devised a method by which bridges can obtain layer 2 routing utopia: redundant and loop-free operation. Think of spanning tree as a tree that the bridge keeps in memory for the purposes of optimized and fault-tolerant data forwarding.
The issue we're dealing with is depicted in figure 1. If the switches were connected in this manner without spanning-tree, each switch would infinitely duplicate the first broadcast packet heard, until they ran out of memory and keeled over from exhaustion. There's nothing at layer 2 to stop a loop from happening. In figure 1, the admin would have to disable the red links manually for this Ethernet network to operate. Spanning tree will block one or more of the links until the active one becomes unusable, and then fail over to the other available link. How spanning tree decides which link to use depends entirely on the topology that it can see.
The idea behind a spanning tree topology is bridges can discover a subset of the topology that is loop-free, i.e. a tree. It also makes certain there is enough connectivity to reach every portion of the network; it's spanning the entire LAN. Bridges will perform the spanning tree algorithm when they are first connected, as well as any time there is a topology change.
When a bridge hears a certain type of "configuration message," a special type of BPDU (bridge protocol data unit) (define) , it will begin its disruptive spanning tree algorithm. This starts with an election, whereby the "root bridge" is elected. The root bridge is the heart of the topology, and all data will essentially flow through it. On a side note, do take care to configure the root bridge manually, the election process with Cisco hardware poses some problems: oversimplified, it normally uses the lowest MAC address, which always happens to be oldest, slowest switch on the network. The next step is for each bridge to determine the shortest path to the root bridge, so that it knows how to get to the "center." The second election happens on each LAN, and it elects the designated bridge, or the bridge that's closest to the root bridge. The designated bridge will forward packets from the LAN toward the root bridge. The final step for an individual bridge is to select a root port. This simply means "the port that I use to send data toward the root bridge." Note, every single port on a bridge, even ones connected to end systems (computers), will participate in the spanning tree unless a port is configured as "ignore."
That's how the election happens, but this doesn't really explain what spanning tree does in the real world. We said that this calculation was disruptive; have no doubts, it really is. To perform this calculation, bridges must stop all traffic. They go through a series of listening and learning stages, and only start to forward traffic once the topology is established. Not so bad, right? It only happens when the topology changes or when a bridge gets a "configuration" BPDU, right? Right, but that happens more often than you'd think.
The whole idea behind spanning tree is that you can have a link fail, because you have a pair of bridges connected via two paths. Spanning tree will leave a port blocked until it is needed. Then we should be able to unplug redundant links and connect new ones without interruptions? Sorry, but no, it doesn't work that way.
When the physical media comes "up" the newly connected bridge will send the reconfiguration BPDU, and the other connected device will comply. All traffic is stopped for roughly 50 seconds while a spanning tree calculation takes place. It's wonderful in practice, since you are limited to a short downtime compared to a permanent downtime if a switch explodes and you lack redundant paths, but the 50-second penalty is very sub-optimal.
Quite recently certain vendors have implemented rapid spanning tree: a modified version of the spanning tree algorithm that is more cautious of creating outages. Everyone should be using rapid spanning tree (RSTP) at this point. It's fully compatible with older devices that only know the old spanning tree algorithm, and it reduces the 50-second outage time to less than three in most cases.
Hopefully the picture is clear enough at this point. We understand that enabling spanning tree will allow us to connect two bridges together via multiple links without creating a loop. If a bridge in-between dies, we can just fail over and use the other link. This works because although the active switch has been blocking its alternative link, it has been listening silently to BPDU updates and knows which links will lead to the root still. That is, if you've configured it properly. Remember VLAN trunks? What would happen if one of the physical links happened to be a VLAN trunk? If we only had one spanning tree instance running, it would probably find that one of the networks on the trunk shouldn't use this link. It has no choice but to turn off the entire link.
Enter per-VLAN spanning trees (PVST). When enabled, a bridge will run one spanning tree instance per VLAN on the bridge. If a trunk link contains VLANs 1, 2, and 3, it can decide that VLANs 1 and 2 should not take that path, but still allow VLAN 3 to use it. In complex networks, there will exist many situations where VLAN 3 only has one way out, probably because the admin wanted to limit where VLAN 3 reached. If we weren't using PVST and the trunk port was blocked by spanning tree, VLAN 3 on this bridge would have no connectivity with the rest of its LAN. Everyone should use PVST.
Finally, you mustn't forget, any port that sends a BPDU can cause a network outage. This includes a computer running ettercap or other nefarious applications. Be sure to enable something akin to BPDU-Guard from Cisco on all host ports to block BPDU packets. Not only can they cause spanning tree recalculations, but a computer can also stuff the ballot and win an election. You really don't want to discover that your spanning tree root is JoeBob's computer. It's really easy to pull off man-in-the-middle attacks when all traffic is flowing through you already!
There are a few unmentioned types of BPDU messages and other details about the spanning tree protocol to learn. The details get a little bit complex, but they should be easy to understand now that you know the big picture. If you want a redundant, yet loop-free layer 2, time spent learning these details will pay off in the long run.
When he's not writing for Enterprise Networking Planet or riding his motorcycle, Charlie Schluting is the Associate Director of Computing Infrastructure at Portland State University. Charlie also operates OmniTraining.net, and recently finished Network Ninja, a must-read for every network engineer.
Article courtesy of Enterprise Networking Planet