There are often redundant connections in a large local internetwork. For example, there could be several bridges running in parallel to connect two LANs, for load-distribution and failure-safety reasons. Figure 12-5 shows an example with redundant connected networks. In this example, if station A in LAN 2 sends a packet to computer B in LAN 5, then bridge 1 floods this packet to LAN 1, and bridge 3 floods it to LANs 3 and 5. Bridge 3 learns that it can reach station A in LAN 2. In the meantime, bridge 2 receives the packet in LAN 1 and floods it to LANs 3 and 4. This means that bridge 3 receives the same packet again, only this time over a different network adapter. Using its learning function, this bridge changes that entry in the forwarding table and floods the packet to the other networks. We could continue this example endlessly to see that, with this network topology, the forwarding tables of all stations would change continuously, and packets would be duplicated and travel around in the network. The bridges have no way to recognize and destroy duplicate packets.
Figure 12-5. The effect of cycles.
Transparent bridges uses the so-called spanning-tree protocol to solve this problem. This protocol should detect redundant connections in a cyclic topology and build a tree structure that does not include any more cycles. Redundant connections are made inactive and can be reactivated when needed. This means that the LAN internetwork maintains its redundancy. Special messages are used by the bridges in the internetwork to work out the tree structure and to build this structure in a decen tralized way.
The spanning-tree method is known from graph theory [OTWi96]. Normally, a spanning tree with minimum total cost can be constructed with an undirected connected graph, where the edges are used as weights to allocate costs. Several algorithms have been introduced as minimum spanning tree (MST) methods to handle this task. The spanning-tree method described here and the MST method have in common that a connected graph is used to form a tree structure. However, the spanning tree in a LAN internetwork is not always the minimum spanning tree from the MST method. This is shown by the example in Figure 12-6.
Figure 12-6. Spanning-tree protocol versus the MST method.
Under the spanning-tree protocol, the root of the tree topology is not determined by the least total cost; instead, the bridge with the smallest bridge identifier is selected. The reason is that the spanning-tree algorithm operates in a decentralized way? it is not calculated centrally in one station. This means that, first of all, all bridges have to agree on the bridge to be selected as the root of the tree. Subsequently, working from the root, the branches of the tree with "minimum" path cost are calculated. These minimum-cost paths do not necessarily have to correspond to the tree structure with the least total cost.
Prerequisites and Terminology
Bridges need certain parameters and values to be able to run the spanning-tree algorithm. These values are then used to manipulate the resulting spanning tree. The following parameters are required by the spanning-tree algorithm:
• Each bridge requires a unique 6-byte identifier, the bridge ID.
• Each network adapter (port) of a bridge obtains a unique identifier, the port ID.
• Port cost is assigned to each network adapter. This cost influences the structure of the tree topology, because the total cost should be minimized by the spanning-tree algorithm. For example, the port cost can reflect the load on or speed of a local area network.
• When two LANs can be reached over several paths, then a priority for each network adapter (port priority) can be considered when selecting a path. The spanning-tree algorithm will then select the adapter with higher priority and equal path cost.
The following are other important terms:
• Root bridge: This is the bridge representing the root of the tree topology.
• Root port: This is the port of a bridge with the least transmission cost to the root bridge.
• Root-path cost: This is the sum of the cost of all root ports on the path from a LAN within the internetwork to the root bridge. The objective is to find the path with the least root-path cost. Figure 12-7 shows these terms in an example of the topology described above, where Bridge1 is
the root of the tree structure shown in Figure 12-8.
Figure 12-7. Topology after running the spanning-tree protocol.
Figure 12-8. Tree topology of the LAN internetwork. [View full size image]
Special packets in the form of so-called Bridge Protocol Data Units (BPDUs) are exchanged to determine the root bridge and distribute path or port cost. There are two types of BPDUs:
• Configuration BPDUs are also called hello packets or configuration messages. They are used to announce the root-bridge identifier, the cost currently accumulated, and certain timer values. Section 12.4.5 will describe the format of this configuration BPDU.
• Topology change notification BPDUs (TCN BPDUs): These packets are exchanged when changes occur in the topology. This can happen when a component has failed and when the execution of the spanning-tree method causes certain network adapters of bridges to move into the blocking state.
Bridge PDUs are sent with a special group MAC address. This means that each bridge that receives such a packet can identify a bridge PDU.
Running the Spanning-Tree Algorithm
The spanning-tree algorithm is defined in IEEE standard 802.1d. It specifies the principle used to build a noncyclic topology from a partly meshed or cyclic LAN internetwork. This method operates in an absolutely decentralized way.
The spanning-tree algorithm runs in three steps:
1. Select the root bridge: The root bridge is the root of the tree topology we want to build. The problem is now to select one of the bridges as the root bridge. To this end, we use a principle similar to the one used in token-ring networks: The bridge with the smallest identifier (bridge ID) is selected as the root bridge.
At the beginning, the bridges in the LAN internetwork send configuration BPDUs periodically with their own identifiers as root ID to all other bridges. When a bridge receives a BPDU, it is immediately compared with its own bridge ID. If the received root ID is smaller, then the BPDU is forwarded. In contrast, if the own bridge ID is smaller, then it is registered as the root ID and distributed to the other bridges. Eventually, the bridge with the smallest identifier becomes the root bridge.
One major benefit of this principle is its decentralized property. This means that no central management unit is required. However, the path cost in a LAN internetwork does not play any role in determining the root bridge. This means that you won't necessarily select the best topology, such as in the Minimal Spanning Tree method.
2. Determine the root port of each bridge: Each bridge selects the network adapter with the
smallest path cost on the path to the root bridge as its root port (root-path cost, RPC). If several paths have the same cost, then the port with the highest priority or (if no priorities are set) the port with the smallest port ID is selected as the root port.
3. Select the designated bridge for a LAN: When one subnetwork within the LAN internetwork is connected to several bridges, so that at least one route over each of these bridges leads to the root bridge, then one of these bridges has to be selected for traffic forwarding to the root bridge. This is the only way to create a tree topology. In a local area network, the bridge with the smallest path cost to the root bridge (the so-called root-path cost) is normally selected. The network adapter used to connect this designated bridge to the local area network is called the designated port. Consequently, there is only one single designated port for each LAN. All adapters of the root bridge are designated ports.
All output adapters that were not selected as root ports or designated ports are locked (i.e., they take the blocking state). Though no payload packets will be transported over these
ports, they can continue receiving BPDUs. This means that a deactivated adapter can detect a component failure and reactivate itself when needed.
Behavior When a Component Fails
When an active bridge (i.e., a root bridge or a designated bridge or an active port) fails, then this can be discovered by a message-age mechanism. To this end, each bridge manages a max age value. If the message age value of a BPDU (see Section 12.4.5) exceeds this value, then the spanning-tree algorithm is reactivated to check for which bridges should be active in the new topology. More specifically, bridges where network adapters change states send the topology-change notification BPDUs described above over the path to the root bridge. This means that all other bridges are informed about a change in topology, so that they can respond accordingly.
The message-age value of a bridge PDU is incremented after each forwarding action. If a failure or the adding of a new bridge causes a cycle, then the message-age value increases continually as the packet cycles, eventually reaching the threshold that triggers the spanning-tree algorithm (to reconfigure the LAN internetwork).
Figure 12-9 shows the topology from Figure 12-7, but with a change: Bridge 3 has failed. This means that the connection from LAN 3 to the root bridge over bridge 2 has to be restored, and LAN 5 is reached over bridge 4. The blocked ports of bridge 4, eth0 and eth1, are activated in this
situation, allowing proper communication, even though bridge 3 failed.
Figure 12-9. Topology of Figure 12-7 after bridge 3 has failed.
Avoiding Temporary Loops
The decentralized operation of the spanning tree algorithm makes it possible that some bridges have not stored the globally correct information (i.e., they have only local knowledge). For this reason, the interfaces could be in a "wrong" state, causing loops that can be removed during the further procedure. For example, if one interface is the designated port, and if no configuration message from a
higher-order bridge has arrived in this bridge yet, then data packets would be forwarded on the basis of their local information. Globally, this would cause a loop and the wrong behavior described earlier. To solve this problem, the standard includes two intermediate states between the blocking and the forwarding states. The transition from one state to another occurs when the so-called forward delay timer expires. In the listening state, a bridge must neither learn addresses nor forward packets. It
receives configuration messages only if these messages reset the interface into the blocking state. The next state allows the bridge to enter addresses in the forwarding table (learning function); this state is called the learning state. In the forwarding state, which is reached after another expiry of the
forward delay timer, data packets can be forwarded. Figure 12-10 shows the state transitions of a network adapter.
Figure 12-10. State automaton of a bridge port. [View full size image]