30 November 2012

Spanning-tree Loop Troubleshooting and SafeGuards

Problem Description:

Spanning tree loop caused network outage
Action Plan:
Implement Layer 2 safeguards designed to protect against STP loops and mitigates the impact if one does occur.
1) First, verify that currently the proper switch is STP root for all VLANs. Then enable root guard on Root/Core switch on all uplink ports to the distribution layer switches.
Excellent doc that details root guard. See the section titled ” What Is the Difference Between STP BPDU Guard and STP Root Guard?” for clarification on the difference. You want root guard on the root and bpdu guard on the access layer. You do not want root guard on the port channel between core switches running HSRP. Only on the uplinks to other switches that you do NOT want to become spanning tree root.
2) Enable loop guard on all distribution/access layer switches
3) Enable BPDU guard on all distribution/access layer switches
4) Enable UDLD aggressive on all fiber uplinks
Unidirectional links can cause spanning tree loops. UDLD Aggressive will prevent this by shutting down a unidirectional link.
5) Prune unnecessary VLANs off your trunks
After implementing root guard, loop guard, UDLD aggressive, and BPDU guard, bring the link back up and see if the loop reforms.
1) Have a TAC engineer online to troubleshoot
2) Enable mac-address move notification (if applicable – this is disabled by default on the 6500/7600 platform and enabled by default on others)
 ITLABSW#(config)#mac-address-table notification mac-move
Check the switch log for mac’s flapping between interfaces. These are the ports that are participating in the loop. Trace the MAC back to its source. Look for:
A link flapping on a upstream switch, causing spanning tree TCNs and SPanning Tree reconvergence. This should be used in conjunction with step 3 below.
A unidirectional link on an upstream switch causing the loop.
A hub or switch connected to a portfast enabled access port where this mac is learned. Shut this port down and see if this breaks the loop.
3) Check for TCNs
While the loop is occurring, if you see excessive TCNs you need to trace the TCNs (topology change notifications) to the source . To do this, start from the core and run the following commands:
 ITLABSW#show spanning-tree detail | inc ieee|occurr|from|is exec
The output from this command will show you the port the last TCN was received on and the time which it was received. Look for the port that received a TCN in the last few seconds.
 ITLABSW#sh spanning-tree detail | i ieee|occur|from|is exec

   VLAN0001 is executing the rstp compatible Spanning Tree protocol

     Number of topology changes 187927 last change occurred 00:01 ago <-time rec'd

         from Port-Channel12 <--interface that received the TCN
You will want to follow this port until the port that receives the TCN is an access port, or until the switch in question is generating TCNs but not receiving them. If you find an access port receiving TCNs, shut it down.
If you find a switch generating TCNs, you will want to look for two ports in a spanning tree forwarding state for the same VLAN. If you find two ports in a forwarding state, shut one port down and see if this breaks the loop. Check for a unidirectional link or excessive link flaps.
4) Look for packets hitting the CPU. Sniff the CPU and see if the packets share a common source. Track down the source. If they are STP or CDP packets (or packets destined to the 0100.0CCC.CCCX reserved multicast address) trace where the source mac is learned. See if the source mac leads you in a loop.
If you see two ports in a forwarding state for the same VLAN on the same switch, we need to look for the following:
a) does this switch think he is the root for this VLAN (or vlans)?
b) should he be?
c) Is he receiving BPDUs from his neighbor on the ports in a forwarding state? (sniff both forwarding ports to look for BPDUs)
d) look for a unidirectional link on one of the ports in a forwarding state
e) shut one of the ports in a forwarding state and see if the loop stops
5) Look for an interface with a very high input rate and low output rate.
 ITLABSW#sh int | i is up|rate
When a bridging loop occurs you will usually see multiple interfaces with a high output rate and low input rate and a single interface with a high input rate and low output rate.
- Trace the port with the high input rate down until you come to an access port and shut it down
- If the port with the high input rate leads you into a loop you will want to check spanning tree states until you either find a switch that has a port in an incorrect forwarding state or some other reason that is causing us to loop packets. TAC will need to assist here.

No comments:

Post a Comment