Untitled Document
On the afternoon of Wednesday, November 13, 2002, BIDMC experienced a network slowdown. Over the next 3 days, network connectivity was restored but service quality was irregular. Late Saturday night, the network was restabilized and by Sunday all access to applications was restored.
Technical explanation:
When Cisco TAC
was first able to access and assess the network, they found the
Layer 2 structure of the network to be unstable and out of specification
with 802.1d standards. The management vlan (vlan 1) had in some locations 10
Layer2 hops from root.
The conservative
default values for the Spanning Tree Protocol (STP)
impose a maximum network diameter of seven. This means that two
distinct bridges in the network should not be more than seven hops away from
one to the other.
Part of this restriction
is coming from the age field Bridge Protocol
Data Unit (BPDU) carry: when a BPDU is propagated from the root bridge
towards the leaves of the tree, the age field is incremented each time
it goes though a bridge. Eventually, when the age field of a BPDU goes
beyond max age, it is discarded. Typically, this will occur if the
root is too far away from some bridges of the network. This issue will
impact convergence of the spanning tree.
A major contributor
to this STP issue was the PACS network and its
connection to the CareGroup network. To eliminate its influence on the
Care Group network we isolated it with a Layer 3 boundary. All
redundancy in the network was removed to ensure no STP loops were
possible.
Full connectivity
was restored to remote devices and networks that
were disconnected in troubleshooting efforts prior to TACs involvement.
Redundancy was returned between the core campus devices. Spanning Tree
was stabilized and localized issues were pursued.