Design Guide
© 2008 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information.
Page 20 of 28
●
Primary root switch failure and recovery
●
Secondary root switch failure and recovery
These tests revealed the intricacies of fast convergence in the data center and the necessity for a
holistic approach to high availability. Test cases that did not involve the failure of the active HSRP
aggregation switch resulted in an average failover time of about 1 second. Failing the active HSRP
device requires convergence at Layer 3 and resulted in a recovery time that reflected the settings
of the HSRP timers.
It is possible to tune the HSRP timers for subsecond convergence. However, when multiple HSRP
devices are involved the recovery time is typically in the 5-second range.
In this topology, 2 to 4 Gigabit Ethernet links compose the port-channel uplinks between the access
and aggregation layers. This configuration allows a single link to fail without triggering Spanning
Tree Protocol convergence.
Note:
The default gateway for the servers is the HSRP address of the Layer 3 aggregation
switches. Failover times may be affected if the default gateway of the server is located on another
device, such as a load balancer or firewall.
The recommended topology provides a high level of availability to the blade servers except in one
failure scenario. If all the uplinks to each of the aggregation switches from a single Cisco Catalyst
Blade Switch 3020 are unavailable, the server NICs homed to that Cisco Catalyst Blade Switch
3020 are not notified by default. The blade servers are unaware of the disconnection between the
access layer switches (Cisco Catalyst Blade Switch 3020s) and the aggregation layer switches, so
they continue to forward traffic. To address this breakdown in network connectivity, use one of the
following methods:
●
Use the NIC teaming features of the ProLiant blade servers
●
Deploy the Layer 2 trunk failover feature in the Cisco Catalyst Blade Switch 3020s
In addition, the NIC teaming features of the blade servers provide redundancy at the network-
adapter level. Stagger the preferred primary NICs between the two Cisco switches in the enclosure
to increase server availability. Assigning the primary NIC is a straightforward process. The NIC
teaming software provides a GUI or a small configuration file, depending on the operating system,
to construct the team. HP also offers network-aware teaming software to verify and detect network
routes. For more information about these features, visit the ProLiant Essential Intelligent Network
Pack at
http://h18004.www1.hp.com/products/servers/proliantessentials/inp/index.html
By monitoring the health of a server farm, a load balancer can bypass the network failure by
redirecting traffic to available servers, helping ensure fulfillment of end-user requests despite the
network failure.
The recommended network topology allows for traffic monitoring either locally or remotely using
SPAN. Local SPAN supports monitoring of network traffic within one switch, whereas RSPAN
allows the destination of mirrored traffic to be another switch within the data center. The source of
mirrored traffic for a SPAN or RSPAN session can be one or more ports or VLANs.
Local SPAN is readily supported by the Cisco Catalyst Blade Switch 3020 over any of the external
Gigabit Ethernet ports. This connection is an ideal location to attach an IDS or other network-
analysis device.