The world is peppered with people having issues with slow performance to storage arrays, that is traced down to packet drops on switches.

What’s going on, and how can you protect yourself against it?

Usually, it is due to the wrong switch being used for the role, a configuration issue that reduces available buffers, or a network design that makes the situation more likely to occur (e.g. high oversubscription rates such as 10Gb hosts pointing to a 1Gb storage array, or vice versa).

In an ideal world, switches should have sufficient buffer capacity, management, configurability and visibility.

Low end, older switches do not have a great level of buffer capability, as it was less of an issue - when storage arrays were slower and there wasn’t a potential of mixing 1Gb and 10Gb connections.

For example, the Cisco Catalyst 3750-X is sometimes used for iSCSI (a role it was not designed for), but with only 2mb of buffer per 24 ports, it doesn’t take much to see it dropping packets.

Surprisingly even older generation datacenter switches such as the Nexus 5548 and the Nexus 7009 can have this issue:

Thankfully, the latest generation of switches from Cisco have significantly enhanced buffer capabilities. A not complete list of these are the 9300 (from 37MB buffer), 5600 (from 150MB buffer) series and 2248TP-E FEXs (32MB of buffer).

References: Cisco Nexus 9300 Platform Buffer and Queuing Architecture

Nexus 2248TP-E FEX for iSCSI/Backup

Nexus 5600 use case - iSCSI and over shared network