This article is part of an Essential Guide, our editor-selected collection of our best articles, videos and other content on this topic. Explore more in this guide:
3. - VMware monitoring vocab: Read more in this section
Explore other sections in this guide:
- 1. - Performance-monitoring tools and techniques for vSphere
- 2. - Keeping an eye on your vCloud infrastructure
A bottleneck, in a communications context, is a point in the enterprise where the flow of data is impaired or stopped entirely. Effectively, there isn't enough data handling capacity to handle the current volume of traffic.
A bottleneck can occur in the user network or storage fabric or within servers where there is excessive contention for internal server resources, such as CPU processing power, memory, or I/O (input/output). As a result, data flow slows down to the speed of the slowest point in the data path. This slow down affects application performance, especially for databases and other heavy transactional applications, and can even cause some applications to crash.
A bottleneck frequently arises from poor network or storage fabric designs. Mismatched hardware selection is a common cause. For example, if a workgroup server is fitted with a Gigabit Ethernet port but the corresponding switch port that connects to the server only offers a legacy 10/100 Ethernet port, the slow switch port will then pose a bottleneck to the server. Another design flaw common to storage networks is excess fan-in, where multiple storage devices are connected to the same switch port in order to maximize the use of that switch port's bandwidth. For example, connecting multiple four- gigabit (Gb) Fibre Channel storage devices to the same switch port can easily overwhelm the switch port and result in performance problems if multiple storage devices are active simultaneously. In many cases, bottlenecks develop over time because administrators fail to track the increasing demands of network and storage traffic.
Bottlenecks can also develop due to poor or sub-optimal configuration of switches or host bus adapters (HBAs). For example, using multiple Fibre Channel ports to connect devices within the storage switching fabric can improve storage availability and performance but if the interconnected devices are not configured for load balancing much of the benefit is lost. Similarly, bottleneck conditions can occur due to hardware failures. From the previous example, suppose that one of two Fibre Channel links should fail. Although failover should allow the storage device to remain accessible, all the traffic that used to be carried by two links now fails over to one -- potentially resulting in a bottleneck if that traffic exceeds the bandwidth of a single link.
Bottlenecks are typically located by systematically testing network performance at various devices along a data path and isolating devices performing noticeably slower than other points. Once identified, the bottleneck can usually be resolved by reconfiguring, upgrading or replacing the offending device. At the network level, this may involve upgrading a switch or HBA. For servers, a CPU or memory upgrade may help or the server may need to be replaced entirely (for example, replacing an aging single-CPU server with a newer dual-or quad-CPU server). Bottlenecks can often be avoided by proactively monitoring traffic load trends over time and implementing improvements before serious problems develop.