"Big data" is touted as a tool that can generate business-changing insights and fuel new revenues. As a result, some organizations are using big data to incorporate a broader variety of data sources and types in their analytics; in other cases, companies just want faster analysis.
Regardless of the strategic objectives, however, organizations dealing with huge volumes of data must have a big data infrastructure in place that can accommodate the load of storing, analyzing and transporting the data. The IT department is often drawn into discussions about the compute and storage resources needed for big data, yet IT staffers are less often asked to make sure the networks are ready. IT teams must be proactive to ensure their network and WAN management is properly mapped out when big data is on the roadmap. Three areas to study:
In the data center
Depending on the kind of data management software in use and the kinds of data analyzed, big data can affect the size and frequency of data movements among servers and between servers and storage. Of most concern are data sets involving large objects, such as video clips or high-resolution medical images. Moving such objects from long-term storage to analytical nodes will place a bigger burden on storage systems and on storage and data networks than, say, moving myriad small text files.
To meet that kind of demand, IT needs to make sure that storage systems and the storage area network (SAN) have sufficient throughput to handle big data traffic in addition to normal operation traffic. Where they don't, it's time to either beef things up or spread things out: Increase I/O on storage controllers or the amount of bandwidth supporting the SAN, or sidestep that by distributing data across more controllers.
On the WAN
If the data to be analyzed are being gathered from locations across the WAN -- retail outlets, factories, regional offices -- and the data represents brand-new traffic, IT must make sure the WAN can handle the new load. This will be more of a challenge if the data objects are large, such as video content. Prioritization and traffic shaping can be remedies if problems crop up. WAN optimizers can help as well; many kinds of data that are fundamental in big data analyses are highly compressible, such as security logs or office documents.
On the Internet
Sometimes, the data may be coming from an Internet source (a social media site, for example). Moreover, the internal site from which data is being pulled may be one without a dedicated WAN link, so data has to move over a VPN across the Internet. In either case, the Internet link on the data center end has to be sized to handle the new load in addition to existing normal traffic. IT may need to use prioritization on the connection to keep the big data transfers from interfering with higher priority traffic streams, such as those for telephony, conferencing or more mission-critical applications. Compression can be an effective approach to minimizing the impact on bandwidth, where it can be implemented either via an appliance (virtual or physical) or via a soft client. When the data is coming from a third-party source, though, compression is not usually an option.
To be ready for big data, any organization needs to assess its infrastructure's readiness -- and not just the storage and computer portions. By scoping the types and sizes of data objects moving across the various network tiers, IT can properly prepare the network where needed, applying traffic shaping and other optimization tools in place to keep the data flowing without making any other services suffer.