"Data reduction" is a technique which can provide large (as in 10 to 100x) reductions in the volume of information traveling over the WAN. Appliances on both ends of the WAN link inspect all incoming and outgoing WAN traffic and store a local instance of information in an application independent data store.
Outbound WAN packets are examined to see if a match exists in the local instance at the destination location. If a match exists, then the duplicate information is not sent across the WAN and instructions are sent to reconstruct and deliver the data locally from the data store. If the data has been modified, only the delta is transmitted across the WAN, maximizing bandwidth utilization and application performance.
Data reduction dramatically improves WAN utilization and improves application response time by enabling information to be delivered locally whenever possible.
On the surface, data reduction sounds like caching because it monitors traffic and stores information locally for future delivery. However, digging deeper, there are significant differences that have a big impact on how applications are delivered. These include:
Application breadth: Data reduction techniques work at the network-layer of the ISO stack. In contrast, caches operate at the application level. As a result, data reduction works on many applications, whereas caching solutions are application-specific. Matching traffic patterns vs. application objects: In a caching solution, devices match objects, like files or Web pages. A cache makes its matching decision based on a label like a file name or URL. If the application and label is the same, the object is deemed to be the same. If the labels are different, then the data is assumed to be different. In a data reduction solution, appliances recognize traffic byte streams independent of any application specific object formats or labels. This means that data reduction can recognize duplication across applications and between objects with different names. Caching requires an exact hit with identical labels of content to be effective. The same file or page with a different name will result in a cache miss. Byte granular deltas: By operating at the byte level, data reduction solutions are able to detect similar content and identify just the deltas which need to be sent across the WAN. Caching solutions operate on entire objects, which are either a complete match or a complete miss. Because many business processes involve multiple revisions of documents or slightly varying versions of the same document, caching solutions miss much of the data duplication found by data reduction techniques. Application transparency and data coherency: Unlike caches, which often reply on behalf of any application without checking with the authoritative server, data reduction solutions do not alter the application semantics. All requests are sent to and responded to by the native application server. This ensures data coherency and eliminates the possibility that a proxy device might serve inaccurate or outdated information.
There is enormous value associated with storing commonly used data locally for future application delivery. This enables application servers to be stored centrally, while maintaining LAN-like performance and keeping WAN bandwidth costs to a minimum.
While caching pioneered this concept, data reduction has emerged as a way of providing much greater performance benefits across a broader application base. For enterprises that require several different types of applications to be accelerated across the WAN, data reduction is an extremely valuable and cost effective tool, which merits serious consideration.
Dr. Hughes founded Silver Peak Systems in 2004 and previously held senior architect positions with Cisco Systems, Stratacom, Blueleaf and Nortel. Dr. Hughes has a PhD in packet network optimization.
This was first published in April 2006