A WAN manager's role in storage
The first reason for this is stated earlier in part 1 of this article, Data deduplication technology in enterprise wide area networks: When a WAN optimization device is already in place, it has probably already seen the data segments prior to a back-end data protection process. For example, a file is created in the primary office and primary site's storage. Then the file is emailed to others in the organization at remote offices. The WAN deduplication device at the remote site stores that in cache since it is the first time the file has been seen by the unit. Then the remote user saves the file to the home directory at the primary site. Since the cache on both sides already has the file, the save appears almost instant to the user. If the user then modifies a few sections of the file and saves it with a different file name to the home directory again, just those changes, not the whole file, go across the WAN back to the primary office.
When the time comes to replicate primary storage to this remote site for disaster recovery (DR), much of the data is in the cache of the WAN optimization devices because of prior activity, and the result is that a much smaller set of data needs to be moved across the WAN. A final step may be the backup processes execution. Even if there is backup deduplication in place, it will experience the additional performance benefits because the cache is preloaded with much of the data.
The second reason is that unlike other deduplication processes, WAN deduplication examines data at a much more granular level. Typically, backup deduplication as an example does its examination in 4 KB chunks, where a WAN deduplication device does so at the sub-byte level. Even though both WAN and other deduplication products have the ability to go this granular, the WAN device has the luxury of a little more time to take that step. The reason for this relates to the available bandwidth. The smaller the granularity, the more time must be spent examining each segment. The time spent can't exceed the latency of the bandwidth segment. In backup deduplication, that bandwidth segment is often many times that of the WAN segment, so smaller chunks of data would actually slow down the process. In the WAN, since the bandwidth is more constrained, more time and granularity can be spent identifying redundant data segments.What a WAN manager should know about storage
The WAN manager should be prepared to address additional bandwidth requests from the storage team with WAN deduplication optimization. Because of its greater opportunity to see data throughout the data lifecycle, the WAN device can provide a broader value to the organization. The storage team, though, will still benefit from the ability to replicate more data more often and to have the remote sites in closer sync with the primary site in the event of a disaster. The value is an investment that pays off for the entire organization, including the original storage request.
About the author:
George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for data centers across the U.S., he has seen the birth of such technologies as RAID, NAS and SAN. Prior to founding Storage Switzerland, George was chief technology officer at one of the nation's largest storage integrators, where he was in charge of technology testing, integration and product selection. Find Storage Switzerland's disclosure statement here.
This was first published in April 2010