In the first part of this tip series I discussed the reason why it's important to take network resiliency into account for remote office/branch office environments. I also covered what
network and diagnostic tools are necessary to enable business continuation for WAN services. In part 2, I am discussing wide area network resiliency best practices, training and testing tips.
Wide area network resiliency techniques and best practices
Keep in mind that the performance of networks, servers and storage is contingent upon having availability, and availability requires performance to exist. When investigating a network outage, it's important to ask the following question: Is the lack of availability due to a performance issue or vice versa?
A network manager must design for fault isolation and containment to prevent a component or service failure expanding into a disaster. This means having a redundant active or standby network service or circuit that can be leveraged when needed. If you are not aware of your options, talk with your current and other service providers to explore your options to meet specific needs.
Another tip is to reduce the amount of data that needs to tie up and consume your network bandwidth for file transfers, synchronized copies, downloads, uploads and backup by implementing data footprint reduction (DFR). DFR includes archiving, compression, dedupe, bandwidth optimization, buffering and caching among other techniques.
There can be a tendency to go with cheaper, lower-cost services. However, you will likely need more of them to reach effective performance, bandwidth and latency along with network reliability objectives. If you pay more for a primary or standby service, you will need to ask what you are getting for it both during normal periods as well as when standby services are needed. Determine how quickly your services will be diagnosed, troubleshot and repaired. Specify to your WAN provider whether your needs are for primarily upload transmitting or download receiving, and ask what fees and restrictions may apply to primary or standby services.
More on network resiliency and disaster recovery
A WAN engineer's guide to a network disaster recovery plan
Testing and training for network resiliency
These may seem like the most intuitive, commonsense or obvious items, and perhaps they are being addressed to some degree. However, you must ask yourself to what extent is your network being tested? Are you testing the hardware, software, server operating system, hypervisors (if applicable) and the entire network stack or solution -- from networking services to load balancers to routers to switches? Likewise, is training along with testing compartmentalized or focused on a specific area? Does your team know how it relates to the bigger broader solution stack?
Enabling your resilient WAN
There are many different options to enable a resilient WAN for ROBO (and other) environments; some of these involve new technology or tools (hardware, software and network services). Other approaches involve taking time for training and testing, rethinking how tools and technologies are used. Finally, there is rethinking about what type of WAN situation scenarios you want or need to protect your environment from.
Keep in mind that users do not know the difference between the LAN, MAN or WAN, hardware, software or network services. To them it is simply that the network, system, computer, Internet or cloud are down. It's up to IT to ensure network resiliency.
This was first published in July 2013