Most enterprises already understand the complexities of managing multiple service providers. Most also realize that when cloud services are added to the mix, particularly where there are multiple
Cloud services and WAN services have a number of common characteristics. Both have performance, security and management issues; both involve service-level agreements (SLAs); and both introduce risks of finger-pointing when they're integrated with each other and with other IT elements. It's difficult not to address the two in parallel, so it's smart to plan on constructing a matrix to relate cloud and WAN provider features and capabilities.
Questions to construct your WAN/cloud provider matrix
The first question for a given WAN/cloud provider pair in the matrix is whether the cloud and WAN provider offer VPN connectivity, and if so, which of the two is the prime contractor . This should be verified for each combination of cloud/WAN provider, and it's also helpful to note how long the relationship between the two has existed. The longer it has, obviously, the better it is in terms of the pair working out their support and diagnostic processes.
The second question is where their geographic scope is. Look at each service geography in your map of locations and establish whether this particular WAN/cloud pairing supports that location at all, and if so, whether there are both cloud and WAN points of presence in the actual service area or whether backhaul will be required. If WAN providers offer cloud support for an area that is provided elsewhere, the longer connection can impact performance and reliability. This is typically the most complex analysis because it has to be done for each major geography where your company expects to use cloud services.
The third question must cover monitoring and management. Does the WAN-cloud combination provide for service monitoring at their point of connection? Lack of monitoring at such a point means problem determination and fault isolation will be more difficult. Performance and status monitoring are especially important in cloud networking because it's difficult to assess cloud performance at all without understanding how network performance is impacting it.
Solidify SLAs for good WAN/cloud service management
When suitable candidates are picked from the basic matrix, the next step is to review the SLA options. Where the network provider is also the cloud provider, it's possible to get a combined SLA and unified problem management. This means that the SLA can focus on traditional issues like uptime and performance. If you have separate network and cloud providers in any geography, you'll need to focus early SLA discussions on the specific mechanisms the parties agree on for performance standards and availability measurement. Where both parties look for management data and how they interpret the data is absolutely critical, and SLAs have to be based not on subjective user-level performance and availability, but on the conditions at these agreed monitoring points. Otherwise you'll have no chance of pinning down the problems and associating them with a provider. You'll also want to have a very specific problem escalation procedure to define how something that can't be resolved through first-line support contacts will be passed upstream to higher levels of management, both in the providers' organizations and in your own. At the top of this escalation process is the arbitration process you've defined for final dispute resolution.
Most users who successfully manage clouds containing multiple network and cloud service providers report that their support process starts with an internal help desk responsible for supporting the applications and their users.
When a provider set has been selected and SLAs and contracts are signed, the relationships will transition into their operational period. Most users who successfully manage clouds containing multiple network and cloud service providers report that their support process starts with an internal help desk responsible for supporting the applications and their users. This help desk would then filter problems for disposition. In hybrid applications where there is an enterprise component to both network and cloud, it's common to hand off suspected problems first to the internal network support or IT support group and have them perform first-level problem isolation, contacting the providers involved as needed and as provided for in the SLA.
Where there is no internal IT or network component to the cloud, your own help desk support processes will have to interface with the providers. This will require some special training of your personnel and a carefully crafted operations manual to describe how problems are to be isolated and reported. One question to address in these procedures is how to "spread the alarm" in case of a problem. In many multiprovider clouds, issues created in one place may become visible on management interfaces elsewhere, both for the providers and for your own organization. It's important that if this is possible, notification of the problem is promulgated to everywhere it might become visible. If you don't do that, there's a high risk of the same problem launching expensive and disruptive parallel resolution processes.
The final point is the old adage that "there's safety in numbers" doesn't hold true in multiprovider network/cloud configurations. The complexity of these relationships tends to grow at the square of the number of providers involved. The bigger that number, the more critical it may be for you to be able to assign one provider as the integrator for the entire process. Doing this will save a lot of problem-determination grief later and improve overall uptime and user satisfaction.
This was first published in December 2012