Even though wide area network (WAN) managers can't control the Internet, their CIOs will blame them when cloud-based applications or infrastructure suffer significant performance lags. Like it or not, WAN managers
Before headlining two panels at Cloud Connect in Silicon Valley, Riverbed Technology's new cloud guru and former Amazon cloud pro, Steve Riley, spoke with SearchEnterpriseWAN.com about what IT managers should expect and demand from cloud providers in terms of visibility, service-level agreements (SLAs) and cloud performance.
Riley, technical leader in the office of the CTO, joined Riverbed in January after spending the past year and a half as senior technical program manager in Amazon Web Services. There, he focused on cloud security internally and externally—working with enterprises to secure their cloud infrastructure and building new methods to make Amazon's cloud more secure. Prior to Amazon, he worked at Microsoft for 11 years, specializing in security and telecom, and co-authored a book on securing Windows networks.
Network-layer visibility into cloud resources is obviously a big concern for enterprises, especially regarding cloud performance. What can WAN managers do? Will cloud providers be willing to provide a window into their networks?
Steve Riley: A lot of cloud providers have some basic visibility capabilities right now. Amazon has an optional service you can subscribe to called Cloud Watch that allows you to monitor I/O, CPU … those kinds of things.
Cloud Watch, though, is limited to what's happening in your virtual machine (VM) and your application. What I would like to see the cloud providers do is provide more visibility at the network level. I don't know of any cloud provider that's doing this now, but if I were an enterprise customer, I would like to be able to issue an API call to my cloud provider and say, "OK, how much network throttling is happening into the host where my VM lives? I'm not interested in all of the other customers' traffic—don't tell me what other customers' traffic is because that's a violation of privacy. I just want to know how the provider's network is shaping the traffic that's ultimately arriving at my VMs."
This is, again, something I'd like to see cloud providers do. So if this is something I can place into the minds of [enterprise IT managers] and they can start asking their cloud providers to provide something like this, [the cloud providers] will be a little more inclined to do that. Cloud providers tend to put forth products and services, sit back, watch what happens, wait for the customers to ask for things. If a large enough number of customers say, "Give me some logs, show me my own data coming into your environment," then a customer would be armed with greater information on how they might be able to tune their applications or tune the branch office side of the communications so that anything the cloud provider is doing doesn't interfere with what the branch office or enterprise wants to get done.
You plan to tell IT pros at Cloud Connect to advocate for better SLAs. What do you mean by that? How can you write an SLA for the Internet?
Riley: The SLA would necessarily exclude the Internet portion of that, unless you somehow manage to be a customer at least big enough to peer directly into the ISP of the cloud provider, but even then, you've still got the ISP in the middle. What concerns me moreso about SLAs are two things: For one, with many providers they're very, very difficult to understand. The language used to write an SLA can be challenging to parse. It's not true of all of them, but some of them are. You have to read them very, very carefully. Secondly, the SLAs tend to mostly say, "If we make enough mistakes and you catch us, then we'll give you a little bit of refund on your next bill." Customers aren't interested in refunds. They're interested in some guarantees of availability, so I'd almost like to see providers start competing against each other with the SLAs.
One thing I've heard a number of [enterprises] ask for is some form of indemnity. Here's a customer talking: "OK, I've done everything you've told me. I've got multiple servers running in multiple geographic locations. I've got failover out the wazoo. Yet something happens and I have lost customers because my services were unavailable for X amount of time. That has resulted in a business loss of X millions of dollars. Cloud provider, what are you going to do about that?"
There are a number of up-and-coming firms that are actually offering to outsource indemnity—CloudInsure is one of these…. So, a cloud provider could, if they wanted to, actually offer insurance on the environment the customer deploys. If something goes really haywire and there's a monetary loss, the customer can make a claim. The cloud provider working through an [insurance firm] can transfer that risk over to insurance companies who are willing to underwrite this. This is very fascinating. I've only seen this in the last four or five months, so it's a relatively new idea. I do think it's going to take off though, and my suspicion is cloud providers who are willing to work with indemnity outsourcers are going to find there is a competitive advantage to having done so.
What did your experience at Amazon teach you about the challenges and possibilities for improving cloud performance?
SLAs tend to mostly say, "If we make enough mistakes ... then we'll give you a little bit of refund on your next bill." Customers aren't interested in refunds. They're interested in some guarantees of availability.
Technical Leader in the Office of the CTO, Riverbed Technology
Riley: It's interesting to me, having come from a cloud provider and seeing how there's a bit of a disconnect between the SLAs cloud providers offer versus the expectations the customers come in with. I believe there's a place for appropriately-deployed technology with good processes where higher visibility and cloud-aware monitoring tools can achieve [reliability and visibility].
When you move the cloud, you're quite often reliant on the speed of the ISP that you're using. There's also a trend I've observed over the last couple years—I call it distributed recentralization. People are creating and accessing massive amounts of data. A lot of that data is being recentralized, whether it's a public cloud, a private cloud or even a hybrid cloud. The data is moving to a smaller number of locations, but the number of connections coming in and the amount people wanting to get to that data and the sheer volume of the data itself is just growing exponentially. So, what we're seeing is that [enterprises] are experiencing unpredictable [cloud performance] when they move some workloads from [their] premise onto a cloud because they don't have control over the network links anymore.
Sometimes, the cloud provider itself might have oversubscription issues—depending on who the cloud provider is. It's really difficult to get a handle on how that application is going to behave, and for mission-critical applications, [it] can be a problem if the business can't predict in advance how something's going to behave.
By deploying a Cloud Steelhead in front of all of your cloud instances and using a Steelhead at the branch or even a mobile Steelhead on the client, we can do a lot to lower the unpredictability of the latency and accelerate those links, especially with all of the protocol awareness that is in Steelhead and the ability to reduce the amount of chatter that is in file servers and Web servers and other kinds of protocols. Those are what we're looking to solve with deployment issues.
Service providers have begun to offer WAN optimization services from their clouds, and some WAN managers are interested in buying WAN optimization-as-a-service from cloud providers as a checkbox item. What is Riverbed's take on this?
Riley: There are two ways you can approach WAN optimization—symmetric and asymmetric. Symmetric WAN optimization needs some kind of optimizing device on each end of the connection. Asymmetric uses an optimizer only on the end closest to the data. Riverbed firmly believes in a symmetric approach. That way you can get the absolute best optimization that's possible, so to satisfy the optimization as a service offering does mean that whatever the cloud provider offers there will need to be an equivalent match the customer can deploy in their branch office or in their data center if they're bursting into the cloud or on mobile devices.
In Cloud Steelhead … it's not as easy as going to Amazon and saying, "Start up a Cloud Steelhead instance." You actually log into a portal at Riverbed, indicate that you want a Cloud Steelhead and from there you are directed to Amazon's website. There's some orchestration in the back end that sets that up and configures your cloud servers to interact with that Cloud Steelhead, so it's almost a checkbox except there are two management panes where you do that—one to initiate and the second one to do the ongoing maintenance of that. We're looking at ways that we can simplify that a little bit and make that an easier user experience. We haven't quite determined what that's going to be.
[With an approach that says], "Give me WAN optimization in the cloud and I don't have to do anything on my side—my branch, my data center or my mobile devices," I don't think you're going to get as good of an optimization that way. The branch can't communicate with the cloud and set up an inner channel ahead of time so that both sides can optimize the traffic in both directions—because that's what critical, that we optimize in both directions—you're only optimizing in one direction. That's going to be a poor user experience. Think for a moment about what I said earlier about distributed recentralization. Yes, that content is often being stored in the cloud so if you're retrieving it, that's fine. But how is it initially getting there? If there's no optimization happening on the origination side, then you're not getting the best return on your investment.
Let us know what you think about the story; email: Jessica Scarpati, News Writer.