To date, the industry has not directly approached the question of an ROI for QoS. It is not even obvious that such an analysis is needed. In the face of hype and political spin (if you are still confused about "net neutrality,"
What is Quality of Service?
To start, let us answer with what it is not:
- It is not a means of increasing throughput or capacity.
- It is not a means of improving overall performance.
- It is not a way of fixing broken or under-performing networks.
- It is not a switch that can be thrown to make certain applications work better.
- It is not identical with Quality of Experience (QoE) of the end user.
In contrast, let us consider what it is:
- It is a means of handling packets non-uniformly.
- At worst, it reduces overall capacity of a network path.
- At best, it allows a preferred application to see the network as if it were empty.
- Typically, it is an end-to-end mechanism that must be supported by all devices on the path.
- It can have a significant effect on the QoE of the end user only if the network is the dominant influence on QoE (i.e., the network is the weakest link in the chain).
- It is the best solution when there is limited capacity and no cost-effective alternative.
Overly simplified, QoS can be considered in the highway metaphor, where a preferred service is like the reserved high-occupancy vehicle (HOV) lane on a highway. The lucky car that travels in the HOV lane does not experience the congestion in the other lanes, but it can go only as fast as an empty highway can support. That is, if there are problems with the highway itself that would have slowed a lone car (e.g., icy surfaces, fog, potholes), then the HOV lane won't improve the experience or increase top speed.
The primary alternative to QoS is over-provisioning -- providing an excess of capacity relative to average or peak usage. In the highway metaphor, this looks like adding more lanes until each vehicle travels as if it were effectively alone. Again, the experience of each vehicle will be no better than if the highway were completely empty. Over-provisioning is typical for many familiar resources, including computer CPU and disk space, vehicle volume and engine power, seating in an auditorium, highway and street size, parking capacity and so on. They are all assigned in terms of peak capacity, and almost any substantially less-than-peak use tends to work well without need for additional management.
The stroke against over-provisioning is that it has a well-defined price tag. When implementing additional capacity, specific levels of capital expenditure (capex) and ongoing service charges (opex) are involved. These costs, particularly the opex, enter into the decision making and will tend to bias away from this approach when the alternative is simply to "switch on QoS" to achieve the desired results.
The choice to "switch on QoS" is not always well considered, however. For example, there are myriad implicit capex and opex costs associated with the deployment of QoS. The requirements for a successful deployment are not usually laid out clearly. Instead, obstacles encountered in the process of deployment steadily eat into the ROI.
To successfully deploy QoS, the implementation must be:
- end-to-end (apart from special cases such as satellite links or wireless)
- robust and reliable
- converged and appropriate to various types of application performance
- supported by agreement across service provider domains
- simple and scalable
In special cases where the use is limited, QoS can be set up and managed effectively. But once it grows beyond the work that a single engineer can handle, it becomes fragile. It takes only one device that does not honor the TOS/DSCP bits to break the end-to-end chain of service. When multiple service provider domains are involved, a high degree of coordination is required, both in terms of business agreements and technical implementation.
Getting QoS right is not all that straightforward. QoS is not a "switch" that is simply thrown. It needs to be specific to the network, the application(s) in use, and the requirements of the users. There is no single turnkey solution. QoS tuning is a high art that requires smart engineers with good feel for their network.
So where do the implicit costs appear in the ROI analysis for QoS?
- Cost of implementation
- Capital investments
- Design effort to address increased complexity
- Discovering the idiosyncrasies of different vendor implementations
- Cost of experimentation/tuning
- Customer churn and/or user productivity impact
- "Black art" requires specialists that are not scalable as resources
- Lab and/or simulation infrastructure for testing
- Limited means to validate implementation
- Difficult to duplicate real-world conditions in the lab
- Cost of maintenance
- Difficult to validate intended performance: leads to user complaints
- Support center has limited diagnostic capability
- Troubleshooting QoS is time consuming
- Truck rolls are expensive
- Service provider overhead
- Maintaining multiple mechanisms for customers
- Developing complex inter-domain contracts
- Reliance on SLA
- Responding to support calls
Framed like this, the apparent cost of QoS can be quite high, depending on the context where it is deployed. As complexity of networks increases, the simplicity and reliability of over-provisioning begins to look more attractive, despite the explicit cost of additional bandwidth. Further, the cost of bandwidth is almost in freefall, as orders of magnitude are being cut from service costs. Optical networks offer almost infinite potential capacity and make good sense even for residential consumers -- providers like Verizon are investing heavily in fiber to the premises. And with copper GigE desktop NICs and per-port switch costs dropping, it is harder to see the imperative for QoS.
Some might suspect that there is more at issue than just the technological considerations….
Needless to say, there certainly are instances where QoS is absolutely essential and the very best design decision. A simple rule of thumb for the appropriate deployment of QoS is:
- On otherwise well-performing networks;
- When applications have specific performance requirements;
- Where capacity is required but scarce;
- Where it is prohibitively expensive to increase capacity.
Anywhere that these four points do not hold, you should be considering an alternative, such as over-provisioning: quick, simple, scalable, reliable and priced right.
Chief Scientist for Apparent Networks, Loki Jorgenson, PhD, has been active in computation, physics and mathematics, scientific visualization, and simulation for over 18 years. Trained in computational physics at Queen's and McGill universities, he has published in areas as diverse as philosophy, graphics, educational technologies, statistical mechanics, logic and number theory. Also, he acts as Adjunct Professor of Mathematics at Simon Fraser University where he co-founded the Center for Experimental and Constructive Mathematics (CECM). He has headed research in numerous academic projects from high-performance computing to digital publishing, working closely with private sector partners and government. At Apparent Networks Inc., Jorgenson leads network research in high performance, wireless, VoIP and other application performance, typically through practical collaboration with academic organizations and other thought leaders such as BCnet, Texas A&M, CANARIE, and Internet2. www.apparentnetworks.com
This was first published in November 2006