A good deal of attention is being paid these days to the analysis of NetFlow information exported from routers...
and switches. And rightly so: by revealing details of the network's application "conversations," NetFlow provides a powerful information base for congestion monitoring, troubleshooting, capacity planning and even security analysis.
But despite its rich data, NetFlow analysis today is limited because it looks at the network in discrete pieces, presenting separate views of each link's data. This helps in understanding issues closely related to a particular link, but sheds no light on traffic dynamics across multiple links – such as changes that affect a flow's end-to-end path – and analyzing flow data across all network links remains impractical.
Network-wide traffic analysis is difficult for two reasons. First is the collection challenge: since NetFlow records are collected on a per-interface basis, getting a network-wide view would require turning on NetFlow at most, if not all, network interfaces. Large networks have hundreds, even thousands, of interfaces. Even with ever-cheaper processors and storage, this means collecting, backing up and analyzing massive amounts of data.
The second reason is that the dynamic nature of routing, which continually changes the set of links comprising a flow's path, has historically made it impossible to maintain an up-to-the-minute Layer 3 routing map. The implications can be illustrated by an automotive traffic analogy. A traffic cop at a street corner notices that vehicles have slowed to a crawl. Looking down the two intersecting streets, he might see that a stalled car is causing the congestion; clearing the car from the road solves the problem. But if the backup is caused by something he can't see from his limited vantage point – an accident two exits down the freeway, or a ball game letting out across town – he can only guess at the cause. The policeman could stop each car, ask where it came from and where it's going, and have other officers at other street corners do the same. He might then be able to make a more educated guess at what caused the congestion. But even if he could approximate a map of the road system in his head and try to mentally process the clues gathered from continuous driver interviews, the answer would likely still be elusive. Furthermore, long before any useful analysis could be completed, the traffic jam would probably have cleared up on its own.
Network engineers find themselves in just such a predicament when troubleshooting many application slowdowns, even with rich NetFlow data to tell them traffic volume, source and destination, etc. Current NetFlow analysis only reaches the low-hanging fruit: link congestion resulting from obvious causes like violations of company policies, best practices or common sense (e.g., the nightly backup mistakenly blasting away at 9:00 a.m. or the compromised computer mounting a DoS attack). But it does nothing to address the thornier problems: intermittent slowdowns, hidden router misconfigurations, trouble tickets closed with "cause not determined," "illogical" outages that cause hours of unproductive scrambling while downtime burns up the bottom line. In many such cases, network engineers must examine data-set after data-set in tedious sequence, having a fuzzy "guesstimate" of the network's current topology in mind as their only guide to searching for clues.
A new technology called "route-flow fusion" – which combines NetFlow analysis with route analytics – may bring much-needed relief. Route analytics solutions record all routing protocol updates and create a "live model" of the routing topology that is an accurate representation of the actual network. This "live model" allows engineers to analyze network-wide routing and topology, including a complete historical record of all routing changes over time. They gain the global visibility they previously lacked in their efforts to understand the network's behavior. In the automotive traffic analogy, route analytics would be a satellite in the sky that sees the condition of every road and bridge (including which bridges are out) and knows the exact route each motorist takes from starting point to destination.
Using route analytics as its foundation, route-flow fusion takes a quantum leap in NetFlow analysis, reducing the number of required NetFlow-monitoring points to a small set of routers that represent the major sources of network traffic (e.g., data centers, Internet peerings) and accurately mapping every traffic flow across all links in their path from source to destination. The continuous recording and correlation of both NetFlow data and routing events means that route-flow fusion provides real-time and historically accurate views of network-wide routing and traffic conditions – including the impact of any routing change on traffic loads across all links.
The implications for network management best practices are profound. Since only a few collection points are required, network-wide traffic visibility with accurate flow mapping becomes a practical reality. The ability to analyze a complete forensic history of all routing and traffic flows eliminates the troubleshooting "guessing game" commonly played in large networks today. An accurate understanding of the network's actual routing and traffic state gives engineers a powerful tool for modeling planned changes (e.g., adding or dropping routers, LAN/WAN links, Internet peerings; deploying new applications), analyzing failure scenarios, planning network-wide capacity requirements or optimizing traffic routes. When combined with sound change-control processes, this real-time modeling capability can drastically reduce costly and time-consuming network configuration errors.
Alex Henthorn-Iwane is Senior Director of Marketing at Packet Design, based in Palo Alto, California.