What do router networks and a preschool have in common? A lot more than you think. Read on for the answer.
To the average enterprise, “network” means “router network”. It’s not that there aren’t other things in the network, but that the whole of enterprise networking is about building IP connectivity. We’ve invented a bunch of terms to describe the elements of our IP networks, and it seems like we’re adding new ones every day. As we do, a growing number of enterprises are finding that they don’t know as much about their networks’ operation as they need to; they don’t have “observability”.
That’s a term that’s been defined by so many sources that the definitions are meaningless. Let’s cut through the hype and focus on a term that’s found within many of those definitions, the concept of trace. Trace implies a path, a relationship, and that’s what networks should be all about.
A network is made up of boxes, and network management and monitoring has tended to focus on the behavior of these boxes as an indicator of the state of the network. All boxes A-OK? Network OK. This same view is pervasive in application management; the sum of the state of the pieces equals the state of the whole. What IT ops people found was that this seemingly obvious approach missed the critical point of message flow. You have to trace how work moves through a series of components to understand how an application is working. Same, it turns out, for a network, because a network isn’t a box or even just a collection of boxes, it’s a cooperative.
Now for that opening question. Your network is a bit like a room full of preschoolers because it’s barely controlled disorder. You can tell preschoolers what to do, organize group activities, and so forth, but inside each kid is a little self-gratifying gremlin that can run off and do something unexpected. And guess what? Almost all IP networks are collections of willful gremlins.
Individual routers discover routes to move traffic using adaptive behavior. Every router typically advertises what network destinations it can reach and receives and forwards the advertisements from others to every adjacent router. From this, they pick the “best” route, and if something breaks or gets congested, the routers work out a new topology through something often called convergence. Is that new topology optimum? Think of preschoolers working out their own lesson plan.
Since routes are created from reachability data exchanged with partner devices, it takes time for changes to percolate through their partners and their partners’ partners, and so forth, and for everyone to pick out what’s best. While this is going on, it’s possible to have packets take erratic routes, even to hit a dead end. Then when the process is finished, whether what’s happened yields truly optimum routes is an open question.
How do you know what routes your packets take? There’s an IP command, traceroute, that can tell you, and some router vendors will have packet-tracing tools built into their management systems to help visualize routes within your network. There are also third-party tools from network-monitoring companies that will do the same thing. They’re particularly helpful in multi-vendor networks where a particular vendor’s tool might not work.
The thing to look for in a packet trace is a route that doesn’t seem to have any logic behind it, or one that keeps changing when there’s no visible device or network failure. Either of these conditions may be due to congestion, which can cause packet loss or delay. To figure out what’s happening, you start with the packet trace end-to-end and follow it along, looking for devices or connections that are overloaded or subject to a high error rate.
Don’t expect to get a solid answer from the packet trace alone. It should show where your route seems to be going awry, but remember that every router gets reachability data from neighbors so the fault may lie elsewhere. A complete route map, the output of those specialized tools for packet-trace visualization, is helpful here if you can get trace data from multiple network endpoints at the same time.
In this case, knowledge isn’t power, though, no matter what the old saw says. There’s a difference between just watching a network and running it, just like there’s a difference between watching a football game and calling the plays. Netops is about controlling and not just knowing. The starting point in traffic management is to examine your router policies to see whether you’re picking routes correctly, but sometimes even controlling routing policies won’t get your flows going along the routes you want. If that’s the case, you have a traffic-management issue to address. The best tools to add traffic management capability are MPLS and SDN.
MPLS lets routers build routes by threading an explicit path through routers. SDN eliminates the whole concept of adaptive routing and convergence by having a central controller maintain a global route map that it gives to each SDN switch, and that it updates in response to failures or congestion. If your network consists of a VPN service and a complicated LAN, SDN is likely the better option. If you actually have a complex router network, MPLS is likely the right choice. With either MPLS or SDN, you know where your flows are because you put them there.
There’s also the option of virtual networking, if neither MPLS nor SDN seems to fit your needs. Almost all the major network vendors offer virtual networks that use a second routing layer, and by putting virtual-network routers at critical places you can create explicit routes for your traffic. Some SD-WAN products will also support this. It may also be possible to use policy management to control how routes and route changes are calculated. Virtual networks are especially valuable if you have multiple paths between remote sites or the cloud and data centers. You can use a virtual network to pick the best path, or to divide traffic across multiple options, like a VPN and the internet.
Don’t forget the control dimension of observability. A teen watching siblings play in the mud might be surprised when parents protest, “I thought I told you to watch them!” Well, the teen was doing just that! And that’s the weakness of observability. Make sure you can do something with your new-found flow and route knowledge or your network may still end up behaving like a room full of preschoolers.