Experiences Monitoring Backbone IP Networks
Chuck Fraleigh*, Christophe Diot+, Sue Moon+, Philippe Owezarski+, Dina Papagiannaki#, Fouad Tobagi* * Department of Electrical Engineering, Stanford University
+ Advanced Technology Labs, Sprint
# Department of Computer Science, University College London
Network traffic measurements provide essential data for networking research and operation. Obtaining such data from an operational IP network, however, is not an easy task. The traffic volume on commercial backbone networks ranges from tens of Mbits/sec on OC-3 access links to several Gb/sec on backbone OC-48 and OC-192 links. Furthermore, the network contains hundreds of links, making exhaustive monitoring of the network impractical. Finally, the traffic carried on the network can exhibit strange phenomenon resulting from routing loops, incorrect protocol implementations, and even malicious attacks.
We present our experiences with monitoring traffic on the Sprint IP backbone network. We have installed passive packet monitors on selected OC-3 and OC-12 links at several locations in Sprint's network. Using the DAG OC-3/OC-12 card developed at the University of Waikato, these systems capture the first 44 bytes of each packet that is transmitted on the links. Once the trace collection is completed, the data is transferred back to our lab for offline analysis.
First we describe the system we have developed. This system differs from other packet monitoring systems in three basic aspects. First, it is deployed in a commercial backbone ISP. Most other traffic monitors are installed in either research networks or in access networks. Second, our monitoring systems are deployed at a much larger scale than other passive measurement efforts. We currently have 11 monitoring systems installed at one location in the Sprint network, and we are in the process of installing two additional locations, each with another 10 systems. We hope these will be operational by the time of publication. Third, all of the systems are synchronized using a GPS clock so we are able correlate the traces and measure one way delays.
Managing a monitoring system of this scale is a challenging task. Each monitored link can generate up to 100 GB of data each day. We discuss techniques for transfering the data to the lab, storing the data, and efficiently processing the data on a dedicated 16 node computing cluster used for data analysis. We also present the techniques we use to synchronize the measurements conducted at different points in the network so that we identify individual packets as they flow over multiple links in the network.
Finally we present results that demonstrate the capabilites of our system and provide information which is useful in developing future measurement systems. Two traffic parameters greatly influenced our system design. First was the packet size distribution. Earlier measurement work indicated that the average packet size in the network was around 400 bytes. Our systems record 64 bytes for every packet, so the systems were designed to handle an average data rate of approximately 16% of the total traffic volume. For example, if an OC-3 link were running at 100 Mbit/sec, we would expect to record about 2 MBytes/sec (7.2 GByte/hour) of trace data. However, we find the packet size distribution varies slightly from link to link due to asymmetric traffic patterns. Some links tend to carry many small ACK packets reducing the average packet size, while other links tend to carry large data packets increasing the average packet size. The second traffic parameter that influenced our system design was bursts of minimum size packets. These bursts represent the worst case traffic our system must handle. We present information on how long these bursts may last and what impact this has on our system design. We also present results on the number of flows/sec that are observed in the network, link utilizations, and the delays packets incur as they travel through the backbone.