RIPE Routing-WG Recommendations on Route Aggregation Philip Smith Rob Evans Mike Hughes Document ID: TBA Date Published: TBA Abstract This document discusses the need for aggregation of prefixes on the Internet today, and recommends good working practices for Internet Service Providers and other Autonomous Networks connected to the Internet. Table of Contents 1. Introduction 2. Background 2.1 The Early Internet 2.2 Today's Internet 3. What is Aggregation 4. The Internet Routing Table 4.1 Deaggregation 4.2 General Deaggregation 4.3 iBGP and eBGP 4.4 Deaggregation to aid Multi-homing 4.5 Legacy Assignments 5. Impacts of the Routing Table size 5.1 Router Memory 5.2 Router Processing Power 5.3 Routing Convergence 5.4 Network Performance 6. Solutions 6.1 The CIDR Report 6.2 Filtering 6.3 The "CIDR Police" 6.4 BGP Features 6.4.1 The NO_EXPORT BGP Community 6.4.2 The NOPEER BGP Community 6.4.3 The AS_PATHLIMIT attribute 7. Recommendations 7.1 Initial Allocations 7.2 Subsequent Allocations 7.3 Multi-homing 7.4 BGP Enhancements 7.5 IP Version 6 8. Conclusion 9. Acknowledgments Y. References Z. Authors 1. Introduction The Internet is made up of autonomous networks (usually called Autonomous Systems or AS) interconnected with each other. These autonomous networks will have over time been assigned or allocated address space for use within their own networks, or networks of their customers. This address space is announced to neighbouring autonomous networks. Depending on the business or contractual arrangements between these neighbouring autonomous networks, this address space may or may not announced to neighbouring autonomous networks. And so on, across the entire Internet. The collection of this address space as announced by the organisational entities making up the Internet is known as the Internet Routing Table. With each AS announcing their address space, and each AS hearing this announcement either directly or indirectly, each end system in the Internet is able to communicate with other end systems, thereby giving the global communications system known as the Internet. 2. Background As documented in the CIDR Report [1] and other similar activities, the size of the Internet Routing Table has been of considerable interest to Internet Service Providers and the vendors of Internet routing equipment since the rapid adoption of the Internet as a communications medium in the early 90s. 2.1 The Early Internet In the early Internet, address space assigned to Autonomous Systems and end-sites fitted into three categories: class A, class B, and class C. The Internet Routing Table contained only these three types of addressing, small organisations receiving a class C, medium sized organisations receiving a class B, and large organisations receiving a class A. This was known as classful address assignment, and the routing system associated with it understood the different classes. A major effort in 1994 saw the Internet start the conversion from using this classful prefix system to using a classless system ([2] and [3]). The motivation for this was the rapid depletion of the classful address space, with biggest pressure being on the address space range being used for class B networks (128.0.0.0 to 191.255.0.0). With the commercialisation of the Internet and prior to the migration to the classless addressing system, organisations which had grown out of their requirements for a single class C network would receive further class C networks, rather than being "upgraded" to a class B network. This was designed to reduce the pressure on the class B address block. The result of this was that many organisations had a large amount of class C address blocks for their use - or a large number of /24 prefixes, using today's terminology. As part of the migration to the classless routing system, the CIDR Report's original motivation was to encourage network operators to merge their contiguous /24 prefixes into a single larger announcement. This activity is called aggregation and will be explained in detail in subsequent sections. The CIDR Report was quite successful in encouraging aggregation. The weekly public e-mail to Operations mailing lists helped point out those ISPs who were making efforts at aggregating their announcements to the Internet. The results of this peer pressure are visible in the early stages of the graphs available on the CIDR Report web-site ([1] and [4]). 2.2 Today's Internet In the classless Internet today, network operators who participate in the Regional Internet Registry (RIR) system will receive allocations from the RIRs (AfriNIC, APNIC, ARIN, LACNIC & RIPE NCC). These allocations will be of the size requested from the RIRs according to the network operators' requirements, and generally will be of a minimum size, for example a /21. Following the introduction of the classless allocation system and classless routing in 1994, network operators would receive an address block from their RIR, and would generally announce this address block to the Internet. This happens in the same way that operators in the early Internet would announce only the class A, class B, or class C they had been assigned. While there is a widely known and unwritten expectation that the address block allocated by the RIR is what would be announced by the network operator to the Internet, the more common practice today seems to be to still announce /24s (the equivalent of the legacy class Cs). The result is that around 60% of the Internet Routing Table consists of /24 prefixes. While some of these are undoubtedly caused by traffic engineering efforts for multi-homing (an activity recognised as a requirement in the industry by the authors of RFC1519 - [3]), the majority can be attributed to ISPs "receiving 32 Class Cs from the RIR" and announcing them as such. (Expected behaviour would be to combine these into a single announcement with a /19 network mask.) 3. What is Aggregation? Aggregation is the activity of introducing several contiguous IP addresses as a single address block into the IP routing system. For example, if an enterprise has received 32 IP addresses from their ISP for numbering the systems on their internal LAN, they would announce these 32 IP addresses to their ISP as a single entity. Each device on the LAN can be represented by the address block, rather than their presence having to be uniquely indicated to the rest of the world. For example, if the enterprise receives contiguous addresses from 192.0.2.0 to 192.0.2.31, they would announce this to their ISP as 192.0.2.0/27. This format says that the first 27 bits of the IP address is the network portion, with the final 5 bits being the host portion. Likewise, on a larger scale, if an ISP has received 4096 IP addresses from the RIR, for example 10.201.48.0 to 10.201.63.255, they would announce these IP addresses as a single address block to their neighbouring networks, so as 10.201.48.0/20. Both these examples describe what is known as aggregation. The end network has combined contiguous addresses into a single entity, and this single entity is announced to neighbouring Autonomous Systems. While these two examples are relatively small scale examples, they indicate the activities of ISPs who participate in the Internet - individual IP addresses are combined into the largest feasible chunk before they are announced to the Internet. 4. The Internet Routing Table There are but a few contributory factors to the size of the Internet Routing Table. These are analysed in turn. 4.1 What is Deaggregation? The RIRs allocate address space to ISPs in blocks, with the expectation that these blocks are announced to the Internet unaltered. It should be noted that the RIRs have no rules about how this address space should be announced to the Internet. The industry considers it improper for the RIRs to tell ISPs how to announce address space; in the same way that libraries won't tell their readers how to read the books it lends. However, many ISPs don't announce their allocations as single blocks as they are expected to, preferring to announce their address space in smaller pieces, even as small as /24s. This activity is known as deaggregation. 4.2 General Deaggregation There seem to be several reasons for this deaggregation. Some providers claim that they have commercial reasons for doing so; some cite routing system security concerns; others claim it reduces bandwidth wasting virus and miscreant activities against their networks. Routing system security is a general concern for many providers around the Internet. There is no universally accepted or used system for ensuring that a provider is entitled to originate any address block. (While the Internet Routing Registry was designed to assist with this, its use has never been made mandatory, and the routing system still works well without it.) The result is that some providers work around their concerns about the relative lack of routing system security by simply announcing the smallest acceptable prefix. This means that no other autonomous system can announce more specific versions of the same prefixes thereby causing a denial of service on the legitimate user of the address space. However, the authors are aware of very few such incidents being recorded, so deaggregation for security reasons seems a somewhat overly unfriendly activity compared with the potential risk. Another claimed reason for deaggregation is the claim that it reduces denial of service attacks and miscreant activities aimed at a service provider network. It is well known that there are many virus and worm ridden systems around the Internet that simply carry out scans of contiguous address blocks (whether routed or not) looking for other systems to infect. This creates a "background noise" of traffic aimed at an address block. In the authors' experience, the announcement of a /16 address block can attract up to 2Mbps of this noise - in developing parts of the Internet, such bandwidth is an expensive proposition for ISPs, so they only announce what they are actually using. As each /24 is consumed by their infrastructure and their customers, they'd announce the /24 to the Internet (and quite often this /24 is not sequential to any previous internal assignments). Even when the original /19 allocation was entirely used, the ISP makes no effort to aggregate it (in their eyes nothing is broken, so doesn't require fixing), with the resulting impact on the size of the global Routing Table. It is likely that the blanket "we have commercial reasons for deaggregation" claimed by some providers includes concerns about both of the previous scenarios. It is also quite likely that there are other commercial reasons; one example heard in previous years was that appearing in the top 10 of the CIDR Report was considered a positive reflection on the size and quality of the ISP's business. 4.3 iBGP and eBGP A further reason for deaggregation seems to be a failure to appreciate the difference between BGP as used inside the SP network (iBGP), and BGP as used for inter-domain routing (eBGP). iBGP is intended to carry all the ISP's customer prefixes and local infrastructure prefixes (hosting LANs, etc), as well as prefixes learned from other SP networks. eBGP is intended to announce reachability between domains, and this can simply be achieved by each ISP announcing the address blocks they have been allocated by the RIRs. Quite often ISPs happily leak their iBGP routing information into eBGP, with the resulting impact on the size of the Internet Routing Table. Perhaps the most interesting location where the differences between an ISP's iBGP and eBGP can be examined would be at the University of Oregon Route Views project [5]. This project has views of the Internet Routing Table as seen by many different ISPs around the world. Some ISPs choose to send their eBGP view, others choose to send their iBGP view. The Route Biews project makes no demands on what should or should not be sent. This then provides an interesting insight into aggregation efforts made by ISPs, the extent of iBGP for some of the larger ISPs, and the filtering efforts made by other ISPs to remove iBGP views received from their peers across the Internet routing infrastructure. 4.4 Deaggregation to aid Multi-homing The need to deaggregate as part of traffic engineering activities for networks who multi-home is an oft quoted reason or absolute justification for the size of the Internet Routing Table. It seems that standard multi-homing practice these days is to take any address allocation or assignment, chop it into individual /24s, and announce these /24s out of all external network links. A /24 is chosen for this activity as there is a belief that most ISPs will filter IP prefixes on a /24 boundary (being the size of the legacy class C address), even though there is little evidence to back this belief up, as a cursory glance at the CIDR Report [1] will show. Furthermore, the theory behind announcing an address block only as /24s is that this will somehow make multi-homing work. In the authors' experience this is not the case, as successful traffic engineering and load balancing is only achieved by announcing appropriate sub-prefixes of an allocated address block depending on traffic levels generated by devices occupying these sub-prefixes. The result of this scatter gun approach is a further contribution to the increase in the size of the Internet Routing Table. 4.5 Legacy Assignments Often blamed for the Internet Routing Table size are legacy assignments. These are assignments made by the IANA prior to the establishment of the RIR system. However, the main contributor to the Internet Routing Table from legacy assignments was from the 192/8 block, the first /8 block in the former class C space. After the clean up post-migration to classless routing, the 192/8 address block has remained at contributing around 5500 prefixes to the Internet Routing Table. This compares more than favourably with other /8 blocks which the RIRs use for allocations to ISPs, where completed blocks often contribute 8000 or 9000 prefixes each. Looking into the former class B space (128/8 up to 191/8), there is significant deaggregation in the legacy class B assignments, but more than likely caused by issues discussed in the previous two sections. The same is true for legacy assignments in the former class A space. 5. Impacts of the Routing Table size Why should the size of the Internet Routing Table matter? In the discussion so far, little more than prudence has been mentioned in what should be announced to the Internet. But there are many issues facing Autonomous Systems participating in the Internet today. 5.1 Router Memory Throughout the history of the Internet, routing equipment vendors have specified routing equipment to be sufficient for the networks of the day. In the rapidly growing Internet, this has caused anguish for operators at various stages. A rapidly growing Internet has seen routers with sufficient memory to carry the Full Table one year become obsolete the following year; even with the router being upgraded to maximum memory it still has not sufficient capacity to store the table as it stands. Newer model routers with larger memory are the natural replacements, meaning a very short shelf life of a router compared with other components in the Internet. With the associated upheaval in the service provider networks caused by equipment upgrades. 5.2 Router Processing Power Another resource under pressure is that of the router CPU (control plane). The larger the Internet Routing Table is, the longer it takes the router CPU to process on initial establishment of the BGP session with neighbouring Autonomous Systems, and the more time the router CPU requires to process changes in this Routing Table due to topology changes. Faster CPUs reduce this time, so the network operator is faced with having to upgrade router CPUs, often by fork-lift upgrades (swapping entire chassis), just to keep the same routing performance within their network. A typical scenario was presented in [6] following a request to provide a prediction on the size of routers needed in 5 years time - the shelf life of existing router hardware is reducing with the increasing size of the table and the increasing number of routing information updates being seen on the Internet. 5.3 Routing Convergence Closely related to Router Processing Power is that of Routing Convergence - or when the network has finally figured out the best path to a particular destination. The slower the router CPU, the longer it will take for the network to converge. The larger the Internet Routing Table, the more prefixes and paths the router will have to process, and the longer the network will take to converge. Slow convergence means slow recovery in the event of network failure, and result in a much more customer visible network issue. To speed routing convergence, the size of router CPUs can be increased, again with control plane upgrades, or even entire chassis upgrades. Alternatively, the size of the Internet Routing Table can be better controlled; slower growth delays the requirement to upgrade router CPUs. 5.4 Network Performance The performance of the network is something that network operators don't consider has anything to do with prefix announcements to the Internet. However, it is relatively easy to demonstrate that failure to announce the aggregate makes the overall Internet experience of the end users of the local network somewhat poorer than if the aggregate was announced. A typical and common situation that the authors have encountered is where a network operator announces prefixes in their internal BGP out to the Internet (by external BGP). Customer prefixes are injected into the internal BGP when the link to the customer is active, and then withdrawn when the link to the customer is inactive. The problem with leaking the internal BGP out to the Internet is that these customer prefixes have to be withdrawn from all the routers which carry the full Internet Routing Table across the entire Internet. And this withdrawal does not happen instantly [7] or uniformly, with the attendant problems this causes. When the customer link returns, their prefix is re-injected into their provider's internal BGP, and then further announced out to the Internet. This out-bound announcement again doesn't happen instantly. The result of all this is that the end user sees the Internet as being not immediately available once their link returns. Support calls to their service provider are handled negatively because as far as the service provider is concerned there is nothing wrong. If the service provider had not leaked their internal BGP to the Internet, but instead announced their aggregate and any necessary traffic engineering sub-prefixes, their customer would not have seen the delays in the restoration of usability of their Internet connection. 6. Solutions Various solutions to the problem of the growth of the Internet routing table have been proposed and attempted over recent years. 6.1 The CIDR Report The CIDR Report originally was one technique employed to hold the Internet Routing Growth in check. The idea behind the CIDR Report was to encourage ISPs to aggregate. Its effectiveness was primarily through peer pressure, naming and shaming. It was effective in the early years of the migration from classful to classless Internet, but in recent years, there is some evidence of ISPs using their prominent position in the CIDR Report as positive marketing regarding their status and influence in the Internet! In recent years the CIDR Report has been greatly enhanced over the early reporting tool, with the associated web-site [1] having a user interface to allow network operators to check their aggregation efforts. In a sense, there is little reason for any network operator to be unaware of the cause of their announcements to the Internet Routing Table. 6.2 Filtering Another technique employed is the filtering on the RIR minimum allocation sizes per address block allocated by the IANA to the RIRs. For example, if an RIR's smallest allocation from a particular /8 block was a /21, the network operators would filter routing announcements received from external networks such that prefixes smaller than the /21 would be rejected. Effects of general prefix filtering can be seen on the CIDR Report web-site [4], which has views of the full Internet Routing Table as seen from many different Autonomous Systems. It's not clear how many network operators employ filtering based on RIR minimum allocation sizes, nor is it clear if such filtering is completely useful in achieving its intention of limiting routing table size. Very clearly, if a network operator has received the minimum allocation from their RIR, their ability to traffic engineer for multi-homing is somewhat restricted - they cannot subdivide their address block for traffic engineering purposes if their upstream provider filters on the RIR minimum allocation boundaries. 6.3 The "CIDR Police" In the late 90s and early 00s, a small group of volunteers analysed the various routing table announcement reports, and gave their time freely to work with ISPs who were announcing more prefixes than they apparently needed to. They would look for ISPs announcing contiguous /24 prefixes, and suggest that merging these into a single larger announcement would be beneficial to the Internet Routing Table. This met with varying levels of success, ranging from cooperation and appreciation to hostility and even abuse from the network operators concerned. With the Internet "bust" in 2001, this activity was deprioritised as the organisations who employed these volunteers had less time and patience for them to carry out their community work. While recent years have seen some discussion about restarting the "CIDR Police" effort, nothing has yet materialised at the time of writing. 6.4 BGP Features The Internet community (mainly the ISPs and the equipment vendors) have also worked to add features within BGP to assist with aggregation efforts. These are discussed in turn. 6.4.1 The NO_EXPORT BGP Community The first aid for multi-homing and prefix aggregation came in the form of the NO_EXPORT BGP Community, described as part of the BGP Community Attribute specification [8]. A prefix tagged with this community would not be advertised by one eBGP speaker to another. The idea is that a service provider would leak sub-prefixes to their upstream or peer provider to aid with traffic engineering; but tag these sub-prefixes with the "no-export" community to indicate to their upstream that these sub-prefixes should not be announced to any other autonomous system. Many providers use this community for traffic engineering purposes, but the usage is perhaps not as widespread as it could be. 6.4.2 The NOPEER BGP Community The next aid to assist with the traffic engineering and aggregation quandary was the NOPEER BGP Community. This was introduced relatively recently [9], but has had no support from the equipment vendors (no known implementations), and apparently little demand from any Internet Service Providers. The idea here is that a service provider who wishes to deaggregate to support traffic engineering for multi-homing would tag such "traffic engineering" sub-prefixes with the NOPEER community. Upstream ISPs would then propagate or discard these prefixes depending on whether the eBGP relationship would be deemed as a peer or not. Internet Service Providers generally have three types of relationships with other providers in the Internet: upstream, bi-lateral peer, or customer. Support of the NOPEER community would be provided by the providers indicating in their router configurations whether the BGP peering was with an upstream, bi-lateral peer, or customer. Upstream and customer BGP peerings would see the NOPEER tagged prefixes being propagated on the peering, whereas bi-lateral peerings would see the NOPEER tagged prefixes being discarded. This allows the edge provider attempting to carry out traffic engineering do so all the way to the "Internet core", but not see the "Internet core" having to carry the sub-prefixes being required for this traffic engineering. There are estimated to be around 10 service providers at the core of the Internet who have a no-fee peering relationship with each other [10], and with the bulk of the Internet's ASNs appearing at the edge rather than the transit core, the impact of the edge providers using the NOPEER community with attendant support in the transit core could be quite significant on the size of the routing table as seen at the "Internet core". 6.4.3 The AS_PATHLIMIT attribute The latest contribution to assist with traffic engineering needs for multi-homing ISPs has been the proposal to introduce an AS_PATHLIMIT attribute [11]. The idea here is to restrict the propagation of a prefix to a particular AS radius, as determined by the value of the AS_PATHLIMIT attribute. The attribute contains the maximum number of ASNs that can appear in the AS path (as well as the ASN which introduced the attribute). Each AS would compare the value of the attribute with the number of ASNs in the AS_PATH. If the number of ASNs in the AS_PATH is greater than the value of the AS_PATHLIMIT, the prefix would not be processed internally in the network, or propagated to eBGP peers. This would allow service providers to do localised traffic engineering with out other providers at more distant points in the Internet having to see those specific traffic engineering prefixes. 7. Recommendations The latter part of this document describes the RIPE Routing Working Group recommendations for making routing announcements to the Internet. It is hoped and expected that all network operators will follow these recommendations so that the growth of the Internet Routing Table is kept in check, and will only be as much as is absolutely essential. 7.1 Initial Allocations When the network operator receives an IP address allocation from the RIR or an assignment from their upstream ISP, the expectation of the entire Internet community is that these IP addresses are combined into the largest feasible block and announced as such to the rest of the Internet. For example, if a network operator receives a /21 from their RIR, they should configure BGP to announce only this /21 to neighbouring Autonomous Systems. 7.2 Subsequent Allocations The RIRs will, whenever possible, attempt to make subsequent allocations to their LIR members which are contiguous with previous allocations. When the network operator receives a new allocation or assignment which is the same size as the original allocation and is contiguous with it, they should combine the two address block and announce them as an aggregate. For example, if a network operator was originally allocated a /21, and now receives the neighbouring /21, if the two /21s fall on the correct bit boundary, they can be combined into a /20. They should then announce this /20 to their neighbouring ASNs, and remove the announcement of the original /21. If the subsequent allocation is not contiguous, or is contiguous but falls foul of bit boundaries, or is of a different size to the previous allocation, then there is no aggregation which can be carried out and the network operator should announce the two address blocks separately. 7.3 Multi-homing If the network operator has a multi-homed network, they will have a requirement to subdivide their address block (or blocks) to aid traffic engineering. If the operator needs to do this, they must still announce their address block, as without this announcement they will get no backup should the alternative link or links fail. The subdivision of this address block should be done prudently - tutorials have been presented at various network operations fora over the last few years explaining how this could be done, achieving maximum traffic engineering effect but with out harshly impacting the Internet Routing Table [12]. 7.4 BGP Enhancements The various BGP enhancements described in Section 6.4 should also be considered where appropriate, and where supported by the router vendors. Most of the ISPs outside the transit core will find a use for the NO_EXPORT and NOPEER BGP Communities, as well as the new AS_PATHLIMIT BGP Attribute, allowing them to limit the number of traffic engineering related sub-prefixes being propagated across the Internet. 7.5 IP version 6 While these recommendations have focused entirely on the IPv4 Internet, they are equally applicable to the use of IPv6. Participation in the IPv6 Internet is no different from participation in the IPv4 Internet, and the expectations on networks or Autonomous Systems are exactly the same in both cases. 8. Conclusion Aggregation is a necessary activity for network operators participating in today's Internet. What was taken for granted following the migration to the classless Internet in the early 90s no longer seems to be a standard activity for most network operators. The result is rampant growth of the Internet Routing Table, causing issues which impact every participant in the Internet. BGP analysis activities such as those at [13] show the potential savings on the size of the Internet Routing Table - and these savings are potentially significant, anything from 30% to 50% depending on the measurements made and the view of the Internet Routing Table being used. 9. Acknowledgments Thanks to the LINX membership for the original idea (LINX's experiment with a Routing Aggregation Policy [14]) and to Mike Hughes for the initial text which formed the basis of this document. Y. References [1] The CIDR Report. Original Idea: Tony Bates. Maintained by: Geoff Huston. http://www.cidr-report.org [2] RFC1518 - An Architecture for IP Address Allocation with CIDR Tony Li and Yakov Rekhter http://www.rfc-editor.org/rfcs/rfc1518.txt [3] RFC1519 - Classless Inter-Domain Routing (CIDR): an Address Assignment and Aggregation Strategy Vince Fuller, Tony Li, Jessica Yu and Kannan Varadhan http://www.rfc-editor.org/rfcs/rfc1519.txt [4] Geoff Huston Routing Table Status Report APRICOT 2005/APNIC 19 http://www.apnic.net/meetings/19/docs/sigs/routing/routing-pres-huston-routing-table.pdf [5] The Route Views Project University of Oregon http://www.routeviews.org [6] Geoff Huston Routing Update APRICOT 2006/APNIC 21 http://www.apnic.net/meetings/21/docs/sigs/routing/routing-pres-huston-routing-update.pdf [7] Craig Labovitz, Abha Ahuja, Abhijit Bose, Farnam Jihanian Delayed Internet Routing Convergence Sigcomm 2000 http://www.acm.org/sigs/sigcomm/sigcomm2000/conf/paper/sigcomm2000-5-2.pdf [8] RFC1997 - BGP Communities Attribute Ravi Chandra and Paul Traina http://www.rfc-editor.org/rfcs/rfc1997.txt [9] RFC3765 - NOPEER Community for Border Gateway Protocol (BGP) Route Scope Control. Geoff Huston http://www.rfc-editor.org/rfcs/rfc3765.txt [10] Vijay Kuhmar Adhikari, Gaurab Raj Upadhaya and Bill Woodcock AS Path Analysis SANOG 8 http://www.sanog.org/sanog8/presentations/sanog8-aspath-analysis-vijay.pdf [11] AS-PATHLIMIT Attribute Joe Abley, Tony Li, Rex Fernando http://www.ietf.org/internet-drafts/draft-ietf-idr-as-pathlimit-02.txt [12] Philip Smith BGP Multi-homing Techniques NANOG 35 http://www.nanog.org/mtg-0510/pdf/smith.pdf [13] Aggregation Potential http://bgp.potaroo.net/as4637/ [14] The Routing Aggregation Policy - A Failed Social Experiment at the LINX Nigel Titley APNIC 21 http://www.apnic.net/meetings/21/docs/sigs/ix/ix-pres-titley-aggregation.pdf Z. Authors The authors can be contacted as follows: Philip Smith Rob Evans Mike Hughes