PI vs PA Address Space
Sean Doran smd at cesium.clock.org
Thu May 18 18:08:19 CEST 1995
| Stronger hierarchy leads to: | - strong regulation of ISPs | - hinders competition | - no incentive to solve difficult routing problems | - leads to governmental regulation and control Let's revisit the economics of the global Internet. You pay for three things, two of which are real products and one of which is an elasticity factor: 1/ delivery of packets into the global Internet 2/ receipt of packets from the global Internet (reachability) 3/ warm fuzzies ("they know what they're doing; they are responsive to my needs") Item (1) is what you get when your immediate service provider turns up your circuit and you say ip route 0.0.0.0 0.0.0.0 Serial0 on your router. The rate at which you can deliver packets into the Internet is the minimum of the sum of egress bandwidths from your local small-i internet, any choke points in the path to egress points, or the width of your circuit. For example, in the simple case, if you have an E1 and your service provider has a 512kbps circuit to AlterNet, your maximum delivery rate of traffic into the global Internet is 512kbps plus any local connectivity. The pricing for item (1) is typically the cost of the physical connection to you plus some value which reflects the effect your bandwidth utilization is likely to have on choke points plus a percentage. Item (2) is what you get when your immediate service provider has arrangements in place to have their customers' prefixes carried and made reachable nearly ubiquitously. ("Nearly" covers firewalls and networks with policy constraints which are enforced via routing mechanisms). Until fairly recently, the guarantee of even nearly ubiquitous reachability was impossible to make thanks to the way the AUP was enforced. However, once you had the NSFNET backbone service carrying your routing information, you generally nearly ubiquitous routing, thanks to the fact that practically everyone defaulted to AS 690. Then along comes Change. The first two huge changes were the CIX and MAE-EAST, two enormous steps away from the model of AS 690 as the network to which you simply defaulted. Suddenly rather than having PSI aggregated behind AS 690, AlterNet started hearing all their routes directly, and preferring those. Generally speaking, the MAE-EAST participants started on a path wherein they preferred any announcement over anything heard from AS 690, which often enough was left as a default. Over time, some of the MAE-EAST participants stopped defaulting to ANS, partly because the amount of routing information reachable only from ANS grew smaller, and partly because in several ways it's easier to manage full routing for recovery and optimization than it is to manage partial routing plus a default. Eventually routers stopped being able to handle full routing in 16Mb of memory, and suddenly the very real cost of carrying routing information around became clear to a number of providers: how much did replacing a bunch of mostly-AGS+ routers with 64Mb Cisco 7000-series routers cost? This was one of the big pushes behind serious deployment of CIDR. CIDR's principal goal was to keep routing tables small by hiding detail, that is, by aggregating into bigger blocks. (Its secondary goal, full classlessness, is being played with as folks start experimenting with interdomain routing of subnets of classful networks). Originally the need to keep routing tables small was to prevent routers which had not been converted to 64Mb boxes, and which could not get by without knowing large amounts of routing information, from running out of memory and crashing. Recently we have started noticing that, while memory consumption is still a real issue for a number of people in the world, those people with 64Mb boxes are starting to notice that the amount of CPU used by carrying full routing is increasing, especialy as interdomain convergence time is decreasing to the point where an update is seen by most Ciscos in the U.S. in a matter of a few seconds. In normal operation, with the normal background noise of a few flaps per second (largely attributable to flakey network connections and people doing dynamic routing updates for dialup users, and some level of longer-term transitions), most routers talking BGP hardly notice any CPU hit at all. Even those routers doing siginificant amounts of as-path and prefix-based filtering for various reasons (mostly involving backup arrangements and making sure bad things don't happen (giving or receiving accidental transit, not accepting or propagating certain bad prefixes (like not accepting an announcement for one's own backbone network from external peers), and so forth)) are borderline. A couple such boxes spend a constant 30-45% of their CPU handling BGP, others run at a constant 20% handling BGP. When a big transition happens, such as when someone at MCI or Sprint types clear ip bgp * at MAE-EAST+, several routers all over the world jump from less than 10% to 100% CPU utilization for on the order of ten minutes. As the number of prefixes increases -- and routing flap -- both the amount of CPU spent on normal everyday processing and the amount of real time necessary to handle a major transition increases. One observation that has been made is that smaller prefixes are liklier to flap than larger prefixes. An analysis of what prefixes were flapping that I did for the last NANOG seemed to indicate (after much discussion with the folks originating the prefixes) that the majority of flaps were caused by /24s used by dialup customers that got introduced into the global routing system upon connection, and removed when the dialup customer hung up. Multiply this by lots of simultaneous dialup customers and you have a problem. The problem is fixable by aggregation. If you aggregate all these /24s (or /28s or whatever) into something bigger, that something bigger is much less likely to flap, and moreover can easily be set up so that it never flaps at all. Nailing down these problems helps considerably, but the amount of CPU used by BGP in increasing numbers of routers is getting scary. Following the line of reasoning -- which seems to hold up in practice -- that on average, smaller prefixes are likelier to flap over time than larger prefixes, one really wants to see a large reduction in the number of smaller prefixes carried globally. That's not to say that local delegations should be big; a dialup user should get as small a chunk of address space as necessary, a dedicated line customer likewise, in an effort to avoid wasting address space, and also in an effort to assist in aggregating lots of individual connections behind a largeish (/18 or shorter) prefix. So, on the theory that pretty much every prefix that's /18 or shorter aggregates enough links and flap-prone things within it, and with the observation that very few prefixes shorter than 18 bits flap in normal circumstances (pace one international connection that was so completely saturated that BGP kept falling over due to keepalive timeouts, which caused traffic to fall off, which allowed BGP to re-establish itself, causing the cycle to repeat -- this got fixed), several NSPs started talking about how to go about reducing the number of prefixes longer than /24 with global scope to essentially zero. That is, while you can have a /24, /28 or /32 now or in the future, and while it can have local scope within a small-i internet (even one that's a big chunk of the big-I global Internet), right now nothing longer than /24 will have global scope at all, and ***in future blocks***, by default, nothing longer than /18 or /19 (it's /18 now, but it's not entirely inflexible, and dialogues continue) will have global scope. (I note *** in future blocks *** because people get really terrified that their current /24 will become useless Real Soon Now. That is not the plan, and likely won't be necessary any time soon, _especially_ if future allocations can be done right. Things are trending in the right direction.) "Local scope" could be as small as your immediate provider, or that provider's provider, or even a largeish NSP. However, if it's not aggregatable into a larger block, it won't work for interdomain routing among several size-large NSPs. Again, the general idea is to keep interdomain routing working in such a way that it doesn't make moving packets impossible. Which returns us to point #2. Arranging global reachability for a prefix is nontrivial; lots of things happen in the background at all levels in order to make global routing work. You pay your provider to pay their provider to pay their provider etc. to work out the hard problems so that a single piece of email, or an RADB object update or an addition to a configuration in a router or a phone call is all that's necessary for you to announce a new network out to the world. There's a problem though, and that is the cost of making some prefixes reachable is much greater than others. In fact, the cost of making everyone's nonaggregatable /28, /29, ... /32 reachable globally is so great that it is easier to say it simply cannot happen, in large part because the cost includes designing, building and deploying new router technology in several NSPs and ISPs, so that the routers of the world can actually handle enormous numbers of prefixes, especially when someone types clear ip bgp * at a large exchange point. Finally, (3). It's clear that people have different needs and wants and requirements from their service providers. Generally speaking, the bulk of Sprint's customers want the global Internet to work, because their users want sex-on-demand with people in Finland and to go poking around Brandy's Babes' home pages or www.plaything.com, or whatever it is that users do. The bulk of Sprint's customers are pretty clever and realize that while there are alot of things that look really really ugly, even or especially from their perspective, they really are necessary in order to keep the global Internet working. Among the things we do realize is that yes, there are side effects to proxy aggregating a size-large service-provider's non-aggregated CIDR blocks, and yes there are side-effects involved in pushing for renumbering into large aggregatable blocks, and yes there are side-effects to putting up filters that block prefixes longer than 24 bits, and yes there are side effects to rewriting our old policy of, "we talk BGP with you if you're a reseller period" to "we prefer not to talk BGP at all, unless there is a strong technical reason to do so". However, in all these cases the position we take is these ugly things (and yes, a whole bunch of much less ugly things) are necessary in order for the global Internet to work, and in order for us to offer you a level of service such that your customers or corporation or whatever doesn't scream bloody murder at you because things Just Don't Work because some router somewhere just keeled over because it was asked to do too much. Moreover, it's not just Sprint taking this line with their customers -- others do too, and give their customers the warm fuzziness that their customers are willing to pay for. So, in the final analysis, what we're pushing for does not reduce competition in an economic sense, although it does have side-effects. There is plenty of room in the current marketplace for all sorts of competition, and even more room for specializaton and cooperative deals, which is normal for a growth market of this magnitude. Lastly, the people most affected by the side-effects of keeping Sprint's part of the Internet up and running and connecting more than sixty countries and four hundred IP resellers are Sprint's customers and their customers. Given how little we directly compete with our customers anyway, while they are right to wish there were some other way (so does Sprint!), I think they also realize that the last thing we are trying to do is put them out of business or make it difficult for them to compete. Healthy customers makes for healthy revenues. And a healthy Internet makes for healthy customers. That's all. Sean.
[ lir-wg Archives ]