The central
machine at the RIPE-NCC controls the test-boxes at the ISP's. The data
is collected by the test-boxes and then transferred to the central machine.
It is planned that the data collected at the local machines will be
kept there for a few days, so that it can be viewed and analyzed
locally. The main processing of the data will be done on the
central machine.
Of the order of 1000 ISP's are active in the geographical area serviced
by the RIPE-NCC. Although we certainly do not foresee that all 1000 ISP's
will participate in the project or that test traffic will be sent for
all 10002
possible connections, we do plan to design the software
such that these numbers can be handled.
Figure 3 shows a first version of the Data Flow Diagram (DFD)
for this project.
This diagram will be the starting point for the software design for
the project.
The central machine (or machines) performs a number of tasks, including:
It controls the configuration, which determines which
connections are being tested,
It collects the data from all test-boxes,
It acts as the software repository,
It will be the platform where the software is being
developed,
Finally, this machine will be used for data-analysis.
The main problem for the central machine is the amount of data to be
stored.
The results of each delay measurement will consist of the packet
sent by one test-box to another, plus information added by the
receiving test-box such as the arrival time.
If we assume that this results
in (at most) 100 bytes
of data for each delay measurement, then
measuring at a rate of once a minute will produce
150 kbyte/day or 55 Mbyte/year
of data for each connection.
In addition to that, one also has to store the routing-vector information.
For a stable network, the amount of data for this will be less, as one
can store a routing-vector with a validity range, rather than the
routing vector for each measurement.
So, allowing for short intervals with high rates and a significant
amount of routing-vector information, a safe upper limit of the
amount of data to be stored is 250 kbyte/day or 90 Mbyte/year
of data per connection.
This means that in a setup where
O(100) connections are
being tested, one needs several Gbytes/year of disk space to
store the raw data. If the number of connections that is being
tested goes up by an order of magnitude, one will need a tape-unit
or similar mass storage device.
In the extreme case, where 1000 providers participate in this
project and we test all
10002
possible connections between them,
the data volume will be of the order of several
Tbyte/year.
While this volume is not unmanageable, it will require a tape robot for
storage, a major investment, or a significant
compression of the raw data before it is stored.
A database program will be used to store the data. This database should
be able to handle data volumes up to several Tbyte,
distributed over
several physical volumes, and read the data in a reasonable amount of time.
As the data might be used for several different analysis, it should be
possible to generate Data Summary Tapes (DST's), with a subset of the data.
For presenting the data, we need a graphical analysis tool. This tool
should be able to plot and histogram data, and, of course, have an
interface to the database.
The test-boxes consist of an industrial PC, which can be mounted in a
19 inch rack. Inside the box, there will be a CPU, a disk and a high
precision clock. The disk should be large enough to store several
days worth of data, assuming the 250 kbyte/day/connection,
1 Gbyte should be sufficient.
The box should be hooked up to the border router of the ISP.
The machine will be ``plug and play'', in order to make installation
easy and to avoid tampering with the machine by a local ISP. After
the box has been installed, all maintenance will be done remotely
from the NCC. Only in the case of major hardware failures that cannot
be solved remotely, we will need some support from the local ISP's.
The design of the box will be such that the ISP cannot affect the
performance of the computer or reconfigure his network such that the
results will be affected (``Design the network for the benchmark'').
This will include installing machines at the customers of this
ISP and do consistency checks.
The software on the machine will read a configuration file that specifies
which measurements it has to do, perform the measurements and collect
the data.
The machine load should be small, to avoid that the measurements are
affected by processes competing for CPU-time. This will also put
a limit on the number of connections that can be simultaneously
tested.
As the number of machines will be large, the software should be easy
to maintain:
One script should restart all processes after a reboot or power failure.
The software should survive common errors and restart itself without
operator interference.
An automated procedure, like rdist, will be used
keep the software up to date.
As the machines will be located in the heart of the networks of the
ISP's, they should
be hacker-proof. In order to accomplish this,
the number of accounts on the machines will be limited to the
absolute minimum, all software will run in secure shells and no
un-encrypted passwords will be sent over the net. Finally, all the
software running on the machines will be made available to the
participating ISP's, so that they can convince themselves that the
test-boxes do not introduce any security holes.
All the software written for this project as well as the design of the
test-boxes will be made available to the participating ISP's. They
can use it for measurements on the internal part of their networks.
The clock at the remote machines is the most critical part of the
whole project. Before we start to work on any other part of the
project, we should prove that a clock with sufficient accuracy
can be built.
We aim for a accuracy that is at least 1 order of magnitude
better than the smallest delays that will be measured by the box,
including drifts. In a typical network environment, the
delays will be of the order of 10 ms, this then translates
to a required accuracy of 1 ms or less. The overall error
in a 10 ms delay measurement will then be of the order of 1.4ms.
This is a software protocol [4] that uses timing servers all over
the world. With atomic clocks as references and suitable hardware, accuracies
of down to a few hundred ps
can be obtained. With off-the-shelf
products, the accuracy will be less.
The current list of servers [5] lists primary servers in:
France, Germany, Holland, Italy, Norway, Switzerland, Sweden and the UK,
and secondary servers in: Portugal, Poland and Slovenia.
This approach uses free software, so the costs for this solution are
small, assuming that the internal PC clock in a standard environment
is stable enough for our purposes.
A potential problem is that there are large areas without a nearby
server. Also, the accuracy might be affected by unstable networks.
A test-setup will be used to determine if this solution can provide
a stable clock under these circumstances. In these cases, an
external reference clock has to be used to obtain the necessary
resolution.
This is a system of long-wave radio stations broadcasting the current time.
A receiver can collect these timing signals and provide the local time with
an accuracy of the order of a few ms.
Typical receivers cost of the
order of $ 2500.- in 1992 [6].
There are, however, several problems with this approach:
First of all, timing signals are only available in the UK, Germany and
surrounding areas. This means that this approach can only provide a
clock signal for a small part of the global area covered by the
RIPE-NCC. For the remaining part, one needs another approach.
Then, reference [6] mentions seasonal effects and other
corrections which are needed in order to get the accuracy of a few ms.
This implies that the clocks need constant attention in
order to keep the high accuracy.
Finally, the costs for each clock is much higher then for a
solution that uses GPS receivers (discussed below).
For all these reasons, we will not consider this solution any further.
This approach uses the Global Position System (GPS), a
satellite navigation system developed by
the US Department of Defence (DoD). Receivers collect a timing and
position signal from up to 24 satellites. With the system, the time
can be obtained with accuracies down to
200 µs.
There are several receivers on the market that can be mounted as
an extension card inside a PC. These cards have to be connected
to an antenna and can be read out by the PC. The NTP package [4]
can then be used to synchronize the internal clock to the global time.
The GPS receivers will therefor provide an easy to maintain and reliable clock.
A test showed that an antenna mounted in the window of our offices
was able to pick up the signals from several satellites, which
is sufficient to provide a timing signal. This
antenna has to be located, depending on the type of card and antenna,
within 10 to 100m
of the receiver. This puts some
constraints on the local infrastructure at the ISP's.
A side-effect of this solution is that we create a network of high-precision
clocks (so-called stratum-1 NTP servers) through the area where our
test-boxes are located. These clocks can be used for other purposes.
Also, the GPS receiver will provide its global position with an
accuracy of about 100m.
We believe that GPS receivers combined with the NTP software
will provide a suitable clock signal for all test-boxes. However,
this is the most critical problem in the project and we
should focus on this first.
The funding for the prototypes and initial deployment of the test-boxes
is discussed in [1]. Large scale deployment and maintenance
may require additional funding which will be sought separately.