This archive is retained to ensure existing URLs remain functional. It will not contain any emails sent to this mailing list after July 1, 2024. For all messages, including those sent before and after this date, please visit the new location of the archive at https://mailman.ripe.net/archives/list/mat-wg@ripe.net/
[mat-wg] RIPE NCC measurement data retention
- Previous message (by thread): [mat-wg] RIPE NCC measurement data retention
- Next message (by thread): [mat-wg] RIPE NCC measurement data retention
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Paul de Weerd
pdeweerd at ripe.net
Thu Dec 21 08:58:38 CET 2023
Thank you Ben, Randy and Joshua for your feedback. The 50 TB (and growing) of storage space needed to hold all of the (compressed) RIS dump and update files for the entire history of the project are (currently) not a big concern. As Ben points out, 50 TB can relatively easily be held in a single machine. These days, three 20 TB disks are sufficient. Obviously, reality is a bit more complex, with redundancy and availability added into the mix. However since this is less problematic, we're not planning any changes in this area at present. Obviously, the 800 TB that was mentioned is a different matter. We store RIS data in a variety of ways for fast access by the RIPEstat front-end servers. We use Apache HBase for this, which uses Apache HDFS as a storage backend, giving us redundant storage. This redundancy comes at a price - by default, HDFS stores its data in triplicate, so the 800 TB of storage used contains just over 250 TB of actual data. Higher replication is possible but will cost even more, and lower is strongly discouraged. Then, for the various widgets / infocards / data endpoints in RIPEstat, the data is transformed and stored in different HBase tables. This does, unfortunately, mean more duplication of data because data is indexed by different aspects in different tables for the specific access pattern. Now these various ways of storing the same data were not a big problem in the past. However, with the growth of RIS and the Internet, the volume of incoming data over time has steadily grown too. Now it has come to a point where we need to start thinking of different ways to make this data available to the community. This is where Robert's RIPE Labs post and his presentation at the meeting in Rome come in. We want to review how we make this data available to you as end users (be it as researchers, as operators or in whatever form that applies) so that we can remain cost effective while giving you the most useful data in a fast and easy-to-access way. So in summary, we're looking at doing exactly what Ben suggests: keep offering the historic dataset as we currently do through the dump and update files, but reviewing how we can reduce cost of the other ways in which we store this data without losing value for our end users. Paul de Weerd RIPE NCC -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: </ripe/mail/archives/mat-wg/attachments/20231221/ab0ac86e/attachment.sig>
- Previous message (by thread): [mat-wg] RIPE NCC measurement data retention
- Next message (by thread): [mat-wg] RIPE NCC measurement data retention
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
[ mat-wg Archives ]