<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr">Hi,<div><br></div><div>I'm another researcher that uses quite a bit of the historical data held in these services, and I appreciate the commitment to keeping this data available where possible.</div><div><br></div><div>In <a href="https://labs.ripe.net/author/kistel/ripe-ncc-measurement-data-retention-principles/" target="_blank">the Labs article</a>, there's a statement that: "For the RIPEstat use-case, we make the data available in a variety of ways which takes up about 800 TB of storage space."</div><div>This reads to me as if there's a lot of (potentially unnecessary?) data duplication. I think proposal 2 therefore sounds sensible - I would imagine that it's possible to reconstruct some of or all of the formats served, so for older data would producing some of these on-the-fly/converting formats be feasible? Is there a way to get a breakdown of what data forms you're using are most storage-intensive, or which parts of services like RIPEstat are using the most storage?</div><div><br></div><div>I'm imagining that there probably aren't that many use-cases where getting instant access to historic data is needed, so making accessing older data slower/tiered (and hence cheaper) doesn't seem like a problem, but I'm looking at it very much from a research perspective so I could be way off the mark on that.</div></div><br clear="all"><div><div dir="ltr" class="gmail_signature"><div dir="ltr">Kind regards,<div>Josh</div></div></div></div></div></div></div>