<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Hi,</p>
<p>I assume you're referring to the daily dumps that we release
here:<br>
<a class="moz-txt-link-freetext" href="https://data-store.ripe.net/datasets/atlas-daily-dumps/">https://data-store.ripe.net/datasets/atlas-daily-dumps/</a><br>
</p>
<p>There are a couple of things that I find are relatively slow to
deal with on the command line: standard bzip2 tooling, and jq for
json parsing. So I lean on a couple of other tools to speed things
up for me:</p>
<p>- the lbzip2 suite parallelises parts of the compress/decompress
pipeline<br>
- GNU parallel can split data in a pipe onto one process per core<br>
</p>
<p>So, for example, on my laptop I can reasonably quickly pull out
all of the traceroutes my own probe ran:<br>
lbzcat traceroute-2018-07-23T0700.bz2 | parallel -q --pipe jq '. |
select(.prb_id == 14277)'<br>
<br>
Stéphane has written about using jq to parse Atlas results on
labs.ripe.net also:
<a class="moz-txt-link-freetext" href="https://labs.ripe.net/Members/stephane_bortzmeyer/processing-ripe-atlas-results-with-jq">https://labs.ripe.net/Members/stephane_bortzmeyer/processing-ripe-atlas-results-with-jq</a></p>
<p>Happy to hear from others what tools they use for data
processing!</p>
<p>Cheers,</p>
<p>S.</p>
<p><br>
</p>
<br>
<div class="moz-cite-prefix">On 21/07/2018 19:09, BELLAFKIH hayat
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAL6eEe2q78O9LQWUoHOvvKMZ3L=X23Nqqg5bO8jseZMhmjP37g@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<div dir="ltr">
<div><font size="2"><span
style="font-family:arial,helvetica,sans-serif">Dear RIPE
Atlas users,</span></font></div>
<div><font size="2"><span
style="font-family:arial,helvetica,sans-serif"><br>
</span></font></div>
<font size="2"><span
style="font-family:arial,helvetica,sans-serif">I am studying
the processing of the data collected by the probes as a Big
Data problem. For instance, one hour of <span
class="gmail-un">traceroute</span> data count for 500 Mo
(bzip2), so 7 Go of data in text format. Can you share with
me how you deal with these data in practice.<br>
</span></font>
<div><font size="2"><span
style="font-family:arial,helvetica,sans-serif">are you
using a super machine, Big Data tools?</span></font></div>
<div><font size="2"><span
style="font-family:arial,helvetica,sans-serif"><br>
</span></font></div>
<div>
<div style="color:rgb(0,0,0)" class="gmail_default"><font
size="2"><span
style="font-family:arial,helvetica,sans-serif">best
regards,</span></font></div>
<div style="color:rgb(0,0,0)" class="gmail_default"><font
size="2"><span
style="font-family:arial,helvetica,sans-serif">Hayat</span></font></div>
<br>
</div>
<div class="gmail_default" style="font-family:times new
roman,serif;font-size:large;color:#000000"><br>
</div>
</div>
</blockquote>
<br>
</body>
</html>