[OpenIPMap] Bringing CAIDA's geoloc efforts into the fold
Emile Aben
emile.aben at ripe.net
Tue Jun 10 11:29:08 CEST 2014
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Hi group,
As I mentioned during my RIPE68 presentation there is active
development on infrastructure geolocation going on at CAIDA, and I'd
like to see if and how we can align both efforts. My personal take is
we can and we should.
CAIDA have been developing 3 things:
- - Automatically detecting naming schemes for networks. This is called
DROP, paper is here:
http://www.caida.org/publications/papers/2014/drop/drop.pdf
- - A means of documenting structure in naming schemes (hostpat).
- - A repository for naming schemes (DDec), which includes DROP and the
undns datasets. Beta version available here: http://ddec.caida.org/
Hostpat is documented as part of DDec: http://ddec.caida.org/help.pl ,
and I think it is a good attempt at lowering the complexities
What I'm currently working on is an interface in OpenIPMap that runs
hostnames (that are already in the OpenIPMap system) through DDec, and
imports any resulting hostname->location mapping into OpenIPMap, with
a lower confidence then the user inputs.
I'll first test this with all the hostnames that users put an empty
string for in, meaning they did override the guessing system in
OpenIPMap, but didn't know the correct geolocation either. I have 3k
of these currently (out of a 16k of total crowdsourced entries for
hostnames (go team!)).
This could easily be extended with batch runs of far more hostnames
(all we can find?) and/or doing an 'online' lookup into DDec whenever
information for an hostname is being called for, and storing the
resulting hostname->location mapping local to OpenIPMap.
This would allows for people to document naming schemes in DDec and
have OpenIPMap use the mapping resulting from naming schemes. I think
this is the fastest path forward towards better infrastructure
geolocation.
I'd like us to evaluate if this works for the group. Specifically for
the people who have expressed interest in describing structure in
hostnames using regular expressions (Martin, Daniel, Robert):
Does DDec/hostpat work for you. If not: what not?
I've invited kc and Brad from CAIDA to join this list, so hopefully we
can get some conversation going on what works and what won't.
What I'd personally like to avoid is unnecessary complexity and
duplicate work, so doing things collaboratively would have my *strong*
preference.
cheers,
Emile
PS: Some more stats on user-contributed hostname->loc mappings:
user_id | count
- ---------+-------
22 | 9
20 | 95
17 | 138
16 | 208
11 | 208
10 | 265
15 | 381
12 | 409
9 | 1836
3 | 1910
18 | 4159
5 | 6758
Top contributors: Please don't get addicted :) :)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iF4EAREIAAYFAlOWz+QACgkQj05ACITZaqqArwD/bRJ0Gr/FQyL6ZUvsRgdlsF2Y
PFc+BSSFvANsBAOI5xAA/3JzOwartTSyH6fvnaVQb8IVJP1XTnyplxwJhqrctyiP
=3Rzp
-----END PGP SIGNATURE-----