Through A Network, Darkly | A Geographic Look At I2PCategory: thoughts
A 4 Minute Read
05 Jan 2015
General Disclaimer: I’m a guy with a blog. Take everything I say with a grain of salt, even the grains of salt.
I2P is a network that doesn’t get a lot of attention, usually living in the shadow of Tor. Also, without any central directories, getting an image of what the network looks like is somewhat difficult. Two services called Stats.i2p and Bigbrother.i2p already give some network statistics, but I wanted to attack the problem myself with full access to the data. Thus, in November I decided to set up four extra I2P routers on VPSes to try and see what I could find.
The routers, which I’ll now refer to as probes to differentiate them from other routers on the network, were set up in London (Probe 1), New York (Probe 2), Singapore (Probe 3), San Francisco (Probe 4), and Amsterdam (Probe 5). They were set up on November 28th so that they could integrate into the network before I started harvesting data on December 1st, and the bandwidth graphs along with the NetDB size (the internal database that each router has of other known routers on the network) both indicate that this window was sufficient. On December 1st, I began to run a script hourly which pulled IP addresses from the NetDB and put them into timestamped lists.
Throughout the month, each of the probes knew approximately 3,000 routers at any given time. Given the size of the I2P network, however, any data gathered from only five probes should be taken with a grain of salt. Nevertheless, during December the probes discovered 142,454 unique IP addresses, and sent out on average 652GB of traffic (3.3TB overall). As Stats.i2p reports, the overall network is estimated to consist of approximately 25,000 routers at any given time, therefore the sample should be representative for cumulative analysis. With that being said, approach any figures that relate to a specific time with caution, as each day the probes only saw approximately 15,000 routers on average, or about 4,000 each hour, as they connected to many of the same nodes.
With these IP lists, MaxMind’s Geolite library was used to add geographic information. Autonomous System (the block of IP space controlled by a single administrator) information was also added. Finally, it was all put into a PostgreSQL database where it was queried, exported, and then mapped or charted with QGIS and LibreOffice.
The first set of maps is just the overall network, plotted as points with the size of the point corresponding to the number of hours that it was known by one of the probes. In other words, the bigger the point, the more time it appeared to be online. These maps also exclude routers in any country with less than 25 unique routers overall. The first map combines the five probes’ IP lists, where the rest are broken down by each probe. Between each probe the differences are minor, as they largely consist of the same points despite the wide geographic distribution of the probes themselves. In other words, don’t worry too much about where you put your router.
(Click to enlarge)
The next map shows the raw count of unique IP addresses detected in each country, categorized by natural breaks. Interestingly, Russia leads the way by approximately 40,000 IPs compared to the second place country, the US. Many of the IPs detected in Russia (29870 out of 54139) were only online for four hours or less. This is, to be frank, strange, and it would be interesting to find out why this is. So far the theories thrown out on Twitter have been laptops or botnets, but even those seem suspicious. If you have any theories, let me know.
Another interesting aspect of this is just how little I2P is used in Scandinavian countries. While Sweden has some nodes, Norway and Finland have very few. One potential reason for this is that the GetI2P.net website offers a Swedish Translation, but not Norwegian or Finnish. The same is true for Iceland. Considering these countries are generally well regarded when it comes to privacy and technology, it seems that if any region is a priority for I2P adoption, this should be it.
This map takes the data from the last one (counts of unique IP addresses) and filters it down to just countries that censor the internet. This was done by using data from the Open Net Initiative. Here, countries that experienced censorship in any category were taken and highlighted to evaluate how well I2P is being used in these countries. Do be aware, however, that the censorship data taken by the Open Net Initiative may be out of date in some areas, but they still seem to be of value. I2P generally tends to be used in most censored countries, but adoption could be higher in regions like Southeast Asia.
The last two figures are simply charts showing the concentration of IPs in the top countries and autonomous systems. Again we see how dominant Russia is, as not only is it the top country, but the top AS, AS8402, is also Russian. What do we learn from this? Don’t be Putin your routers in Russia, it’s got enough as is.
A note on ethics: I’m aware of the ethical problem of tracking what is supposed to be an anonymous network. IP logs were all encrypted when not in use, no IP addresses were made public, and no point data was provided for countries with less than 25 routers (though point data is highly inaccurate anyways). I2P also clearly states in its threat model that it does not attempt to hide the fact you’re using I2P. Lastly, the data was collected through standard means.