In the last couple of months, we, the IPFire development team, have launched a small side project: A new location database for the Internet. In this article, I would like to give you a brief background story on why and how it come to this...
What is this?
I am sure that you all have used a location database - often called GeoIP database after a brand name from a company called Maxmind. Most likely that was in an online shop that showed you shipping cost based on your location, or you were shown a cookie warning when you visited a website where the EU's cookie guidelines applied.
Other applications would be threat prevention like we use it in IPFire. Connection attempts from certain countries can simply be blocked, or port forwardings can be limited to certain countries only.
That is, however, not an exact science. The Internet changes constantly. IP address ranges are re-assigned from one party to another one, and often it can take some time until those location databases are all updated. Up to that point, you will see wrong information like the Google front page being shown in a wrong language. This might only be a bit of an inconvenience, but for a firewall, we need more recent and reliable data.
Maxmind is the biggest player on the market, and the previous source for GeoIP data in IPFire. They are a Massachusetts-based company and recently changed their terms of their database which was available under a Creative Commons license before. Now, users are required to register before they are permitted to use the database. Although the company claims their database is still free, it is at least a very grey area from our point of view, and since, we have decided to no longer use them. Currently, IPFire ships the last version of the database before registration was required and we did not accept the new end-user license agreement.
Accuracy Issues
Development of our own successor has started long before that, because we have already become more and more unhappy with the accuracy of Maxmind's free data. Potentially it is deliberately made inaccurate to promote paid services. Unfortunately we or any of their customers have no insight on where the data is coming from and how the database is composed.
Since for us, this is security-relevant, we needed these problems fixed.
Most importantly, the data needed to be accurate. We do not care about geo coordinates, or a county or city, but only a country. It isn't really possible to divide the Internet into countries, but what is possible is to have an idea from what jurisdiction someone is accessing a website. For most people, that is enough accuracy.
We also wanted to know from which Autonomous System a user is, because that is the only thing the Internet can be divided into. It is an inter-connected network of autonomous systems and that carries valuable information for us. For example to identify cloud providers.
On top of that, it is often interesting to have other attributes. There are plenty of anonymous proxies out there, that are being used for users to hide. Maxmind is using special country codes (in this case A1
) to mark those, but unfortunately that loses the actual country this system is located in.
We added an extra set of attributes that can be used to flag certain networks for various reasons allowing to gather more information without trading in accuracy and use them to mark satellite providers, anonymous proxies and anycast networks.
Finally, we needed to be sure that the database is recent and not modified by a third party. That is something that our competitors do not do. We have instead built a cryptographic signature into the database, so that when it is being downloaded to you local IPFire system, you can be sure that it is coming from us and has not been tampered with before loading it into your firewall.
I will blog more on the technical solutions and challenges in a later post.
Problems solved
So, we are now close to release version 1.0 of what we have built: An always up-to-date location database, that brings you more and accurate data.
We see it as an independent project within the IPFire Project, because not only we can greatly benefit from this piece of software: DNS load-balancers that will steer users to their closest data center, online shops that need to comply with different legal requirements, and many more...
We have implemented it as a C library with a very small footprint and OpenSSL as its only dependency. We then added Python and Perl modules. That way, it can be easily integrated into other software and of course we expect other people to contribute bindings for other scripting languages, etc.
There are download scripts that regularly update the database and use some smart ways to avoid transferring any unnecessary data.
All this is now in its final stages of testing, and you can use it in the latest testing release of IPFire. If you are interested in contributing by either reporting bugs, add language bindings or help making the database more accurate, please join our location mailing list.
More information and a live demo can be found on location.ipfire.org.