Geolocation

The concept of matching an IP to a country is known as IP geolocation, often just “IPGeo” or “GeoIP.” There are lots of reasons for using IP geolocation, ranging from the mundane (identifying countries in your webserver logfiles) to the questionable (banning countries from your server to cut down on spam) to the neat (doing it at firewall/router level and redirecting a user to the closest data center).

Most of the work is just done on a country level. You take an IP (72.36.178.234, my server) and look it up in a database, and get “UNITED STATES” as an answer. There do exist databases on finer levels, down to the city, but they’re expensive and often wrong. (I keep getting ads to find hot singles in Mashpee, more than 100 miles away and in a different state… Or maybe it’s Mattapan. Whatever the case, they’re not even close.)

It turns out that you can download a free database of IP-country mappings. It’s not infallible, but they say it’s 98% accurate. The database itself won’t do you any good. It’s a compressed CSV (comma-separated variable).

In the comments section here, there’s a snippet of PHP code to take the CSV and convert it to a huge series of SQL inserts, which you input into a database… (Hint: for whatever reason, his preg_match is imperfect and leaves a few instances of the word “error” in the middle of the file. It’s probably a bad idea, but I just commented out the “echo error” line. I end up with a 5.7MB SQL query. You can also just download the thing directly here (warning: 5.7 MB SQL file). Note that, per the license terms, I disclose in the comments that it’s a derivative work of their CSV file.

The other important catch is that IPs are stored as long integers, not ‘normal’ IPs. You’ll presumably want to use PHP + MySQL to get the country associate with PHP, so I’ll provide pseudocode in a minute. PHP provides an ip2long() function, but it only takes you halfway, but leaves you with sign problems. (Argh!) It’s an easy fix, though, and you want something like the following:

$long = sprintf("%u", ip2long($ip));
$query = "SELECT a2,a3,country FROM ip2c WHERE start <= $long AND end >= $long";

You then, of course, run $query and parse through it… You get 2- and 3-letter country codes, as well as the full country name. I use it, with good results, in seeing what country comment spam is coming from. (Most of it comes from the US.)

A MySQL query isn’t the proper way to do this: there exist binary files with the same data that result in faster lookups. But this is the simplest way to start doing IP geolocation in ten minutes time, and, with the query cache enabled, there’s not a ton of overhead.

I’m tempted to write some scripts to allow people to ‘browse’ the database, either looking up an IP, or to view it by country.

Update: Weird Silence has a binary implementation of this same database that’s supposedly much faster. The main page is here, the PHP one is here, and the C one is (t)here. (I’m wondering if it makes sense to write a PHP script to call the C version, and what the performance implications would be?)

Update 2: Get your country flags here.

5 thoughts on “Geolocation

  1. http://ttwagner.com/ipgeo/ for your viewing pleasure.

    The US has 9,466 netblocks associated with it.

    The UK tops us with 9,833.

    Note that I’m currently just doing a ton of MySQL queries, letting the query cache do its thing.

    What I don’t currently do (but would like to!) is calculate the *size* of each netblock: some amount to 16 IPs, some amount to tens of thousands.

  2. Ooh, spiffy. I’ve KIND of got it working, but with odd problems…

    I added some functions to the library that all of my blog ‘tools’ use.

    The following code works flawlessly:

    include(‘/[redacted]/includes/library.php’);

    $ip = ‘72.36.178.234’;

    $x = getLocation($ip);
    $country = $x[country];
    $region = $x[region];
    $city = $x[city];
    echo ” – $country / $region / $city “;

    However, when I try to insert the *EXACT SAME CODE* (starting with the $x = getLocation($ip) line), the page loading aborts. The exact same include() is found at the top of the file, and I’ve quadruple-checked that I’m not double-including it somehow.

    The exact error is:
    [Sat Jan 12 14:35:07 2008] [apc-error] Cannot redeclare class geoiprecord

    Given that nothing else on the page calls getLocation(), and that $x isn’t already assigned, can you figure out what would be causing this error? It essentially dies when I call a function in one script, but not in the other one.

  3. Oh! I made the silly mistake of assuming that the line numbers actually corresponded to where the real problem lay.

    The problem ACTUALLY came from trying to call the function repeatedly. My ‘test’ page only did it once, and thus worked fine, but the ‘live’ code called it repeatedly.

    Silly me wrote a function and put the include() of the requisite libraries it wanted INSIDE the function, thus trying to include them every time the function is called — obviously, calling the function more than once would throw errors about not being able to redeclare things.

    I really should learn by now that the answer almost NEVER lies directly with the referenced line of code. This one was even more far-removed than I’d imagined, though.

Leave a Reply

Your email address will not be published. Required fields are marked *