Archive

Posts Tagged ‘Infoviz’

unemployment statistics in the UK

October 14, 2009 2 comments

Visualizing recent trends in benefit claimant counts in the UK.

Unemployment data from the Guardian Data Blog.

Constituency coordinates courtesy of the TheyWorkForYou API.

The three heatmaps show, respectively, from left to right:-

(1) the %age change in those claiming benefits (hotspot in the Thames Valley)

(2) the %age of the workforce out of work and claiming benefits (hotspots in the Midlands, Hull, London, Liverpool, Glasgow)

(3) the gender ratio of claimant percentages. Red=higher ratio of male to female claimants, blue=lower ratio of male to female claimants

Advertisements

homicide rates

October 13, 2009 Leave a comment



homicide rates

Originally uploaded by stevefaeembra

A visualisation of the homicide rates across the world.

Scatter plots with Basemap and Matplotlib

October 12, 2009 Leave a comment

flickr-geotagging-with-base
A while back I used the flickr api to map 24 hours worth of geotagged photos.

My previous attempts needed some manual Photoshop work to superimpose the plots on a map. The next logical step is to do the whole process – from start to finish – in code, and remove the manual steps.

To do this, I tried the awesome Basemap toolkit. This library allows all sorts of cartographic projections…

Installing Basemap

Basemap is an extention available with Matplotlib. You can download it here (under matplotlib-toolkits)

I installed the version for Python 2.5 on Windows; this missed out a dependency to httplib2 which I needed to install separately from here.

Getting started

Let’s assume you have 3 arrays – x, y and z. These contain the longitudes, latitudes, and data values at each point. In this case, I binned the geotagged photos into a grid of degree points (360×180), so that each degree square contained the number of photos tagged in that degree square.

Setting up

from basemap import Basemap 
import matplotlib.pyplot as plt
import numpy as np
import string
import matplotlib.cm as cm

x=[]
y=[]
z=[]

Now, you need to populate the x,y and z arrays with values. I’ll leave that an exercise to you 🙂 All three arrays need to be the same length.

Now, you need to decide which projection to use. Here, I’ve used the Orthographic projection.

m = Basemap(projection='ortho',lon_0=-50,lat_0=60,resolution='l')

Here is the secret sauce I took a while to work out. That’ll teach me not to R the FM. This line transforms all the lat, lon coordinates into the appropriate projection.

x1,y1=m(x,y)

The next bit, you can decide which bits you want to plot – land masses, country boundaries etc.

m.drawmapboundary(fill_color='black') # fill to edge
m.drawcountries()
m.fillcontinents(color='white',lake_color='black',zorder=0)

Finally, the scatter plot.

m.scatter(x1,y1,s=sizes,c=cols,marker="o",cmap=cm.cool,alpha=0.7)
plt.title("Flickr Geotagging Counts with Basemap")
plt.show()

visualising the nationality of Nobel Peace Prize Winners

October 9, 2009 Leave a comment

Visualizing the nationality of Nobel Peace Prize winners over time

map of the flags of the world

September 26, 2009 4 comments

A map of the flags of the world.

Using Flickr API to get the views, faves and comments of your most popular images

September 25, 2009 Leave a comment

One of the first things I wanted to do with the Flickr API was to get some stats on my most popular images.

You can get this info through the web front end, but there’s no option to download the stats in delimited format (such as CSV) so it can be analysed in a spreadsheet.

I wanted to work out if there was a pattern emerging in the key stats for my 200 most popular images…

  1. Number of Views
  2. Number of Favourites
  3. Number of Groups posted to
  4. Number of sets an image is in

Using a Python script (v2.5) and Beej’s FlickR API, this is fairly straightforward. It doesn’t require authentication.

The script runs slowly as it ‘plays nice’, leaving a seconds pause between calls, courtesy of the time.sleep() function. I don’t want to thrash the server.

# -*- coding: UTF8 -*-

import flickrapi
import datetime
import time
import string

# enter your api key below
api_key = 'PUT_YOUR_API_KEY_HERE' 

# enter the user id below (you can use flickr.people.findByUsername to get this for any user)
# it'll look something like 99999999@N99
userid='USER_ID_TO_SEARCH'

# delimiter. Use comma if you want, I tend to use ~
DELIMITER="~"

# dump number of views in delimited format

if __name__ == '__main__':
    #output format : "photoid,title,views,faves,groups,sets"
    flickr = flickrapi.FlickrAPI(api_key)
    photos = flickr.photos_Search(user_id= userid,extras='views', per_page='200', page='1', sort='interestingness-desc')
    for photo in photos.find('photos'):

        title = string.replace(photo.get('title'),",","") #in case you want to use comment as a delimiter ;0)
        
        # number of views
        id = photo.get('id')
        views=photo.get('views')
        
        # fave count (up to 50)
        faves = flickr.photos_getFavorites(photo_id=id,per_page=50)
        countfaves=faves.find('photo').get('total')
        time.sleep(1)
        
        # pools and sets posted to
        contexts=flickr.photos_getAllContexts(photo_id=id)
        posted_groups=len(contexts.findall('.//pool'))
        posted_sets=len(contexts.findall('.//set'))
        time.sleep(1)
        
        # comments
        comments=flickr.photos_comments_getList(photo_id=id)
        countcomments=len(comments.findall('.//comment'))
        
        # output as delimited text
        tokens=(id,title,str(views),str(countcomments),str(countfaves),str(posted_groups),str(posted_sets))
        converted=DELIMITER.join(tokens)
        
        print converted

This script dumps to the console, rather than a file; but it’s easily modified to write to a file. It should work with comma as a separator (for CSV use) as the title tag is stripped of commas…

Here’s some sample output from my photostream…

The format is :
photo id~title~views~favourites~groups~sets


2113237108~north-berwick-old-pier~259~25~21~6~3
3598511429~paris photo heatmap~736~6~12~5~3
3609118442~heart texture~861~10~26~0~2
2304836447~persistence de-motivator~4964~1~4~1~1
2347673075~bergen-ole-bull-plass-lensbaby~184~12~7~6~4
1621047086~banners-down-princes-street~325~11~8~4~2
3688253826~St Anthony's Chapel Edinburgh~124~15~15~9~1
2717978614~st marys~88~9~7~3~2

Once you have the output saved to a text file, you can import it into a spreadsheet (like OpenOffice or Excel) and play around with the figures 🙂

Mapping 24 hours of Flickr geotagging in Python

September 6, 2009 Leave a comment

The aim of this project was to find out where in the world people were geotagging their photos on flickR, using the flickR API.

world

The approach taken was to poll one of the flickR ‘pandas’, ‘Wang Wang’. This is a service which keeps track of geotagged photos as they come in.

The following Python script runs in the background, polls the service once a minute, and appends the location of newly tagged photos to a CSV file. It only asks for up to 100 photos in the previous minute; in reality, up to 120 are returned in any one minute! The average was around 80/minute when I last ran this.

The flickr API is being accessed using beej’s flickr api.

# -*- coding: UTF8 -*-
'''
Created on 28 Apr 2009
Ask WangWang for recently geotagged photos
@author: Steven
'''

import flickrapi
import datetime
import string
from time import time as t
import time

api_key = 'YOUR_API_KEY_HERE'
flickr = flickrapi.FlickrAPI(api_key)
if __name__ == '__main__':
    ct=0
    lastct=0
    print "Timestamp  Total   This"
    while True:
        tstamp=int(t())-60
        wangwang = flickr.panda_getPhotos(panda_name='wang wang', interval=60000, per_page=100, last_update=tstamp, extras='geo')
        fo=open("c:\\wangwang24hours.csv","a")
        for x in wangwang.find('photos'):
            s= "%d,%s,%s,%s\n" % (tstamp, x.get('longitude'),x.get('latitude'),x.get('id'))
            ct=ct+1
            fo.write(s)
        time.sleep(60)
        fo.close()
        print "%10s %07d %04d" %(tstamp,ct,ct-lastct)
        lastct=ct
    print 'done'

Once we have the data, it’s time to visualise it. A heatmap seemed a good choice; the chart uses the Matplotlib ‘hexbin’ style. This takes two arrays of the same size (here, the longitudes are in X and the latitudes in Y) and maps the values onto a hexagonal grid (here, of size 180×180), counting the number of photos which fall into each hexagonal bin.

Each bin is coloured according to the number of points that fall into it; red have most, green have less, blue have the least.

The following script takes the output from the previous script, and plots it.

import numpy as np
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import string

X=[]
Y=[]
fi=open(r"c:\wangwang24hours.csv")
for line in fi:
    ignore,x,y,ignore2=string.split(line, ",")
    if x!='None' and y!='None':
        X.append(float(x))
        Y.append(float(y))
fi.close()
hexbin(X,Y,gridsize=180,bins='log',cmap=cm.jet,linewidths=0,edgecolors=None)
show()