Archive

Archive for the ‘infovis, infoviz’ Category

Using R and ggplot2 to create a timeline visualization of UK government politics

creating an abstract timeline chart with R and ggplot

This was an attempt to visualise the ebb and flow of various political parties in the UK, over time.

The data was made available on the Guardian DataBlog, so I thought I’d have a go at producing a timeline image, using vertical strips of colour to show which party was in power in each year.

I’m not an expert in R, so there may well be a better way of doing this!

The steps taken were:-

  • download the OpenOffice spreadsheet from the Google Spreadsheet linked to on the Guardian DataBlog.
  • remove footer rows (copyright notices at the foot of the spreadsheet).
  • tidy up headers in row 1 – simplify these to single words like “Party”
  • save as a CSV, using the pipe (|) character as a delimiter. I put this in c:\infoviz\data.csv

First of all, we need to load the CSV file into a data frame

inp <- read.table("c:\\infoviz\\data.csv",header=T, sep="|")

This works fine, but the column ‘Party’ contains names like “Conservative”, “Labour” and “Liberal”.

How to convert this into something a bit more abstract, like a colour? The answer was to define a palette of colours…

palette <- c("blue", "red", "orange1", "rosybrown", "orange3", "red4", "blue4", "grey") 

.. and use a function to change a party name to a colour mentioned in that array…

getcol <- function(party) { 
	if ( party=="Conservative" | party=="Tory" )
         r<-1
     	else if ( party=="Labour" ) 
         r<-2
     	else if ( party=="Liberal" ) 
         r<-3
     	else if ( party=="Whig" ) 
         r<-4
	else if (party=="National Labour National Government") 
	   r<-6
	else if (party=="Conservative National Government") 
	   r<-7
     	else 
         r<-8
	return(palette[r])
}

Now, we can use the sapply function to map the values in the “party” column of the data frame into a colour..

inp$PartyCol <- sapply(inp$Party,getcol)

Now, the data frame has a column called ‘PartyCol’ which maps “Conservative” to “blue”, “Labour” to “red” and so on.

So now, ggplot…

qplot(factor(Year), title="history", data=inp, geom="bar", fill=factor(PartyCol), color=factor(PartyCol)) +scale_fill_identity(name=inp$PartyCol) +scale_colour_identity(name=inp$YearCol)

This shows a whole lot of labels and grids that I didn’t want…,

…so I added these options to give a clear, blank theme…

+ opts(panel.background = theme_rect(size = 1, colour = "lightgray"),panel.grid.major = theme_blank(),panel.grid.minor = theme_blank(),axis.line = theme_blank(),axis.text.x = theme_blank(),axis.text.y = theme_blank(),axis.title.x = theme_blank(),axis.title.y = theme_blank(), axis.ticks = theme_blank(),strip.background = theme_blank(),strip.text.y = theme_blank())

Here’s the whole source, if you’re brave enough to want to try this yourself!

library(ggplot2)
library(rgb)
palette <- c("blue", "red", "orange1", "rosybrown", "orange3", "red4", "blue4", "grey") 
getcol <- function(party) { 
	if ( party=="Conservative" | party=="Tory" )
         r<-1
     	else if ( party=="Labour" ) 
         r<-2
     	else if ( party=="Liberal" ) 
         r<-3
     	else if ( party=="Whig" ) 
         r<-4
	else if (party=="National Labour National Government") 
	   r<-6
	else if (party=="Conservative National Government") 
	   r<-7
     	else 
         r<-8
	return(palette[r])
}
inp <- read.table("c:\\infoviz\\data.csv",header=T, sep="|")
inp$PartyCol <- sapply(inp$Party,getcol)
qplot(factor(Year), title="history", data=inp, geom="bar", fill=factor(PartyCol), color=factor(PartyCol)) +scale_fill_identity(name=inp$PartyCol) +scale_colour_identity(name=inp$YearCol) + opts(panel.background = theme_rect(size = 1, colour = "lightgray"),panel.grid.major = theme_blank(),panel.grid.minor = theme_blank(),axis.line = theme_blank(),axis.text.x = theme_blank(),axis.text.y = theme_blank(),axis.title.x = theme_blank(),axis.title.y = theme_blank(), axis.ticks = theme_blank(),strip.background = theme_blank(),strip.text.y = theme_blank())

Advertisements

Plotting postcode density heatmaps in R

April 21, 2010 Leave a comment

Postcode density heatmap for edinburgh

Here in the UK, postcode geodata was recently released as part of the OS Opendata initiative. Although not the full postcode address file (PAF), it’s enough to be useful for visualisation purposes!

I thought it’d be interesting to plot the density of postcodes in a heatmap, as a way of getting a visualisation of population density.

I downloaded a copy from MySociety. The raw data uses the Ordnance Survey coordinate systems (Eastings and Northings), but MySociety also provide a version with coordinates converted to the WGS84 datum (latitude and longitude). It gets better; one file per outcode (so all the EH postcodes go in one file, for example).

So how to plot this?

I’ve been using R with the ggplot2 library for visualisations; it’s an amazing toolset, and you’d be surprised how few lines of code it takes to get results.

First of all, download the data.

Now, extract the data for the EH postcode area.

The first step is to load the extracted CSV file into a data frame. There are no column headers (header=F) and the file is comma separated (sep=”,”)

inp <- read.table("c:\\infoviz\\ehpostcode.csv",header=F, sep=",")

Next, a quick hack to remove certain postcodes which seem way out of the region of Edinburgh. I’ve yet to work out what these stray postcodes are.

inp <- subset(inp,inp$V10 != "")

Now we bring in the ggplot2 library..

library(ggplot2)

Now the plot itself.

m <- qplot(xlab="Longitude",ylab="Latitude",main="EH Postcode heatmap",geom="blank",x=inp$V3,y=inp$V4,data=inp)  + stat_bin2d(bins =200,aes(fill = log1p(..count..))) 

The stat_bin2d does the binned heatmap plot. It splits the area into a grid of 200×200 (‘bins’) and counts the number of postcodes in each grid square. These counts are then scaled logarithmically (fill = log1p(..count..)).

Finally, we plot this puppy.

m 

That’s it! Five lines of code.

Here’s the whole chunk..

inp <- read.table("c:\\infoviz\\ehpostcode.csv",header=F, sep=",")
inp <- subset(inp,inp$V10 != "")
library(ggplot2)
m <- qplot(xlab="Longitude",ylab="Latitude",main="EH Postcode heatmap",geom="blank",x=inp$V3,y=inp$V4,data=inp)  + stat_bin2d(bins =200,aes(fill = log1p(..count..))) 
m 

If you set the boundaries of the x and y axes, you can then use the resulting image as an overlay in Google Earth.

I found that the heatmap is a reasonably good proxy for population density – but with exceptions. There are hotspots that turn out to be postal sorting offices – PO boxes and post restante.