Archive

Posts Tagged ‘example’

Using R and ggplot2 to create a timeline visualization of UK government politics

creating an abstract timeline chart with R and ggplot

This was an attempt to visualise the ebb and flow of various political parties in the UK, over time.

The data was made available on the Guardian DataBlog, so I thought I’d have a go at producing a timeline image, using vertical strips of colour to show which party was in power in each year.

I’m not an expert in R, so there may well be a better way of doing this!

The steps taken were:-

  • download the OpenOffice spreadsheet from the Google Spreadsheet linked to on the Guardian DataBlog.
  • remove footer rows (copyright notices at the foot of the spreadsheet).
  • tidy up headers in row 1 – simplify these to single words like “Party”
  • save as a CSV, using the pipe (|) character as a delimiter. I put this in c:\infoviz\data.csv

First of all, we need to load the CSV file into a data frame

inp <- read.table("c:\\infoviz\\data.csv",header=T, sep="|")

This works fine, but the column ‘Party’ contains names like “Conservative”, “Labour” and “Liberal”.

How to convert this into something a bit more abstract, like a colour? The answer was to define a palette of colours…

palette <- c("blue", "red", "orange1", "rosybrown", "orange3", "red4", "blue4", "grey") 

.. and use a function to change a party name to a colour mentioned in that array…

getcol <- function(party) { 
	if ( party=="Conservative" | party=="Tory" )
         r<-1
     	else if ( party=="Labour" ) 
         r<-2
     	else if ( party=="Liberal" ) 
         r<-3
     	else if ( party=="Whig" ) 
         r<-4
	else if (party=="National Labour National Government") 
	   r<-6
	else if (party=="Conservative National Government") 
	   r<-7
     	else 
         r<-8
	return(palette[r])
}

Now, we can use the sapply function to map the values in the “party” column of the data frame into a colour..

inp$PartyCol <- sapply(inp$Party,getcol)

Now, the data frame has a column called ‘PartyCol’ which maps “Conservative” to “blue”, “Labour” to “red” and so on.

So now, ggplot…

qplot(factor(Year), title="history", data=inp, geom="bar", fill=factor(PartyCol), color=factor(PartyCol)) +scale_fill_identity(name=inp$PartyCol) +scale_colour_identity(name=inp$YearCol)

This shows a whole lot of labels and grids that I didn’t want…,

…so I added these options to give a clear, blank theme…

+ opts(panel.background = theme_rect(size = 1, colour = "lightgray"),panel.grid.major = theme_blank(),panel.grid.minor = theme_blank(),axis.line = theme_blank(),axis.text.x = theme_blank(),axis.text.y = theme_blank(),axis.title.x = theme_blank(),axis.title.y = theme_blank(), axis.ticks = theme_blank(),strip.background = theme_blank(),strip.text.y = theme_blank())

Here’s the whole source, if you’re brave enough to want to try this yourself!

library(ggplot2)
library(rgb)
palette <- c("blue", "red", "orange1", "rosybrown", "orange3", "red4", "blue4", "grey") 
getcol <- function(party) { 
	if ( party=="Conservative" | party=="Tory" )
         r<-1
     	else if ( party=="Labour" ) 
         r<-2
     	else if ( party=="Liberal" ) 
         r<-3
     	else if ( party=="Whig" ) 
         r<-4
	else if (party=="National Labour National Government") 
	   r<-6
	else if (party=="Conservative National Government") 
	   r<-7
     	else 
         r<-8
	return(palette[r])
}
inp <- read.table("c:\\infoviz\\data.csv",header=T, sep="|")
inp$PartyCol <- sapply(inp$Party,getcol)
qplot(factor(Year), title="history", data=inp, geom="bar", fill=factor(PartyCol), color=factor(PartyCol)) +scale_fill_identity(name=inp$PartyCol) +scale_colour_identity(name=inp$YearCol) + opts(panel.background = theme_rect(size = 1, colour = "lightgray"),panel.grid.major = theme_blank(),panel.grid.minor = theme_blank(),axis.line = theme_blank(),axis.text.x = theme_blank(),axis.text.y = theme_blank(),axis.title.x = theme_blank(),axis.title.y = theme_blank(), axis.ticks = theme_blank(),strip.background = theme_blank(),strip.text.y = theme_blank())

Scatter plots with Basemap and Matplotlib

October 12, 2009 Leave a comment

flickr-geotagging-with-base
A while back I used the flickr api to map 24 hours worth of geotagged photos.

My previous attempts needed some manual Photoshop work to superimpose the plots on a map. The next logical step is to do the whole process – from start to finish – in code, and remove the manual steps.

To do this, I tried the awesome Basemap toolkit. This library allows all sorts of cartographic projections…

Installing Basemap

Basemap is an extention available with Matplotlib. You can download it here (under matplotlib-toolkits)

I installed the version for Python 2.5 on Windows; this missed out a dependency to httplib2 which I needed to install separately from here.

Getting started

Let’s assume you have 3 arrays – x, y and z. These contain the longitudes, latitudes, and data values at each point. In this case, I binned the geotagged photos into a grid of degree points (360×180), so that each degree square contained the number of photos tagged in that degree square.

Setting up

from basemap import Basemap 
import matplotlib.pyplot as plt
import numpy as np
import string
import matplotlib.cm as cm

x=[]
y=[]
z=[]

Now, you need to populate the x,y and z arrays with values. I’ll leave that an exercise to you 🙂 All three arrays need to be the same length.

Now, you need to decide which projection to use. Here, I’ve used the Orthographic projection.

m = Basemap(projection='ortho',lon_0=-50,lat_0=60,resolution='l')

Here is the secret sauce I took a while to work out. That’ll teach me not to R the FM. This line transforms all the lat, lon coordinates into the appropriate projection.

x1,y1=m(x,y)

The next bit, you can decide which bits you want to plot – land masses, country boundaries etc.

m.drawmapboundary(fill_color='black') # fill to edge
m.drawcountries()
m.fillcontinents(color='white',lake_color='black',zorder=0)

Finally, the scatter plot.

m.scatter(x1,y1,s=sizes,c=cols,marker="o",cmap=cm.cool,alpha=0.7)
plt.title("Flickr Geotagging Counts with Basemap")
plt.show()

Image Steganography with PIL

October 7, 2009 Leave a comment

Steganography is greek for ‘hidden writing‘; the act of hiding a message inside another message.

In this case, hiding an image inside another image, without it being obvious to the viewer. The example I’ll give here is only a ‘toy’ implementation, for two reasons:-

  • easily cracked: it wouldn’t take the authorities long to spot the hidden message, not least because the algorithm is described on wikipedia 😉
  • fragile: the hidden image-within-an-image can easily be broken, if the image has its colours changed afterwards.

But it does illustrate how to do bitwise-manipulation of images in PIL using the ImageMath module, which is the purpose of the post.

How it works

The watermark – the image we wish to hide – is a bitonal image, with black and white pixels only. It’s then resized to be the same size as the original image.

We ‘smuggle’ the watermark inside the original by replacing the LSB (least significant bit) of each colour channel (R,G and B) in the original with the corresponding pixel in the watermark – either 1 for white, or 0 for black.

This image shows the binary arithmetic…least significant bit on the right.

watermarking

Hiding the watermark image inside our image

For this, we’ll need these imports…

from PIL import Image, ImageMath

and open the two files. The watermark is scaled to match the size of the original image.

watermark=Image.open(r"c:\watermark.png")
original=Image.open(r"c:\original.jpg")
watermark=watermark.resize(original.size)

ImageMath only works with single channel (greyscale) images, so we need to split the two images into their three channels (Red, Green and Blue) using the split() method.

red, green, blue = original.split()
wred, wgreen, wblue = watermark.split()

Now, using ImageMath. ImageMath lets you write simple expressions using values from one or more images. Here, ‘a’ and ‘b’ are bound to the values in the original and watermarked images, respectively. The convert() call is needed to prevent problems later; we need to cast the results back to a greyscale image (mode ‘L’).

red2 = ImageMath.eval("convert(a&0xFE|b&0x1,'L')", a=red, b=wred)
green2 = ImageMath.eval("convert(a&0xFE|b&0x1,'L')", a=green, b=wgreen)
blue2 = ImageMath.eval("convert(a&0xFE|b&0x1,'L')", a=blue, b=wblue)

Okay, so now we have three channels whose LSBs have been replaced with the LSB of the watermark.

But we need to combine the 3 channels back to get an RGB image ready for saving.

out = Image.merge("RGB", (red2, green2, blue2))
out.save(r"c:\merged.png")

Open the original and the processed images; can you see any difference?

Extracting the hidden image

All this is for nought if you can’t extract the hidden image afterwards.

This is simpler, as we only need to produce a black/white image from the LSB of the image. Here, I’ve only bothered with the Red channel.

stegged=Image.open(r"c:\merged.png")
red, green, blue = stegged.split()
watermark=ImageMath.eval("(a&0x1)*255",a=red) # convert to 0 or 255
watermark=watermark.convert("L")
watermark.save(r"c:\extracted-watermark.png")