Archive

Posts Tagged ‘Matplotlib’

Parsing USGS BIL Digital Elevation Models in Python

A false colour elevation map of the Hawaiian island of Kahoolawe

The USGS have a digital elevation model covering most of the US, Hawaii and Puerto Rico. If you use the National Map Seamless Server it’s possible to select a region and download a digital elevation model in a number of formats. One of these, BIL, is a simple binary raster file, which can be read quite easily in Python – in this, I’ll base this on code from a previous post.

Getting the data

  • Go to National Map Seamless Server
  • The map is fairly intuitive, but if you get stuck check out their tutorial
  • When you use the download rectangle tool, make sure to click on “Modify data request” and choose the BIL data format (BIL_INT16) rather than the default “Arc_GIS”
  • Choose zip or TGZ
  • You’ll then download a compressed archive – may take a while
  • The archive contains lots of files; look at the contents of output_parameters.txt. This tells you the dimensions of the file. Make a note of the sizes.
  • Extract the .bil file – this is what has the raw data.

Extracting the binary elevation data from the BIL file

# USGS Seamless Server BIL Format Parser
#

import struct

# get these from the output_parameters.txt file
# included in the download
height=437
width=663

# where you put the extracted BIL file
fi=open(r"my-extracted.bil","rb")
contents=fi.read()
fi.close()

# unpack binary data into a flat tuple z
s="<%dH" % (int(width*height),)
z=struct.unpack(s, contents)

After running this, z is now a tuple of width*height elements, with data stored in raster order (entries in row 1 from left-to-right, then row 2 from left-to-right, and so on)

The struct module for reading binary data

struct is a fantastic Python library, it really makes reading binary files pretty easy. Say we have a raster-layout binary file of 50 rows by 100 values per row. We’d expect 5000 values. Each value is an unsigned short (H) and little-endian (<)

In this case, we build up a format specification like this
“<5000H”

Plotting the end result

And optionally, if you have matplotlib and numpy libraries, you can get a quick preview of the resulting array.

from pylab import *
import numpy as np
import matplotlib.cm as cm

heights = np.zeros((height,width))
for r in range(0,height):
    for c in range(0,width):
        elevation=z[((width)*r)+c]
        if (elevation==65535 or elevation<0 or elevation>20000):
            # may not be needed depending on format, and the "magic number"
            # value used for 'void' or missing data
            elevation=0.0
        heights[r][c]=float(elevation)
imshow(heights, interpolation='bilinear',cmap=cm.prism,alpha=1.0)
grid(False)
show()

Creating a pinboard map of geotagged photos in a flickr pool

January 14, 2010 1 comment

In this post I’ll show how to produce a simple pinboard map of geotagged photos in a flickr group pool, using Python and Basemap/Matplotlib. You’ll need:-

There are two short scripts here:-

  • A script to find the longitude and latitude of geotagged photos in the group pool
  • A script to generate the plot

The first script produces a CSV file; the second uses this CSV file to produce the plot.

Here’s the script to produce the CSV file with photo locations:-

# -*- coding: UTF8 -*-
'''
Created on 12 May 2009
Based on beejs flickr API
Produce a list of photo locations for a given
group's pool on flickr
@author: Steven Kay
'''

import flickrapi
import string
import datetime
import string
import time

# Enter your API key below
# You can apply for an API key at 
# http://www.flickr.com/services/apps/create/apply
api_key = '' 

# paste group NSID below
group = '1124494@N22'

# sample... fetch your latest images with 
# a count of the views, faves and comments

if __name__ == '__main__':
    flickr = flickrapi.FlickrAPI(api_key)
    response_photos = flickr.groups_pools_getPhotos(group_id=group,per_page=500,extras='geo')
    root=response_photos.findall('.//photos')
    pages=int(root[0].get('pages'))
    if pages>8:
        # stop after 8 pages of 500 images
        # not sure if groups.pools.getPhotos has the same
        # 4000 image limit as photos.search..?
        pages=8
    
    fo=open(r"C:\infoviz\scotland_photos.csv","w")
    print "Longitude,Latitude"
    fo.write("Longitude,Latitude\n")
    for page in range(0,pages):
        response_photos = flickr.groups_pools_getPhotos(group_id=group,per_page=500,page=str(page),extras='geo') 
        for photo in response_photos.findall(".//photos/photo"):
            try:
                lat=photo.get('latitude')
                lon=photo.get('longitude')
                st="%s,%s" %(lon,lat)
                if not st=="0,0":
                    # ignore the odd buggy 0,0 coords
                    print "%s,%s" %(lon,lat)
                    fo.write("%s,%s\n" %(lon,lat))
            except:
                pass
        time.sleep(1)
    fo.close()

You’ll need to find the NSID of the group as an input; you can find this with the flickr API call flickr.group.search.

Now, you have a simple CSV file with the latitude and longitude of each geotagged image in the pool.

Longitude,Latitude
-5.167792,58.352519
-4.024359,57.675544
-4.230251,57.497356
-4.2348,57.501045
-4.84703,56.646034
-4.306168,55.873986
-3.586263,56.564732
...

This demo uses the Photography Guide to Scotland pool.

The next step is to plot the map.

'''
Simple Matplotlib/Basemap pinboard map for
Flickr Groups.

Need to provide a CSV file in following format

Longitude,Latitude
20.1,-3.25
20.225,-3.125
.. etc..

Created on 10 Oct 2009

@author: Steven Kay
'''

from basemap import Basemap 
import matplotlib.pyplot as plt
import matplotlib.pylab as pylab
import numpy as np
import string
import matplotlib.cm as cm

x=[] #longitudes
y=[] #latitudes

fi=open(r'C:\infoviz\scotland_photos.csv','r')

linenum=0
for line in fi:
    if linenum>0:
        line=string.replace(line, "\n","")
        try:
            fields=string.split(line,",")
            lon,lat=fields[0:2]
            x.append(float(lon))
            y.append(float(lat))
        except:
            pass
    linenum+=1
fi.close()

# cass projection centred on scotland
# will need to replace with a projection more suited
# to the group you're plotting

m = Basemap(llcrnrlon=-8.0,llcrnrlat=54.5,urcrnrlon=1.5,urcrnrlat=59.5,
            resolution='h',projection='cass',lon_0=-4.36,lat_0=54.5)
x1,y1=m(x,y)
m.drawmapboundary(fill_color='cyan') # fill to edge
m.drawcountries()
m.drawrivers() # you may want to turn this off for larger areas like continents
m.fillcontinents(color='white',lake_color='cyan',zorder=0)
m.scatter(x1,y1,s=5,c='r',marker="o",cmap=cm.jet,alpha=1.0)

plt.title("Photography Guide to Scotland in FlickR") # might want to change this!
plt.show()

This script uses a projection centred around scotland; you’ll need to change the following line…

m = Basemap(llcrnrlon=-8.0,llcrnrlat=54.5,urcrnrlon=1.5,urcrnrlat=59.5,
            resolution='h',projection='cass',lon_0=-4.36,lat_0=54.5)

…to something more suitable for your needs. Basemap provides an intimidating list of projections which should meet your needs.

homicide rates

October 13, 2009 Leave a comment



homicide rates

Originally uploaded by stevefaeembra

A visualisation of the homicide rates across the world.

Scatter plots with Basemap and Matplotlib

October 12, 2009 Leave a comment

flickr-geotagging-with-base
A while back I used the flickr api to map 24 hours worth of geotagged photos.

My previous attempts needed some manual Photoshop work to superimpose the plots on a map. The next logical step is to do the whole process – from start to finish – in code, and remove the manual steps.

To do this, I tried the awesome Basemap toolkit. This library allows all sorts of cartographic projections…

Installing Basemap

Basemap is an extention available with Matplotlib. You can download it here (under matplotlib-toolkits)

I installed the version for Python 2.5 on Windows; this missed out a dependency to httplib2 which I needed to install separately from here.

Getting started

Let’s assume you have 3 arrays – x, y and z. These contain the longitudes, latitudes, and data values at each point. In this case, I binned the geotagged photos into a grid of degree points (360×180), so that each degree square contained the number of photos tagged in that degree square.

Setting up

from basemap import Basemap 
import matplotlib.pyplot as plt
import numpy as np
import string
import matplotlib.cm as cm

x=[]
y=[]
z=[]

Now, you need to populate the x,y and z arrays with values. I’ll leave that an exercise to you 🙂 All three arrays need to be the same length.

Now, you need to decide which projection to use. Here, I’ve used the Orthographic projection.

m = Basemap(projection='ortho',lon_0=-50,lat_0=60,resolution='l')

Here is the secret sauce I took a while to work out. That’ll teach me not to R the FM. This line transforms all the lat, lon coordinates into the appropriate projection.

x1,y1=m(x,y)

The next bit, you can decide which bits you want to plot – land masses, country boundaries etc.

m.drawmapboundary(fill_color='black') # fill to edge
m.drawcountries()
m.fillcontinents(color='white',lake_color='black',zorder=0)

Finally, the scatter plot.

m.scatter(x1,y1,s=sizes,c=cols,marker="o",cmap=cm.cool,alpha=0.7)
plt.title("Flickr Geotagging Counts with Basemap")
plt.show()

mapping flickr group activity

September 20, 2009 Leave a comment

flickr group activity visualization

Mapping the activity levels of approx 1,500 Flickr groups against the number of members of each group, using the Flickr API.

The x axis is the group size, the y axis is the number of seconds an image can expect to stay on the ‘front page’. This was measured as the timestamp difference between the 1st and 13th images in the pool (the landing page for a group shows 12 images; the user has to follow a link to show more).

Note that the image is a log-log plot, as the groups follow a power-law distribution.

Groups in the bottom right are the busiest – this includes the B&W and FlickrCentral groups. Groups in the top right have lots of members but are more vigorously moderated, so images are added to the pool more slowly (or new submissions are deleted). The group touching the bottom of the graph is the “10 million images” group, where users are encouraged to “dump and run”.

This is a hexbin plot – the colour represents the number of groups falling within a certain range of values. Red=Lots, Green=Fewer, Blue=Few.

As you’d expect, larger groups tend to have a higher turnaround of images, but there’s a lot of variation.

The most common group size seems to be around 2000-3000 members; a group this size, you can expect an image to stay on the front page for around 2-3 hours. With the largest groups, this drops to around 5 minutes.

mapping water drainage with SRTM elevation data in Python

September 18, 2009 Leave a comment

srtm enhanced relief map

With this mini-project, I wanted to try to model the effect of rainfall on the landscape, to see if I could map water drainage – finding watersheds, catchment areas and so on. After lots of tries, I stumbled across a potentially interesting mapping technique by accident, which does a good job of providing a relief map that emphasizes water courses.

The algorithm starts with a square grid of SRTM digital elevation data. (You can read more about this in a previous post)…

For each cell in the grid, a virtual raindrop falls. This drop then repeatedly moves to the lowest neighbouring cell, each time keeping a running total of how many meters it has descended by. Eventually, it falls into a local minima; a ‘well’, a cell with no neighbours lower than itself. This can be the coast, or it might be a lake or a low-lying piece of land in a valley. When it can’t fall any further, the map is coloured according to the total vertical distance travelled.

The height of the cell is relative to the nearest patch of flat land, rather than being relative to sea level. This makes it easier to follow the lie of the land.

The image here is a map of the degree square covering Edinburgh and the Scottish Borders (Dumfries is in the south, Edinburgh in the top-right corner).

Mapping 24 hours of Flickr geotagging in Python

September 6, 2009 Leave a comment

The aim of this project was to find out where in the world people were geotagging their photos on flickR, using the flickR API.

world

The approach taken was to poll one of the flickR ‘pandas’, ‘Wang Wang’. This is a service which keeps track of geotagged photos as they come in.

The following Python script runs in the background, polls the service once a minute, and appends the location of newly tagged photos to a CSV file. It only asks for up to 100 photos in the previous minute; in reality, up to 120 are returned in any one minute! The average was around 80/minute when I last ran this.

The flickr API is being accessed using beej’s flickr api.

# -*- coding: UTF8 -*-
'''
Created on 28 Apr 2009
Ask WangWang for recently geotagged photos
@author: Steven
'''

import flickrapi
import datetime
import string
from time import time as t
import time

api_key = 'YOUR_API_KEY_HERE'
flickr = flickrapi.FlickrAPI(api_key)
if __name__ == '__main__':
    ct=0
    lastct=0
    print "Timestamp  Total   This"
    while True:
        tstamp=int(t())-60
        wangwang = flickr.panda_getPhotos(panda_name='wang wang', interval=60000, per_page=100, last_update=tstamp, extras='geo')
        fo=open("c:\\wangwang24hours.csv","a")
        for x in wangwang.find('photos'):
            s= "%d,%s,%s,%s\n" % (tstamp, x.get('longitude'),x.get('latitude'),x.get('id'))
            ct=ct+1
            fo.write(s)
        time.sleep(60)
        fo.close()
        print "%10s %07d %04d" %(tstamp,ct,ct-lastct)
        lastct=ct
    print 'done'

Once we have the data, it’s time to visualise it. A heatmap seemed a good choice; the chart uses the Matplotlib ‘hexbin’ style. This takes two arrays of the same size (here, the longitudes are in X and the latitudes in Y) and maps the values onto a hexagonal grid (here, of size 180×180), counting the number of photos which fall into each hexagonal bin.

Each bin is coloured according to the number of points that fall into it; red have most, green have less, blue have the least.

The following script takes the output from the previous script, and plots it.

import numpy as np
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import string

X=[]
Y=[]
fi=open(r"c:\wangwang24hours.csv")
for line in fi:
    ignore,x,y,ignore2=string.split(line, ",")
    if x!='None' and y!='None':
        X.append(float(x))
        Y.append(float(y))
fi.close()
hexbin(X,Y,gridsize=180,bins='log',cmap=cm.jet,linewidths=0,edgecolors=None)
show()