Home > Python > Creating a Unicode character map in Python

Creating a Unicode character map in Python

When I were a lad, you had ASCII, and that was it. Or EBCDIC.. but the less said about that the better.

Unfortunately ASCII is useless when you want to start adding characters from non-latin scripts, such as… well, pretty much every language which isn’t English. That’s where Unicode comes in.

unicode-grid

The following Python script creates a web page showing a table of Unicode characters. Invalid Unicode characters appear as grey squares; hover over a character to get more info (a description of the character and its hex code).

# -*- coding: UTF8 -*-
'''
Created on 15 Sep 2009

@author: Steven Kay
'''
import unicodedata

# generate table of Unicode characters in Python
# using the unicodedata module

fo=open(r"c:\dumpUnicode.htm","w")
fo.write("<html><body>")
fo.write("<table border=1 style=\"text-align:center\">")
fo.write("".join("<td>%s</td>"%x for x in " .0.1.2.3.4.5.6.7.8.9.A.B.C.D.E.F".split("."))) #header
for page in range(0,1024):
    s="<td>%s</td>" % hex(page)
    for cell in range(0,16):
        try:
            # valid Unicode char
            x=(page*16)+cell
            name=unicodedata.name(unichr(x))
            s=s+"<td>"+"<span title=\"0x%0X\n%s\">" % (x,name) +unichr(x)+"</span></td>"
        except:
            # not a valid Unicode char
            s=s+"<td style=\"background:#A99\">&nbsp;</td>"
    fo.write("<tr>"+s+"</tr>")
fo.write("</table>")
fo.write("</body></html>")
fo.close()

You won’t be able to see all characters; many fonts only support the Latin characters (A-Z,a-z,0-9 and punctuation characters).

The script also stops after 16k, so it won’t pick up all the characters in some eastern languages. Easy enough to change though!

If you’re using Windows XP, you’ll need to install Eastern fonts (using Regional and Language Options/Languages on the Control Panel) to see characters in languages such as Japanese, Thai, Chinese etc. I normally have this turned off as it takes up a fair amount of memory…

Advertisements
Categories: Python Tags: , , , ,
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: