Archive

Posts Tagged ‘grid’

Creating a Unicode character map in Python

September 17, 2009 Leave a comment

When I were a lad, you had ASCII, and that was it. Or EBCDIC.. but the less said about that the better.

Unfortunately ASCII is useless when you want to start adding characters from non-latin scripts, such as… well, pretty much every language which isn’t English. That’s where Unicode comes in.

unicode-grid

The following Python script creates a web page showing a table of Unicode characters. Invalid Unicode characters appear as grey squares; hover over a character to get more info (a description of the character and its hex code).

# -*- coding: UTF8 -*-
'''
Created on 15 Sep 2009

@author: Steven Kay
'''
import unicodedata

# generate table of Unicode characters in Python
# using the unicodedata module

fo=open(r"c:\dumpUnicode.htm","w")
fo.write("<html><body>")
fo.write("<table border=1 style=\"text-align:center\">")
fo.write("".join("<td>%s</td>"%x for x in " .0.1.2.3.4.5.6.7.8.9.A.B.C.D.E.F".split("."))) #header
for page in range(0,1024):
    s="<td>%s</td>" % hex(page)
    for cell in range(0,16):
        try:
            # valid Unicode char
            x=(page*16)+cell
            name=unicodedata.name(unichr(x))
            s=s+"<td>"+"<span title=\"0x%0X\n%s\">" % (x,name) +unichr(x)+"</span></td>"
        except:
            # not a valid Unicode char
            s=s+"<td style=\"background:#A99\">&nbsp;</td>"
    fo.write("<tr>"+s+"</tr>")
fo.write("</table>")
fo.write("</body></html>")
fo.close()

You won’t be able to see all characters; many fonts only support the Latin characters (A-Z,a-z,0-9 and punctuation characters).

The script also stops after 16k, so it won’t pick up all the characters in some eastern languages. Easy enough to change though!

If you’re using Windows XP, you’ll need to install Eastern fonts (using Regional and Language Options/Languages on the Control Panel) to see characters in languages such as Japanese, Thai, Chinese etc. I normally have this turned off as it takes up a fair amount of memory…

Advertisements
Categories: Python Tags: , , , ,