Typeface Clustering Based on Typewar

Data from typewar can be used for some interesting experiments in hierarchical classification of typefaces.

Our typeface recognition game, typewar, maintains a lot of statistical information, including the percentage of the time that people identify the correct typeface for every pair of typefaces. For example, we know that players are able to correctly distinguish Futura and Times New Roman 98.1% of the time, but correctly distinguish Baskerville and Times New Roman only 64.6% of the time. You can see a wealth of stats at http://typewar.com/stats/

It occurred to me shortly after we implemented that bit of stats-keeping that it could be treated as a distance metric between typefaces—the higher the percentage, the further apart the typefaces are. In theory, if the percentage correct is 50%, the two typefaces are indistinguishable. Now once you have a distance metric you can cluster the objects and present the clustering in a tree form known as a dendrogram.

So with a bit of Python and R, here are the preliminary results:

What immediately struck me first of all was that it separated the serif and sans serif typefaces very early on. Of course, this is to be expected, but it was nice to see it visually. The more distinctive typefaces such as Optima, Didot and Skia break off quickly. Georgia, a little surprisingly to me, also breaks off near the top. Helvetica Neue and Arial, of course, cluster very closely together near 50%, as do Verdana and Tahoma. In the 60-70% range we have the clustering of Futura with Gill Sans, Trebuchet MS with Lucida Grande, Monaco with Andale Mono, Garamond with Hoefler Text, Calson with Palatino and Times New Roman with Baskerville.

Note that this is based on overall typeface pairings. It's possible (and fairly easy) to generate a similar dendrogram just for a particular letter. For example, here is one based on the letter R:

Now, as is to be expected, we find Arial and Helvetica Neue a long way apart. Similarly Futura and Gill Sans are now much further away than they were in general. A lot of the familiar clusterings are still there, but it does strike me as very odd that Helvetica Neue would cluster with Optima. The glyphs are quite different to be confused 20% of the time (see the pair on typewar).

One obvious limitation in the stats is that certain typefaces only become unlocked at higher levels and so there is a selection bias in who is distinguishing, say, Caslon and Bembo (which requires you to be at a high level) versus being asked Helvetica Neue versus Optima (which everyone is asked from the beginning).

Still, it seems an interesting technique for typeface classification and one that, at some point, I'd like to incorporate into the typewar site itself. It will also help us greatly in tuning the gameplay to focus more within some of the clusters. A new feature we have planned called quests will definitely take advantage of this clustering information.

I would very much like more typefaces to be featured both in the game and in this analysis. If you're a foundry interested in promoting your work on typewar, please feel free to contact Eldarion.

About this blog

This blog is about Eldarion and our thoughts on business and technical topics.

If terms like “cache invalidation” or “database denormalization” don't interest you, you can just follow the business topics. Alternatively, if terms like “web analytics” or “sales automation” don't interest you, just follow the technical topics.

Of course, you're more than welcome to follow both. At Eldarion, we don't believe people have to only be of one type or the other.

feed Subscribe to a feed:
Technical, Business, Combined

About Eldarion

Eldarion builds great sites with Pinax and Django and helps you to do the same. We provide Pinax support & services, whether it's training, support or custom development you need.

We do web site development, both front-end and back-end, and can continue to host and maintain the site as it evolves.

Gondor, our managed hosting solution, is catered specifically to commercial, multi-tiered Django and Pinax deployments. Whether you are a client wanting us to also take care of your hosting needs or have a site that you develop yourself and just desire push button deployments, Gondor is the solution for you.

We also develop our own sites and host them on Gondor. Some are serious commercial sites, others are more lighthearted and fun.

Whenever possible, we open source components of our web sites and love working with clients that allow us to open source parts of our work for them.