Typeface Clustering Based on Typewar

Data from typewar can be used for some interesting experiments in hierarchical classification of typefaces.

Our typeface recognition game, [[http://typewar.com|typewar]], maintains a lot of statistical information, including the percentage of the time that people identify the correct typeface for every pair of typefaces. For example, we know that players are able to correctly distinguish Futura and Times New Roman 98.1% of the time, but correctly distinguish Baskerville and Times New Roman only 64.6% of the time. You can see a wealth of stats at [[http://typewar.com/stats/]]

It occurred to me shortly after we implemented that bit of stats-keeping that it could be treated as a distance metric between typefaces—the higher the percentage, the further apart the typefaces are. In theory, if the percentage correct is 50%, the two typefaces are indistinguishable. Now once you have a distance metric you can cluster the objects and present the clustering in a tree form known as a dendrogram.

So with a bit of Python and R, here are the preliminary results:

What immediately struck me first of all was that it separated the serif and sans serif typefaces very early on. Of course, this is to be expected, but it was nice to see it visually. The more distinctive typefaces such as Optima, Didot and Skia break off quickly. Georgia, a little surprisingly to me, also breaks off near the top. Helvetica Neue and Arial, of course, cluster very closely together near 50%, as do Verdana and Tahoma. In the 60-70% range we have the clustering of Futura with Gill Sans, Trebuchet MS with Lucida Grande, Monaco with Andale Mono, Garamond with Hoefler Text, Calson with Palatino and Times New Roman with Baskerville.

Note that this is based on overall typeface pairings. It's possible (and fairly easy) to generate a similar dendrogram just for a particular letter. For example, here is one based on the letter R:

Now, as is to be expected, we find Arial and Helvetica Neue a long way apart. Similarly Futura and Gill Sans are now much further away than they were in general. A lot of the familiar clusterings are still there, but it does strike me as very odd that Helvetica Neue would cluster with Optima. The glyphs are quite different to be confused 20% of the time (see the [[http://typewar.com/stats/glyph_pair/R/Helvetica_Neue/Optima/|pair on typewar]]).

One obvious limitation in the stats is that certain typefaces only become unlocked at higher levels and so there is a selection bias in who is distinguishing, say, Caslon and Bembo (which requires you to be at a high level) versus being asked Helvetica Neue versus Optima (which everyone is asked from the beginning).

Still, it seems an interesting technique for typeface classification and one that, at some point, I'd like to incorporate into the typewar site itself. It will also help us greatly in tuning the gameplay to focus more within some of the clusters. A new feature we have planned called quests will definitely take advantage of this clustering information.

I would very much like more typefaces to be featured both in the game and in this analysis. If you're a foundry interested in promoting your work on typewar, please feel free to contact Eldarion.