Wolfram Computation Meets Knowledge

How Do YOU Type “wolfram”? Analyzing Your Typing Style Using Mathematica

Wouldn’t it be cool if you never had to remember another password again?

I read an article in The New York Times recently about using individual typing styles to identify people. A computer could authenticate you based on how you type your user name without ever requiring you to type a password.

To continue our series of posts about personal analytics, I want to show you how you can do a detailed analysis of your own typing style just by using Mathematica!

Here’s a fun little application that analyzes the way you type the word “wolfram.” It’s an embedded Computable Document Format (CDF) file, so you can try it out right here in your browser. Type “wolfram” into the input field and click the “save” button (or just press “Enter” on your keyboard). A bunch of charts will appear showing the time interval between each successive pair of characters you typed: w–o, o–l, l–f, f–r, r–a, and a–m. Do several trials: type “wolfram,” click “save,” rinse, and repeat (if you make a typo, that trial will just be ignored).

Keystrokes

After a few trials the bar chart showing the average intervals should stop changing much between trials. Here’s what mine looked like after five trials:

Results after typing "wolfram" five times

You can see in the history panel on the top right that each trial has a fairly consistent profile. The time series plots in the middle show how the interval for each letter pair changed from trial to trial. The statistics above each time series plot show that the standard deviation is hovering around 10 milliseconds (ms). That’s pretty consistent! Maybe that’s because I typed “wolfram” about a thousand times while preparing this blog post. :)

Here’s what it looked like after 50 trials:

Results after typing "wolfram" 50 times

Notice in the plot on the bottom left how the interval distribution is basically bimodal, with o–l (green) and f–r (red) hovering around 150 ms, while all the other key pairs (w–o, l–f, r–a, and a–m) are clustered around 75 ms. That’s interesting because o–l and f–r are both key pairs that are typed with the same finger on the same hand.

So what about letters involving the same hand but different fingers? Well, the r–a interval requires me to hit “r” with my left index finger and then “a” using my left pinky, and it’s one of the fastest intervals for me at an average of 70 ms.

So it seems like I can transition between letters quickly using the same hand, as long as it’s using different fingers. But as soon as I’m using the same finger for a transition, it takes about twice as long.

Now I’m curious if faster transitions have more or less fluctuation. Do you pay a price for speed with more inconsistency? I’d like to see a plot of interval fluctuation versus interval average. And the measure of fluctuation that I think makes the most sense here is to divide the standard deviation by the mean (which is called relative standard deviation), giving fluctuation as a fraction of the mean.

To do this analysis I’ll need the raw data for all 50 trials, so I click the “data” tab and then click “copy to clipboard”:

Copying data to the clipboard

After pasting the data into a notebook and assigning it to a variable called keydata, I get a list of trials for each interval like so:

intervals = Transpose[Differences /@ keydata[[All, All, 2]]]

Length@intervals

6

For example, here’s the w–o interval for all 50 trials (in seconds):

Length@intervals

{0.038694, 0.049795, 0.060173, 0.062232, 0.061813, 0.050397, 0.047450, 0.049871, 0.049927, 0.061964, 0.049777, 0.086927, 0.074264, 0.074802, 0.076007, 0.038097, 0.078292, 0.086846, 0.086584, 0.075184, 0.049636, 0.074912, 0.057256, 0.087949, 0.087619, 0.061979, 0.046079, 0.049256, 0.062087, 0.061795, 0.157831, 0.073180, 0.061975, 0.049956, 0.042222, 0.110283, 0.050346, 0.038629, 0.074833, 0.050545, 0.062748, 0.087184, 0.050334, 0.039608, 0.040772, 0.078140, 0.127844, 0.065453, 0.141108, 0.093185}

Now I create pairs of the form {mean, sd/mean} for each interval:

meanrelativesd = {Mean@#, StandardDeviation@#/Mean@#} & /@ intervals

{{0.067877, 0.37605}, {0.140637, 0.13239}, {0.081019, 0.18848}, {0.148784, 0.10601}, {0.065879, 0.17654}, {0.082176, 0.22633}}

And plot them together with a linear fit:

fit = FindFit[meanrelativesd, a*x + b, {a, b}, x]

{a → -1.8074, b → 0.3776}

slope = a /. fit; intercept = b /. fit; y[x_] := slope*x + intercept

Labeled[ListPlot[List /@ meanrelativesd, Frame → True, FrameStyle → Gray, FrameTicksStyle → Black, AxesOrigin → {0, 0}, PlotRange → {{0.04, 0.17}, {0, 0.45}}, PlotMarkers → (MapThread[{Graphics[{#2, Circle[{0, 0}], Text[#1]}], 0.12} &, {pchars, colors}]), FrameTicks → {{Range[0.1, 0.4, 0.1], None}, {{#, Round[1000*#]} & /@ Range[0.05, 0.2, 0.05], None}}, Epilog → {Red, Opacity[0.8], Thickness[0.004], Line[{{0.05, y[0.05]}, {0.16, y[0.16]}}]}], {Style["relative standard deviation", FontFamily → "Helvetica"], Row[{Style["mean", FontFamily → "Helvetica"], Style["(ms)", Gray, FontFamily → "Helvetica"]}, " "]}, {Left, Bottom}, RotateLabel → True]

Linear graph for 50 trials

There does appear to be lower relative fluctuation for the slower intervals o–l and f–r, at least for my typing style.

The question we really want to answer is whether people can be identified just by the way they type. How could we test that?

Well it just so happens that Wolfram is a company full of data nerds just like me, so I sent out an email asking people to do a bunch of trials with this interface and send me their data. A total of 42 people responded:

wolframdata = ReadList[FileNameJoin[{NotebookDirectory[], "wolfram-data.txt"}], Expression];

Length@wolframdata

42

People did 12 trials on average—some did as many as 30, and others only did a few:

Length /@ wolframdata

{29, 33, 33, 10, 4, 9, 8, 7, 17, 11, 9, 27, 4, 7, 3, 11, 12, 11, 12, 13, 9, 7, 8, 8, 10, 20, 6, 6, 6, 5, 7, 14, 15, 6, 9, 28, 7, 4, 6, 4, 14, 20}

Round@N@Mean[Length /@ wolframdata]

12

Here are the average interval charts for each person. You can see there’s quite a range of profiles:

meanswolfram = Mean /@ Transpose[Differences@# & /@ #[[All, All, 2]]] & /@ wolframdata;

Grid@Partition[keychart[#, ImageSize → 50, ChartStyle → (Directive[Opacity[0.6], #] & /@ colors)] & /@ meanswolfram, 7, 7, 1, {}]

Average interval charts for each person's results

And here’s the combined distribution for everyone:

intervalswolfram = Flatten /@Transpose[Transpose[Differences /@ #[[All, All, 2]]] & /@ wolframdata];

Histogram[Flatten[intervalswolfram], 300, PlotRange → {{0, 0.4}, All}, PerformanceGoal → "Speed", Frame → True, FrameStyle → Gray, FrameTicksStyle → Black, FrameTicks → {{None, None}, {{#, Round[1000*#]} & /@ FindDivisions[{#1, #2, 0.1}, 4] &, None}}, PlotRangePadding → {{Automatic, Automatic}, {None, Scaled[0.1]}}]

Combined distribution for all results

There’s a hint of the same bimodal structure like there was for just my 50 trials. But which letter intervals are contributing to each peak? We can use ChartLayout → "Stacked" to find out:

legend = Grid[{MapThread[Row[{Graphics[{#, EdgeForm[Darker[#]], Opacity[0.6], Rectangle[]}, ImageSize → 15], nicelabel@#2}, " ", Alignment → {Left, Baseline}] &, {colors, pchars}]}, Alignment → {Left, Baseline}, Spacings → {1, Automatic}]; Column[{Labeled[Histogram[intervalswolfram, 300, PlotRange → {{0, 0.4}, All}, ChartStyle → colors, ChartLayout → "Stacked", PerformanceGoal → "Speed", Frame → True, FrameStyle → Gray, FrameTicksStyle → Black, FrameTicks → {{None, None}, {{#, Round[1000*#]} & /@ FindDivisions[{#1, #2, 0.1}, 4] &, None}}, PlotRangePadding → {{Automatic, Automatic}, {None, Scaled[0.1]}}, ImageSize → 400, ImagePadding → {{10, 10}, {20, 5}}], Row[{Style["interval", FontFamily → "Helvetica"], Style["(ms)", Gray, FontFamily → "Helvetica"]}, " "], Bottom], legend}, Alignment → Center, Spacings → {Automatic, 1}]

Using ChartLayout → "Stacked" to determine which letter intervals are contributing to each peak

It looks like the o–l (green) and f–r (red) transitions are centered around 150 ms, with the others centered around 75 ms, just like they were for me.

What about the relative fluctuations? Now that we have data for the Wolfram population, we can see if that trend still holds:

meanrelativesdwolfram = Mean /@ Transpose[{Mean@#, StandardDeviation@#/Mean@#} & /@ Transpose[Differences /@ #[[All, All, 2]]] & /@ wolframdata];

wfit = FindFit[meanrelativesdwolfram, a*x + b, {a, b}, x]

{a → -3.44815, b → 0.725336}

wslope = a /. wfit; wintercept = b /. wfit; wy[x_] := wslope*x + wintercept

Labeled[ListPlot[List /@ meanrelativesdwolfram, Frame → True, FrameStyle → Gray, FrameTicksStyle → Black, AxesOrigin → {0, 0}, PlotRange → {{0.09, 0.18}, {0, 0.45}}, PlotMarkers → (MapThread[{Graphics[{#2, Circle[{0, 0}], Text[#1]}], 0.12} &, {pchars, colors}]), FrameTicks → {{Range[0.1, 0.4, 0.1], None}, {{#, Round[1000*#]} & /@ Range[0.05, 0.2, 0.05], None}}, Epilog → {Red, Opacity[0.8], Thickness[0.004], Line[{{0.05, wy[0.05]}, {0.18, wy[0.18]}}]}], {Style["relative standard deviation", FontFamily → "Helvetica"], Row[{Style["mean", FontFamily → "Helvetica"], Style["(ms)", Gray, FontFamily → "Helvetica"]}, " "]}, {Left, Bottom}, RotateLabel → True]

Linear graph for data from Wolfram trials

Yes, indeed! There does seem to be a trend of lower relative fluctuations for longer intervals. So it seems like in general the slower you type, the more consistent you’ll be.

Now let’s test how well these individuals can be identified by their typing styles.

The simplest way to test this is to take the trials from two people and see if a clustering algorithm can separate the trials cleanly into two distinct groups.

For example, here are the trials from two of the individuals (person 1 and person 6):

indexed = MapIndexed[With[{i = #2[[1]]}, MapIndexed[With[{j = #2[[1]]}, Differences@# → {i, j}] &, #[[All, All, 2]]]] &, wolframdata];

Grid[{#[[1, 2, 1]], Framed@Grid[Partition[Map[keychart[#, ImageSize → 30] &, #[[All, 1]]], 12, 12, 1, {}], Spacings → {-0.05, 0.1}]} & /@ indexed[[{1, 6}]],  Alignment → Left]

Trials from two individuals separated into two groups

Each trial (represented here by a bar chart) consists of a vector of six numbers, one number for each letter interval: (w–o, o–l, l–f, f–r, r–a, a–m). So each trial can be thought of as a point in six dimensions, where the similarity of two trials is just their Euclidean distance:

FindClusters[Join[indexed[[1]], indexed[[6]]], 2, DistanceFunction → (EuclideanDistance[Normalize@#1, Normalize@#2] &)][[All, All, 1]]

{{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}, {6, 6, 6, 6, 6, 6, 6, 6, 6}}

FindClusters is able to partition these trials perfectly, so that all trials from person 1 are in the first cluster and all trials from person 6 are in the second cluster.

But if we try the same thing with person 1 and person 7, the majority of trials for both 1 and 7 end up in the second cluster, so they actually cluster quite poorly:

Grid[{#[[1, 2, 1]], Framed@Grid[Partition[Map[keychart[#, ImageSize → 30] &, #[[All, 1]]], 12, 12, 1, {}], Spacings → {-0.05, 0.1}]} & /@ indexed[[{1, 7}]],  Alignment → Left]

Trials from person 1 and 7 separated into two groups

FindClusters[Join[indexed[[1]], indexed[[7]]], 2, DistanceFunction → (EuclideanDistance[Normalize@#1, Normalize@#2] &)][[All, All, 1]]

{{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 7, 7}, {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 7, 7, 7, 7, 7, 7}}

And here’s an intermediate case, where the clustering algorithm is able to mostly distinguish between person 1 and person 9, but not perfectly:

Grid[{#[[1, 2, 1]], Framed@Grid[Partition[Map[keychart[#, ImageSize → 30] &, #[[All, 1]]], 12, 12, 1, {}], Spacings → {-0.05, 0.1}]} & /@ indexed[[{1, 9}]],  Alignment → Left]

Trials from person 1 and 9 separated into two groups

FindClusters[Join[indexed[[1]], indexed[[9]]], 2, DistanceFunction → (EuclideanDistance[Normalize@#1, Normalize@#2] &)][[All, All, 1]]

{{1, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9}, {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 9, 9, 9}}

To quantify the clustering quality, we’ll use the Rand index (see the attached notebook for the implementation). Using the Rand index, a perfect partitioning gets a score of 1:

randclusterquality[{{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}, {6, 6, 6, 6, 6, 6, 6, 6, 6}}]

1.

The 1 versus 9 cluster gets a score of about 78%:

randclusterquality[{{1, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9}, {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 9, 9, 9}}]

0.789047

And the 1 versus 7 cluster gets a score of 0, because the clustering doesn’t help distinguish between the two people at all:

randclusterquality[{{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 7, 7}, {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 7, 7, 7, 7, 7, 7}}]

0.

Now, since there are only 42 people, let’s just use brute force and measure the quality for all 1,764 pairwise clusterings:

clusterquality[data1_, data2_] := randclusterquality@ FindClusters[Join[data1, data2], 2, DistanceFunction → (EuclideanDistance[Normalize@#1, Normalize@#2] &)][[All, All, 1]]

clusterquality[data1_, data1_] := 0.

pairclusters = DeleteCases[Outer[{#1[[1, 2, 1]], #2[[1, 2, 1]]} → clusterquality[#1, #2] &, indexed, indexed, 1], HoldPattern[{i_, i_} → _], 2];

pairclusters = Chop[pairclusters];

The average cluster quality is about 67%:

Mean@Flatten[pairclusters][[All, -1]]

0.67398

Here’s the distribution of Rand quality scores:

Histogram[Flatten[pairclusters][[All, -1]], 30, Frame → True, FrameStyle → Gray, FrameTicksStyle → Black, FrameTicks → {{Automatic, None}, {Automatic, None}}, PerformanceGoal → "Speed"]

Distribution of Rand quality scores

And here is the matrix of all pairwise cluster scores, where matrix element {i, j} gives the Rand score for the clustering of trials for person i and person j. Darker is a higher score, and white is a score of 0 (all elements along the diagonal are by definition 0 because it’s impossible to distinguish between two identical sets of trials):

indices2clusterquality = Dispatch@Join[Flatten[pairclusters], {_ → 0.}];

pairwiseclusterscores = Replace[Table[{i, j}, {i, 1, 42}, {j, 1, 42}], indices2clusterquality, {2}];

MatrixPlot[pairwiseclusterscores]

Matrix of all pairwise cluster scores

(Note that the matrix isn’t perfectly symmetric—that’s because the ordering of data points given to FindClusters can change the results.)

You can see there are rows with lots of white cells. Those are individuals whose trials tend to cluster poorly against all the other peoples’. If we take the average score for each person, we can see the poorly clustering ones as low points:

meanscoreperperson = Mean /@ pairwiseclusterscores

{0.778765, 0.338672, 0.561213, 0.872599, 0.709796, 0.739428, 0.503638, 0.920413, 0.582432, 0.635402, 0.728354, 0.662812, 0.657463, 0.785851, 0.662576, 0.708751, 0.786356, 0.603808, 0.736254, 0.614572, 0.845568, 0.720183, 0.583103, 0.457772, 0.657543, 0.873992, 0.748698, 0.433935, 0.558306, 0.593791, 0.625238, 0.569416, 0.818976, 0.598492, 0.851795, 0.339936, 0.532003, 0.547207, 0.715792, 0.758778, 0.551265, 0.662232}

ListPlot[meanscoreperperson, Filling → Axis, PlotRange → {All, {0, 1}}, Frame → True, FrameTicks → {{Automatic, None}, {Automatic, None}}, FrameStyle → Gray, FrameTicksStyle → Black]

Average score for each person

I suspect the poorly clustering people tend to be less consistent in their trials, which makes it harder for the clustering algorithm to find a clean partition. Let’s test that by quantifying a person’s typing consistency using a measure of the scatter of their trials in 6D space:

ballradius[points_] :=  With[{centroid = Mean@points}, Max[Norm[# - centroid] & /@ points]]

pairscatter = Outer[{#1[[1, 2, 1]], #2[[1, 2, 1]]} → {ballradius[#1[[All, 1]]], ballradius[#2[[All, 1]]]} &, indexed, indexed, 1];

pair2scatter = Dispatch@Flatten[pairscatter]

Now that we have a measure of scatter, we plot cluster score versus the average scatter for each pair:

qualscatter = DeleteCases[Flatten[Map[# → {Mean[# /. pair2scatter], # /. indices2clusterquality} &, Table[{i, j}, {i, 1, 42}, {j, 1, 42}], {2}], 1], HoldPattern[{i_, i_} → _]];

Labeled[ListPlot[qualscatter[[All, 2]], Epilog → {Red, Line[{{0, 0}, {1, 1}}]}, Frame → True, FrameTicks → {{Automatic, None}, {Automatic, None}}, PlotRange → {{0, 1}, {0, 1}}, PlotRangePadding → {{Scaled[0.05], Scaled[0.05]}, {Scaled[0.05], Scaled[0.05]}}, FrameStyle → Gray, FrameTicksStyle → Black], {Style["cluster quality", FontFamily → "Helvetica"], Style["scatter", FontFamily → "Helvetica"]}, {Left, Bottom}, RotateLabel → True]

Cluster score versus the average scatter for each pair

There’s a clear trend here: pairs of people with lower scatter tend to have a higher cluster score (points lying above the red y = x line). So the ability of this method to identify you based on your typing style would require a certain amount of consistency in the way you type.

Using this fun little typing interface, I feel like I actually learned something about the way my colleagues and I type. The time to type two letters with the same finger on the same hand takes twice as long as with different fingers. The faster you type, the more your typing speed will fluctuate. The more your typing speed fluctuates, the harder it will be to distinguish you from another person based on your typing style. Of course we’ve really just scratched the surface of what’s possible and what would actually be necessary in order to build a keystroke-based authentication system. But we’ve uncovered some trends in typing behavior that would help in building such a system.

And hey, we just used the interactive features of Mathematica for real-time data acquisition! Being able to combine that with sophisticated functions like FindClusters for the subsequent data analysis allowed us to quickly find patterns and extract some meaningful conclusions about typing behavior from this dataset. Now I’m curious what other kinds of biometrics we can measure using Mathematica….

Download this ZIP file that includes the post as a CDF file, associated data files, and the code used above.

Comments

Join the discussion

!Please enter your comment (at least 5 characters).

!Please enter your name.

!Please enter a valid email address.

11 comments

  1. That was really cool! Thanks for the post!

    Reply
  2. Interesting study. However I think to conclude that “the time to type two letters with the same finger on the same hand takes twice as long as with different fingers” you’d need to assume that the sample typists all type using both hands and type particular keys with particular fingers?

    Reply
  3. It is still better to take the human out of the equation. The right circumstances to do that are arriving.

    Reply
  4. While I most frequently touch-type, I also tend to log into my computer at the beginning of lunch with a sandwich in one hand – so even under normal circumstances I have two different typing styles. But, more importantly what about emergencies? I had an accident and for a few days after I had to tell other people my password. And for several days beyond that I was dopy when I was trying to log-in and had a hard-enough time getting my name and password entered correctly, so I’m sure my typing was drastically different.

    Reply
  5. You need to take into account the fact that people may be typing slightly slower or faster at given times.

    For example, I may type “wolfram” at my fastest speed during one session, and type it slower during another session.

    What doesn’t (perhaps?) change however, is the distance (measured in time) between my keystrokes relative to the speed at which I’m typing.

    I may type “wolfram” at a high speed, and “wolfram” at a low speed, the time measured between keystrokes is relative to speed, so all I would have to do is scale the average times relatively from slow to fast and the data should then begin to look the same again, regardless of how fast I’m typing.

    Reply
  6. A colleague of mine also noted that perhaps the user may change keyboard, then they will be typing differently for a while until they get used to it.

    Physical factors will affect the data such as a hand injury etc.

    Reply
  7. Very interesting. This immediately made me wonder what could be accomplished by doing the same thing with mouse movements. When someone picks a target for their mouse on the screen, the speed at and path through which they reach that target will vary based on how quickly they respond to visual feedback and how well trained they are with the mouse. You might be able to model that as a control system and characterize individuals according to their mouse movements.

    Reply