N-gram
An n-gram is a sequence of consecutive keystrokes in a particular order.
In other areas of computational linguistics and natural language processing, n-grams are more commonly defined as ordered sequences of words or characters. In the context of keyboard layout analysis, it is common to define n-grams over individual characters—or more precisely, graphemes.
Defining n-grams over characters or graphemes works well for common English keyboard layouts, but breaks down when analyzing magic keys. For example, on Magic Sturdy, pressing b
followed by the magic key next to it produces the characters “before”:
On layouts with a magic key, a single press of a magic key may produce more than one character (in this case, five: efore
). But when analyzing the motion of hands and fingers, there are only two distinct motions, corresponding to the two keystrokes, so we treat b★
as a bigram of two keystrokes rather than a five-gram of five characters. This allows us to calculate metrics such as same-finger bigrams using the same keystroke-based definition for both magic and non-magic layouts.
Orders
Section titled “Orders”Layouts Wiki uses the first three orders of n-grams with Latin prefixes:
n | Name | Also Known As | Examples |
---|---|---|---|
1 | Unigram | Monogram, Unigraph | e , t , a |
2 | Bigram | Digram, Digraph | th , he , in |
3 | Trigram | Trigraph | the , ing , and |
The alternate names unigraph, digraph, and trigraph are used in the documentation of some older layouts from the 2000s, and can ultimately be traced back to August Dvorak’s influential work, such as Typewriting Behavior (1936). Layouts Wiki uses unigram, bigram, and trigram because it is the standard terminology used in computational linguistics and natural language processing in the modern era, and because the “gram” terminology does not.
Skipgrams
Section titled “Skipgrams”A skipgram is sequence of two keystrokes in a particular order separated by one keystroke. For example, the word “mouse” contains the skipgrams m_u
, o_s
, and u_e
, with _
representing the (omitted) middle keystroke that separates the two defined keystrokes in the skipgram. In some earlier literature, it is called disjoint bigram, for example in the stat “disjoint same-finger bigram.”