Wolfram Computation Meets Knowledge

The Longest Word Ladder Puzzle Ever

UPDATE: The solution to the puzzle and more comments from Jon have been added at the bottom of the post.

On the long flight to the recent Wolfram Technology Conference, I ended up on the puzzle page of a newspaper. My attention was drawn to a word ladder puzzle, where you must fill in a sequence of words from clues, but each word differs from the previous by only a single letter. Here, for example, is a simple puzzle already solved:

best from a position of superiority or authority
bast strong woody fibers obtained especially from the phloem of
from various plants
bash a vigorous blow
bath a vessel containing liquid in which something is immersed
(as to process it or to maintain it at a constant temperature or to lubricate it)
math a science (or group of related sciences) dealing with the logic
of quantity and shape and arrangement

I wasn’t going to do a blog entry on this, as it is a very similar task to my “Exploring Synonym Chains” post that I wrote some time ago, but that changed with a chance conversation at the (excellent) Technology Conference. Proving that one never stops learning, Charles Pooh, one of our graph theory developers, pointed out to me that my synonyms item could have been done much better. I had broken one of the very rules that I wrote about in my “10 Tips for Fast Mathematica Code” entry—”Use built-in functions.” I had effectively re-implemented the built-in Mathematica commands GraphPeriphery and GraphDiameter.

So, armed with these two new functions, let’s find the longest word ladder puzzle that can be made using Mathematica‘s English dictionary.

Some word ladder puzzles allow you to add or remove letters, but I am going to look only at the version where all words are the same length. Knowing that there can be no connections between different lengths means that we can be more efficient by considering each word length separately. So I start by generating dictionaries of words of specific lengths. Because I intend to use the words’ definitions as my clues, I will only look at words where I know at least one definition for the word, and I will also exclude words that contain non-letters or capitals (e.g. hyphenated words and proper names). The function caches its result, as we only want to do this once:

nWords[n_] := nWords[n] = DeleteCases[Cases[WordData[], word_String /; (StringLength[word] === n && StringMatchQ[word, RegularExpression["[a-z]+"]] && Length[WordData[word, "Definitions"]] > 0)], "-Redacted word-"];

Next we construct the graph of words that have an edit distance of one, that is, only a single letter difference. The conceptually simple way is…

wordLadderGraph[n_] := wordLadderGraph[n] = AdjacencyGraph[nWords[n], Outer[EditDistance, nWords[n], nWords[n]] /. p_Integer /; p =!= 1 → 0]

…but we are unnecessarily checking the edit distance in both directions, when we know it is symmetric. And we are also including “aloof” words (completely unconnected—so named by Donald Knuth because “aloof” is one of them). So this version, while more complicated, is faster and yields simpler graphs:

wordLadderGraph[n_] := wordLadderGraph[n] = Graph@Flatten@ Last@Reap@ Do[If[EditDistance[nWords[n][[a]], nWords[n][[b]]] == 1, Sow[nWords[n][[a]] ↔ nWords[n][[b]]]], {a, 2, Length[nWords[n]]}, {b, 1, a - 1}];

Knuth studied word ladders with five letters and observed that most five-letter words can be connected. We can easily replicate this observation:

wordLadderGraph[5]

Graph of words with five letters

We can count the five-letter aloof words:

Length[Complement[nWords[5], VertexList[wordLadderGraph[5]]]]

817

Apparently, Knuth couldn’t tackle the six-letter words at the time, as it was too difficult. But we will analyze all the way up to 23-letter words. It turns out that the disconnectedness increases with word length. Here, for example, is the graph for six letters:

wordLadderGraph[6]

Graph for six-letter words

The number of aloof words also goes up to 2,756 in the graph for six letters.

You can make incredibly long ladders, but ones that behave like bat → cat → fat → hat are rather annoying, because you can get directly from “bat” to “hat” without going through the other steps. So I am going to assert that the only good word ladders follow the shortest path of valid words. This view is supported, I think, by Charles Dodgson (Lewis Carroll), who claimed to have invented word ladders. He did work on the shortest paths in word ladders, including the result that the shortest evolution from “ape” to “man” was six steps. Either because I have a more modern set of words, or because Dodgson needed Mathematica, I make it five steps:

FindShortestPath[wordLadderGraph[3], ape, man]

{ape, apt, opt, oat, mat, man}

Using the looser definition of “no two consecutive steps may change the same letter,” Games Magazine held a competition in 1993 to find the longest word ladders, and its readers managed 26 steps. But we can do better than that.

We need to find which connected subgraph of which word length has the longest shortest distance between any two of its words. That’s a tough sentence to parse, but the concept is summed up in one of the two commands that Charles Pooh told me about: GraphDiameter.

diameters = ParallelTable[GraphDiameter[Subgraph[wordLadderGraph[i], #]] & /@ ConnectedComponents[wordLadderGraph[i]], {i, 23}];

There is quite a lot of searching to do, but parallelizing it makes it take under 15 minutes on my laptop. We discover that the longest shortest word ladder is 49 words long:

Max[diameters]

49

Because I kept the structure of the data as I generated it, the position of that value will tell me the word length and subgraph number that contains our prize.

Position[diameters, 49]

{{6, 23}}

This turns out to be the main cluster of connected six-letter words.

bestGraph = Subgraph[wordLadderGraph[6], ConnectedComponents[wordLadderGraph[6]][[23]]]

Graph of the main cluster of connected six-letter words

Now we use the other command that Charles introduced me to. GraphPeriphery finds those vertices that are at the maximal distance from each other.

GraphPeriphery[bestGraph]

{charge, comedo}

Luckily, we get two results. This means that they must be at either end of the word ladder. If we had more, we would have to figure out which element paired with which to produce the maximal path lengths. The next step is to generate the path:

puzzle = FindShortestPath[bestGraph, "charge", "comedo"]

I am not going to show you the result as that would ruin the puzzle, but here it is highlighted on the graph:

HighlightGraph[bestGraph, PathGraph[puzzle], GraphHighlightStyle → "Thick"]

Graph highlighting the shortest path to solve the puzzle

We now have to make the clues. I will use the built-in WordData definitions, picking one of them at random. Since lots of words have multiple meanings, you get a slightly different puzzle each time.

Before the result, an admission of cheating and a warning: unfortunately, the solution that this code finds turns out to contain a rather adult-themed word and clue. In an effort to keep the Wolfram Blog family-friendly, I manually removed the word from the dictionary (the redacted word in the first line of code). Run the code for yourself if you want the adult version. Of course, the whole result is dependent on the dictionary you use anyway, though larger dictionaries do not necessarily lead to longer word ladders, since they also increase connectedness.

Here is the family-friendly version; have fun:

Style[Grid[Transpose[{Join[{"charge"}, Table["", {48}], {"comedo"}], Map[RandomChoice[Last /@ WordData[#, "Definitions"]] &, puzzle]}], Frame → All, Alignment → Left, ItemSize → {{6, 35}, Automatic}, FrameStyle → Gray], FontFamily → "Georgia"]

charge a quantity of explosive to be set off at one time
exchange or replace with another, usually of the same kind or category
occurring or appearing or singled out by chance
of uncertain outcome; especially fraught with risk
a rhythmical work song originally sung by sailors
small crude shelter used as a dwelling
European scaleless blenny
a simple version of hockey played by children on the streets (or on ice or on a field) using a ball or can as the puck
the characteristic sounds made by a horse
habitually complaining
a person given to excessive complaints and crying and whining
any of numerous small silvery North American cyprinid fishes especially of the genus Notropis
shake, as from cold
a razor powered by an electric motor
someone who has or gives or receives a part or a share
an effigy in the shape of a man to frighten birds away from seeds
an electronic pulse counter used to count pulses that occur too rapidly to be recorded individually
an official who affixes a seal to a document
a person skilled in a particular type of therapy
a machine that cuts the heads off grain and moves them into a wagon
someone who reads proof in order to find errors and mark corrections
give an interpretation or rendition of
an owner of property who receives payment for its use by another person
someone who rants and raves; speaks in a violent or loud manner
an enlisted soldier who serves in the ranks of the armed forces
desire strongly or persistently
a programmer for whom computing is its own reward; may enjoy the challenge of breaking into other computers but does no harm
small striped semiterrestrial eastern American squirrel with cheek pouches
long slender feather on the necks of e.g. turkeys and pheasants
challenge aggressively
(paper making) a frame used to form paper pulp into sheets
(statistics) any of nine points that divided a distribution of ranked scores into equal intervals where each interval contains one-tenth of the scores
make dirty or spotty, as by exposure to air; also used metaphorically
give a definition for the meaning of a word
make more complex, intricate, or richer
express discontent
the act of despoiling a country in warfare
a deep narrow steep-sided valley (especially one formed by running water)
in a raving manner
migratory
capturing cattle or horses with a lasso
brick that is laid sideways at the top of a wall
of the relatively near future
orienting or directing homeward or to a destination
hulled corn with the bran and germ removed
a sermon on a moral or religious topic
having a feeling of home; cozy and comfortable
according with custom or propriety
light and humorous drama with a happy ending
comedo a black-tipped plug clogging a pore of the skin

If you want printable versions, here are CDF and PDF versions of the table.

I will post the solution to the puzzle in the comments section of the Wolfram Blog in a couple of weeks.

Download this post as a Computable Document Format (CDF) file.


Update:

The solution to the generated puzzle is:

{“charge”, “change”, “chance”, “chancy”, “chanty”, “shanty”, “shanny”, “shinny”, “whinny”, “whiney”, “whiner”, “shiner”, “shiver”, “shaver”, “sharer”, “scarer”, “scaler”, “sealer”, “healer”, “header”, “reader”, “render”, “renter”, “ranter”, “ranker”, “hanker”, “hacker”, “hackee”, “hackle”, “heckle”, “deckle”, “decile”, “defile”, “define”, “refine”, “repine”, “rapine”, “ravine”, “raving”, “roving”, “roping”, “coping”, “coming”, “homing”, “hominy”, “homily”, “homely”, “comely”, “comedy”, “comedo”}

However, as some of the comments pointed out, the dictionary used for this lacked some obvious modified words such as plurals and verb conjugations. Dropping the requirement that we know definitions for the generated puzzle, and using a much larger dictionary, I have a revised result.

nWords[n_] := nWords[n] = Cases[Union[WordData[], DictionaryLookup[]], word_String /; (StringLength[word] === n && StringMatchQ[word, RegularExpression["[a-z]+"]])]

The larger dictionary provides greater connectivity, so the largest minimal word ladder is a little shorter at 46 words, and interestingly occurs in the seven-letter words.

{“gimlets”, “giblets”, “gibbets”, “gibbers”, “libbers”, “limbers”, “lumbers”, “cumbers”, “cambers”, “campers”, “carpers”, “carters”, “barters”, “batters”, “butters”, “putters”, “puttees”, “putties”, “patties”, “parties”, “parries”, “carries”, “carrier”, “currier”, “curlier”, “burlier”, “bullier”, “bullies”, “bellies”, “jellies”, “jollies”, “collies”, “collins”, “colling”, “coaling”, “coaming”, “foaming”, “flaming”, “flaking”, “fluking”, “fluxing”, “flexing”, “fleeing”, “freeing”, “treeing”, “theeing”}

Comments

Join the discussion

!Please enter your comment (at least 5 characters).

!Please enter your name.

!Please enter a valid email address.

27 comments

  1. I believe that the family friendly puzzle is missing an entry after “a person skilled in a particular type of therapy”.

    Reply
  2. The puzzle site Sporcle.com has a daily “Word Ladder”. When the solution is released, can I get your permission to reproduce this list in puzzle form (with credit to you & a link to this article)? If it becomes a featured quiz, it could be exposed to tens of thousands of players. (See the music-related quiz I created in the link.)

    Reply
  3. That was fun, though the definitions were a bit revealing…

    Reply
  4. Two further notes: 1) I’ve solved it. As RJK infers, the clues made several steps pretty easy, although I went off the rails for a while when I put “TACKLE” for “challenge aggressively”. 2) You should correct the headline. This is not the “world’s longest word ladder”, but rather the world’s longest SHORTEST word ladder (or longest minimal word ladder).

    Reply
  5. I’m a bit puzzled as there seems to be a shorter path between “capturing cattle or horses with a lasso” and “orienting or directing homeward or to a destination”: just change the R to an H, then the P to an M. I’m having a tough time seeing how it’s even possible to use two words in between them without changing the same letter twice in a row. Anyone with some insight?

    Reply
  6. @Alan: Possibly (maybe even probably) the dictionary word list used didn’t include verb forms such as “hoping” that didn’t have a separate definition. Note that the other -ING words are not defined relative to their roots, and remarkably for a ladder of this length there are no words ending in either -ED or -S.

    Reply
  7. @Alan: I had to change the same letter twice.

    Reply
  8. Patrick: did the title change before I came here? This isn’t claimed to be the longest word ladder, but the longest word ladder PUZZLE. That means the idea is to find the shortest ladder between two words, and this pair is purported to have the longest solution. That said…
    Alan: I suppose that word is missing from the list used. But in this version, the intermediaries start with C.

    Reply
  9. I think the previous attempt at the ladder was in a version of mathematica without GraphPeriphery and GraphDiameter. With that in mind, it may be a good idea to include the mathematica version, that a blog has been written about. That will also help posterity keep things in perspective. The following should do:
    Title
    Date
    Author
    Mathematica #

    Good blog. Thanks.

    Reply
  10. @Jon: Thanks for the ok to put this on Sporcle. I’ve revised the clues to make it more of a puzzle challenge but the words remain the same. You can try it out at http://www.sporcle.com/games/rockgolf/the-longest-shortest-word-ladder

    Reply
  11. Patrick Kelly’s comments are correct. The dictionary does not contain many verb cojugations and word variations. In particular “hoping”. Had it been present in the source dictionary, the solution would have short-cut through it and yielded a ladder with one fewer steps.

    @Isaac, I wish I could say that my previous items pre-date the functionality, but unfortunately not. I just didn’t know enough about them then.

    Reply
  12. Great post, as usual from Jon McLoone.

    Now, can we please have a few more dictionaries into M here? I’d like to suggest important languages such as German and French. Also Russian and Hindi would be interesting, as they are related to the original Indo-Germanic languages of past millenia. Also Latin and (ancient! not modern!) Greek as well as Hebrew would be great to make statistical as well as graph-theoretic analyses of diction/vocabulary/etymology/morphology, and their migrations. How much Russian came from German? How much of English came from German and how much came from French, and how much French came from Latin? And what influence did Hebrew have on Greek? And what were the drivers of these influences? War, money, peace, love, food, domination, religion, crime, etc. Please, more languages! From a conceptual viewpoint in M, another dic should just be data!

    Reply
  13. @Jon – Perhaps a better way to obfuscate the ‘adult’ content would be to use the Overlay function, which is new in version 8. I’ve also wondered if Overlay could be used to ‘skin’ a Dynamic interface—maybe that would be a good blog post example?? (If so, feel free to contact me via email and I can point you toward some graphics to use!)

    Reply
  14. @ telefunkenvf14
    One nice way to obscure text is to blur it…
    Blur[Rasterize[“hidden text”], 5]
    And as you suggest, Overlay can be used…
    Overlay[{“XXXXXXXXXXX”, “hidden text”}]

    Reply
  15. @Mooniac – Mathematica 8 has dictionaries for 27 languages with the following number of entries:

    Table[{l, Length[DictionaryLookup[{l, All}]]}, {l,
    DictionaryLookup[All]}]

    {{“Arabic”, 43768}, {“BrazilianPortuguese”, 264713}, {“Breton”,
    32733}, {“BritishEnglish”, 86135}, {“Catalan”, 602208}, {“Croatian”,
    260752}, {“Danish”, 376469}, {“Dutch”, 229368}, {“English”,
    92518}, {“Esperanto”, 14780}, {“Faroese”, 108508}, {“Finnish”,
    728498}, {“French”, 139657}, {“Galician”, 515385}, {“German”,
    76155}, {“Hebrew”, 455264}, {“Hindi”, 15983}, {“Hungarian”,
    230206}, {“IrishGaelic”, 65203}, {“Italian”, 116854}, {“Latin”,
    8881}, {“Polish”, 234907}, {“Portuguese”, 459996}, {“Russian”,
    31801}, {“ScottishGaelic”, 15670}, {“Spanish”, 86016}, {“Swedish”,
    121430}}

    German, French, Russian, and Hindi are available.

    Check also the help for DictionaryLookup (Neat Examples etc.)

    Reply
  16. The solution to the puzzle and some further comments have been added to the bottom of the article.

    Reply
  17. Allowing Letter Additions and Deletions as well the Longest Shortest Ladder I got is 48 words:
    EMBARRED
    EMBARKED
    IMBARKED
    IMPARKED
    IMPARTED
    IMPORTED
    IMPOSTED
    IMPOSED
    IMPOSE
    IMPOST
    IMPORT
    IMPART
    IMPARK
    IMBARK
    EMBARK
    EMBAR
    EMBER
    AMBER
    CAMBER
    CABER
    CANER
    CANE
    VANE
    VINE
    VINA
    VINAL
    FINAL
    FINALE
    FINABLE
    FINEABLE
    FILEABLE
    FILLABLE
    FELLABLE
    SELLABLE
    SEALABLE
    HEALABLE
    HEATABLE
    EATABLE
    RATABLE
    RETABLE
    RESTABLE
    TESTABLE
    TASTABLE
    CASTABLE
    CARTABLE
    CHARTABLE
    CHARITABLE
    CHARITABLY

    Reply
  18. I did a similar project a few years ago, finding a 55-word shortest word ladder from comedo to haslet, using only lowercase, all-alphabetic words from http://www.gutenberg.org/ebooks/3201. If you want to explore word ladders, you can check out http://www.sprighty.com/cat-dog-solver/index.html.

    Reply
  19. Using a slightly different dictionary, I was able to come up with a longer solution (25 words).

    Here is converting ORANGE to PURPLE

    ORANGE → ORANGS → PRANGS → PRANKS → CRANKS → CRANES → CRANED → CRATED → COATED → COALED → COOLED → COOEED → COOEES → COOERS → COWERS → BOWERS → BOWELS → BOTELS → MOTELS → MORELS → SORELS → SORELY → SURELY → PURELY → PURPLY → PURPLE

    Reply
  20. You. Guys. Are. Weirdly. GREAT!

    Reply
  21. According to ceptimus.co.uk/wordladder.php, using words valid in Scrabble (which I assume is CSW), you can get from ‘gimlets’ to ‘theeing’ in just 13 steps: GIMLETS, GILLETS, GILLERS, TILLERS, TELLERS, TELLENS, TELLINS, TELLING, SELLING, SEELING, SEEMING, TEEMING, THEMING, THEEING.

    Reply