Letter Adjacency Word Generation


Introduction

This is a cross-platform, language independent technique for creating natural-language sounding words. The underlying technique is based on research performed as part of an investigation for the Game Programming Gems book, to which the author contributed two articles.

History & Technical Backgrounder

The technique first cam to the authors attention in the game Elite, back in the early 1980s. From an analysis of the game it was clear that with 16KB of memory it was not possible to store enough 6 letter words to provide names for a virtual universe of several galaxies, each with about 255 stars.

Clearly some form of name generation algorithm was in place. Besides this, every time the galaxy was re-generated, the names were identical to the last time. Hence there had to be something more sophisticated than a pure random letter choice algorithm being applied.

The author mistakenly assumed that a pseudo random number generator was being used to create a stream of repetitive patternless numbers. However, it has come to his attention since the original analysis that it was in fact the result of a Fibonacci sequence that provided the input to the generation algorithm.

The algorithm itself is based on the selection of letters resulting from the analysis of a set of words in order to determine the probabilities of various letter-pairs. In other words, letter adjacency.

Analysis

Before the generation algorithm can be discussed, it is necessary to look at how the analysis takes place. Essentially, all that is needed is a two dimensional array, sized 26x26, which is populated according to the following pseudocode:


for each letter in the word, except the last one
LetterTable[this_letter][next_letter] + 1;
next letter

This assumes that the array is set to zeros, and simply increments the value in the array according to the letters that are positioned next to each other in the word.

The result is a table of numbers, each of which indicates the number of times that the two letters that it refers to have appeared side by side in the words that have been analysed.

Once the algorithm has been created, all that is needed is to feed it a stream of words (as many as possible) and store the resulting table.

Generation

The analysis phase results in a table which can be read in by a program, and used to create sequences of letters which have previously appeared side by side. They can be chained together and have a good chance of forming a word that appears similar to other words in the vocabulary of the language in question.

The copyright of the article Letter Adjacency Word Generation in Video Games is owned by Guy Lecky-Thompson. Permission to republish Letter Adjacency Word Generation in print or online must be granted by the author in writing.

Go To Page: 1 2

Articles in this Topic    Discussions in this Topic