Markov Chain Text Generator

Sample corpuses

Corpus

Generated text

Current Gram

Array of succeeding words

How this text generator works

This text generator works by creating a Markov chain from a given corpus. This generator uses the following algorithm:
  1. Create an empty hash
  2. For each word in the provided corpus:
    1. Make that word a key in the hash
    2. Place each word that comes after it in the corpus into an array, then map that array to the original word

Consider the sentence 'John had many cats and dogs and fishes'. After this program runs, it returns an object that looks as follows:
{John => [had], had => [many], many => [cats], cats => [and], and => [dogs, fishes], dogs => [and], fishes => []}

Using this object, we can generate random text through the following algorithm
  1. Pick a random key from the hash
  2. Append this key to our sentence
  3. Choose a random word from the array mapped to this key
  4. Make this word our new key and return to step 2

Note that we are not limited to using single words as keys. This generator can also use bigrams (two words) and trigrams (three words) as keys.

About Markov Chains

Markov chains are mathematical systems consisting of a series of states, a set of transitions between each state, and a set of probabilities for each of these transitions occuring.

As an example, consider this simple markov chain modeling the weather:

markov chain image

This markov chain has 3 states: rainy, cloudy, and sunny. For each state, there is some probability for it transitioning to a different state or remaining in its current state. When the weather is rainy, it tends to remain rainy. When it's sunny, it tends to remain sunny. And when it's cloudy, it doesn't stay that way for very long.

Markov chains offer more sophisticated modeling of events. If we were trying to model weather just based on the probability that a given day is rainy, cloudy, or sunny, we end up with rainy days evenly distributed in our output. This is innacurate to real-life weather patterns - everyone knows that when it rains one day, it's likely to rain the next day as well.