Call the events in the MC "events" and write about intersection.
Chris Pressey
1 year, 3 months ago
29 | 29 | asked and answers that were obtained in that project (which was eventually christened |
30 | 30 | "Chainscape"). |
31 | 31 | |
32 | Probability distributions and Markov chains | |
33 | ------------------------------------------- | |
32 | Events, probability distributions and Markov chains | |
33 | --------------------------------------------------- | |
34 | 34 | |
35 | A discrete probability distribution has a finite set of distinct outcomes. We | |
36 | can think of each outcome as being uniquely labelled, so that we can distinguish | |
37 | it from the other outcomes. | |
35 | We say an _event_ is a finite set of distinct, discrete outcomes. We think | |
36 | of each outcome as being uniquely labelled, so that we can distinguish | |
37 | it from the other outcomes. Each outcome also has a fixed, known probability | |
38 | of occurring. The event is then said to have a discrete | |
39 | _probability distribution_. | |
38 | 40 | |
39 | We regard a Markov chain as a set of labelled probability distributions. When we | |
40 | walk a Markov chain, we start at some probability distribution. We select one of | |
41 | the outcomes. We use the label of the selected outcome to select a new | |
42 | probability distribution from the Markov chain, and with this new probability | |
43 | distribution we repeat this process for as long as we like. | |
41 | We regard a Markov chain as a set of labelled events. When we _walk_ a Markov chain, | |
42 | we start at some event. We select one of the outcomes, based on its probability. | |
43 | We use the label of the selected outcome to select a new event from the Markov chain, | |
44 | and from this new event we repeat this process for as long as we like. | |
44 | 45 | |
45 | In the context of text generation, each probability distribution is associated | |
46 | with a word in a corpus (the label is the word). The outcomes of that probability | |
47 | distribution are also associated with words -- the words that can possibly follow | |
46 | In the context of text generation, each event is associated | |
47 | with a word in a corpus (the label is the word). The outcomes of that event | |
48 | are also associated with words -- the words that can possibly follow | |
48 | 49 | the first word in the corpus. Walking such a Markov chain produces streams of |
49 | text that resemble the corpus because the next word is as likely in the chain as | |
50 | it is in the corpus. | |
50 | text that resemble the corpus because each successive word is as likely to | |
51 | follow the previous word as it is to follow that same previous word in the corpus. | |
51 | 52 | |
52 | Now, each of the outcomes in a probability distribution has a non-zero probability, | |
53 | and the probabilities of all the outcomes in a probability | |
54 | distribution must add up to 1. This is Kolmogorov's 2nd axiom. | |
55 | However, in practice, we must engage in some special pleading, because the | |
56 | application does not fit the mathematical model exactly. | |
53 | Now, each of the outcomes in an event has a non-zero probability. | |
54 | Also, the probabilities of all the outcomes in a probability | |
55 | distribution must add up to 1; this is Kolmogorov's 2nd axiom. | |
56 | However, to conform to practice, we must engage in some special pleading here, | |
57 | because the application does not fit the mathematical model exactly. | |
58 | ||
59 | In particular, we have events with an empty set of outcomes. Their probabilities | |
60 | cannot add up to one, because there are no probabilities to add up! | |
61 | ||
62 | This can happen because the events are, generally speaking, generated | |
63 | from a corpus by recording the frequency with which some word is followed by | |
64 | other words. If a word only appears once, at the very end of the corpus, it | |
65 | is not followed by any words. We will also see in the sequel that the result | |
66 | of an intersection operation can quite validly be an empty set. | |
67 | ||
68 | So we need a way to deal with such a "probability distribution" even though it | |
69 | is not a probability distribution in Kolmogorov's sense. We will deal with it | |
70 | this way: when walking a Markov chain and coming across an event with zero | |
71 | possible outcomes, we will treat it as having an equal possibility of every | |
72 | possible outcome, i.e. of transitioning uniformly at random to any other event. | |
73 | ||
74 | This is a tiny bit ad hoc, but allows us to continue to use these structures | |
75 | even though they do not conform 100% to the expectations of probability theory. | |
76 | ||
77 | Intersection | |
78 | ------------ | |
79 | ||
80 | In the Anne of Green Garbles work, "intersection" of a Markov chain and a | |
81 | finite automaton was such that a transition from some word _a_ to some word | |
82 | _b_ could only occur if that transition had a non-zero probability in the | |
83 | Markov chain **and** was present in the finite automaton. | |
84 | ||
85 | We use the "intersection" concept in the same "conjunctive" way here. If | |
86 | _A_ and _B_ are two Markov chains, and _C_ is a Markov chain formed by | |
87 | taking the intersection of _A_ and _B_, then the transition from word _a_ | |
88 | to word _b_ has the probability in _C_ that is the probability of it | |
89 | happening in _A_ **and** the probability of it happening in _B_. | |
90 | ||
91 | The concept can be extended to entire events. | |
57 | 92 | |
58 | 93 | _To be continued_ |
59 | 94 |