git @ Cat's Eye Technologies Chainscape / 8ac1854
Call the events in the MC "events" and write about intersection. Chris Pressey 1 year, 3 months ago
1 changed file(s) with 55 addition(s) and 20 deletion(s). Raw diff Collapse all Expand all
2929 asked and answers that were obtained in that project (which was eventually christened
3030 "Chainscape").
3131
32 Probability distributions and Markov chains
33 -------------------------------------------
32 Events, probability distributions and Markov chains
33 ---------------------------------------------------
3434
35 A discrete probability distribution has a finite set of distinct outcomes. We
36 can think of each outcome as being uniquely labelled, so that we can distinguish
37 it from the other outcomes.
35 We say an _event_ is a finite set of distinct, discrete outcomes. We think
36 of each outcome as being uniquely labelled, so that we can distinguish
37 it from the other outcomes. Each outcome also has a fixed, known probability
38 of occurring. The event is then said to have a discrete
39 _probability distribution_.
3840
39 We regard a Markov chain as a set of labelled probability distributions. When we
40 walk a Markov chain, we start at some probability distribution. We select one of
41 the outcomes. We use the label of the selected outcome to select a new
42 probability distribution from the Markov chain, and with this new probability
43 distribution we repeat this process for as long as we like.
41 We regard a Markov chain as a set of labelled events. When we _walk_ a Markov chain,
42 we start at some event. We select one of the outcomes, based on its probability.
43 We use the label of the selected outcome to select a new event from the Markov chain,
44 and from this new event we repeat this process for as long as we like.
4445
45 In the context of text generation, each probability distribution is associated
46 with a word in a corpus (the label is the word). The outcomes of that probability
47 distribution are also associated with words -- the words that can possibly follow
46 In the context of text generation, each event is associated
47 with a word in a corpus (the label is the word). The outcomes of that event
48 are also associated with words -- the words that can possibly follow
4849 the first word in the corpus. Walking such a Markov chain produces streams of
49 text that resemble the corpus because the next word is as likely in the chain as
50 it is in the corpus.
50 text that resemble the corpus because each successive word is as likely to
51 follow the previous word as it is to follow that same previous word in the corpus.
5152
52 Now, each of the outcomes in a probability distribution has a non-zero probability,
53 and the probabilities of all the outcomes in a probability
54 distribution must add up to 1. This is Kolmogorov's 2nd axiom.
55 However, in practice, we must engage in some special pleading, because the
56 application does not fit the mathematical model exactly.
53 Now, each of the outcomes in an event has a non-zero probability.
54 Also, the probabilities of all the outcomes in a probability
55 distribution must add up to 1; this is Kolmogorov's 2nd axiom.
56 However, to conform to practice, we must engage in some special pleading here,
57 because the application does not fit the mathematical model exactly.
58
59 In particular, we have events with an empty set of outcomes. Their probabilities
60 cannot add up to one, because there are no probabilities to add up!
61
62 This can happen because the events are, generally speaking, generated
63 from a corpus by recording the frequency with which some word is followed by
64 other words. If a word only appears once, at the very end of the corpus, it
65 is not followed by any words. We will also see in the sequel that the result
66 of an intersection operation can quite validly be an empty set.
67
68 So we need a way to deal with such a "probability distribution" even though it
69 is not a probability distribution in Kolmogorov's sense. We will deal with it
70 this way: when walking a Markov chain and coming across an event with zero
71 possible outcomes, we will treat it as having an equal possibility of every
72 possible outcome, i.e. of transitioning uniformly at random to any other event.
73
74 This is a tiny bit ad hoc, but allows us to continue to use these structures
75 even though they do not conform 100% to the expectations of probability theory.
76
77 Intersection
78 ------------
79
80 In the Anne of Green Garbles work, "intersection" of a Markov chain and a
81 finite automaton was such that a transition from some word _a_ to some word
82 _b_ could only occur if that transition had a non-zero probability in the
83 Markov chain **and** was present in the finite automaton.
84
85 We use the "intersection" concept in the same "conjunctive" way here. If
86 _A_ and _B_ are two Markov chains, and _C_ is a Markov chain formed by
87 taking the intersection of _A_ and _B_, then the transition from word _a_
88 to word _b_ has the probability in _C_ that is the probability of it
89 happening in _A_ **and** the probability of it happening in _B_.
90
91 The concept can be extended to entire events.
5792
5893 _To be continued_
5994