Commit 8ac1854d2e0aa86e649f681c642e6a2cce785626 - Chainscape

Call the events in the MC "events" and write about intersection. Chris Pressey 1 year, 3 months ago

1 changed file(s) with 55 addition(s) and 20 deletion(s). Raw diff Collapse all Expand all

+55

-20

doc/Algebraic-Structure-of-Markov-Chains.md less more

29	29	asked and answers that were obtained in that project (which was eventually christened
30	30	"Chainscape").
31	31
32		Probability distributions and Markov chains
33		-------------------------------------------
	32	Events, probability distributions and Markov chains
	33	---------------------------------------------------
34	34
35		A discrete probability distribution has a finite set of distinct outcomes. We
36		can think of each outcome as being uniquely labelled, so that we can distinguish
37		it from the other outcomes.
	35	We say an _event_ is a finite set of distinct, discrete outcomes. We think
	36	of each outcome as being uniquely labelled, so that we can distinguish
	37	it from the other outcomes. Each outcome also has a fixed, known probability
	38	of occurring. The event is then said to have a discrete
	39	_probability distribution_.
38	40
39		We regard a Markov chain as a set of labelled probability distributions. When we
40		walk a Markov chain, we start at some probability distribution. We select one of
41		the outcomes. We use the label of the selected outcome to select a new
42		probability distribution from the Markov chain, and with this new probability
43		distribution we repeat this process for as long as we like.
	41	We regard a Markov chain as a set of labelled events. When we _walk_ a Markov chain,
	42	we start at some event. We select one of the outcomes, based on its probability.
	43	We use the label of the selected outcome to select a new event from the Markov chain,
	44	and from this new event we repeat this process for as long as we like.
44	45
45		In the context of text generation, each probability distribution is associated
46		with a word in a corpus (the label is the word). The outcomes of that probability
47		distribution are also associated with words -- the words that can possibly follow
	46	In the context of text generation, each event is associated
	47	with a word in a corpus (the label is the word). The outcomes of that event
	48	are also associated with words -- the words that can possibly follow
48	49	the first word in the corpus. Walking such a Markov chain produces streams of
49		text that resemble the corpus because the next word is as likely in the chain as
50		it is in the corpus.
	50	text that resemble the corpus because each successive word is as likely to
	51	follow the previous word as it is to follow that same previous word in the corpus.
51	52
52		Now, each of the outcomes in a probability distribution has a non-zero probability,
53		and the probabilities of all the outcomes in a probability
54		distribution must add up to 1. This is Kolmogorov's 2nd axiom.
55		However, in practice, we must engage in some special pleading, because the
56		application does not fit the mathematical model exactly.
	53	Now, each of the outcomes in an event has a non-zero probability.
	54	Also, the probabilities of all the outcomes in a probability
	55	distribution must add up to 1; this is Kolmogorov's 2nd axiom.
	56	However, to conform to practice, we must engage in some special pleading here,
	57	because the application does not fit the mathematical model exactly.
	58
	59	In particular, we have events with an empty set of outcomes. Their probabilities
	60	cannot add up to one, because there are no probabilities to add up!
	61
	62	This can happen because the events are, generally speaking, generated
	63	from a corpus by recording the frequency with which some word is followed by
	64	other words. If a word only appears once, at the very end of the corpus, it
	65	is not followed by any words. We will also see in the sequel that the result
	66	of an intersection operation can quite validly be an empty set.
	67
	68	So we need a way to deal with such a "probability distribution" even though it
	69	is not a probability distribution in Kolmogorov's sense. We will deal with it
	70	this way: when walking a Markov chain and coming across an event with zero
	71	possible outcomes, we will treat it as having an equal possibility of every
	72	possible outcome, i.e. of transitioning uniformly at random to any other event.
	73
	74	This is a tiny bit ad hoc, but allows us to continue to use these structures
	75	even though they do not conform 100% to the expectations of probability theory.
	76
	77	Intersection
	78	------------
	79
	80	In the Anne of Green Garbles work, "intersection" of a Markov chain and a
	81	finite automaton was such that a transition from some word _a_ to some word
	82	_b_ could only occur if that transition had a non-zero probability in the
	83	Markov chain and was present in the finite automaton.
	84
	85	We use the "intersection" concept in the same "conjunctive" way here. If
	86	_A_ and _B_ are two Markov chains, and _C_ is a Markov chain formed by
	87	taking the intersection of _A_ and _B_, then the transition from word _a_
	88	to word _b_ has the probability in _C_ that is the probability of it
	89	happening in _A_ and the probability of it happening in _B_.
	90
	91	The concept can be extended to entire events.
57	92
58	93	_To be continued_
59	94