git @ Cat's Eye Technologies The-Dossier / bd36055
Last few edits of first version of Machine State Combinators. Chris Pressey a month ago
1 changed file(s) with 29 addition(s) and 22 deletion(s). Raw diff Collapse all Expand all
00 Machine State Combinators
11 =========================
2
3 _DRAFT_
42
53 Alternate title: "Machine Instructions as (Defunctionalized) Combinators"
64
8684 Programs, as we've defined them, are also functions that take machine states to machine states.
8785 Therefore they also have this type. We will also call this type `P`.
8886
89 Some machine instructions are parameterized; in machine language nomenclature, we might say
90 they have an _immediate operand_. In this case, we model the instruction as a higher-order function
91 that takes the immediate value and yields a function of type `P`. Say the immediate value is
92 a byte; its type would be `Byte → P`. Note that this is equivalent to `Byte → S → S`, which
93 suggests that we can think of ourself as working with curried functions: we curry this function with a `Byte`
94 value to obtain the specific instruction, which is of type `P`.
87 Some machine instructions are parameterized; in machine language nomenclature,
88 we might say they have an _operand_. In this case, we model the instruction
89 as a higher-order function that takes the operand value and yields a function
90 of type `P`. Say the operand is an immediate operand, one byte in size; its
91 type would be `Byte → P`. Note that this is equivalent to `Byte → S → S`,
92 which suggests that we can think of ourself as working with
93 _curried functions_: we curry this function with a `Byte` value to obtain the
94 specific instruction, which is of type `P`.
9595
9696 Example: the immediate mode of 6502 instruction `LDA`:
9797
103103 ### A Model for Machine Locations
104104
105105 While an instruction such as `LDA` immediate refers only to a fixed register and some
106 immediate data, many other instructions, such as `STA`, require a machine address.
106 immediate data, many other instructions, such as `STA`, require a machine address
107 as their operand.
107108
108109 The idea of machine state combinators does not explicitly provide concepts for
109110 working with machine addresses. We might assume they are raw machine words
111112 maintaining labels for every possible machine location in use.
112113
113114 The latter is clearly more convenient for the programmer. It also meshes better
114 with our ideas about the type system you would probably want to use. That is, each
115 label would have a type,
115 with our ideas about the type system that one would probably like to use with this.
116 That is, each label would have a type,
116117 which would capture some properties of the memory location that it refers to.
117118 We can assume, as well, that a machine location refers to some part of the
118119 machine state, whether it be an address in RAM, a register or flag in the CPU,
120121 location. We will not concretely define the entire label system, but we will assume one is
121122 in use.
122123
123 One further thing we will note is that, due to its nature as the
124 One further thing we will note is that, due to its nature as a
124125 composition of pure functions, a program in our system does not naturally
125126 have any location (it's just a mathematical object) and,
126127 short of taking additional measures, it cannot be referred to by labels.
127128
128129 On one hand, this is advantageous; like Harvard architecture, it is not possible
129130 to overwrite the program with data or create self-modifying code with obscure semantics.
130
131131 On the other hand, it will sometimes be useful to treat machine addresses of
132132 (sub)programs as data, which can be passed around the program and called when needed.
133133 For now we will forego that, although a future revision of this article may discuss
158158 Later on we'll see how these control structure "macros" get converted to concrete machine
159159 instructions.
160160
161 For now we can observe that any control idiom that can be mechanically implemented with the
161 For now we can observe that any reasonable control idiom that can be mechanically implemented with the
162162 underlying machine instructions of our model, can be encapsulated in a combinator. For example,
163163 a basic "if zero flag is not set then ... else ..." combinator would take a subprogram
164164 for the "then" part, and a subprogram for the "else" part, and yield a new program:
289289 eval :: Prog -> S -> S
290290
291291 Given a program, and a state, it returns another state
292 (the state we end up with we run the program on the first state).
292 (the state we end up with when we run the program on the first state).
293293
294294 III. Type-checking
295295 -------------------
320320 So the type `S` is really more like `S[M]`, where
321321 `M` is the set of meaningful machine locations, where
322322 `M` is a subset of all the machine locations in `S`.
323 And the type `P` is really more like `S[M.in] -> S[M.out]`.
324323
325324 But wait, there's more.
326325
333332 * a set of locations **M.out** ("output", a subset of **C**) that it warrants to be meaningful after its
334333 execution has finished.
335334
336 All location that are not in **C**, after the execution, are guaranteed to not have
337 changed (something like a frame axioms). The locations in **C** have no such guarantee,
335 All locations that are not in **C**, after the execution, are guaranteed to not have
336 changed. (This invariant is something like a _frame axiom_ from planning systems like
337 the Situation Calculus.) The locations in **C** have no such guarantee,
338338 but their subset **M.out** are guaranteed to be meaningful. The set
339339 **C** - **M.out** can be referred to as a **T.out** ("trashed"). By the same
340340 token, **C** = **M.out** ∪ **T.out**.
341341
342342 If there are meaningful locations in **M.in** that are outside **C**,
343 they will remain meaningful in the result, as indeed, they remain
343 they will remain meaningful in the result; indeed, they remain
344344 unchanged in the result.
345345
346346 So the type `P` is really more like
362362 meaningful.
363363
364364 Where does **C** fit in? Well, our state type expresses meaningfulness;
365 we need to extend it to to express unmeaningfulness too. Something
366 like this:
365 but we need it to express _unmeaningfulness_ too. So we need to extend it again,
366 to something like this:
367367
368368 S[M.in, T.in] -> S[M.out, T.out]
369369
370 Noting that **C** = **M.out** ∪ **T.out**, it is derivable from this type expression.
370 We note that **C** is derivable from this type expression, as desired,
371 because **C** = **M.out** ∪ **T.out**.
371372
372373 The second (**T**) parameter of our type here is the set of machine locations
373374 in the machine state which we deem unmeaningful.
388389 `S[M.in, T.in] -> S[M.out, T.out]` to a state `S[M.x, T.x]` are:
389390 * **M.in** ⊆ **M.x**. (at least as meaningful)
390391 * **T.in** ⊆ **M.out** ∪ **T.out**. (no leakage of unmeaningfulness)
392
393 There may be some rough edges in the above exposition of the type system,
394 but in the main it matches what SixtyPical does in its (ham-fisted and
395 practical) way. We thus consider the approach of typed combinators, with
396 its formalizable and generalizable structure, to be a valuable one for
397 applying toward the kinds of problems that SixtyPical was designed to solve.