git @ Cat's Eye Technologies The-Dossier / master article / Machine-State-Combinators / README.md
master

Tree @master (Download .tar.gz)

README.md @masterview markup · raw · history · blame

Machine State Combinators

Alternate title: "Machine Instructions as (Defunctionalized) Combinators"

Background

This article is a product of explorations into Future Directions for SixtyPical. After a couple of failed attempts to generalize the MOS 6502-based idea in SixtyPical to arbitrary machine architectures, I "gave up" and concluded that it would make just as much sense to take your favourite machine architecture and write it as a deep embedding in Coq. Because then, you could prove any property you wanted about your low-level program.

But that approach turned out to have some complications, too (at least on the face of it). So I decided to step back and write about where the approach has gotten to, without reference to a particular language or theorem prover.

Overview

All a computer can ever really execute is its own native machine language.

We are interested here, not in designing a high-level language with good code-structuring facilities that compiles to a given machine language, but in designing a low-level language, hardly distinguishable from a given machine language, that nevertheless has good code-structuring facilities.

We observe a few things.

The CPU executes machine instructions. Executing a machine instruction takes the machine from one state to another state. Therefore, a machine instruction can be thought of as a pure function from machine states to machine states.

Furthermore, a sequence of straight-line instructions can be thought of as the composition of a sequence of pure functions from machine states to machine states. Such a composition is also a pure function from machine states to machine states.

These are the first two steps along a path that leads us to viewing the machine instructions of a particular architecture as machine state combinators — pure functions that, given appropriate inputs, yield pure functions from machine states to machine states, i.e. programs for that machine.

Now, this path is only the first leg of the journey. There are two further complexities.

One is that, once we have assembled our desired program from these combinators, we want to extract from them native machine code for that program, so that we can execute it directly. To that end it behooves us to represent the combinators in a form that is examinable and traversable: they will be defunctionalized. We'll expand on this later.

The other is that we would like our program to be, insofar as possible, correct by construction, and to that end, each of our combinators will be typed. This will prevent us from combining these combinators in incorrect ways. We should, however, be careful about what we mean by "incorrect". We can surely exclude combinations which are glossly nonsensical. But just as surely we cannot prevent every conceivable infelicitous situation.

So we must work out which properties we would like to see hold, and work out how to model them, so that we can confirm that they do hold, by a mere act of type-checking.

Note that type-checking is not a core part of this idea of machine state combinators. One could treat these combinators as untyped, or apply a very basic type system to them, and one would still be dealing with "machine state combinators" as we have presented the concept. But the correctness guarantee that they can support is a significant application of them, perhaps even their main purpose. So we will explore it as part of this write-up.

I. Combinators

A Model for Straight-line Code

Let S be the set of machine states. We will treat this as a type (without worrying too much about what the values of this type look like, right yet.)

X → Y is the type of functions that take X's to Y's.

Instructions, as we've defined them, are functions that take machine states to machine states, and therefore have type S → S.

Example: the 6502 instruction PLA takes any machine state with a byte n on the stack, to the machine state where the stack has one less item on it, and the accumulator contains n.

pla :: S -> S

Programs, as we've defined them, are also functions that take machine states to machine states. Therefore they also have this type. We will also call this type P.

Some machine instructions are parameterized; in machine language nomenclature, we might say they have an operand. In this case, we model the instruction as a higher-order function that takes the operand value and yields a function of type P. Say the operand is an immediate operand, one byte in size; its type would be Byte → P. Note that this is equivalent to Byte → S → S, which suggests that we can think of ourself as working with curried functions: we curry this function with a Byte value to obtain the specific instruction, which is of type P.

Example: the immediate mode of 6502 instruction LDA:

lda_imm :: Byte -> S -> S

Sequencing straight-line instructions is essentially function composition. As such, the instruction composition operation has type P → P → P.

A Model for Machine Locations

While an instruction such as LDA immediate refers only to a fixed register and some immediate data, many other instructions, such as STA, require a machine address as their operand.

The idea of machine state combinators does not explicitly provide concepts for working with machine addresses. We might assume they are raw machine words just as the machine would natively use, or we might assume there is a system for maintaining labels for every possible machine location in use.

The latter is clearly more convenient for the programmer. It also meshes better with our ideas about the type system that one would probably like to use with this. That is, each label would have a type, which would capture some properties of the memory location that it refers to. We can assume, as well, that a machine location refers to some part of the machine state, whether it be an address in RAM, a register or flag in the CPU, an I/O port, or some other location. We will not concretely define the entire label system, but we will assume one is in use.

One further thing we will note is that, due to its nature as a composition of pure functions, a program in our system does not naturally have any location (it's just a mathematical object) and, short of taking additional measures, it cannot be referred to by labels.

On one hand, this is advantageous; like Harvard architecture, it is not possible to overwrite the program with data or create self-modifying code with obscure semantics. On the other hand, it will sometimes be useful to treat machine addresses of (sub)programs as data, which can be passed around the program and called when needed. For now we will forego that, although a future revision of this article may discuss how it can be modelled, employing additional measures like we referred to above.

Back to straight-line instructions. Along with machine locations, they cover all straight-line code. But there is more to life than straight-line code, so we have come to the first turn in the path.

A Model for Control Structures

We have so far glossed over one piece of machine state: the program counter (or instruction pointer).

While we could say each instruction combinator maps the machine state to a new state where the instruction pointer has advanced, that doesn't match our intuition, explained in the previous section, that a program doesn't naturally reside at any machine location.

And because a program doesn't have a machine location, we must put some consideration into how could we make a combinator for a "jump" or "branch" instruction.

Our solution is to think of these combinators as mapping to something like assembler macros, instead of to machine instructions themselves. (This aligns with our goal of providing good code-structuring facilities.) We do give up being able to write ultra-optimized code with obscure applications of jumping and branching instructions. But if we're more concerned with being able to reason about our programs, that is actually a good thing.

Later on we'll see how these control structure "macros" get converted to concrete machine instructions.

For now we can observe that any reasonable control idiom that can be mechanically implemented with the underlying machine instructions of our model, can be encapsulated in a combinator. For example, a basic "if zero flag is not set then ... else ..." combinator would take a subprogram for the "then" part, and a subprogram for the "else" part, and yield a new program:

if_nz :: P -> P -> P

A similarly basic "while zero flag is not set, loop" combinator would take a subprogram for the loop body. We would assume that an instruction somewhere in the body of the loop may or may not cause the zero flag to be set, and when the zero flag remains set after the execution of the body, the loop terminates:

while_nz :: P -> P

If there are multiple processor conditions, these could be modelled as an enumeration value, and the above loop combinators generalized to curry in such a value first:

data Condition = Zero
               | NonZero
               | Carry
               | NonCarry
               | ...
if :: Condition -> P -> P -> P
while :: Condition -> P -> P

A Model for Subroutines

When a subroutine is called, the machine is in one state, and when it returns, the machine is in another (probably different) state. A subroutine, therefore, is a function that maps machine states to machine states, just like any other instruction or program.

We enumerate some implications of this way of conceptualizing subroutines.

First, while the machine may have a native "return from subroutine" instruction that returns control to the caller, it will (much like branch instructions, noted above) not be explicitly exposed in our model; instead there will be a subroutine combinator which takes a function of type P and packages it up as a callable subroutine which captures this behaviour.

Second, we would often like to label a subroutine, and refer to it by that label when calling it. That is permitted and expected here, but outside of our scope, and not material to our exposition, so we will not go into it.

Third, we often think of a subroutine as taking arguments (or parameters) as its input, and sometimes producing a return value as its output as well.

But when you get right down to it, these values are just parts of the machine state. So it is useful to generalize this, and say that all subroutines expect the machine to be in a certain state when they start executing, and make some warrants about the state the machine will be in when they finish executing. These can in fact be seen as preconditions and postconditions that involve particular machine locations.

So for example, a subroutine might expect an integer to be present in the accumulator when it is called, and it would treat this integer as its input; and it might leave its output in another register, say X. Additionally, it might use the accumulator as a work area for its own purposes, but the contents of the accumulator would not be intended to be meaningful to the caller of this subroutine.

This means (among other things) that there is no "caller-callee convention" in our system; if a program wishes to preserve any machine state across a call, it must include instructions to save it in some machine location that the subroutine does guarantee will be preserved.

We would expect these expectations and warrants to be encoded in the type annotations in use; in particular, in both the type of machine states S, and the types of the subroutines (or rather, programs), P. This implies that these types are not simply sets of values. We will explore this in more detail in section III.

II. Defunctionalization

As mentioned above, once we have composed a program (of type P) from all these combinators available to us, we will want to generate a machine language executable from it.

Since the language is extremely low-level, this looks a lot more like "assembling" the executable code than "compiling" to it.

But in any case the main obstacle to this is the fact that, traditionally in functional programming, function values are opaque. If we have a value of type P, that is to say S → S, all we can do with it is apply it; we can't look inside it.

Our solution to this is to defunctionalize all the basic combinators.

(Here I should refer the reader to the substantial body of work on defunctionalization. And perhaps someday I will. But for now, just remember these names: Reynolds, Danvy, and Gibbons.)

In short, what this means is, instead of being functions, the instruction composition operator, and all the other combinators like if and while, are essentially constructors of an algebraic data type, which we can pattern-match on and destruct.

This is also why we haven't actually shown any implementation of the example functions we've given, so far. We're not writing an interpreter. We have little interest in applying these functions. We do want to know what they mean, but that is mainly so we can confirm they match up with what the actual machine instructions mean, so that we can reason about the programs that we're building with them. But their types and their names are, for now, more important, because that's all we need to start building programs and generating machine code.

So the type of (defunctionalized) machine state combinators, representing instructions, programs, and subroutines, might look like:

data Prog = PLA
          | LDAimm Byte
          | ...
          | Seq Prog Prog
          | ...
          | If Condition Prog Prog
          | While Condition Prog
          | ...
          | Subroutine Prog
          | ...

(Here we use the longer name Prog to distinguish it as the defunctionalized version of P.)

Looks strangely like an AST, doesn't it? But that innocent façade hides that these are all thought of and treated as combinators, that is, as higher-order functions intended to be combined to produce, as an end product, a pure function that maps machine states to machine states — that is, a program.

Our evaluator — which, again, we are not interested in actually constructing, but are interested in as a mathematical object which defines the semantics of our language — looks like so:

eval :: Prog -> S -> S

Given a program, and a state, it returns another state (the state we end up with when we run the program on the first state).

III. Type-checking

In this section we aim to explain the approach to typing that our idea of machine state combinators was in fact developed to support.

The approach was initially worked out in a ham-fisted and largely ad-hoc fashion in SixtyPical. SixtyPical does not treat programs as typed combinators. The major contribution of machine state combinators to this approach is that considering programs as typed combinators greatly regularizes the problem.

We make some refinements on our statements so far about types.

A machine state isn't just a machine state, it is an aggregation of machine locations, and its type is a product (in some sense) of the types of those locations.

In any given machine state, there is some subset of the machine locations that are considered meaningful.

When the machine first starts up, the set of meaningful machine locations is empty. Instructions such as LDA #$00 could be executed after startup to produce a machine state with some meaningful machine locations.

So the type S is really more like S[M], where M is the set of meaningful machine locations, where M is a subset of all the machine locations in S.

But wait, there's more.

Whenever we have a function that takes machine states to machine states, whether it is a subroutine or a single straight-line instruction or a subprogram, associated with it are:

  • a set of locations M.in ("input") that the function expects to be meaningful when it starts executing,
  • a set of locations C ("changed") that it does not guarantee to preserve during its execution, and
  • a set of locations M.out ("output", a subset of C) that it warrants to be meaningful after its execution has finished.

All locations that are not in C, after the execution, are guaranteed to not have changed. (This invariant is something like a frame axiom from planning systems like the Situation Calculus.) The locations in C have no such guarantee, but their subset M.out are guaranteed to be meaningful. The set C - M.out can be referred to as a T.out ("trashed"). By the same token, C = M.outT.out.

If there are meaningful locations in M.in that are outside C, they will remain meaningful in the result; indeed, they remain unchanged in the result.

So the type P is really more like

S[M.in] -> S[M.out]

where M.in is the set of machine locations which we require to be meaningful in the input state, and M.out to the set of machine locations which we warrant to be meaningful in the output state.

Machine state types have a notion of compatibility here. If f is a function of type S[M.in] -> S[M.out], then f can be applied to any machine state that is at least as meaningful as S[I] - that is, any machine state S[X] where M.inX. We might informally say f will "throw away" any extra meaningfulness in X.

If we try to apply f to S[Y] where YM.in, that is a type error, because S[Y] does not have the machine locations in its set of meaningful locations, that f requires to be meaningful.

Where does C fit in? Well, our state type expresses meaningfulness; but we need it to express unmeaningfulness too. So we need to extend it again, to something like this:

S[M.in, T.in] -> S[M.out, T.out]

We note that C is derivable from this type expression, as desired, because C = M.outT.out.

The second (T) parameter of our type here is the set of machine locations in the machine state which we deem unmeaningful.

Any location which is in T.in, i.e. unmeaningful in the input, must either appear in T.out, i.e. be unmeaningful in the output, or in M.out, i.e. it was set by the function as a meaningful output.

So to recap:

  • our S type is actually S[M,U], parameterized by two sets of machine locations: the machine locations that are known to be meaningful (M) and the machine locations that are known to be unmeaningful (U) in that machine state. All other machine locations may or may not be meaningful (it is not known).
  • The compatibility constraints on applying the function S[M.in, T.in] -> S[M.out, T.out] to a state S[M.x, T.x] are:
    • M.inM.x. (at least as meaningful)
    • T.inM.outT.out. (no leakage of unmeaningfulness)

There may be some rough edges in the above exposition of the type system, but in the main it matches what SixtyPical does in its (ham-fisted and practical) way. We thus consider the approach of typed combinators, with its formalizable and generalizable structure, to be a valuable one for applying toward the kinds of problems that SixtyPical was designed to solve.