diff --git a/doc/emmental.html b/doc/emmental.html new file mode 100755 index 0000000..2061417 --- /dev/null +++ b/doc/emmental.html @@ -0,0 +1,525 @@ +<html><head><title>The Emmental Programming Language</title></head> +<body> + +<h1>The Emmental Programming Language</h1> + +<p>November 2007, Chris Pressey, Cat's Eye Technologies</p> + +<h2>Introduction</h2> + +<p>Emmental is a self-modifying programming language. That is not to say +that is a language in which programs are self-modifying; rather, it is +the language itself, as defined by a meta-circular interpreter, that can be +modified during the course of a running program. Indeed, this is how +Emmental, without conventional conditional and repetition/recursion constructs, +achieves Turing-completeness.</p> + +<h2>Meta-circular Interpreters</h2> + +<p>One way to attempt to define a language is by giving what's called +a <em>meta-circular interpreter</em> (often shortened to "MCI" in this document.) +This is an interpreter for some language which is written in that same language +(or at least, a language which is very close to it.)</p> + +<p>Meta-circular interpreters were a popular way to the describe the +semantics of programming languages, especially +LISP-like languages, and especially before the advent of denotational +semantics. The term "meta-circular" was apparently coined by John C. Reynolds +in his paper "Definitional Interpreters for Higher-Order Programming +Languages" (1972 Proceedings ACM National Conference.)</p> + +<p>Of course, in the real world, MCI's are not often used. +They certainly <em>can</em> be used: if you have a working Scheme interpreter +that came with your computer system, there +is nothing stopping you from writing another Scheme interpreter in Scheme, +and running your programs on your interpreter (which is itself running on your +system's interpreter.) However, this is quite a bit +less efficient due to the duplication of effort. A somewhat more realistic +case might be if your system came with, say, a Scheme compiler. You might then feed your +Scheme interpreter (written in Scheme) through that to make a native Scheme +interpreter, and use that to interpret your programs. +(In this setup, the interpreter is usually described as "self-hosting" +rather than "meta-circular".)</p> + +<p>But, as should be obvious, you already need an implementation of Scheme +for your Scheme interpreter written in Scheme to be of much practical +use to you. If your meta-circular interpreter is all you have, you won't +be able to use it to run or understand Scheme programs. Because the MCI is +defined in terms of itself, you'll need some other source of "understanding +how it works" to make it complete. This understanding might come +from an implementation in some other programming language, or a specification +in some formal language, or a description in some natural language, or simply +from intuition — but it has to come from somewhere.</p> + +<p>Assuming that we do have that external source of understanding, the +meta-circular interpreter can come in quite handy in codifying the semantics +of the language. And, in Emmental's case, <em>altering</em> those semantics: +Emmental's MCI supports operations which instruct Emmental's MCI to modify +its behaviour.</p> + +<h2>Interpreter Structure</h2> + +<p>To describe the structure of Emmental's MCI, we first examine the +general structure of interpreters. If you've ever written a virtual machine +executor in, say, C, you've noticed that it has the general form</p> + +<pre> + pc = start; + while (!done) { + switch (instruction[pc]) { + case INSTR_ONE: + /* implement semantics of INSTR_ONE */ + pc += advance(INSTR_ONE); + break; + case INSTR_TWO: + /* implement semantics of INSTR_TWO */ + pc += advance(INSTR_TWO); + break; + /* ... */ + case INSTR_HALT: + done = 1; + break; + default: + perror("Invalid opcode"); + } + } +</pre> + +<p>Note that <code>advance()</code> is some function that computes how far the +program counter is advanced on that instruction. This value is typically ++1 for most instructions, but more or less than 1 (and dependent on the +state of the program) for a handful of "branch" instructions. +Note also that <code>advance()</code> would not typically be implemented in C +as a function; I'm just showing it like this to emphasize the +regular structure.</p> + +<p>From this we infer that the basic structure of an interpreter is +a <em>dictionary</em> or <em>map</em> that associates program symbols with operations. +There is some extra housekeeping like the fetch-execute cycle that surrounds +this dictionary, but this can (hopefully) be handled mostly automatically, +freeing us to concentrate on <em>symbols</em> and <em>operations</em>.</p> + +<p>The symbols could be taken from any finite alphabet, but in Emmental, +to keep things relatively simple, we'll just use the ASCII character set. +(Actually, to be precise, this is the full 8-bit ASCII character set. +Non-printable control characters are allowed, as are characters between +128 and 255, and each has a distinct meaning. But their representations +are not defined.)</p> + +<p>The operations can be thought of, abstractly, as functions which +transform program states. Or they can be thought of, concretely, as +segments of code — mini-programs which implement these functions. +In the case of a meta-circular interpreter, these mini-programs would be written +<em>in the language being interpreted</em>.</p> + +<p>To extend this idea to a <em>self-modifying</em> meta-circular interpreter, +the operations can be thought of as functions which transform +both program states <em>and</em> interpreter definitions. +(Alternatively, the interpreter definition could be thought of as +part of the program state, although I feel that's a bit gorier a way +to look at it, and I prefer the other view, at least for Emmental.)</p> + +<p>In Emmental, most operations leave the interpreter definition unchanged. +However, there is one operation which alters the interpreter +definition, and it is this altered definition that is used to interpret +the remainder of the program.</p> + +<h2>Emmental Semantics (in Emmental)</h2> + +<p>Emmental is essentially a stack-based language. (There's also a queue, +but it's not as central.) All operations implicitly +get data from, and implicitly deposit results back onto, a single +stack. For orthogonality's sake, this stack may contain only ASCII symbols. +(And note that trying to pop an empty stack, or dequeue an empty queue, +is an error that aborts the program.)</p> + +<p>Note that because we've established that an interpreter (at least, +insofar as Emmental ever needs to know) is simply a map that takes +symbols to operations, and that operations in Emmental are defined +(meta-circularly) as Emmental programs, we can use the following notation +to describe interpreters:</p> + +<pre> +% → XYZ+*! +& → 'ap'ag'ag +</pre> + +<p>That is, the symbol <code>%</code>, when encountered in an Emmental +program, indicates an operation that +is defined by the Emmental program <code>XYZ+*!</code>, and so forth.</p> + +<p>When a main Emmental program begins execution for the first time, +it starts with what's called the <em>initial Emmental interpreter</em>. +(This fact, of course, doesn't apply to any further point of execution inside +an Emmental program, or execution of operations defined in +Emmental's MCI, since these would be considered subprograms of a sort. +These cases use whichever interpreter happens to be in force in that point in time.)</p> + +<p>The inital Emmental interpreter is defined as follows:</p> + +<pre> +a → a +b → b +c → c +... +</pre> + +<p>That is, for every symbol <var>x</var> in the ASCII set, +<var>x</var> <code>→</code> <var>x</var>.</p> + +<p>Doesn't tell us a lot about Emmental's semantics, does it? No. +Nothing at all, really. But remember what I said about needing an external +source of understanding, in order to actually get full value out of an MCI. +And remember the purpose of Emmental's MCI: it is +not there so much to help us understand Emmental, but to allow us to +<em>change</em> Emmental, from inside an Emmental program.</p> + +<p>And, for all that our description of the initial Emmental interpreter +is unhelpfully tautological, it is not incorrect: the semantics of +<code>a</code> can in fact be thought of as being defined by an Emmental +program that consists of only one instruction, <code>a</code>. This +happy state of affairs comes about because Emmental is stack-based; +the "signature" (the requirements for the "before" and "after" stacks) +of the symbol <code>a</code> is the same as the signature of the program +containing the single symbol <code>a</code>. No extra syntax to specify +arity and the like is necessary.</p> + +<p>Above all, don't panic: we <em>will</em> describe what symbols like +<code>a</code> actually mean in Emmental, we'll just need to do it in something +besides Emmental. In fact, let's do that right now.</p> + +<h2>Emmental Semantics (in English)</h2> + +<h3>Primitive Arithmetic</h3> + +<p><code>#</code> pushes the symbol NUL (ASCII 0) onto the stack.</p> + +<p>The symbols <code>0</code>, <code>1</code>, ... <code>9</code> +pop a symbol from the stack, multiply its ASCII value by 10 modulo 256, add +the value 0, 1, ... 9 (respectively) to that value modulo 256, and +push the resulting symbol back onto the stack.</p> + +<p>The upshot of these 11 operations is that you can push arbitrary +symbols onto the stack by spelling out their ASCII values in decimal. +For example, <code>#64</code> pushes a <code>@</code> onto the stack.</p> + +<p><code>+</code> pops two symbols off the stack, adds together their +ASCII values modulo 256, and pushes the symbol with the resultant ASCII value +back onto the stack.</p> + +<p><code>-</code> pops two symbols off the stack, subtracts the ASCII value +of the first popped from the ASCII value of the second popped modulo 256, +and pushes the symbol with the resultant ASCII value back onto the stack.</p> + +<p><code>~</code> ("log") pops a symbol from the stack, computes the discrete +base-2 logarithm of the ASCII value of that symbol, +and pushes the symbol with the resultant ASCII value back onto the stack. +The discrete base-2 logarithm of a number is the floor or integer part +of the base-2 logarithm of that number. Alternately, it is the number of +the highest bit position (starting with the LSB = bit position 0) +with a bit set when the number is viewed as binary. +Because the base-2 logarithm of the number 0 itself is undefined, +the number 0 is treated as 256 for this operation; its discrete base-2 +logarithm is 8.</p> + +<h3>Stack and Queue Manipulation</h3> + +<p><code>^</code> ("enqueue a copy") pops a symbol off the stack, makes a copy of it, pushes +it back onto the stack, and enqueues the copy onto the queue.</p> + +<p><code>v</code> ("dequeue") dequeues a symbol from the queue and pushes it onto the +stack.</p> + +<p>Using these operations in combination, one can form "discard", "duplicate", +"swap", and other more advanced stack manipulation operations. For example, +assuming an empty queue and more than two elements on the stack, "swap" can +be accomplished with the code <code>^v^-+^^v^v^v-+^v-+^v-+vv</code>.</p> + +<p>Despite this fact, the operation <code>:</code> duplicates the top value +of the stack. (Emmental is not an absolutely minimal language; note, for +instance, that it has all ten decimal digits as operations when these +could surely have been defined in terms of only one or two operations. +The reasons for a seperate <code>:</code> operation are given below in the +section on Computational Class.)</p> + +<h3>I/O</h3> + +<p><code>.</code> pops a symbol off the stack and sends it to the +standard output as an ASCII symbol.</p> + +<p><code>,</code> waits for an ASCII symbol to arrive on standard input, +and pushes it onto the stack.</p> + +<h3>Interpreter Modification and Reflection</h3> + +<p>First let's define what it means to <em>pop a string</em> off the stack. +Symbols are popped off the stack until a <code>;</code> symbol is +found on the stack. The symbols popped off are considered a string +in the reverse order they were popped; i.e. the last symbol popped +is the first symbol of the string. The <code>;</code> symbol is +popped off the stack, but is not made part of the string; it is simply discarded.</p> + +<p>Since an Emmental program is a string, popping a program is the same +as popping a string, just that the string is interpreted as a program.</p> + +<p><code>!</code> (sometimes called "supplant") +pops a symbol, which we call <var>s</var>, off the stack. +Then it pops a program <var>t</var>. +It then inserts the association <var>s</var> <code>→</code> <var>t</var> into the +interpreter definition. This overwrites whatever mapping of <var>s</var> might have +been in the interpreter definition previously. This new interpreter definition +is used for all subsequent execution (until it is changed again, of course.)</p> + +<p>Note that <code>!</code> does <em>early binding</em>. That is, +the meaning of each symbol in this program <var>t</var> is +the meaning of that symbol <em>at the time <code>!</code> is executed</em>. +If some subsequent <code>!</code> operation later changes the meaning of one +of the symbols that occurs in <var>t</var>, the meaning of <var>t</var> +is not changed. The semantics of <var>t</var> are "captured" or "frozen". +This implies, among other things, that supplanting some +symbol <var>z</var> with itself (a program consisting only of the symbol <var>z</var>) +is a no-op, because <var>z</var>'s meaning, at the time that <code>!</code> +is executed, is invariably <var>z</var>.</p> + +<p><code>?</code> (sometimes called "eval") pops a symbol, which we call <var>s</var>, off the stack. +It then executes that symbol (interprets it as an operation) +with the interpreter currently in effect.</p> + +<p>Note that <code>?</code> does <em>late binding</em>. That is, +in contrast with <code>!</code>, <code>?</code> never "freezes" the semantic +definition of the thing that it is executing. This is true even when +<code>?</code> occurs in a operation redefinition (i.e. the program +that supplanted some symbol's semantics when an <code>!</code> was executed.) +This implies, among other things, that supplanting some symbol <var>z</var> +with the program that consists of instructions that push the ASCII value of +<var>z</var> onto the stack, followed by a <code>?</code> instruction, +creates a <em>cyclic meaning</em> for <var>z</var>. +This is because the <var>z</var> that will be executed by the <code>?</code> +will always be the present <var>z</var>, that is, the <var>z</var> +that is executing the <code>?</code>.</p> + +<p>For convenience's sake, <code>;</code> pushes the symbol <code>;</code> +onto the stack.</p> + +<p>All other symbols are no-ops.</p> + +<h2>Computational Class</h2> + +<p>I believe Emmental is Turing-complete with only the operations that +have been given so far, but I haven't proved it yet. +All the elements are there, and although some of them are somewhat "cramped", +they look viable.</p> + +<p>If you want to try thinking about how you could write real programs +(like a universal Turing-machine simulator) in Emmental, you might want +to skip this section, since it contains "spoilers".</p> + +<p>Repetition can be accomplished by assigning a symbol a cyclic semantics, +by use of a <code>?</code> within a <code>!</code>. For example, we can +redefine the semantics of <code>0</code> to be <code>#48?</code>. This is +simply a program that pushes the symbol <code>0</code> onto the stack and +executes it with the current interpreter, and, since <code>0</code> has been +redefined to mean <code>#48?</code> in the current interpreter, this will +loop forever. The entire program to do this to <code>0</code> and run the +infinite loop is:</p> + +<pre> +;#35#52#56#63#48!0 +</pre> + +<p>This technique can also be used to "jump" from one definition to +another, by using <code>?</code> to execute some <em>other</em> symbol +at the end of a definition (that is, some symbol other than the symbol +being defined.)</p> + +<p>Conditionals are a little more restrictive. The trick to them is, +strangely, the discrete log operator <code>~</code> in combination +with the eval operator <code>?</code>. Since <code>~</code> maps +all symbols into a set of nine symbols, and <code>?</code> executes +the symbol on the stack, <code>~?</code> will execute one of the +symbols from ASCII 0 (NUL) to ASCII 8 (BS). We can then, for instance, +define NUL to do one thing, define SOH through BEL to do the same +as NUL, and define BS to do some other thing; this essentially +distinguishes between 0 (which executes BS) and every other value +(which executes NUL). Further, we can use this in conjunction +with <code>-</code> to compare two values for equality. So, for +example, a program which inputs a character, and outputs Y if +the character is M and N otherwise, would be:</p> + +<pre> +#59#35#55#56#46#!;##1!;##2!;##3!;##4!;##5!;##6!;##7!#59#35#56#57#46#8!,#77-~? +</pre> + +<p>In case NUL through BS are in use for some reason, we can always +add 9 to the result of the log (<code>~#9+?</code>) +to map the answer onto HT through DC1. Or, of course, any of +a great number of other arithmetical mappings of our choosing. +The most severe constraint is that there be 9 available symbols +to act as "destinations" for our "branch". Even if we never +overwrite one "branch" with another (and we can do that in Emmental!) +and even if we allocate one extra symbol to be the "launch point" +of the branch, we still have room for 25 branches in the ASCII +character set.</p> + +<p>So these parts look good. If there's a problem, it's +with the queue. Specifically, the problem seems to be +the need to know the present size of the queue in order to do stack housework +like "duplicate" and the subsequent need for "duplicate" to achieve +"discard." (Duplicate can be defined as <code>^v</code>, but this only +works when the queue is empty. Discard can be defined as duplicate plus +<code>-+</code>, but this only works when there are other elements below +the element being discarded. [This last point is not generally a problem +since we can push arbitrary values onto the stack before any +program.])</p> + +<p>However, if it turns out that we need "duplicate" or "discard" in order +to write routines that can handle a variable-sized queue — and that +strikes me as likely — then it looks like we have a severe problem.</p> + +<p>Here's one way I could try to deal with it. I could say that the queue +is <em>local</em> to the operation being defined (or the main program.) +Then you could define <code>:</code> to be <code>^v</code>, and inside +<code>:</code>'s definition, the queue would always initially be empty, +so the definition would work.</p> + +<p>But... we need the queue to store our global data. For example, if +we are going to simulate a Turing machine, we'd need to use the queue +to store the tape (perhaps "doubled up", with one symbol of each +pair telling us "the next symbol is a simulated tape symbol" or +"the next symbol is some housekeeping value.") We can't store the +tape on just one stack. And, once you are looping in Emmental, you've +left the "main program" forever; you're jumping from definition to +definition, and each has their own queue. At best, you'd need to +"dump" the queue onto the stack each time you switched definitions, +and even then you'd need a loop to do that, and to loop you need to +switch definitions. It's a royal mess.</p> + +<p>So here's how I will deal with it. I will add a primitive duplicate +operation, <code>:</code>, to Emmental. Proving that Emmental is Turing-complete +is still, then, a challenge, although a doable-seeming challenge. +I will then propose a more formidable challenge: prove that +the language formed by removing the <code>:</code> operation from +Emmental is Turing-complete. (Equivalently, prove that the set of +Emmental programs that begin with <code>;#0#58!</code> is Turing-complete. +The nice thing about Emmental is that you can always shoot yourself in +the foot — until you erase your pistol, that is.)</p> + +<p>And if you <em>really</em> like a challenge, try proving that Emmental +without <code>~</code> is Turing-complete. I don't think that it is, +although it's possible for it to compute parity, at least (input a +symbol, output E if its ASCII value is even, and O if it's odd. +To accomplish this, multiply the input's ASCII value by 128 by adding 127 +copies of it to it; this is modulo 256, so the only results can be +0 or 128. Define those operations to print out E and O respectively. +But that's as far as I've gotten.)</p> + +<h2>Discussion</h2> + +<h3>Design Decisions</h3> + +<p>I would've liked to have given Emmental a <code>'</code> or <code>"</code> +instruction similar to Funge's "quote" and "quote-mode" instructions; +instructions that treat one or more of the following symbols in the program +literally, pushing them, as symbols, onto the stack, instead of executing them. +However, such instructions are somewhat problematic, both theoretically and +(for the approach I took implementing Emmental) practically. There are +two ways of thinking about the problems that arise.</p> + +<p>One is that the function which implements <code>'</code> is given +access to the program text itself, and possibly the position within the program, +and it uses these to extract the "immediate mode" symbol past the <code>'</code>. This information +could be available because these pieces of information are considered extra arguments +to the function, or because they are (gorily) considered part of the overall +program state. Either way, this operation is given a lot of information to work with, +and for consistency (since we want to be all nice and neat and say that all operations +have the same signature so that our dictionary has a nice uniform type,) <em>all</em> operations +have access to this information. This is almost too much information; that is, it +is so much that operations don't really <em>need</em> the dictionary. We could just say +there is <em>one</em> operation, defined by a function, and that function is given the +current symbol and has to decide what it means through whatever means it likes.</p> + +<p>This approach is very powerful, of course, but it's just not the style that +Emmental embodies. (In fact, the idea to view interpreters as dictionaries +was one of the foundational design choices for Emmental, to the point where I +started constructing a "little theory of interpreters as maps." It really +wasn't exploited as much as I think it could have been. If an interpreter is +a map of symbols to strings of symbols, it's much more tractable than an +opaque function would be; you can define all sorts of operations on them, +for example concatenating two interpreters (for all symbols <var>s</var> +in interpreter <var>a</var> and interpreter <var>b</var>, +<var>c</var>[<var>s</var>] <code>→</code> <var>a</var>[<var>s</var>]<var>b</var>[<var>s</var>] — +that sort of thing,) computing union or intersection of interpreters, +Cartesian product, etc.)</p> + +<p>The other way of looking at it is to say that there are in fact +<em>multiple</em> meta-circular interpreters available inside Emmental, and symbols +like <code>'</code> switch temporarily to an alternate MCI. This alternate MCI +interprets every symbol as "push this symbol", then reinstates the previous MCI. +I like this explication better than the one above — MCIs begin +to look a bit like continuations! — but to do it justice would take some +work. I envision a language where the program has fine control +over which MCI is in effect, possibly by keeping a map from symbols to MCIs, +or maybe even being able to push MCIs onto the stack. +This is a wee bit much for Emmental too though, and +perhaps I'll explore these possibilities in a future language.</p> + +<h3>Turing-completeness</h3> + +<p>You can make the argument that Emmental's way of being Turing-complete +is really nothing new: when you redefine some symbol, you're really just +defining a new function, and when you use <code>?</code> to execute that +symbol from within its own definition, you're just making a recursive +function call.</p> + +<p>Well, yes, you can make that argument. But it has to do with how you think +about "what is a language", I think. Does a Pascal program fragment which +defines a procedure called <code>PrintFibonacci</code> represent another +programming language, one different from Pascal? You could certainly +say that it does — it's +the language Pascal where the token <code>PrintFibonacci</code> has +some meaning that it doesn't have in Pascal.</p> + +<p>In that view, any language where you can define procedures, or +functions, or standard libraries, or the like, is an extensible language. +But even languages where you <em>can't</em> define new procedures or +functions is arguably an extensible language. Take some initial Brainfuck +program fragment, for instance. After it executes, it leaves the Brainfuck +tape and while-stack in some state that depends on the input. +Any Brainfuck fragment that executes after that, will execute in that +environment, and that environment is arguably a version of the language +Brainfuck, suitably extended.</p> + +<p>You don't normally think of it that way, I bet, but you +<em>could</em> — and you would need to, to some degree, +to claim that Emmental is "just" defining new functions. +The reason you don't typically look at languages like this +(unless you are very strange) is because it's much more useful to +divide the world into "languages" and "programs." And Emmental +<em>does</em> make this division, it just makes it in a slightly +different place than usual.</p> + +<p>As far as I'm concerned, if I describe what Emmental does as +modifying the Emmental language via its MCI, and what Emmental actually does is +consistent with the idea of modifying the Emmental language via its MCI, then +what Emmental effectively does is modify the Emmental language via its +MCI. And if it needs to do this in a certain way in order to simulate +a universal Turing machine, then that difference (however slight) +sets it apart from systems where this simulation needs to be done by defining +recursive functions.</p> + +<h2>Implementation</h2> + +<p><code>emmental.hs</code> is a reference interpreter for Emmental +written in Haskell. Run the function <code>emmental</code> on a +string; you can also run <code>debug</code> on a string to view +the state of the program (stack & queue) during execution. +(Note that <code>debug</code> is <em>not</em> able to show +program states that occur internal to an operation.)</p> + +<p>Happy interpreter-redefining! +<br>Chris Pressey +<br>Chicago, IL +<br>November 11, 2007</p> + +</body> +</html> diff --git a/src/emmental.hs b/src/emmental.hs new file mode 100755 index 0000000..25537ed --- /dev/null +++ b/src/emmental.hs @@ -0,0 +1,447 @@ +-- +-- emmental.hs +-- Interpreter for the Emmental Programming Language +-- Chris Pressey, Cat's Eye Technologies +-- +-- $Id: emmental.hs 5 2007-11-12 04:36:43Z catseye $ +-- + +-- +-- Copyright (c)2007 Cat's Eye Technologies. All rights reserved. +-- +-- Redistribution and use in source and binary forms, with or without +-- modification, are permitted provided that the following conditions +-- are met: +-- +-- 1. Redistributions of source code must retain the above copyright +-- notices, this list of conditions and the following disclaimer. +-- 2. Redistributions in binary form must reproduce the above copyright +-- notices, this list of conditions, and the following disclaimer in +-- the documentation and/or other materials provided with the +-- distribution. +-- 3. Neither the names of the copyright holders nor the names of their +-- contributors may be used to endorse or promote products derived +-- from this software without specific prior written permission. +-- +-- THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +-- ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +-- LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS +-- FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE +-- COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, +-- INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, +-- BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; +-- LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +-- CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +-- LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN +-- ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +-- POSSIBILITY OF SUCH DAMAGE. +-- + + +import qualified Data.Map as Map +import qualified Data.Char as Char + +----------------------------------------------------------------------- +-- ============================ Symbols ============================ -- +----------------------------------------------------------------------- + +type Symbol = Char + + +----------------------------------------------------------------------- +-- ======================== Program States ========================= -- +----------------------------------------------------------------------- + +data State = State [Symbol] [Symbol] + deriving (Ord, Eq, Show) + +pop (State (head:tail) queue) = (head, State tail queue) +push (State list queue) sym = (State (sym:list) queue) + +popString (State (';':tail) queue) = ([], State tail queue) +popString (State (head:tail) queue) = + let + (string, state') = popString (State tail queue) + in + (string ++ [head], state') + +enqueue (State stack queue) symbol = State stack (symbol:queue) +dequeue (State stack queue) = + let + symbol = last queue + queue' = init queue + in + (symbol, State stack queue') + + +----------------------------------------------------------------------- +-- ========================= Interpreters ========================== -- +----------------------------------------------------------------------- + +data Interpreter = Interp (Map.Map Symbol Operation) + +fetch (Interp map) sym = Map.findWithDefault (fitRegOp opNop) sym map +supplant (Interp map) sym op = (Interp (Map.insert sym op map)) + + +----------------------------------------------------------------------- +-- ========================== Operations =========================== -- +----------------------------------------------------------------------- + +type Operation = State -> Interpreter -> IO (State, Interpreter) + +composeOps :: Operation -> Operation -> Operation + +composeOps op1 op2 = f where + f state interpreter = do + (state', interpreter') <- op1 state interpreter + op2 state' interpreter' + +createOp :: Interpreter -> [Symbol] -> Operation + +createOp interpreter [] = + (fitRegOp opNop) +createOp interpreter (head:tail) = + composeOps (fetch interpreter head) (createOp interpreter tail) + +-- +-- It's useful for us to express a lot of our operators as non-monadic +-- functions that don't affect the interpreter. This is a little "adapter" +-- function that lets us create monadic functions with the right signature +-- from them. +-- + +fitRegOp :: (State -> State) -> Operation + +fitRegOp regop = f where + f state interpreter = + let + state' = regop state + in + do return (state', interpreter) + + +------------------------------------------------------------ +--------------- The operations themselves. ----------------- +------------------------------------------------------------ + +-- +-- Redefine the meaning of the symbol on the stack with +-- a mini-program also popped off the stack. +-- + +opSupplant state interpreter = + let + (opSym, state') = pop state + (newOpDefn, state'') = popString state' + newOp = createOp interpreter newOpDefn + in + do return (state'', supplant interpreter opSym newOp) + +-- +-- Execute the symbol on the stack with the current interpreter. +-- + +opEval state interpreter = + let + (opSym, state') = pop state + newOp = createOp interpreter [opSym] + in + newOp state' interpreter + +-- +-- I/O. +-- + +opInput state interpreter = do + symbol <- getChar + do return (push state symbol, interpreter) + +opOutput state interpreter = + let + (symbol, state') = pop state + in do + putChar symbol + return (state', interpreter) + +-- +-- Primitive arithmetic. +-- + +opAdd state = + let + (symA, state') = pop state + (symB, state'') = pop state' + in + push state'' (Char.chr (((Char.ord symB) + (Char.ord symA)) `mod` 256)) + +opSubtract state = + let + (symA, state') = pop state + (symB, state'') = pop state' + in + push state'' (Char.chr (((Char.ord symB) - (Char.ord symA)) `mod` 256)) + +discreteLog 0 = 8 +discreteLog 1 = 0 +discreteLog 2 = 1 +discreteLog n = (discreteLog (n `div` 2)) + 1 + +opDiscreteLog state = + let + (symbol, state') = pop state + in + push state' (Char.chr (discreteLog (Char.ord symbol))) + +-- +-- Stack manipulation. +-- + +-- +-- Pop the top symbol of the stack, make a copy of it, push it back onto the +-- stack, and enqueue the copy onto the queue. +-- + +opEnqueueCopy state = + let + (sym, _) = pop state + in + enqueue state sym + +-- +-- Dequeue a symbol from the queue and push it onto the stack. +-- + +opDequeue state = + let + (sym, state') = dequeue state + in + push state' sym + +-- +-- Duplicate the top symbol of the stack. +-- + +opDuplicate state = + let + (symbol, _) = pop state + in + push state symbol + +-- +-- Miscellaneous operations. +-- + +opNop state = + state + +-- +-- Parameterizable operations. +-- + +opPushValue value state = + push state (Char.chr value) + +opAccumValue value state = + let + (sym, state') = pop state + value' = ((Char.ord sym) * 10) + value + in + push state' (Char.chr (value' `mod` 256)) + + +----------------------------------------------------------------------- +-- ===================== Debugging Functions ======================= -- +----------------------------------------------------------------------- + +type Debugger = State -> Interpreter -> IO () + +debugNop s i = do + return () + +debugPrintState s i = do + putStr ((show s) ++ "\n") + return () + + +----------------------------------------------------------------------- +-- ============================ Executor =========================== -- +----------------------------------------------------------------------- + +execute :: [Symbol] -> State -> Interpreter -> Debugger -> IO (State, Interpreter) + +execute [] state interpreter debugger = + return (state, interpreter) +execute (opSym:program') state interpreter debugger = + let + operation = fetch interpreter opSym + in do + (state', interpreter') <- operation state interpreter + debugger state' interpreter' + execute program' state' interpreter' debugger + + +----------------------------------------------------------------------- +-- ====================== Top-Level Function ======================= -- +----------------------------------------------------------------------- + +initialInterpreter = Interp + (Map.fromList + [ + ('.', opOutput), + (',', opInput), + + ('#', fitRegOp (opPushValue 0)), + ('0', fitRegOp (opAccumValue 0)), + ('1', fitRegOp (opAccumValue 1)), + ('2', fitRegOp (opAccumValue 2)), + ('3', fitRegOp (opAccumValue 3)), + ('4', fitRegOp (opAccumValue 4)), + ('5', fitRegOp (opAccumValue 5)), + ('6', fitRegOp (opAccumValue 6)), + ('7', fitRegOp (opAccumValue 7)), + ('8', fitRegOp (opAccumValue 8)), + ('9', fitRegOp (opAccumValue 9)), + + ('+', fitRegOp opAdd), + ('-', fitRegOp opSubtract), + ('~', fitRegOp opDiscreteLog), + + ('^', fitRegOp opEnqueueCopy), + ('v', fitRegOp opDequeue), + (':', fitRegOp opDuplicate), + + ('!', opSupplant), + ('?', opEval), + + (';', fitRegOp (opPushValue (Char.ord ';'))) + ] + ) + +initialState = State [] [] + +emmental string = do + (state, interpreter) <- execute string initialState initialInterpreter debugNop + return state + +debug string = do + (state, interpreter) <- execute string initialState initialInterpreter debugPrintState + return state + + +----------------------------------------------------------------------- +-- ========================== Test Cases =========================== -- +----------------------------------------------------------------------- + +-- +-- Drivers for test cases. 'demo' runs them straight, whereas 'test' +-- uses the debugger. +-- + +demo n = emmental (testProg n) + +test n = debug (testProg n) + +-- +-- Here we introduce a bit of a cheat, in order to make writing +-- complex Emmental programs tolerable. You can still see the +-- programs in their fully glory by executing "show (testProg n)". +-- + +quote [] = [] +quote (symbol:rest) = "#" ++ (show (Char.ord symbol)) ++ (quote rest) + +-- +-- Add one and one. +-- + +testProg 1 = "#1#1+" + +-- +-- Redefine & as "+". +-- + +testProg 2 = ";#43#38!#1#1&" -- 59,43,38 ==> ";+&" + +-- +-- Redefine 0 as "9". +-- + +testProg 3 = ";#57#48!#0" -- 59,57,48 ==> ";90" + +-- +-- Redefine 0 as "#48?". This results in an infinite loop when 0 is executed. +-- + +testProg 4 = ";#35#52#56#63#48!0" -- 59,35,52,56,63,48 ==> ";#48?0" + +-- +-- Redefine $ as ".#36?". This results in a loop that pops symbols and +-- and prints them, until the stack underflows, when $ is executed. +-- + +testProg 5 = ";#46#35#51#54#63#36! #65#66#67#68#69$" + +-- +-- Duplicate the top stack element (assuming an empty queue.) +-- This shows that the : operation is not strictly necessary +-- (when you know the size of the queue.) +-- + +testProg 6 = "#65^v" + +-- +-- Discard the top stack element (assuming more than one element +-- on the stack, and an empty queue.) +-- + +testProg 7 = "#33#123^v-+" + +-- +-- Swap the top two elements of the stack (assuming an empty queue.) +-- + +testProg 8 = "#67#66#65^v^-+^^v^v^v-+^v-+^v-+vv" + +-- +-- Input a symbol. Report whether its ASCII value is even or odd. +-- + +testProg 9 = (quote ";^v:") ++ "!" ++ -- : = dup + (quote ";#69.") ++ "#!" ++ -- NUL = print "E" + (quote ";#79.") ++ "#128!" ++ -- \128 = print "O" + (quote (";" ++ (take 127 [':',':'..]) ++ -- m = mul by 128 + (take 127 ['+','+'..]) ++ "m")) ++ "!" ++ + ",m?" + +-- +-- Input a symbol. Report whether it is M or not. +-- + +testProg 10 = (quote ";#78.") ++ "#!" ++ -- NUL = print "N" + ";##1!" ++ -- SOH = same as NUL + ";##2!" ++ -- STX = same as NUL + ";##3!" ++ -- ETX = same as NUL + ";##4!" ++ -- EOT = same as NUL + ";##5!" ++ -- ENQ = same as NUL + ";##6!" ++ -- ACK = same as NUL + ";##7!" ++ -- BEL = same as NUL + (quote ";#89.") ++ "#8!" ++ -- BS = print "Y" + ",#77-~?" + +-- +-- Same as testProg 5, except stop printing when a NUL is +-- encountered, instead of just underflowing the stack. +-- + +testProg 11 = ";" ++ (quote ":~?$") ++ "!" ++ -- $ = dup & test + ";" ++ (quote ".$") ++ "#!" ++ -- NUL = print & repeat + ";#0#1!" ++ -- SOH = same as NUL + ";#0#2!" ++ -- STX = same as NUL + ";#0#3!" ++ -- ETX = same as NUL + ";#0#4!" ++ -- EOT = same as NUL + ";#0#5!" ++ -- ENQ = same as NUL + ";#0#6!" ++ -- ACK = same as NUL + ";#0#7!" ++ -- BEL = same as NUL + -- BS = stop (nop) + "#0" ++ (quote (reverse "Hello!")) ++ "$"