git @ Cat's Eye Technologies Emmental / rel_1_0_2007_1111
Initial import of Emmental version 1.0 revision 2007.1111 sources. Cat's Eye Technologies 12 years ago
2 changed file(s) with 972 addition(s) and 0 deletion(s). Raw diff Collapse all Expand all
0 <html><head><title>The Emmental Programming Language</title></head>
1 <body>
2
3 <h1>The Emmental Programming Language</h1>
4
5 <p>November 2007, Chris Pressey, Cat's Eye Technologies</p>
6
7 <h2>Introduction</h2>
8
9 <p>Emmental is a self-modifying programming language. That is not to say
10 that is a language in which programs are self-modifying; rather, it is
11 the language itself, as defined by a meta-circular interpreter, that can be
12 modified during the course of a running program. Indeed, this is how
13 Emmental, without conventional conditional and repetition/recursion constructs,
14 achieves Turing-completeness.</p>
15
16 <h2>Meta-circular Interpreters</h2>
17
18 <p>One way to attempt to define a language is by giving what's called
19 a <em>meta-circular interpreter</em> (often shortened to "MCI" in this document.)
20 This is an interpreter for some language which is written in that same language
21 (or at least, a language which is very close to it.)</p>
22
23 <p>Meta-circular interpreters were a popular way to the describe the
24 semantics of programming languages, especially
25 LISP-like languages, and especially before the advent of denotational
26 semantics. The term "meta-circular" was apparently coined by John C. Reynolds
27 in his paper "Definitional Interpreters for Higher-Order Programming
28 Languages" (1972 Proceedings ACM National Conference.)</p>
29
30 <p>Of course, in the real world, MCI's are not often used.
31 They certainly <em>can</em> be used: if you have a working Scheme interpreter
32 that came with your computer system, there
33 is nothing stopping you from writing another Scheme interpreter in Scheme,
34 and running your programs on your interpreter (which is itself running on your
35 system's interpreter.) However, this is quite a bit
36 less efficient due to the duplication of effort. A somewhat more realistic
37 case might be if your system came with, say, a Scheme compiler. You might then feed your
38 Scheme interpreter (written in Scheme) through that to make a native Scheme
39 interpreter, and use that to interpret your programs.
40 (In this setup, the interpreter is usually described as "self-hosting"
41 rather than "meta-circular".)</p>
42
43 <p>But, as should be obvious, you already need an implementation of Scheme
44 for your Scheme interpreter written in Scheme to be of much practical
45 use to you. If your meta-circular interpreter is all you have, you won't
46 be able to use it to run or understand Scheme programs. Because the MCI is
47 defined in terms of itself, you'll need some other source of "understanding
48 how it works" to make it complete. This understanding might come
49 from an implementation in some other programming language, or a specification
50 in some formal language, or a description in some natural language, or simply
51 from intuition &mdash; but it has to come from somewhere.</p>
52
53 <p>Assuming that we do have that external source of understanding, the
54 meta-circular interpreter can come in quite handy in codifying the semantics
55 of the language. And, in Emmental's case, <em>altering</em> those semantics:
56 Emmental's MCI supports operations which instruct Emmental's MCI to modify
57 its behaviour.</p>
58
59 <h2>Interpreter Structure</h2>
60
61 <p>To describe the structure of Emmental's MCI, we first examine the
62 general structure of interpreters. If you've ever written a virtual machine
63 executor in, say, C, you've noticed that it has the general form</p>
64
65 <pre>
66 pc = start;
67 while (!done) {
68 switch (instruction[pc]) {
69 case INSTR_ONE:
70 /* implement semantics of INSTR_ONE */
71 pc += advance(INSTR_ONE);
72 break;
73 case INSTR_TWO:
74 /* implement semantics of INSTR_TWO */
75 pc += advance(INSTR_TWO);
76 break;
77 /* ... */
78 case INSTR_HALT:
79 done = 1;
80 break;
81 default:
82 perror("Invalid opcode");
83 }
84 }
85 </pre>
86
87 <p>Note that <code>advance()</code> is some function that computes how far the
88 program counter is advanced on that instruction. This value is typically
89 +1 for most instructions, but more or less than 1 (and dependent on the
90 state of the program) for a handful of "branch" instructions.
91 Note also that <code>advance()</code> would not typically be implemented in C
92 as a function; I'm just showing it like this to emphasize the
93 regular structure.</p>
94
95 <p>From this we infer that the basic structure of an interpreter is
96 a <em>dictionary</em> or <em>map</em> that associates program symbols with operations.
97 There is some extra housekeeping like the fetch-execute cycle that surrounds
98 this dictionary, but this can (hopefully) be handled mostly automatically,
99 freeing us to concentrate on <em>symbols</em> and <em>operations</em>.</p>
100
101 <p>The symbols could be taken from any finite alphabet, but in Emmental,
102 to keep things relatively simple, we'll just use the ASCII character set.
103 (Actually, to be precise, this is the full 8-bit ASCII character set.
104 Non-printable control characters are allowed, as are characters between
105 128 and 255, and each has a distinct meaning. But their representations
106 are not defined.)</p>
107
108 <p>The operations can be thought of, abstractly, as functions which
109 transform program states. Or they can be thought of, concretely, as
110 segments of code &mdash; mini-programs which implement these functions.
111 In the case of a meta-circular interpreter, these mini-programs would be written
112 <em>in the language being interpreted</em>.</p>
113
114 <p>To extend this idea to a <em>self-modifying</em> meta-circular interpreter,
115 the operations can be thought of as functions which transform
116 both program states <em>and</em> interpreter definitions.
117 (Alternatively, the interpreter definition could be thought of as
118 part of the program state, although I feel that's a bit gorier a way
119 to look at it, and I prefer the other view, at least for Emmental.)</p>
120
121 <p>In Emmental, most operations leave the interpreter definition unchanged.
122 However, there is one operation which alters the interpreter
123 definition, and it is this altered definition that is used to interpret
124 the remainder of the program.</p>
125
126 <h2>Emmental Semantics (in Emmental)</h2>
127
128 <p>Emmental is essentially a stack-based language. (There's also a queue,
129 but it's not as central.) All operations implicitly
130 get data from, and implicitly deposit results back onto, a single
131 stack. For orthogonality's sake, this stack may contain only ASCII symbols.
132 (And note that trying to pop an empty stack, or dequeue an empty queue,
133 is an error that aborts the program.)</p>
134
135 <p>Note that because we've established that an interpreter (at least,
136 insofar as Emmental ever needs to know) is simply a map that takes
137 symbols to operations, and that operations in Emmental are defined
138 (meta-circularly) as Emmental programs, we can use the following notation
139 to describe interpreters:</p>
140
141 <pre>
142 % &rarr; XYZ+*!
143 &amp; &rarr; 'ap'ag'ag
144 </pre>
145
146 <p>That is, the symbol <code>%</code>, when encountered in an Emmental
147 program, indicates an operation that
148 is defined by the Emmental program <code>XYZ+*!</code>, and so forth.</p>
149
150 <p>When a main Emmental program begins execution for the first time,
151 it starts with what's called the <em>initial Emmental interpreter</em>.
152 (This fact, of course, doesn't apply to any further point of execution inside
153 an Emmental program, or execution of operations defined in
154 Emmental's MCI, since these would be considered subprograms of a sort.
155 These cases use whichever interpreter happens to be in force in that point in time.)</p>
156
157 <p>The inital Emmental interpreter is defined as follows:</p>
158
159 <pre>
160 a &rarr; a
161 b &rarr; b
162 c &rarr; c
163 ...
164 </pre>
165
166 <p>That is, for every symbol <var>x</var> in the ASCII set,
167 <var>x</var> <code>&rarr;</code> <var>x</var>.</p>
168
169 <p>Doesn't tell us a lot about Emmental's semantics, does it? No.
170 Nothing at all, really. But remember what I said about needing an external
171 source of understanding, in order to actually get full value out of an MCI.
172 And remember the purpose of Emmental's MCI: it is
173 not there so much to help us understand Emmental, but to allow us to
174 <em>change</em> Emmental, from inside an Emmental program.</p>
175
176 <p>And, for all that our description of the initial Emmental interpreter
177 is unhelpfully tautological, it is not incorrect: the semantics of
178 <code>a</code> can in fact be thought of as being defined by an Emmental
179 program that consists of only one instruction, <code>a</code>. This
180 happy state of affairs comes about because Emmental is stack-based;
181 the "signature" (the requirements for the "before" and "after" stacks)
182 of the symbol <code>a</code> is the same as the signature of the program
183 containing the single symbol <code>a</code>. No extra syntax to specify
184 arity and the like is necessary.</p>
185
186 <p>Above all, don't panic: we <em>will</em> describe what symbols like
187 <code>a</code> actually mean in Emmental, we'll just need to do it in something
188 besides Emmental. In fact, let's do that right now.</p>
189
190 <h2>Emmental Semantics (in English)</h2>
191
192 <h3>Primitive Arithmetic</h3>
193
194 <p><code>#</code> pushes the symbol NUL (ASCII 0) onto the stack.</p>
195
196 <p>The symbols <code>0</code>, <code>1</code>, ... <code>9</code>
197 pop a symbol from the stack, multiply its ASCII value by 10 modulo 256, add
198 the value 0, 1, ... 9 (respectively) to that value modulo 256, and
199 push the resulting symbol back onto the stack.</p>
200
201 <p>The upshot of these 11 operations is that you can push arbitrary
202 symbols onto the stack by spelling out their ASCII values in decimal.
203 For example, <code>#64</code> pushes a <code>@</code> onto the stack.</p>
204
205 <p><code>+</code> pops two symbols off the stack, adds together their
206 ASCII values modulo 256, and pushes the symbol with the resultant ASCII value
207 back onto the stack.</p>
208
209 <p><code>-</code> pops two symbols off the stack, subtracts the ASCII value
210 of the first popped from the ASCII value of the second popped modulo 256,
211 and pushes the symbol with the resultant ASCII value back onto the stack.</p>
212
213 <p><code>~</code> ("log") pops a symbol from the stack, computes the discrete
214 base-2 logarithm of the ASCII value of that symbol,
215 and pushes the symbol with the resultant ASCII value back onto the stack.
216 The discrete base-2 logarithm of a number is the floor or integer part
217 of the base-2 logarithm of that number. Alternately, it is the number of
218 the highest bit position (starting with the LSB = bit position 0)
219 with a bit set when the number is viewed as binary.
220 Because the base-2 logarithm of the number 0 itself is undefined,
221 the number 0 is treated as 256 for this operation; its discrete base-2
222 logarithm is 8.</p>
223
224 <h3>Stack and Queue Manipulation</h3>
225
226 <p><code>^</code> ("enqueue a copy") pops a symbol off the stack, makes a copy of it, pushes
227 it back onto the stack, and enqueues the copy onto the queue.</p>
228
229 <p><code>v</code> ("dequeue") dequeues a symbol from the queue and pushes it onto the
230 stack.</p>
231
232 <p>Using these operations in combination, one can form "discard", "duplicate",
233 "swap", and other more advanced stack manipulation operations. For example,
234 assuming an empty queue and more than two elements on the stack, "swap" can
235 be accomplished with the code <code>^v^-+^^v^v^v-+^v-+^v-+vv</code>.</p>
236
237 <p>Despite this fact, the operation <code>:</code> duplicates the top value
238 of the stack. (Emmental is not an absolutely minimal language; note, for
239 instance, that it has all ten decimal digits as operations when these
240 could surely have been defined in terms of only one or two operations.
241 The reasons for a seperate <code>:</code> operation are given below in the
242 section on Computational Class.)</p>
243
244 <h3>I/O</h3>
245
246 <p><code>.</code> pops a symbol off the stack and sends it to the
247 standard output as an ASCII symbol.</p>
248
249 <p><code>,</code> waits for an ASCII symbol to arrive on standard input,
250 and pushes it onto the stack.</p>
251
252 <h3>Interpreter Modification and Reflection</h3>
253
254 <p>First let's define what it means to <em>pop a string</em> off the stack.
255 Symbols are popped off the stack until a <code>;</code> symbol is
256 found on the stack. The symbols popped off are considered a string
257 in the reverse order they were popped; i.e. the last symbol popped
258 is the first symbol of the string. The <code>;</code> symbol is
259 popped off the stack, but is not made part of the string; it is simply discarded.</p>
260
261 <p>Since an Emmental program is a string, popping a program is the same
262 as popping a string, just that the string is interpreted as a program.</p>
263
264 <p><code>!</code> (sometimes called "supplant")
265 pops a symbol, which we call <var>s</var>, off the stack.
266 Then it pops a program <var>t</var>.
267 It then inserts the association <var>s</var> <code>&rarr;</code> <var>t</var> into the
268 interpreter definition. This overwrites whatever mapping of <var>s</var> might have
269 been in the interpreter definition previously. This new interpreter definition
270 is used for all subsequent execution (until it is changed again, of course.)</p>
271
272 <p>Note that <code>!</code> does <em>early binding</em>. That is,
273 the meaning of each symbol in this program <var>t</var> is
274 the meaning of that symbol <em>at the time <code>!</code> is executed</em>.
275 If some subsequent <code>!</code> operation later changes the meaning of one
276 of the symbols that occurs in <var>t</var>, the meaning of <var>t</var>
277 is not changed. The semantics of <var>t</var> are "captured" or "frozen".
278 This implies, among other things, that supplanting some
279 symbol <var>z</var> with itself (a program consisting only of the symbol <var>z</var>)
280 is a no-op, because <var>z</var>'s meaning, at the time that <code>!</code>
281 is executed, is invariably <var>z</var>.</p>
282
283 <p><code>?</code> (sometimes called "eval") pops a symbol, which we call <var>s</var>, off the stack.
284 It then executes that symbol (interprets it as an operation)
285 with the interpreter currently in effect.</p>
286
287 <p>Note that <code>?</code> does <em>late binding</em>. That is,
288 in contrast with <code>!</code>, <code>?</code> never "freezes" the semantic
289 definition of the thing that it is executing. This is true even when
290 <code>?</code> occurs in a operation redefinition (i.e. the program
291 that supplanted some symbol's semantics when an <code>!</code> was executed.)
292 This implies, among other things, that supplanting some symbol <var>z</var>
293 with the program that consists of instructions that push the ASCII value of
294 <var>z</var> onto the stack, followed by a <code>?</code> instruction,
295 creates a <em>cyclic meaning</em> for <var>z</var>.
296 This is because the <var>z</var> that will be executed by the <code>?</code>
297 will always be the present <var>z</var>, that is, the <var>z</var>
298 that is executing the <code>?</code>.</p>
299
300 <p>For convenience's sake, <code>;</code> pushes the symbol <code>;</code>
301 onto the stack.</p>
302
303 <p>All other symbols are no-ops.</p>
304
305 <h2>Computational Class</h2>
306
307 <p>I believe Emmental is Turing-complete with only the operations that
308 have been given so far, but I haven't proved it yet.
309 All the elements are there, and although some of them are somewhat "cramped",
310 they look viable.</p>
311
312 <p>If you want to try thinking about how you could write real programs
313 (like a universal Turing-machine simulator) in Emmental, you might want
314 to skip this section, since it contains "spoilers".</p>
315
316 <p>Repetition can be accomplished by assigning a symbol a cyclic semantics,
317 by use of a <code>?</code> within a <code>!</code>. For example, we can
318 redefine the semantics of <code>0</code> to be <code>#48?</code>. This is
319 simply a program that pushes the symbol <code>0</code> onto the stack and
320 executes it with the current interpreter, and, since <code>0</code> has been
321 redefined to mean <code>#48?</code> in the current interpreter, this will
322 loop forever. The entire program to do this to <code>0</code> and run the
323 infinite loop is:</p>
324
325 <pre>
326 ;#35#52#56#63#48!0
327 </pre>
328
329 <p>This technique can also be used to "jump" from one definition to
330 another, by using <code>?</code> to execute some <em>other</em> symbol
331 at the end of a definition (that is, some symbol other than the symbol
332 being defined.)</p>
333
334 <p>Conditionals are a little more restrictive. The trick to them is,
335 strangely, the discrete log operator <code>~</code> in combination
336 with the eval operator <code>?</code>. Since <code>~</code> maps
337 all symbols into a set of nine symbols, and <code>?</code> executes
338 the symbol on the stack, <code>~?</code> will execute one of the
339 symbols from ASCII 0 (NUL) to ASCII 8 (BS). We can then, for instance,
340 define NUL to do one thing, define SOH through BEL to do the same
341 as NUL, and define BS to do some other thing; this essentially
342 distinguishes between 0 (which executes BS) and every other value
343 (which executes NUL). Further, we can use this in conjunction
344 with <code>-</code> to compare two values for equality. So, for
345 example, a program which inputs a character, and outputs Y if
346 the character is M and N otherwise, would be:</p>
347
348 <pre>
349 #59#35#55#56#46#!;##1!;##2!;##3!;##4!;##5!;##6!;##7!#59#35#56#57#46#8!,#77-~?
350 </pre>
351
352 <p>In case NUL through BS are in use for some reason, we can always
353 add 9 to the result of the log (<code>~#9+?</code>)
354 to map the answer onto HT through DC1. Or, of course, any of
355 a great number of other arithmetical mappings of our choosing.
356 The most severe constraint is that there be 9 available symbols
357 to act as "destinations" for our "branch". Even if we never
358 overwrite one "branch" with another (and we can do that in Emmental!)
359 and even if we allocate one extra symbol to be the "launch point"
360 of the branch, we still have room for 25 branches in the ASCII
361 character set.</p>
362
363 <p>So these parts look good. If there's a problem, it's
364 with the queue. Specifically, the problem seems to be
365 the need to know the present size of the queue in order to do stack housework
366 like "duplicate" and the subsequent need for "duplicate" to achieve
367 "discard." (Duplicate can be defined as <code>^v</code>, but this only
368 works when the queue is empty. Discard can be defined as duplicate plus
369 <code>-+</code>, but this only works when there are other elements below
370 the element being discarded. [This last point is not generally a problem
371 since we can push arbitrary values onto the stack before any
372 program.])</p>
373
374 <p>However, if it turns out that we need "duplicate" or "discard" in order
375 to write routines that can handle a variable-sized queue &mdash; and that
376 strikes me as likely &mdash; then it looks like we have a severe problem.</p>
377
378 <p>Here's one way I could try to deal with it. I could say that the queue
379 is <em>local</em> to the operation being defined (or the main program.)
380 Then you could define <code>:</code> to be <code>^v</code>, and inside
381 <code>:</code>'s definition, the queue would always initially be empty,
382 so the definition would work.</p>
383
384 <p>But... we need the queue to store our global data. For example, if
385 we are going to simulate a Turing machine, we'd need to use the queue
386 to store the tape (perhaps "doubled up", with one symbol of each
387 pair telling us "the next symbol is a simulated tape symbol" or
388 "the next symbol is some housekeeping value.") We can't store the
389 tape on just one stack. And, once you are looping in Emmental, you've
390 left the "main program" forever; you're jumping from definition to
391 definition, and each has their own queue. At best, you'd need to
392 "dump" the queue onto the stack each time you switched definitions,
393 and even then you'd need a loop to do that, and to loop you need to
394 switch definitions. It's a royal mess.</p>
395
396 <p>So here's how I will deal with it. I will add a primitive duplicate
397 operation, <code>:</code>, to Emmental. Proving that Emmental is Turing-complete
398 is still, then, a challenge, although a doable-seeming challenge.
399 I will then propose a more formidable challenge: prove that
400 the language formed by removing the <code>:</code> operation from
401 Emmental is Turing-complete. (Equivalently, prove that the set of
402 Emmental programs that begin with <code>;#0#58!</code> is Turing-complete.
403 The nice thing about Emmental is that you can always shoot yourself in
404 the foot &mdash; until you erase your pistol, that is.)</p>
405
406 <p>And if you <em>really</em> like a challenge, try proving that Emmental
407 without <code>~</code> is Turing-complete. I don't think that it is,
408 although it's possible for it to compute parity, at least (input a
409 symbol, output E if its ASCII value is even, and O if it's odd.
410 To accomplish this, multiply the input's ASCII value by 128 by adding 127
411 copies of it to it; this is modulo 256, so the only results can be
412 0 or 128. Define those operations to print out E and O respectively.
413 But that's as far as I've gotten.)</p>
414
415 <h2>Discussion</h2>
416
417 <h3>Design Decisions</h3>
418
419 <p>I would've liked to have given Emmental a <code>'</code> or <code>"</code>
420 instruction similar to Funge's "quote" and "quote-mode" instructions;
421 instructions that treat one or more of the following symbols in the program
422 literally, pushing them, as symbols, onto the stack, instead of executing them.
423 However, such instructions are somewhat problematic, both theoretically and
424 (for the approach I took implementing Emmental) practically. There are
425 two ways of thinking about the problems that arise.</p>
426
427 <p>One is that the function which implements <code>'</code> is given
428 access to the program text itself, and possibly the position within the program,
429 and it uses these to extract the "immediate mode" symbol past the <code>'</code>. This information
430 could be available because these pieces of information are considered extra arguments
431 to the function, or because they are (gorily) considered part of the overall
432 program state. Either way, this operation is given a lot of information to work with,
433 and for consistency (since we want to be all nice and neat and say that all operations
434 have the same signature so that our dictionary has a nice uniform type,) <em>all</em> operations
435 have access to this information. This is almost too much information; that is, it
436 is so much that operations don't really <em>need</em> the dictionary. We could just say
437 there is <em>one</em> operation, defined by a function, and that function is given the
438 current symbol and has to decide what it means through whatever means it likes.</p>
439
440 <p>This approach is very powerful, of course, but it's just not the style that
441 Emmental embodies. (In fact, the idea to view interpreters as dictionaries
442 was one of the foundational design choices for Emmental, to the point where I
443 started constructing a "little theory of interpreters as maps." It really
444 wasn't exploited as much as I think it could have been. If an interpreter is
445 a map of symbols to strings of symbols, it's much more tractable than an
446 opaque function would be; you can define all sorts of operations on them,
447 for example concatenating two interpreters (for all symbols <var>s</var>
448 in interpreter <var>a</var> and interpreter <var>b</var>,
449 <var>c</var>[<var>s</var>] <code>&rarr;</code> <var>a</var>[<var>s</var>]<var>b</var>[<var>s</var>] &mdash;
450 that sort of thing,) computing union or intersection of interpreters,
451 Cartesian product, etc.)</p>
452
453 <p>The other way of looking at it is to say that there are in fact
454 <em>multiple</em> meta-circular interpreters available inside Emmental, and symbols
455 like <code>'</code> switch temporarily to an alternate MCI. This alternate MCI
456 interprets every symbol as "push this symbol", then reinstates the previous MCI.
457 I like this explication better than the one above &mdash; MCIs begin
458 to look a bit like continuations! &mdash; but to do it justice would take some
459 work. I envision a language where the program has fine control
460 over which MCI is in effect, possibly by keeping a map from symbols to MCIs,
461 or maybe even being able to push MCIs onto the stack.
462 This is a wee bit much for Emmental too though, and
463 perhaps I'll explore these possibilities in a future language.</p>
464
465 <h3>Turing-completeness</h3>
466
467 <p>You can make the argument that Emmental's way of being Turing-complete
468 is really nothing new: when you redefine some symbol, you're really just
469 defining a new function, and when you use <code>?</code> to execute that
470 symbol from within its own definition, you're just making a recursive
471 function call.</p>
472
473 <p>Well, yes, you can make that argument. But it has to do with how you think
474 about "what is a language", I think. Does a Pascal program fragment which
475 defines a procedure called <code>PrintFibonacci</code> represent another
476 programming language, one different from Pascal? You could certainly
477 say that it does &mdash; it's
478 the language Pascal where the token <code>PrintFibonacci</code> has
479 some meaning that it doesn't have in Pascal.</p>
480
481 <p>In that view, any language where you can define procedures, or
482 functions, or standard libraries, or the like, is an extensible language.
483 But even languages where you <em>can't</em> define new procedures or
484 functions is arguably an extensible language. Take some initial Brainfuck
485 program fragment, for instance. After it executes, it leaves the Brainfuck
486 tape and while-stack in some state that depends on the input.
487 Any Brainfuck fragment that executes after that, will execute in that
488 environment, and that environment is arguably a version of the language
489 Brainfuck, suitably extended.</p>
490
491 <p>You don't normally think of it that way, I bet, but you
492 <em>could</em> &mdash; and you would need to, to some degree,
493 to claim that Emmental is "just" defining new functions.
494 The reason you don't typically look at languages like this
495 (unless you are very strange) is because it's much more useful to
496 divide the world into "languages" and "programs." And Emmental
497 <em>does</em> make this division, it just makes it in a slightly
498 different place than usual.</p>
499
500 <p>As far as I'm concerned, if I describe what Emmental does as
501 modifying the Emmental language via its MCI, and what Emmental actually does is
502 consistent with the idea of modifying the Emmental language via its MCI, then
503 what Emmental effectively does is modify the Emmental language via its
504 MCI. And if it needs to do this in a certain way in order to simulate
505 a universal Turing machine, then that difference (however slight)
506 sets it apart from systems where this simulation needs to be done by defining
507 recursive functions.</p>
508
509 <h2>Implementation</h2>
510
511 <p><code>emmental.hs</code> is a reference interpreter for Emmental
512 written in Haskell. Run the function <code>emmental</code> on a
513 string; you can also run <code>debug</code> on a string to view
514 the state of the program (stack &amp; queue) during execution.
515 (Note that <code>debug</code> is <em>not</em> able to show
516 program states that occur internal to an operation.)</p>
517
518 <p>Happy interpreter-redefining!
519 <br>Chris Pressey
520 <br>Chicago, IL
521 <br>November 11, 2007</p>
522
523 </body>
524 </html>
0 --
1 -- emmental.hs
2 -- Interpreter for the Emmental Programming Language
3 -- Chris Pressey, Cat's Eye Technologies
4 --
5 -- $Id: emmental.hs 5 2007-11-12 04:36:43Z catseye $
6 --
7
8 --
9 -- Copyright (c)2007 Cat's Eye Technologies. All rights reserved.
10 --
11 -- Redistribution and use in source and binary forms, with or without
12 -- modification, are permitted provided that the following conditions
13 -- are met:
14 --
15 -- 1. Redistributions of source code must retain the above copyright
16 -- notices, this list of conditions and the following disclaimer.
17 -- 2. Redistributions in binary form must reproduce the above copyright
18 -- notices, this list of conditions, and the following disclaimer in
19 -- the documentation and/or other materials provided with the
20 -- distribution.
21 -- 3. Neither the names of the copyright holders nor the names of their
22 -- contributors may be used to endorse or promote products derived
23 -- from this software without specific prior written permission.
24 --
25 -- THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
26 -- ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
27 -- LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
28 -- FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
29 -- COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
30 -- INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
31 -- BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
32 -- LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
33 -- CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
34 -- LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
35 -- ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
36 -- POSSIBILITY OF SUCH DAMAGE.
37 --
38
39
40 import qualified Data.Map as Map
41 import qualified Data.Char as Char
42
43 -----------------------------------------------------------------------
44 -- ============================ Symbols ============================ --
45 -----------------------------------------------------------------------
46
47 type Symbol = Char
48
49
50 -----------------------------------------------------------------------
51 -- ======================== Program States ========================= --
52 -----------------------------------------------------------------------
53
54 data State = State [Symbol] [Symbol]
55 deriving (Ord, Eq, Show)
56
57 pop (State (head:tail) queue) = (head, State tail queue)
58 push (State list queue) sym = (State (sym:list) queue)
59
60 popString (State (';':tail) queue) = ([], State tail queue)
61 popString (State (head:tail) queue) =
62 let
63 (string, state') = popString (State tail queue)
64 in
65 (string ++ [head], state')
66
67 enqueue (State stack queue) symbol = State stack (symbol:queue)
68 dequeue (State stack queue) =
69 let
70 symbol = last queue
71 queue' = init queue
72 in
73 (symbol, State stack queue')
74
75
76 -----------------------------------------------------------------------
77 -- ========================= Interpreters ========================== --
78 -----------------------------------------------------------------------
79
80 data Interpreter = Interp (Map.Map Symbol Operation)
81
82 fetch (Interp map) sym = Map.findWithDefault (fitRegOp opNop) sym map
83 supplant (Interp map) sym op = (Interp (Map.insert sym op map))
84
85
86 -----------------------------------------------------------------------
87 -- ========================== Operations =========================== --
88 -----------------------------------------------------------------------
89
90 type Operation = State -> Interpreter -> IO (State, Interpreter)
91
92 composeOps :: Operation -> Operation -> Operation
93
94 composeOps op1 op2 = f where
95 f state interpreter = do
96 (state', interpreter') <- op1 state interpreter
97 op2 state' interpreter'
98
99 createOp :: Interpreter -> [Symbol] -> Operation
100
101 createOp interpreter [] =
102 (fitRegOp opNop)
103 createOp interpreter (head:tail) =
104 composeOps (fetch interpreter head) (createOp interpreter tail)
105
106 --
107 -- It's useful for us to express a lot of our operators as non-monadic
108 -- functions that don't affect the interpreter. This is a little "adapter"
109 -- function that lets us create monadic functions with the right signature
110 -- from them.
111 --
112
113 fitRegOp :: (State -> State) -> Operation
114
115 fitRegOp regop = f where
116 f state interpreter =
117 let
118 state' = regop state
119 in
120 do return (state', interpreter)
121
122
123 ------------------------------------------------------------
124 --------------- The operations themselves. -----------------
125 ------------------------------------------------------------
126
127 --
128 -- Redefine the meaning of the symbol on the stack with
129 -- a mini-program also popped off the stack.
130 --
131
132 opSupplant state interpreter =
133 let
134 (opSym, state') = pop state
135 (newOpDefn, state'') = popString state'
136 newOp = createOp interpreter newOpDefn
137 in
138 do return (state'', supplant interpreter opSym newOp)
139
140 --
141 -- Execute the symbol on the stack with the current interpreter.
142 --
143
144 opEval state interpreter =
145 let
146 (opSym, state') = pop state
147 newOp = createOp interpreter [opSym]
148 in
149 newOp state' interpreter
150
151 --
152 -- I/O.
153 --
154
155 opInput state interpreter = do
156 symbol <- getChar
157 do return (push state symbol, interpreter)
158
159 opOutput state interpreter =
160 let
161 (symbol, state') = pop state
162 in do
163 putChar symbol
164 return (state', interpreter)
165
166 --
167 -- Primitive arithmetic.
168 --
169
170 opAdd state =
171 let
172 (symA, state') = pop state
173 (symB, state'') = pop state'
174 in
175 push state'' (Char.chr (((Char.ord symB) + (Char.ord symA)) `mod` 256))
176
177 opSubtract state =
178 let
179 (symA, state') = pop state
180 (symB, state'') = pop state'
181 in
182 push state'' (Char.chr (((Char.ord symB) - (Char.ord symA)) `mod` 256))
183
184 discreteLog 0 = 8
185 discreteLog 1 = 0
186 discreteLog 2 = 1
187 discreteLog n = (discreteLog (n `div` 2)) + 1
188
189 opDiscreteLog state =
190 let
191 (symbol, state') = pop state
192 in
193 push state' (Char.chr (discreteLog (Char.ord symbol)))
194
195 --
196 -- Stack manipulation.
197 --
198
199 --
200 -- Pop the top symbol of the stack, make a copy of it, push it back onto the
201 -- stack, and enqueue the copy onto the queue.
202 --
203
204 opEnqueueCopy state =
205 let
206 (sym, _) = pop state
207 in
208 enqueue state sym
209
210 --
211 -- Dequeue a symbol from the queue and push it onto the stack.
212 --
213
214 opDequeue state =
215 let
216 (sym, state') = dequeue state
217 in
218 push state' sym
219
220 --
221 -- Duplicate the top symbol of the stack.
222 --
223
224 opDuplicate state =
225 let
226 (symbol, _) = pop state
227 in
228 push state symbol
229
230 --
231 -- Miscellaneous operations.
232 --
233
234 opNop state =
235 state
236
237 --
238 -- Parameterizable operations.
239 --
240
241 opPushValue value state =
242 push state (Char.chr value)
243
244 opAccumValue value state =
245 let
246 (sym, state') = pop state
247 value' = ((Char.ord sym) * 10) + value
248 in
249 push state' (Char.chr (value' `mod` 256))
250
251
252 -----------------------------------------------------------------------
253 -- ===================== Debugging Functions ======================= --
254 -----------------------------------------------------------------------
255
256 type Debugger = State -> Interpreter -> IO ()
257
258 debugNop s i = do
259 return ()
260
261 debugPrintState s i = do
262 putStr ((show s) ++ "\n")
263 return ()
264
265
266 -----------------------------------------------------------------------
267 -- ============================ Executor =========================== --
268 -----------------------------------------------------------------------
269
270 execute :: [Symbol] -> State -> Interpreter -> Debugger -> IO (State, Interpreter)
271
272 execute [] state interpreter debugger =
273 return (state, interpreter)
274 execute (opSym:program') state interpreter debugger =
275 let
276 operation = fetch interpreter opSym
277 in do
278 (state', interpreter') <- operation state interpreter
279 debugger state' interpreter'
280 execute program' state' interpreter' debugger
281
282
283 -----------------------------------------------------------------------
284 -- ====================== Top-Level Function ======================= --
285 -----------------------------------------------------------------------
286
287 initialInterpreter = Interp
288 (Map.fromList
289 [
290 ('.', opOutput),
291 (',', opInput),
292
293 ('#', fitRegOp (opPushValue 0)),
294 ('0', fitRegOp (opAccumValue 0)),
295 ('1', fitRegOp (opAccumValue 1)),
296 ('2', fitRegOp (opAccumValue 2)),
297 ('3', fitRegOp (opAccumValue 3)),
298 ('4', fitRegOp (opAccumValue 4)),
299 ('5', fitRegOp (opAccumValue 5)),
300 ('6', fitRegOp (opAccumValue 6)),
301 ('7', fitRegOp (opAccumValue 7)),
302 ('8', fitRegOp (opAccumValue 8)),
303 ('9', fitRegOp (opAccumValue 9)),
304
305 ('+', fitRegOp opAdd),
306 ('-', fitRegOp opSubtract),
307 ('~', fitRegOp opDiscreteLog),
308
309 ('^', fitRegOp opEnqueueCopy),
310 ('v', fitRegOp opDequeue),
311 (':', fitRegOp opDuplicate),
312
313 ('!', opSupplant),
314 ('?', opEval),
315
316 (';', fitRegOp (opPushValue (Char.ord ';')))
317 ]
318 )
319
320 initialState = State [] []
321
322 emmental string = do
323 (state, interpreter) <- execute string initialState initialInterpreter debugNop
324 return state
325
326 debug string = do
327 (state, interpreter) <- execute string initialState initialInterpreter debugPrintState
328 return state
329
330
331 -----------------------------------------------------------------------
332 -- ========================== Test Cases =========================== --
333 -----------------------------------------------------------------------
334
335 --
336 -- Drivers for test cases. 'demo' runs them straight, whereas 'test'
337 -- uses the debugger.
338 --
339
340 demo n = emmental (testProg n)
341
342 test n = debug (testProg n)
343
344 --
345 -- Here we introduce a bit of a cheat, in order to make writing
346 -- complex Emmental programs tolerable. You can still see the
347 -- programs in their fully glory by executing "show (testProg n)".
348 --
349
350 quote [] = []
351 quote (symbol:rest) = "#" ++ (show (Char.ord symbol)) ++ (quote rest)
352
353 --
354 -- Add one and one.
355 --
356
357 testProg 1 = "#1#1+"
358
359 --
360 -- Redefine & as "+".
361 --
362
363 testProg 2 = ";#43#38!#1#1&" -- 59,43,38 ==> ";+&"
364
365 --
366 -- Redefine 0 as "9".
367 --
368
369 testProg 3 = ";#57#48!#0" -- 59,57,48 ==> ";90"
370
371 --
372 -- Redefine 0 as "#48?". This results in an infinite loop when 0 is executed.
373 --
374
375 testProg 4 = ";#35#52#56#63#48!0" -- 59,35,52,56,63,48 ==> ";#48?0"
376
377 --
378 -- Redefine $ as ".#36?". This results in a loop that pops symbols and
379 -- and prints them, until the stack underflows, when $ is executed.
380 --
381
382 testProg 5 = ";#46#35#51#54#63#36! #65#66#67#68#69$"
383
384 --
385 -- Duplicate the top stack element (assuming an empty queue.)
386 -- This shows that the : operation is not strictly necessary
387 -- (when you know the size of the queue.)
388 --
389
390 testProg 6 = "#65^v"
391
392 --
393 -- Discard the top stack element (assuming more than one element
394 -- on the stack, and an empty queue.)
395 --
396
397 testProg 7 = "#33#123^v-+"
398
399 --
400 -- Swap the top two elements of the stack (assuming an empty queue.)
401 --
402
403 testProg 8 = "#67#66#65^v^-+^^v^v^v-+^v-+^v-+vv"
404
405 --
406 -- Input a symbol. Report whether its ASCII value is even or odd.
407 --
408
409 testProg 9 = (quote ";^v:") ++ "!" ++ -- : = dup
410 (quote ";#69.") ++ "#!" ++ -- NUL = print "E"
411 (quote ";#79.") ++ "#128!" ++ -- \128 = print "O"
412 (quote (";" ++ (take 127 [':',':'..]) ++ -- m = mul by 128
413 (take 127 ['+','+'..]) ++ "m")) ++ "!" ++
414 ",m?"
415
416 --
417 -- Input a symbol. Report whether it is M or not.
418 --
419
420 testProg 10 = (quote ";#78.") ++ "#!" ++ -- NUL = print "N"
421 ";##1!" ++ -- SOH = same as NUL
422 ";##2!" ++ -- STX = same as NUL
423 ";##3!" ++ -- ETX = same as NUL
424 ";##4!" ++ -- EOT = same as NUL
425 ";##5!" ++ -- ENQ = same as NUL
426 ";##6!" ++ -- ACK = same as NUL
427 ";##7!" ++ -- BEL = same as NUL
428 (quote ";#89.") ++ "#8!" ++ -- BS = print "Y"
429 ",#77-~?"
430
431 --
432 -- Same as testProg 5, except stop printing when a NUL is
433 -- encountered, instead of just underflowing the stack.
434 --
435
436 testProg 11 = ";" ++ (quote ":~?$") ++ "!" ++ -- $ = dup & test
437 ";" ++ (quote ".$") ++ "#!" ++ -- NUL = print & repeat
438 ";#0#1!" ++ -- SOH = same as NUL
439 ";#0#2!" ++ -- STX = same as NUL
440 ";#0#3!" ++ -- ETX = same as NUL
441 ";#0#4!" ++ -- EOT = same as NUL
442 ";#0#5!" ++ -- ENQ = same as NUL
443 ";#0#6!" ++ -- ACK = same as NUL
444 ";#0#7!" ++ -- BEL = same as NUL
445 -- BS = stop (nop)
446 "#0" ++ (quote (reverse "Hello!")) ++ "$"