|
0 |
<html><head><title>The Emmental Programming Language</title></head>
|
|
1 |
<body>
|
|
2 |
|
|
3 |
<h1>The Emmental Programming Language</h1>
|
|
4 |
|
|
5 |
<p>November 2007, Chris Pressey, Cat's Eye Technologies</p>
|
|
6 |
|
|
7 |
<h2>Introduction</h2>
|
|
8 |
|
|
9 |
<p>Emmental is a self-modifying programming language. That is not to say
|
|
10 |
that is a language in which programs are self-modifying; rather, it is
|
|
11 |
the language itself, as defined by a meta-circular interpreter, that can be
|
|
12 |
modified during the course of a running program. Indeed, this is how
|
|
13 |
Emmental, without conventional conditional and repetition/recursion constructs,
|
|
14 |
achieves Turing-completeness.</p>
|
|
15 |
|
|
16 |
<h2>Meta-circular Interpreters</h2>
|
|
17 |
|
|
18 |
<p>One way to attempt to define a language is by giving what's called
|
|
19 |
a <em>meta-circular interpreter</em> (often shortened to "MCI" in this document.)
|
|
20 |
This is an interpreter for some language which is written in that same language
|
|
21 |
(or at least, a language which is very close to it.)</p>
|
|
22 |
|
|
23 |
<p>Meta-circular interpreters were a popular way to the describe the
|
|
24 |
semantics of programming languages, especially
|
|
25 |
LISP-like languages, and especially before the advent of denotational
|
|
26 |
semantics. The term "meta-circular" was apparently coined by John C. Reynolds
|
|
27 |
in his paper "Definitional Interpreters for Higher-Order Programming
|
|
28 |
Languages" (1972 Proceedings ACM National Conference.)</p>
|
|
29 |
|
|
30 |
<p>Of course, in the real world, MCI's are not often used.
|
|
31 |
They certainly <em>can</em> be used: if you have a working Scheme interpreter
|
|
32 |
that came with your computer system, there
|
|
33 |
is nothing stopping you from writing another Scheme interpreter in Scheme,
|
|
34 |
and running your programs on your interpreter (which is itself running on your
|
|
35 |
system's interpreter.) However, this is quite a bit
|
|
36 |
less efficient due to the duplication of effort. A somewhat more realistic
|
|
37 |
case might be if your system came with, say, a Scheme compiler. You might then feed your
|
|
38 |
Scheme interpreter (written in Scheme) through that to make a native Scheme
|
|
39 |
interpreter, and use that to interpret your programs.
|
|
40 |
(In this setup, the interpreter is usually described as "self-hosting"
|
|
41 |
rather than "meta-circular".)</p>
|
|
42 |
|
|
43 |
<p>But, as should be obvious, you already need an implementation of Scheme
|
|
44 |
for your Scheme interpreter written in Scheme to be of much practical
|
|
45 |
use to you. If your meta-circular interpreter is all you have, you won't
|
|
46 |
be able to use it to run or understand Scheme programs. Because the MCI is
|
|
47 |
defined in terms of itself, you'll need some other source of "understanding
|
|
48 |
how it works" to make it complete. This understanding might come
|
|
49 |
from an implementation in some other programming language, or a specification
|
|
50 |
in some formal language, or a description in some natural language, or simply
|
|
51 |
from intuition — but it has to come from somewhere.</p>
|
|
52 |
|
|
53 |
<p>Assuming that we do have that external source of understanding, the
|
|
54 |
meta-circular interpreter can come in quite handy in codifying the semantics
|
|
55 |
of the language. And, in Emmental's case, <em>altering</em> those semantics:
|
|
56 |
Emmental's MCI supports operations which instruct Emmental's MCI to modify
|
|
57 |
its behaviour.</p>
|
|
58 |
|
|
59 |
<h2>Interpreter Structure</h2>
|
|
60 |
|
|
61 |
<p>To describe the structure of Emmental's MCI, we first examine the
|
|
62 |
general structure of interpreters. If you've ever written a virtual machine
|
|
63 |
executor in, say, C, you've noticed that it has the general form</p>
|
|
64 |
|
|
65 |
<pre>
|
|
66 |
pc = start;
|
|
67 |
while (!done) {
|
|
68 |
switch (instruction[pc]) {
|
|
69 |
case INSTR_ONE:
|
|
70 |
/* implement semantics of INSTR_ONE */
|
|
71 |
pc += advance(INSTR_ONE);
|
|
72 |
break;
|
|
73 |
case INSTR_TWO:
|
|
74 |
/* implement semantics of INSTR_TWO */
|
|
75 |
pc += advance(INSTR_TWO);
|
|
76 |
break;
|
|
77 |
/* ... */
|
|
78 |
case INSTR_HALT:
|
|
79 |
done = 1;
|
|
80 |
break;
|
|
81 |
default:
|
|
82 |
perror("Invalid opcode");
|
|
83 |
}
|
|
84 |
}
|
|
85 |
</pre>
|
|
86 |
|
|
87 |
<p>Note that <code>advance()</code> is some function that computes how far the
|
|
88 |
program counter is advanced on that instruction. This value is typically
|
|
89 |
+1 for most instructions, but more or less than 1 (and dependent on the
|
|
90 |
state of the program) for a handful of "branch" instructions.
|
|
91 |
Note also that <code>advance()</code> would not typically be implemented in C
|
|
92 |
as a function; I'm just showing it like this to emphasize the
|
|
93 |
regular structure.</p>
|
|
94 |
|
|
95 |
<p>From this we infer that the basic structure of an interpreter is
|
|
96 |
a <em>dictionary</em> or <em>map</em> that associates program symbols with operations.
|
|
97 |
There is some extra housekeeping like the fetch-execute cycle that surrounds
|
|
98 |
this dictionary, but this can (hopefully) be handled mostly automatically,
|
|
99 |
freeing us to concentrate on <em>symbols</em> and <em>operations</em>.</p>
|
|
100 |
|
|
101 |
<p>The symbols could be taken from any finite alphabet, but in Emmental,
|
|
102 |
to keep things relatively simple, we'll just use the ASCII character set.
|
|
103 |
(Actually, to be precise, this is the full 8-bit ASCII character set.
|
|
104 |
Non-printable control characters are allowed, as are characters between
|
|
105 |
128 and 255, and each has a distinct meaning. But their representations
|
|
106 |
are not defined.)</p>
|
|
107 |
|
|
108 |
<p>The operations can be thought of, abstractly, as functions which
|
|
109 |
transform program states. Or they can be thought of, concretely, as
|
|
110 |
segments of code — mini-programs which implement these functions.
|
|
111 |
In the case of a meta-circular interpreter, these mini-programs would be written
|
|
112 |
<em>in the language being interpreted</em>.</p>
|
|
113 |
|
|
114 |
<p>To extend this idea to a <em>self-modifying</em> meta-circular interpreter,
|
|
115 |
the operations can be thought of as functions which transform
|
|
116 |
both program states <em>and</em> interpreter definitions.
|
|
117 |
(Alternatively, the interpreter definition could be thought of as
|
|
118 |
part of the program state, although I feel that's a bit gorier a way
|
|
119 |
to look at it, and I prefer the other view, at least for Emmental.)</p>
|
|
120 |
|
|
121 |
<p>In Emmental, most operations leave the interpreter definition unchanged.
|
|
122 |
However, there is one operation which alters the interpreter
|
|
123 |
definition, and it is this altered definition that is used to interpret
|
|
124 |
the remainder of the program.</p>
|
|
125 |
|
|
126 |
<h2>Emmental Semantics (in Emmental)</h2>
|
|
127 |
|
|
128 |
<p>Emmental is essentially a stack-based language. (There's also a queue,
|
|
129 |
but it's not as central.) All operations implicitly
|
|
130 |
get data from, and implicitly deposit results back onto, a single
|
|
131 |
stack. For orthogonality's sake, this stack may contain only ASCII symbols.
|
|
132 |
(And note that trying to pop an empty stack, or dequeue an empty queue,
|
|
133 |
is an error that aborts the program.)</p>
|
|
134 |
|
|
135 |
<p>Note that because we've established that an interpreter (at least,
|
|
136 |
insofar as Emmental ever needs to know) is simply a map that takes
|
|
137 |
symbols to operations, and that operations in Emmental are defined
|
|
138 |
(meta-circularly) as Emmental programs, we can use the following notation
|
|
139 |
to describe interpreters:</p>
|
|
140 |
|
|
141 |
<pre>
|
|
142 |
% → XYZ+*!
|
|
143 |
& → 'ap'ag'ag
|
|
144 |
</pre>
|
|
145 |
|
|
146 |
<p>That is, the symbol <code>%</code>, when encountered in an Emmental
|
|
147 |
program, indicates an operation that
|
|
148 |
is defined by the Emmental program <code>XYZ+*!</code>, and so forth.</p>
|
|
149 |
|
|
150 |
<p>When a main Emmental program begins execution for the first time,
|
|
151 |
it starts with what's called the <em>initial Emmental interpreter</em>.
|
|
152 |
(This fact, of course, doesn't apply to any further point of execution inside
|
|
153 |
an Emmental program, or execution of operations defined in
|
|
154 |
Emmental's MCI, since these would be considered subprograms of a sort.
|
|
155 |
These cases use whichever interpreter happens to be in force in that point in time.)</p>
|
|
156 |
|
|
157 |
<p>The inital Emmental interpreter is defined as follows:</p>
|
|
158 |
|
|
159 |
<pre>
|
|
160 |
a → a
|
|
161 |
b → b
|
|
162 |
c → c
|
|
163 |
...
|
|
164 |
</pre>
|
|
165 |
|
|
166 |
<p>That is, for every symbol <var>x</var> in the ASCII set,
|
|
167 |
<var>x</var> <code>→</code> <var>x</var>.</p>
|
|
168 |
|
|
169 |
<p>Doesn't tell us a lot about Emmental's semantics, does it? No.
|
|
170 |
Nothing at all, really. But remember what I said about needing an external
|
|
171 |
source of understanding, in order to actually get full value out of an MCI.
|
|
172 |
And remember the purpose of Emmental's MCI: it is
|
|
173 |
not there so much to help us understand Emmental, but to allow us to
|
|
174 |
<em>change</em> Emmental, from inside an Emmental program.</p>
|
|
175 |
|
|
176 |
<p>And, for all that our description of the initial Emmental interpreter
|
|
177 |
is unhelpfully tautological, it is not incorrect: the semantics of
|
|
178 |
<code>a</code> can in fact be thought of as being defined by an Emmental
|
|
179 |
program that consists of only one instruction, <code>a</code>. This
|
|
180 |
happy state of affairs comes about because Emmental is stack-based;
|
|
181 |
the "signature" (the requirements for the "before" and "after" stacks)
|
|
182 |
of the symbol <code>a</code> is the same as the signature of the program
|
|
183 |
containing the single symbol <code>a</code>. No extra syntax to specify
|
|
184 |
arity and the like is necessary.</p>
|
|
185 |
|
|
186 |
<p>Above all, don't panic: we <em>will</em> describe what symbols like
|
|
187 |
<code>a</code> actually mean in Emmental, we'll just need to do it in something
|
|
188 |
besides Emmental. In fact, let's do that right now.</p>
|
|
189 |
|
|
190 |
<h2>Emmental Semantics (in English)</h2>
|
|
191 |
|
|
192 |
<h3>Primitive Arithmetic</h3>
|
|
193 |
|
|
194 |
<p><code>#</code> pushes the symbol NUL (ASCII 0) onto the stack.</p>
|
|
195 |
|
|
196 |
<p>The symbols <code>0</code>, <code>1</code>, ... <code>9</code>
|
|
197 |
pop a symbol from the stack, multiply its ASCII value by 10 modulo 256, add
|
|
198 |
the value 0, 1, ... 9 (respectively) to that value modulo 256, and
|
|
199 |
push the resulting symbol back onto the stack.</p>
|
|
200 |
|
|
201 |
<p>The upshot of these 11 operations is that you can push arbitrary
|
|
202 |
symbols onto the stack by spelling out their ASCII values in decimal.
|
|
203 |
For example, <code>#64</code> pushes a <code>@</code> onto the stack.</p>
|
|
204 |
|
|
205 |
<p><code>+</code> pops two symbols off the stack, adds together their
|
|
206 |
ASCII values modulo 256, and pushes the symbol with the resultant ASCII value
|
|
207 |
back onto the stack.</p>
|
|
208 |
|
|
209 |
<p><code>-</code> pops two symbols off the stack, subtracts the ASCII value
|
|
210 |
of the first popped from the ASCII value of the second popped modulo 256,
|
|
211 |
and pushes the symbol with the resultant ASCII value back onto the stack.</p>
|
|
212 |
|
|
213 |
<p><code>~</code> ("log") pops a symbol from the stack, computes the discrete
|
|
214 |
base-2 logarithm of the ASCII value of that symbol,
|
|
215 |
and pushes the symbol with the resultant ASCII value back onto the stack.
|
|
216 |
The discrete base-2 logarithm of a number is the floor or integer part
|
|
217 |
of the base-2 logarithm of that number. Alternately, it is the number of
|
|
218 |
the highest bit position (starting with the LSB = bit position 0)
|
|
219 |
with a bit set when the number is viewed as binary.
|
|
220 |
Because the base-2 logarithm of the number 0 itself is undefined,
|
|
221 |
the number 0 is treated as 256 for this operation; its discrete base-2
|
|
222 |
logarithm is 8.</p>
|
|
223 |
|
|
224 |
<h3>Stack and Queue Manipulation</h3>
|
|
225 |
|
|
226 |
<p><code>^</code> ("enqueue a copy") pops a symbol off the stack, makes a copy of it, pushes
|
|
227 |
it back onto the stack, and enqueues the copy onto the queue.</p>
|
|
228 |
|
|
229 |
<p><code>v</code> ("dequeue") dequeues a symbol from the queue and pushes it onto the
|
|
230 |
stack.</p>
|
|
231 |
|
|
232 |
<p>Using these operations in combination, one can form "discard", "duplicate",
|
|
233 |
"swap", and other more advanced stack manipulation operations. For example,
|
|
234 |
assuming an empty queue and more than two elements on the stack, "swap" can
|
|
235 |
be accomplished with the code <code>^v^-+^^v^v^v-+^v-+^v-+vv</code>.</p>
|
|
236 |
|
|
237 |
<p>Despite this fact, the operation <code>:</code> duplicates the top value
|
|
238 |
of the stack. (Emmental is not an absolutely minimal language; note, for
|
|
239 |
instance, that it has all ten decimal digits as operations when these
|
|
240 |
could surely have been defined in terms of only one or two operations.
|
|
241 |
The reasons for a seperate <code>:</code> operation are given below in the
|
|
242 |
section on Computational Class.)</p>
|
|
243 |
|
|
244 |
<h3>I/O</h3>
|
|
245 |
|
|
246 |
<p><code>.</code> pops a symbol off the stack and sends it to the
|
|
247 |
standard output as an ASCII symbol.</p>
|
|
248 |
|
|
249 |
<p><code>,</code> waits for an ASCII symbol to arrive on standard input,
|
|
250 |
and pushes it onto the stack.</p>
|
|
251 |
|
|
252 |
<h3>Interpreter Modification and Reflection</h3>
|
|
253 |
|
|
254 |
<p>First let's define what it means to <em>pop a string</em> off the stack.
|
|
255 |
Symbols are popped off the stack until a <code>;</code> symbol is
|
|
256 |
found on the stack. The symbols popped off are considered a string
|
|
257 |
in the reverse order they were popped; i.e. the last symbol popped
|
|
258 |
is the first symbol of the string. The <code>;</code> symbol is
|
|
259 |
popped off the stack, but is not made part of the string; it is simply discarded.</p>
|
|
260 |
|
|
261 |
<p>Since an Emmental program is a string, popping a program is the same
|
|
262 |
as popping a string, just that the string is interpreted as a program.</p>
|
|
263 |
|
|
264 |
<p><code>!</code> (sometimes called "supplant")
|
|
265 |
pops a symbol, which we call <var>s</var>, off the stack.
|
|
266 |
Then it pops a program <var>t</var>.
|
|
267 |
It then inserts the association <var>s</var> <code>→</code> <var>t</var> into the
|
|
268 |
interpreter definition. This overwrites whatever mapping of <var>s</var> might have
|
|
269 |
been in the interpreter definition previously. This new interpreter definition
|
|
270 |
is used for all subsequent execution (until it is changed again, of course.)</p>
|
|
271 |
|
|
272 |
<p>Note that <code>!</code> does <em>early binding</em>. That is,
|
|
273 |
the meaning of each symbol in this program <var>t</var> is
|
|
274 |
the meaning of that symbol <em>at the time <code>!</code> is executed</em>.
|
|
275 |
If some subsequent <code>!</code> operation later changes the meaning of one
|
|
276 |
of the symbols that occurs in <var>t</var>, the meaning of <var>t</var>
|
|
277 |
is not changed. The semantics of <var>t</var> are "captured" or "frozen".
|
|
278 |
This implies, among other things, that supplanting some
|
|
279 |
symbol <var>z</var> with itself (a program consisting only of the symbol <var>z</var>)
|
|
280 |
is a no-op, because <var>z</var>'s meaning, at the time that <code>!</code>
|
|
281 |
is executed, is invariably <var>z</var>.</p>
|
|
282 |
|
|
283 |
<p><code>?</code> (sometimes called "eval") pops a symbol, which we call <var>s</var>, off the stack.
|
|
284 |
It then executes that symbol (interprets it as an operation)
|
|
285 |
with the interpreter currently in effect.</p>
|
|
286 |
|
|
287 |
<p>Note that <code>?</code> does <em>late binding</em>. That is,
|
|
288 |
in contrast with <code>!</code>, <code>?</code> never "freezes" the semantic
|
|
289 |
definition of the thing that it is executing. This is true even when
|
|
290 |
<code>?</code> occurs in a operation redefinition (i.e. the program
|
|
291 |
that supplanted some symbol's semantics when an <code>!</code> was executed.)
|
|
292 |
This implies, among other things, that supplanting some symbol <var>z</var>
|
|
293 |
with the program that consists of instructions that push the ASCII value of
|
|
294 |
<var>z</var> onto the stack, followed by a <code>?</code> instruction,
|
|
295 |
creates a <em>cyclic meaning</em> for <var>z</var>.
|
|
296 |
This is because the <var>z</var> that will be executed by the <code>?</code>
|
|
297 |
will always be the present <var>z</var>, that is, the <var>z</var>
|
|
298 |
that is executing the <code>?</code>.</p>
|
|
299 |
|
|
300 |
<p>For convenience's sake, <code>;</code> pushes the symbol <code>;</code>
|
|
301 |
onto the stack.</p>
|
|
302 |
|
|
303 |
<p>All other symbols are no-ops.</p>
|
|
304 |
|
|
305 |
<h2>Computational Class</h2>
|
|
306 |
|
|
307 |
<p>I believe Emmental is Turing-complete with only the operations that
|
|
308 |
have been given so far, but I haven't proved it yet.
|
|
309 |
All the elements are there, and although some of them are somewhat "cramped",
|
|
310 |
they look viable.</p>
|
|
311 |
|
|
312 |
<p>If you want to try thinking about how you could write real programs
|
|
313 |
(like a universal Turing-machine simulator) in Emmental, you might want
|
|
314 |
to skip this section, since it contains "spoilers".</p>
|
|
315 |
|
|
316 |
<p>Repetition can be accomplished by assigning a symbol a cyclic semantics,
|
|
317 |
by use of a <code>?</code> within a <code>!</code>. For example, we can
|
|
318 |
redefine the semantics of <code>0</code> to be <code>#48?</code>. This is
|
|
319 |
simply a program that pushes the symbol <code>0</code> onto the stack and
|
|
320 |
executes it with the current interpreter, and, since <code>0</code> has been
|
|
321 |
redefined to mean <code>#48?</code> in the current interpreter, this will
|
|
322 |
loop forever. The entire program to do this to <code>0</code> and run the
|
|
323 |
infinite loop is:</p>
|
|
324 |
|
|
325 |
<pre>
|
|
326 |
;#35#52#56#63#48!0
|
|
327 |
</pre>
|
|
328 |
|
|
329 |
<p>This technique can also be used to "jump" from one definition to
|
|
330 |
another, by using <code>?</code> to execute some <em>other</em> symbol
|
|
331 |
at the end of a definition (that is, some symbol other than the symbol
|
|
332 |
being defined.)</p>
|
|
333 |
|
|
334 |
<p>Conditionals are a little more restrictive. The trick to them is,
|
|
335 |
strangely, the discrete log operator <code>~</code> in combination
|
|
336 |
with the eval operator <code>?</code>. Since <code>~</code> maps
|
|
337 |
all symbols into a set of nine symbols, and <code>?</code> executes
|
|
338 |
the symbol on the stack, <code>~?</code> will execute one of the
|
|
339 |
symbols from ASCII 0 (NUL) to ASCII 8 (BS). We can then, for instance,
|
|
340 |
define NUL to do one thing, define SOH through BEL to do the same
|
|
341 |
as NUL, and define BS to do some other thing; this essentially
|
|
342 |
distinguishes between 0 (which executes BS) and every other value
|
|
343 |
(which executes NUL). Further, we can use this in conjunction
|
|
344 |
with <code>-</code> to compare two values for equality. So, for
|
|
345 |
example, a program which inputs a character, and outputs Y if
|
|
346 |
the character is M and N otherwise, would be:</p>
|
|
347 |
|
|
348 |
<pre>
|
|
349 |
#59#35#55#56#46#!;##1!;##2!;##3!;##4!;##5!;##6!;##7!#59#35#56#57#46#8!,#77-~?
|
|
350 |
</pre>
|
|
351 |
|
|
352 |
<p>In case NUL through BS are in use for some reason, we can always
|
|
353 |
add 9 to the result of the log (<code>~#9+?</code>)
|
|
354 |
to map the answer onto HT through DC1. Or, of course, any of
|
|
355 |
a great number of other arithmetical mappings of our choosing.
|
|
356 |
The most severe constraint is that there be 9 available symbols
|
|
357 |
to act as "destinations" for our "branch". Even if we never
|
|
358 |
overwrite one "branch" with another (and we can do that in Emmental!)
|
|
359 |
and even if we allocate one extra symbol to be the "launch point"
|
|
360 |
of the branch, we still have room for 25 branches in the ASCII
|
|
361 |
character set.</p>
|
|
362 |
|
|
363 |
<p>So these parts look good. If there's a problem, it's
|
|
364 |
with the queue. Specifically, the problem seems to be
|
|
365 |
the need to know the present size of the queue in order to do stack housework
|
|
366 |
like "duplicate" and the subsequent need for "duplicate" to achieve
|
|
367 |
"discard." (Duplicate can be defined as <code>^v</code>, but this only
|
|
368 |
works when the queue is empty. Discard can be defined as duplicate plus
|
|
369 |
<code>-+</code>, but this only works when there are other elements below
|
|
370 |
the element being discarded. [This last point is not generally a problem
|
|
371 |
since we can push arbitrary values onto the stack before any
|
|
372 |
program.])</p>
|
|
373 |
|
|
374 |
<p>However, if it turns out that we need "duplicate" or "discard" in order
|
|
375 |
to write routines that can handle a variable-sized queue — and that
|
|
376 |
strikes me as likely — then it looks like we have a severe problem.</p>
|
|
377 |
|
|
378 |
<p>Here's one way I could try to deal with it. I could say that the queue
|
|
379 |
is <em>local</em> to the operation being defined (or the main program.)
|
|
380 |
Then you could define <code>:</code> to be <code>^v</code>, and inside
|
|
381 |
<code>:</code>'s definition, the queue would always initially be empty,
|
|
382 |
so the definition would work.</p>
|
|
383 |
|
|
384 |
<p>But... we need the queue to store our global data. For example, if
|
|
385 |
we are going to simulate a Turing machine, we'd need to use the queue
|
|
386 |
to store the tape (perhaps "doubled up", with one symbol of each
|
|
387 |
pair telling us "the next symbol is a simulated tape symbol" or
|
|
388 |
"the next symbol is some housekeeping value.") We can't store the
|
|
389 |
tape on just one stack. And, once you are looping in Emmental, you've
|
|
390 |
left the "main program" forever; you're jumping from definition to
|
|
391 |
definition, and each has their own queue. At best, you'd need to
|
|
392 |
"dump" the queue onto the stack each time you switched definitions,
|
|
393 |
and even then you'd need a loop to do that, and to loop you need to
|
|
394 |
switch definitions. It's a royal mess.</p>
|
|
395 |
|
|
396 |
<p>So here's how I will deal with it. I will add a primitive duplicate
|
|
397 |
operation, <code>:</code>, to Emmental. Proving that Emmental is Turing-complete
|
|
398 |
is still, then, a challenge, although a doable-seeming challenge.
|
|
399 |
I will then propose a more formidable challenge: prove that
|
|
400 |
the language formed by removing the <code>:</code> operation from
|
|
401 |
Emmental is Turing-complete. (Equivalently, prove that the set of
|
|
402 |
Emmental programs that begin with <code>;#0#58!</code> is Turing-complete.
|
|
403 |
The nice thing about Emmental is that you can always shoot yourself in
|
|
404 |
the foot — until you erase your pistol, that is.)</p>
|
|
405 |
|
|
406 |
<p>And if you <em>really</em> like a challenge, try proving that Emmental
|
|
407 |
without <code>~</code> is Turing-complete. I don't think that it is,
|
|
408 |
although it's possible for it to compute parity, at least (input a
|
|
409 |
symbol, output E if its ASCII value is even, and O if it's odd.
|
|
410 |
To accomplish this, multiply the input's ASCII value by 128 by adding 127
|
|
411 |
copies of it to it; this is modulo 256, so the only results can be
|
|
412 |
0 or 128. Define those operations to print out E and O respectively.
|
|
413 |
But that's as far as I've gotten.)</p>
|
|
414 |
|
|
415 |
<h2>Discussion</h2>
|
|
416 |
|
|
417 |
<h3>Design Decisions</h3>
|
|
418 |
|
|
419 |
<p>I would've liked to have given Emmental a <code>'</code> or <code>"</code>
|
|
420 |
instruction similar to Funge's "quote" and "quote-mode" instructions;
|
|
421 |
instructions that treat one or more of the following symbols in the program
|
|
422 |
literally, pushing them, as symbols, onto the stack, instead of executing them.
|
|
423 |
However, such instructions are somewhat problematic, both theoretically and
|
|
424 |
(for the approach I took implementing Emmental) practically. There are
|
|
425 |
two ways of thinking about the problems that arise.</p>
|
|
426 |
|
|
427 |
<p>One is that the function which implements <code>'</code> is given
|
|
428 |
access to the program text itself, and possibly the position within the program,
|
|
429 |
and it uses these to extract the "immediate mode" symbol past the <code>'</code>. This information
|
|
430 |
could be available because these pieces of information are considered extra arguments
|
|
431 |
to the function, or because they are (gorily) considered part of the overall
|
|
432 |
program state. Either way, this operation is given a lot of information to work with,
|
|
433 |
and for consistency (since we want to be all nice and neat and say that all operations
|
|
434 |
have the same signature so that our dictionary has a nice uniform type,) <em>all</em> operations
|
|
435 |
have access to this information. This is almost too much information; that is, it
|
|
436 |
is so much that operations don't really <em>need</em> the dictionary. We could just say
|
|
437 |
there is <em>one</em> operation, defined by a function, and that function is given the
|
|
438 |
current symbol and has to decide what it means through whatever means it likes.</p>
|
|
439 |
|
|
440 |
<p>This approach is very powerful, of course, but it's just not the style that
|
|
441 |
Emmental embodies. (In fact, the idea to view interpreters as dictionaries
|
|
442 |
was one of the foundational design choices for Emmental, to the point where I
|
|
443 |
started constructing a "little theory of interpreters as maps." It really
|
|
444 |
wasn't exploited as much as I think it could have been. If an interpreter is
|
|
445 |
a map of symbols to strings of symbols, it's much more tractable than an
|
|
446 |
opaque function would be; you can define all sorts of operations on them,
|
|
447 |
for example concatenating two interpreters (for all symbols <var>s</var>
|
|
448 |
in interpreter <var>a</var> and interpreter <var>b</var>,
|
|
449 |
<var>c</var>[<var>s</var>] <code>→</code> <var>a</var>[<var>s</var>]<var>b</var>[<var>s</var>] —
|
|
450 |
that sort of thing,) computing union or intersection of interpreters,
|
|
451 |
Cartesian product, etc.)</p>
|
|
452 |
|
|
453 |
<p>The other way of looking at it is to say that there are in fact
|
|
454 |
<em>multiple</em> meta-circular interpreters available inside Emmental, and symbols
|
|
455 |
like <code>'</code> switch temporarily to an alternate MCI. This alternate MCI
|
|
456 |
interprets every symbol as "push this symbol", then reinstates the previous MCI.
|
|
457 |
I like this explication better than the one above — MCIs begin
|
|
458 |
to look a bit like continuations! — but to do it justice would take some
|
|
459 |
work. I envision a language where the program has fine control
|
|
460 |
over which MCI is in effect, possibly by keeping a map from symbols to MCIs,
|
|
461 |
or maybe even being able to push MCIs onto the stack.
|
|
462 |
This is a wee bit much for Emmental too though, and
|
|
463 |
perhaps I'll explore these possibilities in a future language.</p>
|
|
464 |
|
|
465 |
<h3>Turing-completeness</h3>
|
|
466 |
|
|
467 |
<p>You can make the argument that Emmental's way of being Turing-complete
|
|
468 |
is really nothing new: when you redefine some symbol, you're really just
|
|
469 |
defining a new function, and when you use <code>?</code> to execute that
|
|
470 |
symbol from within its own definition, you're just making a recursive
|
|
471 |
function call.</p>
|
|
472 |
|
|
473 |
<p>Well, yes, you can make that argument. But it has to do with how you think
|
|
474 |
about "what is a language", I think. Does a Pascal program fragment which
|
|
475 |
defines a procedure called <code>PrintFibonacci</code> represent another
|
|
476 |
programming language, one different from Pascal? You could certainly
|
|
477 |
say that it does — it's
|
|
478 |
the language Pascal where the token <code>PrintFibonacci</code> has
|
|
479 |
some meaning that it doesn't have in Pascal.</p>
|
|
480 |
|
|
481 |
<p>In that view, any language where you can define procedures, or
|
|
482 |
functions, or standard libraries, or the like, is an extensible language.
|
|
483 |
But even languages where you <em>can't</em> define new procedures or
|
|
484 |
functions is arguably an extensible language. Take some initial Brainfuck
|
|
485 |
program fragment, for instance. After it executes, it leaves the Brainfuck
|
|
486 |
tape and while-stack in some state that depends on the input.
|
|
487 |
Any Brainfuck fragment that executes after that, will execute in that
|
|
488 |
environment, and that environment is arguably a version of the language
|
|
489 |
Brainfuck, suitably extended.</p>
|
|
490 |
|
|
491 |
<p>You don't normally think of it that way, I bet, but you
|
|
492 |
<em>could</em> — and you would need to, to some degree,
|
|
493 |
to claim that Emmental is "just" defining new functions.
|
|
494 |
The reason you don't typically look at languages like this
|
|
495 |
(unless you are very strange) is because it's much more useful to
|
|
496 |
divide the world into "languages" and "programs." And Emmental
|
|
497 |
<em>does</em> make this division, it just makes it in a slightly
|
|
498 |
different place than usual.</p>
|
|
499 |
|
|
500 |
<p>As far as I'm concerned, if I describe what Emmental does as
|
|
501 |
modifying the Emmental language via its MCI, and what Emmental actually does is
|
|
502 |
consistent with the idea of modifying the Emmental language via its MCI, then
|
|
503 |
what Emmental effectively does is modify the Emmental language via its
|
|
504 |
MCI. And if it needs to do this in a certain way in order to simulate
|
|
505 |
a universal Turing machine, then that difference (however slight)
|
|
506 |
sets it apart from systems where this simulation needs to be done by defining
|
|
507 |
recursive functions.</p>
|
|
508 |
|
|
509 |
<h2>Implementation</h2>
|
|
510 |
|
|
511 |
<p><code>emmental.hs</code> is a reference interpreter for Emmental
|
|
512 |
written in Haskell. Run the function <code>emmental</code> on a
|
|
513 |
string; you can also run <code>debug</code> on a string to view
|
|
514 |
the state of the program (stack & queue) during execution.
|
|
515 |
(Note that <code>debug</code> is <em>not</em> able to show
|
|
516 |
program states that occur internal to an operation.)</p>
|
|
517 |
|
|
518 |
<p>Happy interpreter-redefining!
|
|
519 |
<br>Chris Pressey
|
|
520 |
<br>Chicago, IL
|
|
521 |
<br>November 11, 2007</p>
|
|
522 |
|
|
523 |
</body>
|
|
524 |
</html>
|