Commit rel_1_0_2007_1111 - Emmental - git @ Cat's Eye Technologies

Initial import of Emmental version 1.0 revision 2007.1111 sources. Cat's Eye Technologies 12 years ago

2 changed file(s) with 972 addition(s) and 0 deletion(s). Raw diff Collapse all Expand all

+525

-0

doc/emmental.html less more

	0	<html><head><title>The Emmental Programming Language</title></head>
	1	<body>
	2
	3	<h1>The Emmental Programming Language</h1>
	4
	5	<p>November 2007, Chris Pressey, Cat's Eye Technologies</p>
	6
	7	<h2>Introduction</h2>
	8
	9	<p>Emmental is a self-modifying programming language. That is not to say
	10	that is a language in which programs are self-modifying; rather, it is
	11	the language itself, as defined by a meta-circular interpreter, that can be
	12	modified during the course of a running program. Indeed, this is how
	13	Emmental, without conventional conditional and repetition/recursion constructs,
	14	achieves Turing-completeness.</p>
	15
	16	<h2>Meta-circular Interpreters</h2>
	17
	18	<p>One way to attempt to define a language is by giving what's called
	19	a <em>meta-circular interpreter</em> (often shortened to "MCI" in this document.)
	20	This is an interpreter for some language which is written in that same language
	21	(or at least, a language which is very close to it.)</p>
	22
	23	<p>Meta-circular interpreters were a popular way to the describe the
	24	semantics of programming languages, especially
	25	LISP-like languages, and especially before the advent of denotational
	26	semantics. The term "meta-circular" was apparently coined by John C. Reynolds
	27	in his paper "Definitional Interpreters for Higher-Order Programming
	28	Languages" (1972 Proceedings ACM National Conference.)</p>
	29
	30	<p>Of course, in the real world, MCI's are not often used.
	31	They certainly <em>can</em> be used: if you have a working Scheme interpreter
	32	that came with your computer system, there
	33	is nothing stopping you from writing another Scheme interpreter in Scheme,
	34	and running your programs on your interpreter (which is itself running on your
	35	system's interpreter.) However, this is quite a bit
	36	less efficient due to the duplication of effort. A somewhat more realistic
	37	case might be if your system came with, say, a Scheme compiler. You might then feed your
	38	Scheme interpreter (written in Scheme) through that to make a native Scheme
	39	interpreter, and use that to interpret your programs.
	40	(In this setup, the interpreter is usually described as "self-hosting"
	41	rather than "meta-circular".)</p>
	42
	43	<p>But, as should be obvious, you already need an implementation of Scheme
	44	for your Scheme interpreter written in Scheme to be of much practical
	45	use to you. If your meta-circular interpreter is all you have, you won't
	46	be able to use it to run or understand Scheme programs. Because the MCI is
	47	defined in terms of itself, you'll need some other source of "understanding
	48	how it works" to make it complete. This understanding might come
	49	from an implementation in some other programming language, or a specification
	50	in some formal language, or a description in some natural language, or simply
	51	from intuition — but it has to come from somewhere.</p>
	52
	53	<p>Assuming that we do have that external source of understanding, the
	54	meta-circular interpreter can come in quite handy in codifying the semantics
	55	of the language. And, in Emmental's case, <em>altering</em> those semantics:
	56	Emmental's MCI supports operations which instruct Emmental's MCI to modify
	57	its behaviour.</p>
	58
	59	<h2>Interpreter Structure</h2>
	60
	61	<p>To describe the structure of Emmental's MCI, we first examine the
	62	general structure of interpreters. If you've ever written a virtual machine
	63	executor in, say, C, you've noticed that it has the general form</p>
	64
	65	<pre>
	66	pc = start;
	67	while (!done) {
	68	switch (instruction[pc]) {
	69	case INSTR_ONE:
	70	/* implement semantics of INSTR_ONE */
	71	pc += advance(INSTR_ONE);
	72	break;
	73	case INSTR_TWO:
	74	/* implement semantics of INSTR_TWO */
	75	pc += advance(INSTR_TWO);
	76	break;
	77	/* ... */
	78	case INSTR_HALT:
	79	done = 1;
	80	break;
	81	default:
	82	perror("Invalid opcode");
	83	}
	84	}
	85	</pre>
	86
	87	<p>Note that <code>advance()</code> is some function that computes how far the
	88	program counter is advanced on that instruction. This value is typically
	89	+1 for most instructions, but more or less than 1 (and dependent on the
	90	state of the program) for a handful of "branch" instructions.
	91	Note also that <code>advance()</code> would not typically be implemented in C
	92	as a function; I'm just showing it like this to emphasize the
	93	regular structure.</p>
	94
	95	<p>From this we infer that the basic structure of an interpreter is
	96	a <em>dictionary</em> or <em>map</em> that associates program symbols with operations.
	97	There is some extra housekeeping like the fetch-execute cycle that surrounds
	98	this dictionary, but this can (hopefully) be handled mostly automatically,
	99	freeing us to concentrate on <em>symbols</em> and <em>operations</em>.</p>
	100
	101	<p>The symbols could be taken from any finite alphabet, but in Emmental,
	102	to keep things relatively simple, we'll just use the ASCII character set.
	103	(Actually, to be precise, this is the full 8-bit ASCII character set.
	104	Non-printable control characters are allowed, as are characters between
	105	128 and 255, and each has a distinct meaning. But their representations
	106	are not defined.)</p>
	107
	108	<p>The operations can be thought of, abstractly, as functions which
	109	transform program states. Or they can be thought of, concretely, as
	110	segments of code — mini-programs which implement these functions.
	111	In the case of a meta-circular interpreter, these mini-programs would be written
	112	<em>in the language being interpreted</em>.</p>
	113
	114	<p>To extend this idea to a <em>self-modifying</em> meta-circular interpreter,
	115	the operations can be thought of as functions which transform
	116	both program states <em>and</em> interpreter definitions.
	117	(Alternatively, the interpreter definition could be thought of as
	118	part of the program state, although I feel that's a bit gorier a way
	119	to look at it, and I prefer the other view, at least for Emmental.)</p>
	120
	121	<p>In Emmental, most operations leave the interpreter definition unchanged.
	122	However, there is one operation which alters the interpreter
	123	definition, and it is this altered definition that is used to interpret
	124	the remainder of the program.</p>
	125
	126	<h2>Emmental Semantics (in Emmental)</h2>
	127
	128	<p>Emmental is essentially a stack-based language. (There's also a queue,
	129	but it's not as central.) All operations implicitly
	130	get data from, and implicitly deposit results back onto, a single
	131	stack. For orthogonality's sake, this stack may contain only ASCII symbols.
	132	(And note that trying to pop an empty stack, or dequeue an empty queue,
	133	is an error that aborts the program.)</p>
	134
	135	<p>Note that because we've established that an interpreter (at least,
	136	insofar as Emmental ever needs to know) is simply a map that takes
	137	symbols to operations, and that operations in Emmental are defined
	138	(meta-circularly) as Emmental programs, we can use the following notation
	139	to describe interpreters:</p>
	140
	141	<pre>
	142	% → XYZ+*!
	143	& → 'ap'ag'ag
	144	</pre>
	145
	146	<p>That is, the symbol <code>%</code>, when encountered in an Emmental
	147	program, indicates an operation that
	148	is defined by the Emmental program <code>XYZ+*!</code>, and so forth.</p>
	149
	150	<p>When a main Emmental program begins execution for the first time,
	151	it starts with what's called the <em>initial Emmental interpreter</em>.
	152	(This fact, of course, doesn't apply to any further point of execution inside
	153	an Emmental program, or execution of operations defined in
	154	Emmental's MCI, since these would be considered subprograms of a sort.
	155	These cases use whichever interpreter happens to be in force in that point in time.)</p>
	156
	157	<p>The inital Emmental interpreter is defined as follows:</p>
	158
	159	<pre>
	160	a → a
	161	b → b
	162	c → c
	163	...
	164	</pre>
	165
	166	<p>That is, for every symbol <var>x</var> in the ASCII set,
	167	<var>x</var> <code>→</code> <var>x</var>.</p>
	168
	169	<p>Doesn't tell us a lot about Emmental's semantics, does it? No.
	170	Nothing at all, really. But remember what I said about needing an external
	171	source of understanding, in order to actually get full value out of an MCI.
	172	And remember the purpose of Emmental's MCI: it is
	173	not there so much to help us understand Emmental, but to allow us to
	174	<em>change</em> Emmental, from inside an Emmental program.</p>
	175
	176	<p>And, for all that our description of the initial Emmental interpreter
	177	is unhelpfully tautological, it is not incorrect: the semantics of
	178	<code>a</code> can in fact be thought of as being defined by an Emmental
	179	program that consists of only one instruction, <code>a</code>. This
	180	happy state of affairs comes about because Emmental is stack-based;
	181	the "signature" (the requirements for the "before" and "after" stacks)
	182	of the symbol <code>a</code> is the same as the signature of the program
	183	containing the single symbol <code>a</code>. No extra syntax to specify
	184	arity and the like is necessary.</p>
	185
	186	<p>Above all, don't panic: we <em>will</em> describe what symbols like
	187	<code>a</code> actually mean in Emmental, we'll just need to do it in something
	188	besides Emmental. In fact, let's do that right now.</p>
	189
	190	<h2>Emmental Semantics (in English)</h2>
	191
	192	<h3>Primitive Arithmetic</h3>
	193
	194	<p><code>#</code> pushes the symbol NUL (ASCII 0) onto the stack.</p>
	195
	196	<p>The symbols <code>0</code>, <code>1</code>, ... <code>9</code>
	197	pop a symbol from the stack, multiply its ASCII value by 10 modulo 256, add
	198	the value 0, 1, ... 9 (respectively) to that value modulo 256, and
	199	push the resulting symbol back onto the stack.</p>
	200
	201	<p>The upshot of these 11 operations is that you can push arbitrary
	202	symbols onto the stack by spelling out their ASCII values in decimal.
	203	For example, <code>#64</code> pushes a <code>@</code> onto the stack.</p>
	204
	205	<p><code>+</code> pops two symbols off the stack, adds together their
	206	ASCII values modulo 256, and pushes the symbol with the resultant ASCII value
	207	back onto the stack.</p>
	208
	209	<p><code>-</code> pops two symbols off the stack, subtracts the ASCII value
	210	of the first popped from the ASCII value of the second popped modulo 256,
	211	and pushes the symbol with the resultant ASCII value back onto the stack.</p>
	212
	213	<p><code>~</code> ("log") pops a symbol from the stack, computes the discrete
	214	base-2 logarithm of the ASCII value of that symbol,
	215	and pushes the symbol with the resultant ASCII value back onto the stack.
	216	The discrete base-2 logarithm of a number is the floor or integer part
	217	of the base-2 logarithm of that number. Alternately, it is the number of
	218	the highest bit position (starting with the LSB = bit position 0)
	219	with a bit set when the number is viewed as binary.
	220	Because the base-2 logarithm of the number 0 itself is undefined,
	221	the number 0 is treated as 256 for this operation; its discrete base-2
	222	logarithm is 8.</p>
	223
	224	<h3>Stack and Queue Manipulation</h3>
	225
	226	<p><code>^</code> ("enqueue a copy") pops a symbol off the stack, makes a copy of it, pushes
	227	it back onto the stack, and enqueues the copy onto the queue.</p>
	228
	229	<p><code>v</code> ("dequeue") dequeues a symbol from the queue and pushes it onto the
	230	stack.</p>
	231
	232	<p>Using these operations in combination, one can form "discard", "duplicate",
	233	"swap", and other more advanced stack manipulation operations. For example,
	234	assuming an empty queue and more than two elements on the stack, "swap" can
	235	be accomplished with the code <code>^v^-+^^v^v^v-+^v-+^v-+vv</code>.</p>
	236
	237	<p>Despite this fact, the operation <code>:</code> duplicates the top value
	238	of the stack. (Emmental is not an absolutely minimal language; note, for
	239	instance, that it has all ten decimal digits as operations when these
	240	could surely have been defined in terms of only one or two operations.
	241	The reasons for a seperate <code>:</code> operation are given below in the
	242	section on Computational Class.)</p>
	243
	244	<h3>I/O</h3>
	245
	246	<p><code>.</code> pops a symbol off the stack and sends it to the
	247	standard output as an ASCII symbol.</p>
	248
	249	<p><code>,</code> waits for an ASCII symbol to arrive on standard input,
	250	and pushes it onto the stack.</p>
	251
	252	<h3>Interpreter Modification and Reflection</h3>
	253
	254	<p>First let's define what it means to <em>pop a string</em> off the stack.
	255	Symbols are popped off the stack until a <code>;</code> symbol is
	256	found on the stack. The symbols popped off are considered a string
	257	in the reverse order they were popped; i.e. the last symbol popped
	258	is the first symbol of the string. The <code>;</code> symbol is
	259	popped off the stack, but is not made part of the string; it is simply discarded.</p>
	260
	261	<p>Since an Emmental program is a string, popping a program is the same
	262	as popping a string, just that the string is interpreted as a program.</p>
	263
	264	<p><code>!</code> (sometimes called "supplant")
	265	pops a symbol, which we call <var>s</var>, off the stack.
	266	Then it pops a program <var>t</var>.
	267	It then inserts the association <var>s</var> <code>→</code> <var>t</var> into the
	268	interpreter definition. This overwrites whatever mapping of <var>s</var> might have
	269	been in the interpreter definition previously. This new interpreter definition
	270	is used for all subsequent execution (until it is changed again, of course.)</p>
	271
	272	<p>Note that <code>!</code> does <em>early binding</em>. That is,
	273	the meaning of each symbol in this program <var>t</var> is
	274	the meaning of that symbol <em>at the time <code>!</code> is executed</em>.
	275	If some subsequent <code>!</code> operation later changes the meaning of one
	276	of the symbols that occurs in <var>t</var>, the meaning of <var>t</var>
	277	is not changed. The semantics of <var>t</var> are "captured" or "frozen".
	278	This implies, among other things, that supplanting some
	279	symbol <var>z</var> with itself (a program consisting only of the symbol <var>z</var>)
	280	is a no-op, because <var>z</var>'s meaning, at the time that <code>!</code>
	281	is executed, is invariably <var>z</var>.</p>
	282
	283	<p><code>?</code> (sometimes called "eval") pops a symbol, which we call <var>s</var>, off the stack.
	284	It then executes that symbol (interprets it as an operation)
	285	with the interpreter currently in effect.</p>
	286
	287	<p>Note that <code>?</code> does <em>late binding</em>. That is,
	288	in contrast with <code>!</code>, <code>?</code> never "freezes" the semantic
	289	definition of the thing that it is executing. This is true even when
	290	<code>?</code> occurs in a operation redefinition (i.e. the program
	291	that supplanted some symbol's semantics when an <code>!</code> was executed.)
	292	This implies, among other things, that supplanting some symbol <var>z</var>
	293	with the program that consists of instructions that push the ASCII value of
	294	<var>z</var> onto the stack, followed by a <code>?</code> instruction,
	295	creates a <em>cyclic meaning</em> for <var>z</var>.
	296	This is because the <var>z</var> that will be executed by the <code>?</code>
	297	will always be the present <var>z</var>, that is, the <var>z</var>
	298	that is executing the <code>?</code>.</p>
	299
	300	<p>For convenience's sake, <code>;</code> pushes the symbol <code>;</code>
	301	onto the stack.</p>
	302
	303	<p>All other symbols are no-ops.</p>
	304
	305	<h2>Computational Class</h2>
	306
	307	<p>I believe Emmental is Turing-complete with only the operations that
	308	have been given so far, but I haven't proved it yet.
	309	All the elements are there, and although some of them are somewhat "cramped",
	310	they look viable.</p>
	311
	312	<p>If you want to try thinking about how you could write real programs
	313	(like a universal Turing-machine simulator) in Emmental, you might want
	314	to skip this section, since it contains "spoilers".</p>
	315
	316	<p>Repetition can be accomplished by assigning a symbol a cyclic semantics,
	317	by use of a <code>?</code> within a <code>!</code>. For example, we can
	318	redefine the semantics of <code>0</code> to be <code>#48?</code>. This is
	319	simply a program that pushes the symbol <code>0</code> onto the stack and
	320	executes it with the current interpreter, and, since <code>0</code> has been
	321	redefined to mean <code>#48?</code> in the current interpreter, this will
	322	loop forever. The entire program to do this to <code>0</code> and run the
	323	infinite loop is:</p>
	324
	325	<pre>
	326	;#35#52#56#63#48!0
	327	</pre>
	328
	329	<p>This technique can also be used to "jump" from one definition to
	330	another, by using <code>?</code> to execute some <em>other</em> symbol
	331	at the end of a definition (that is, some symbol other than the symbol
	332	being defined.)</p>
	333
	334	<p>Conditionals are a little more restrictive. The trick to them is,
	335	strangely, the discrete log operator <code>~</code> in combination
	336	with the eval operator <code>?</code>. Since <code>~</code> maps
	337	all symbols into a set of nine symbols, and <code>?</code> executes
	338	the symbol on the stack, <code>~?</code> will execute one of the
	339	symbols from ASCII 0 (NUL) to ASCII 8 (BS). We can then, for instance,
	340	define NUL to do one thing, define SOH through BEL to do the same
	341	as NUL, and define BS to do some other thing; this essentially
	342	distinguishes between 0 (which executes BS) and every other value
	343	(which executes NUL). Further, we can use this in conjunction
	344	with <code>-</code> to compare two values for equality. So, for
	345	example, a program which inputs a character, and outputs Y if
	346	the character is M and N otherwise, would be:</p>
	347
	348	<pre>
	349	#59#35#55#56#46#!;##1!;##2!;##3!;##4!;##5!;##6!;##7!#59#35#56#57#46#8!,#77-~?
	350	</pre>
	351
	352	<p>In case NUL through BS are in use for some reason, we can always
	353	add 9 to the result of the log (<code>~#9+?</code>)
	354	to map the answer onto HT through DC1. Or, of course, any of
	355	a great number of other arithmetical mappings of our choosing.
	356	The most severe constraint is that there be 9 available symbols
	357	to act as "destinations" for our "branch". Even if we never
	358	overwrite one "branch" with another (and we can do that in Emmental!)
	359	and even if we allocate one extra symbol to be the "launch point"
	360	of the branch, we still have room for 25 branches in the ASCII
	361	character set.</p>
	362
	363	<p>So these parts look good. If there's a problem, it's
	364	with the queue. Specifically, the problem seems to be
	365	the need to know the present size of the queue in order to do stack housework
	366	like "duplicate" and the subsequent need for "duplicate" to achieve
	367	"discard." (Duplicate can be defined as <code>^v</code>, but this only
	368	works when the queue is empty. Discard can be defined as duplicate plus
	369	<code>-+</code>, but this only works when there are other elements below
	370	the element being discarded. [This last point is not generally a problem
	371	since we can push arbitrary values onto the stack before any
	372	program.])</p>
	373
	374	<p>However, if it turns out that we need "duplicate" or "discard" in order
	375	to write routines that can handle a variable-sized queue — and that
	376	strikes me as likely — then it looks like we have a severe problem.</p>
	377
	378	<p>Here's one way I could try to deal with it. I could say that the queue
	379	is <em>local</em> to the operation being defined (or the main program.)
	380	Then you could define <code>:</code> to be <code>^v</code>, and inside
	381	<code>:</code>'s definition, the queue would always initially be empty,
	382	so the definition would work.</p>
	383
	384	<p>But... we need the queue to store our global data. For example, if
	385	we are going to simulate a Turing machine, we'd need to use the queue
	386	to store the tape (perhaps "doubled up", with one symbol of each
	387	pair telling us "the next symbol is a simulated tape symbol" or
	388	"the next symbol is some housekeeping value.") We can't store the
	389	tape on just one stack. And, once you are looping in Emmental, you've
	390	left the "main program" forever; you're jumping from definition to
	391	definition, and each has their own queue. At best, you'd need to
	392	"dump" the queue onto the stack each time you switched definitions,
	393	and even then you'd need a loop to do that, and to loop you need to
	394	switch definitions. It's a royal mess.</p>
	395
	396	<p>So here's how I will deal with it. I will add a primitive duplicate
	397	operation, <code>:</code>, to Emmental. Proving that Emmental is Turing-complete
	398	is still, then, a challenge, although a doable-seeming challenge.
	399	I will then propose a more formidable challenge: prove that
	400	the language formed by removing the <code>:</code> operation from
	401	Emmental is Turing-complete. (Equivalently, prove that the set of
	402	Emmental programs that begin with <code>;#0#58!</code> is Turing-complete.
	403	The nice thing about Emmental is that you can always shoot yourself in
	404	the foot — until you erase your pistol, that is.)</p>
	405
	406	<p>And if you <em>really</em> like a challenge, try proving that Emmental
	407	without <code>~</code> is Turing-complete. I don't think that it is,
	408	although it's possible for it to compute parity, at least (input a
	409	symbol, output E if its ASCII value is even, and O if it's odd.
	410	To accomplish this, multiply the input's ASCII value by 128 by adding 127
	411	copies of it to it; this is modulo 256, so the only results can be
	412	0 or 128. Define those operations to print out E and O respectively.
	413	But that's as far as I've gotten.)</p>
	414
	415	<h2>Discussion</h2>
	416
	417	<h3>Design Decisions</h3>
	418
	419	<p>I would've liked to have given Emmental a <code>'</code> or <code>"</code>
	420	instruction similar to Funge's "quote" and "quote-mode" instructions;
	421	instructions that treat one or more of the following symbols in the program
	422	literally, pushing them, as symbols, onto the stack, instead of executing them.
	423	However, such instructions are somewhat problematic, both theoretically and
	424	(for the approach I took implementing Emmental) practically. There are
	425	two ways of thinking about the problems that arise.</p>
	426
	427	<p>One is that the function which implements <code>'</code> is given
	428	access to the program text itself, and possibly the position within the program,
	429	and it uses these to extract the "immediate mode" symbol past the <code>'</code>. This information
	430	could be available because these pieces of information are considered extra arguments
	431	to the function, or because they are (gorily) considered part of the overall
	432	program state. Either way, this operation is given a lot of information to work with,
	433	and for consistency (since we want to be all nice and neat and say that all operations
	434	have the same signature so that our dictionary has a nice uniform type,) <em>all</em> operations
	435	have access to this information. This is almost too much information; that is, it
	436	is so much that operations don't really <em>need</em> the dictionary. We could just say
	437	there is <em>one</em> operation, defined by a function, and that function is given the
	438	current symbol and has to decide what it means through whatever means it likes.</p>
	439
	440	<p>This approach is very powerful, of course, but it's just not the style that
	441	Emmental embodies. (In fact, the idea to view interpreters as dictionaries
	442	was one of the foundational design choices for Emmental, to the point where I
	443	started constructing a "little theory of interpreters as maps." It really
	444	wasn't exploited as much as I think it could have been. If an interpreter is
	445	a map of symbols to strings of symbols, it's much more tractable than an
	446	opaque function would be; you can define all sorts of operations on them,
	447	for example concatenating two interpreters (for all symbols <var>s</var>
	448	in interpreter <var>a</var> and interpreter <var>b</var>,
	449	<var>c</var>[<var>s</var>] <code>→</code> <var>a</var>[<var>s</var>]<var>b</var>[<var>s</var>] —
	450	that sort of thing,) computing union or intersection of interpreters,
	451	Cartesian product, etc.)</p>
	452
	453	<p>The other way of looking at it is to say that there are in fact
	454	<em>multiple</em> meta-circular interpreters available inside Emmental, and symbols
	455	like <code>'</code> switch temporarily to an alternate MCI. This alternate MCI
	456	interprets every symbol as "push this symbol", then reinstates the previous MCI.
	457	I like this explication better than the one above — MCIs begin
	458	to look a bit like continuations! — but to do it justice would take some
	459	work. I envision a language where the program has fine control
	460	over which MCI is in effect, possibly by keeping a map from symbols to MCIs,
	461	or maybe even being able to push MCIs onto the stack.
	462	This is a wee bit much for Emmental too though, and
	463	perhaps I'll explore these possibilities in a future language.</p>
	464
	465	<h3>Turing-completeness</h3>
	466
	467	<p>You can make the argument that Emmental's way of being Turing-complete
	468	is really nothing new: when you redefine some symbol, you're really just
	469	defining a new function, and when you use <code>?</code> to execute that
	470	symbol from within its own definition, you're just making a recursive
	471	function call.</p>
	472
	473	<p>Well, yes, you can make that argument. But it has to do with how you think
	474	about "what is a language", I think. Does a Pascal program fragment which
	475	defines a procedure called <code>PrintFibonacci</code> represent another
	476	programming language, one different from Pascal? You could certainly
	477	say that it does — it's
	478	the language Pascal where the token <code>PrintFibonacci</code> has
	479	some meaning that it doesn't have in Pascal.</p>
	480
	481	<p>In that view, any language where you can define procedures, or
	482	functions, or standard libraries, or the like, is an extensible language.
	483	But even languages where you <em>can't</em> define new procedures or
	484	functions is arguably an extensible language. Take some initial Brainfuck
	485	program fragment, for instance. After it executes, it leaves the Brainfuck
	486	tape and while-stack in some state that depends on the input.
	487	Any Brainfuck fragment that executes after that, will execute in that
	488	environment, and that environment is arguably a version of the language
	489	Brainfuck, suitably extended.</p>
	490
	491	<p>You don't normally think of it that way, I bet, but you
	492	<em>could</em> — and you would need to, to some degree,
	493	to claim that Emmental is "just" defining new functions.
	494	The reason you don't typically look at languages like this
	495	(unless you are very strange) is because it's much more useful to
	496	divide the world into "languages" and "programs." And Emmental
	497	<em>does</em> make this division, it just makes it in a slightly
	498	different place than usual.</p>
	499
	500	<p>As far as I'm concerned, if I describe what Emmental does as
	501	modifying the Emmental language via its MCI, and what Emmental actually does is
	502	consistent with the idea of modifying the Emmental language via its MCI, then
	503	what Emmental effectively does is modify the Emmental language via its
	504	MCI. And if it needs to do this in a certain way in order to simulate
	505	a universal Turing machine, then that difference (however slight)
	506	sets it apart from systems where this simulation needs to be done by defining
	507	recursive functions.</p>
	508
	509	<h2>Implementation</h2>
	510
	511	<p><code>emmental.hs</code> is a reference interpreter for Emmental
	512	written in Haskell. Run the function <code>emmental</code> on a
	513	string; you can also run <code>debug</code> on a string to view
	514	the state of the program (stack & queue) during execution.
	515	(Note that <code>debug</code> is <em>not</em> able to show
	516	program states that occur internal to an operation.)</p>
	517
	518	<p>Happy interpreter-redefining!
	519	<br>Chris Pressey
	520	<br>Chicago, IL
	521	<br>November 11, 2007</p>
	522
	523	</body>
	524	</html>

+447

-0

src/emmental.hs less more

	0	--
	1	-- emmental.hs
	2	-- Interpreter for the Emmental Programming Language
	3	-- Chris Pressey, Cat's Eye Technologies
	4	--
	5	-- $Id: emmental.hs 5 2007-11-12 04:36:43Z catseye $
	6	--
	7
	8	--
	9	-- Copyright (c)2007 Cat's Eye Technologies. All rights reserved.
	10	--
	11	-- Redistribution and use in source and binary forms, with or without
	12	-- modification, are permitted provided that the following conditions
	13	-- are met:
	14	--
	15	-- 1. Redistributions of source code must retain the above copyright
	16	-- notices, this list of conditions and the following disclaimer.
	17	-- 2. Redistributions in binary form must reproduce the above copyright
	18	-- notices, this list of conditions, and the following disclaimer in
	19	-- the documentation and/or other materials provided with the
	20	-- distribution.
	21	-- 3. Neither the names of the copyright holders nor the names of their
	22	-- contributors may be used to endorse or promote products derived
	23	-- from this software without specific prior written permission.
	24	--
	25	-- THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
	26	-- ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
	27	-- LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
	28	-- FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
	29	-- COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
	30	-- INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
	31	-- BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
	32	-- LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
	33	-- CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
	34	-- LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
	35	-- ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
	36	-- POSSIBILITY OF SUCH DAMAGE.
	37	--
	38
	39
	40	import qualified Data.Map as Map
	41	import qualified Data.Char as Char
	42
	43	-----------------------------------------------------------------------
	44	-- ============================ Symbols ============================ --
	45	-----------------------------------------------------------------------
	46
	47	type Symbol = Char
	48
	49
	50	-----------------------------------------------------------------------
	51	-- ======================== Program States ========================= --
	52	-----------------------------------------------------------------------
	53
	54	data State = State [Symbol] [Symbol]
	55	deriving (Ord, Eq, Show)
	56
	57	pop (State (head:tail) queue) = (head, State tail queue)
	58	push (State list queue) sym = (State (sym:list) queue)
	59
	60	popString (State (';':tail) queue) = ([], State tail queue)
	61	popString (State (head:tail) queue) =
	62	let
	63	(string, state') = popString (State tail queue)
	64	in
	65	(string ++ [head], state')
	66
	67	enqueue (State stack queue) symbol = State stack (symbol:queue)
	68	dequeue (State stack queue) =
	69	let
	70	symbol = last queue
	71	queue' = init queue
	72	in
	73	(symbol, State stack queue')
	74
	75
	76	-----------------------------------------------------------------------
	77	-- ========================= Interpreters ========================== --
	78	-----------------------------------------------------------------------
	79
	80	data Interpreter = Interp (Map.Map Symbol Operation)
	81
	82	fetch (Interp map) sym = Map.findWithDefault (fitRegOp opNop) sym map
	83	supplant (Interp map) sym op = (Interp (Map.insert sym op map))
	84
	85
	86	-----------------------------------------------------------------------
	87	-- ========================== Operations =========================== --
	88	-----------------------------------------------------------------------
	89
	90	type Operation = State -> Interpreter -> IO (State, Interpreter)
	91
	92	composeOps :: Operation -> Operation -> Operation
	93
	94	composeOps op1 op2 = f where
	95	f state interpreter = do
	96	(state', interpreter') <- op1 state interpreter
	97	op2 state' interpreter'
	98
	99	createOp :: Interpreter -> [Symbol] -> Operation
	100
	101	createOp interpreter [] =
	102	(fitRegOp opNop)
	103	createOp interpreter (head:tail) =
	104	composeOps (fetch interpreter head) (createOp interpreter tail)
	105
	106	--
	107	-- It's useful for us to express a lot of our operators as non-monadic
	108	-- functions that don't affect the interpreter. This is a little "adapter"
	109	-- function that lets us create monadic functions with the right signature
	110	-- from them.
	111	--
	112
	113	fitRegOp :: (State -> State) -> Operation
	114
	115	fitRegOp regop = f where
	116	f state interpreter =
	117	let
	118	state' = regop state
	119	in
	120	do return (state', interpreter)
	121
	122
	123	------------------------------------------------------------
	124	--------------- The operations themselves. -----------------
	125	------------------------------------------------------------
	126
	127	--
	128	-- Redefine the meaning of the symbol on the stack with
	129	-- a mini-program also popped off the stack.
	130	--
	131
	132	opSupplant state interpreter =
	133	let
	134	(opSym, state') = pop state
	135	(newOpDefn, state'') = popString state'
	136	newOp = createOp interpreter newOpDefn
	137	in
	138	do return (state'', supplant interpreter opSym newOp)
	139
	140	--
	141	-- Execute the symbol on the stack with the current interpreter.
	142	--
	143
	144	opEval state interpreter =
	145	let
	146	(opSym, state') = pop state
	147	newOp = createOp interpreter [opSym]
	148	in
	149	newOp state' interpreter
	150
	151	--
	152	-- I/O.
	153	--
	154
	155	opInput state interpreter = do
	156	symbol <- getChar
	157	do return (push state symbol, interpreter)
	158
	159	opOutput state interpreter =
	160	let
	161	(symbol, state') = pop state
	162	in do
	163	putChar symbol
	164	return (state', interpreter)
	165
	166	--
	167	-- Primitive arithmetic.
	168	--
	169
	170	opAdd state =
	171	let
	172	(symA, state') = pop state
	173	(symB, state'') = pop state'
	174	in
	175	push state'' (Char.chr (((Char.ord symB) + (Char.ord symA)) `mod` 256))
	176
	177	opSubtract state =
	178	let
	179	(symA, state') = pop state
	180	(symB, state'') = pop state'
	181	in
	182	push state'' (Char.chr (((Char.ord symB) - (Char.ord symA)) `mod` 256))
	183
	184	discreteLog 0 = 8
	185	discreteLog 1 = 0
	186	discreteLog 2 = 1
	187	discreteLog n = (discreteLog (n `div` 2)) + 1
	188
	189	opDiscreteLog state =
	190	let
	191	(symbol, state') = pop state
	192	in
	193	push state' (Char.chr (discreteLog (Char.ord symbol)))
	194
	195	--
	196	-- Stack manipulation.
	197	--
	198
	199	--
	200	-- Pop the top symbol of the stack, make a copy of it, push it back onto the
	201	-- stack, and enqueue the copy onto the queue.
	202	--
	203
	204	opEnqueueCopy state =
	205	let
	206	(sym, _) = pop state
	207	in
	208	enqueue state sym
	209
	210	--
	211	-- Dequeue a symbol from the queue and push it onto the stack.
	212	--
	213
	214	opDequeue state =
	215	let
	216	(sym, state') = dequeue state
	217	in
	218	push state' sym
	219
	220	--
	221	-- Duplicate the top symbol of the stack.
	222	--
	223
	224	opDuplicate state =
	225	let
	226	(symbol, _) = pop state
	227	in
	228	push state symbol
	229
	230	--
	231	-- Miscellaneous operations.
	232	--
	233
	234	opNop state =
	235	state
	236
	237	--
	238	-- Parameterizable operations.
	239	--
	240
	241	opPushValue value state =
	242	push state (Char.chr value)
	243
	244	opAccumValue value state =
	245	let
	246	(sym, state') = pop state
	247	value' = ((Char.ord sym) * 10) + value
	248	in
	249	push state' (Char.chr (value' `mod` 256))
	250
	251
	252	-----------------------------------------------------------------------
	253	-- ===================== Debugging Functions ======================= --
	254	-----------------------------------------------------------------------
	255
	256	type Debugger = State -> Interpreter -> IO ()
	257
	258	debugNop s i = do
	259	return ()
	260
	261	debugPrintState s i = do
	262	putStr ((show s) ++ "\n")
	263	return ()
	264
	265
	266	-----------------------------------------------------------------------
	267	-- ============================ Executor =========================== --
	268	-----------------------------------------------------------------------
	269
	270	execute :: [Symbol] -> State -> Interpreter -> Debugger -> IO (State, Interpreter)
	271
	272	execute [] state interpreter debugger =
	273	return (state, interpreter)
	274	execute (opSym:program') state interpreter debugger =
	275	let
	276	operation = fetch interpreter opSym
	277	in do
	278	(state', interpreter') <- operation state interpreter
	279	debugger state' interpreter'
	280	execute program' state' interpreter' debugger
	281
	282
	283	-----------------------------------------------------------------------
	284	-- ====================== Top-Level Function ======================= --
	285	-----------------------------------------------------------------------
	286
	287	initialInterpreter = Interp
	288	(Map.fromList
	289	[
	290	('.', opOutput),
	291	(',', opInput),
	292
	293	('#', fitRegOp (opPushValue 0)),
	294	('0', fitRegOp (opAccumValue 0)),
	295	('1', fitRegOp (opAccumValue 1)),
	296	('2', fitRegOp (opAccumValue 2)),
	297	('3', fitRegOp (opAccumValue 3)),
	298	('4', fitRegOp (opAccumValue 4)),
	299	('5', fitRegOp (opAccumValue 5)),
	300	('6', fitRegOp (opAccumValue 6)),
	301	('7', fitRegOp (opAccumValue 7)),
	302	('8', fitRegOp (opAccumValue 8)),
	303	('9', fitRegOp (opAccumValue 9)),
	304
	305	('+', fitRegOp opAdd),
	306	('-', fitRegOp opSubtract),
	307	('~', fitRegOp opDiscreteLog),
	308
	309	('^', fitRegOp opEnqueueCopy),
	310	('v', fitRegOp opDequeue),
	311	(':', fitRegOp opDuplicate),
	312
	313	('!', opSupplant),
	314	('?', opEval),
	315
	316	(';', fitRegOp (opPushValue (Char.ord ';')))
	317	]
	318	)
	319
	320	initialState = State [] []
	321
	322	emmental string = do
	323	(state, interpreter) <- execute string initialState initialInterpreter debugNop
	324	return state
	325
	326	debug string = do
	327	(state, interpreter) <- execute string initialState initialInterpreter debugPrintState
	328	return state
	329
	330
	331	-----------------------------------------------------------------------
	332	-- ========================== Test Cases =========================== --
	333	-----------------------------------------------------------------------
	334
	335	--
	336	-- Drivers for test cases. 'demo' runs them straight, whereas 'test'
	337	-- uses the debugger.
	338	--
	339
	340	demo n = emmental (testProg n)
	341
	342	test n = debug (testProg n)
	343
	344	--
	345	-- Here we introduce a bit of a cheat, in order to make writing
	346	-- complex Emmental programs tolerable. You can still see the
	347	-- programs in their fully glory by executing "show (testProg n)".
	348	--
	349
	350	quote [] = []
	351	quote (symbol:rest) = "#" ++ (show (Char.ord symbol)) ++ (quote rest)
	352
	353	--
	354	-- Add one and one.
	355	--
	356
	357	testProg 1 = "#1#1+"
	358
	359	--
	360	-- Redefine & as "+".
	361	--
	362
	363	testProg 2 = ";#43#38!#1#1&" -- 59,43,38 ==> ";+&"
	364
	365	--
	366	-- Redefine 0 as "9".
	367	--
	368
	369	testProg 3 = ";#57#48!#0" -- 59,57,48 ==> ";90"
	370
	371	--
	372	-- Redefine 0 as "#48?". This results in an infinite loop when 0 is executed.
	373	--
	374
	375	testProg 4 = ";#35#52#56#63#48!0" -- 59,35,52,56,63,48 ==> ";#48?0"
	376
	377	--
	378	-- Redefine $ as ".#36?". This results in a loop that pops symbols and
	379	-- and prints them, until the stack underflows, when $ is executed.
	380	--
	381
	382	testProg 5 = ";#46#35#51#54#63#36! #65#66#67#68#69$"
	383
	384	--
	385	-- Duplicate the top stack element (assuming an empty queue.)
	386	-- This shows that the : operation is not strictly necessary
	387	-- (when you know the size of the queue.)
	388	--
	389
	390	testProg 6 = "#65^v"
	391
	392	--
	393	-- Discard the top stack element (assuming more than one element
	394	-- on the stack, and an empty queue.)
	395	--
	396
	397	testProg 7 = "#33#123^v-+"
	398
	399	--
	400	-- Swap the top two elements of the stack (assuming an empty queue.)
	401	--
	402
	403	testProg 8 = "#67#66#65^v^-+^^v^v^v-+^v-+^v-+vv"
	404
	405	--
	406	-- Input a symbol. Report whether its ASCII value is even or odd.
	407	--
	408
	409	testProg 9 = (quote ";^v:") ++ "!" ++ -- : = dup
	410	(quote ";#69.") ++ "#!" ++ -- NUL = print "E"
	411	(quote ";#79.") ++ "#128!" ++ -- \128 = print "O"
	412	(quote (";" ++ (take 127 [':',':'..]) ++ -- m = mul by 128
	413	(take 127 ['+','+'..]) ++ "m")) ++ "!" ++
	414	",m?"
	415
	416	--
	417	-- Input a symbol. Report whether it is M or not.
	418	--
	419
	420	testProg 10 = (quote ";#78.") ++ "#!" ++ -- NUL = print "N"
	421	";##1!" ++ -- SOH = same as NUL
	422	";##2!" ++ -- STX = same as NUL
	423	";##3!" ++ -- ETX = same as NUL
	424	";##4!" ++ -- EOT = same as NUL
	425	";##5!" ++ -- ENQ = same as NUL
	426	";##6!" ++ -- ACK = same as NUL
	427	";##7!" ++ -- BEL = same as NUL
	428	(quote ";#89.") ++ "#8!" ++ -- BS = print "Y"
	429	",#77-~?"
	430
	431	--
	432	-- Same as testProg 5, except stop printing when a NUL is
	433	-- encountered, instead of just underflowing the stack.
	434	--
	435
	436	testProg 11 = ";" ++ (quote ":~?$") ++ "!" ++ -- $ = dup & test
	437	";" ++ (quote ".$") ++ "#!" ++ -- NUL = print & repeat
	438	";#0#1!" ++ -- SOH = same as NUL
	439	";#0#2!" ++ -- STX = same as NUL
	440	";#0#3!" ++ -- ETX = same as NUL
	441	";#0#4!" ++ -- EOT = same as NUL
	442	";#0#5!" ++ -- ENQ = same as NUL
	443	";#0#6!" ++ -- ACK = same as NUL
	444	";#0#7!" ++ -- BEL = same as NUL
	445	-- BS = stop (nop)
	446	"#0" ++ (quote (reverse "Hello!")) ++ "$"