git @ Cat's Eye Technologies SMITH / rel_2_1_2011_0922 doc / smith.html
rel_2_1_2011_0922

Tree @rel_2_1_2011_0922 (Download .tar.gz)

smith.html @rel_2_1_2011_0922raw · history · blame

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!-- encoding: UTF-8 -->
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
  <meta name="Description" content="Cat's Eye Technologies: The SMITH Programming Language" />
  <meta name="Keywords" content=" Cat's Eye Technologies Language
 Programming computational computation self-modification self-modifying
 code self-copying self-propogation self-perpetuation infinite tape
 half-infinite Turing tape Turing-Complete experimental indecent hack " />
  <title>The SMITH Programming Language</title>
  <!-- begin html doc dynamic markup -->
  <script type="text/javascript" src="/contrib/jquery-1.6.4.min.js"></script>
  <script type="text/javascript" src="/scripts/documentation.js"></script>
  <!-- end html doc dynamic markup -->
</head>
<body>

<h1>SMITH</h1>

<p><b>S</b>elf-<b>M</b>odifying <b>I</b>ndecent <b>T</b>uring <b>H</b>ack</p>

<p>Version 2.1-2011.0922.  ©2000-2011 Cat's Eye Technologies.  All rights reserved.</p>
    
<hr />

<blockquote>
<p style="font-style: italic; margin-bottom: 0">"This space intentionally left blank."</p>
<p style="font-style: normal; margin-left: 2em; margin-top: 0.1em">— MIT's <i>Dungeon</i> (later Infocom's <i>Zork I</i>)</p>
</blockquote>

<blockquote>
<p style="font-style: italic; margin-bottom: 0">"The only thing that really worried me was the ether.  There is
 nothing in the world more helpless and irresponsible and
 depraved than a man in the depths of an ether binge.  And I knew
 we'd get into that rotten stuff pretty soon."</p>
<p style="font-style: normal; margin-left: 2em; margin-top: 0.1em">— Hunter S. Thompson, <i>Fear and Loathing in Las Vegas</i></p>
</blockquote>

<h3>What is SMITH?</h3>

<p>SMITH, the successor to <a href="/projects/smetana/">SMETANA</a>,
is a programming language that goes one better than <a class="external"
 href="http://members.tripod.com/rkusnery/bullfrog.html">Bullfrog</a>.

Not only are there no conditional jumps,
there are no jumps <strong>whatsoever</strong>.  The program counter
can <i>only</i> be incremented and <i>only</i> instruction by
instruction.</p>

<h3>What the...???</h3>

<p><i>But</i>, you ask, <i>how can such an ungodly beast be Turing-Complete?
Surely this goes against all that is decent and good and sane in the world we know.</i>
(OK, maybe you won't put it exactly that way - I forgive you.)</p>

<p>Well, there's actually a relatively straightforward answer.
Instead of jumping back into code to repeat and/or recheck
something, you must instead copy blocks of code forward, where
they will be checked in the near future.</p>

<p>And if the code copied forward includes the instruction
that <i>caused</i> the copy in the first place, that will happen
again (and again and again)
when the copy executes.  A loop, but not by means of iteration or
recursion but instead by <i>self-propogation</i>.</p>

<p>Otherwise, SMITH is basically a small generic imperative 
CISC-ish
virtual machine with only the most basic arithmetic operations,
loosely modelled on the "[imaginary] register machine that is
representative of several minicomputers" described in section 9.2
of the Dragon Book.</p>

<p>And when I say loosely modelled, I mean it.  Differences between
this and the 'Dragon' machine include the fact that this has no <code>ADD</code>
instruction, and that here, the destination is specified <i>before</i>
the source (not after;) also, the <code>#</code> before an immediate
(literal) is allowed but not required here, and the indirection
syntax is completely different (there's only indirect references to
registers, not memory, anyway.)</p>

<h3>Overview</h3>

<p>Consider the SMITH machine to have an unlimited number of
registers.  These are called <code>R0</code> to <code>R<i>n</i></code> where
no arbitrary limit is imposed on <code><i>n</i></code>.  In some instructions,
they may be referenced indirectly, and in that case they are called
<code>R[R0]</code> to <code>R[R<i>n</i>]</code>.</p>

<p>An indirect reference like <code>R[R0]</code> refers to 'the register
whose number is the number stored in register zero.'  If <code>R0</code>
contained 7, then <code>R[R0]</code> would be synonymous with <code>R7</code>.</p>

<p>Each instruction has the general form:</p>

<pre>  OPCODE [destination[, source[, length]]]</pre>

<p>(Here, the square breackets denote "may be omitted", not indirect references.)</p>

<p><code>destination</code> is either the name of a register or an immediate
offset from the current program counter.  <code>source</code> is either a
register, an immediate offset from the current program counter, or an
immediate integer.  <code>length</code>, if present, is the name of a
register.  All integers are notated in decimal.  <code>#</code> may precede
immediate integers if the programmer desires.  However, it may not
precede offsets (they should be preceded by either <code>+</code> or <code>-</code>.)</p>

<h3>Instructions</h3>

<p>The basic instructions (available in the original version) are:</p>

<pre>  MOV register, immediate       e.g.  MOV R0, 0
  MOV register, register              MOV R1, R0

  SUB register, immediate             SUB R1, 1
  SUB register, register              SUB R0, R1

  MUL register, immediate             MUL R0, 2

  NOT register                        NOT R0

  COR offset, offset, register        COR +1, -5, R0

  STOP                                STOP
</pre>

<p>The instructions in the following table
were made available in SMITH v2 (June 25 2000).</p>

<pre>  MOV [register], register            MOV R[R1], R0
  MOV register, [register]            MOV R1, R[R0]

  MOV [register], "string"            MOV R[R0], "nights and weird mornings"

  MOV register, PC                    MOV R303, PC
  MOV register, *                     MOV R304, *

  MOV TTY, register                   MOV TTY, R1
  MOV TTY, [register]                 MOV TTY, R[R65535]
  MOV register, TTY                   MOV R0, TTY
  MOV [register], TTY                 MOV R[R0], TTY

  MUL register, register              MUL R999999999, R123456789

  COR offset, register, register      COR +1, R22, R23

  BLA offset, NOP, register           BLA +2, NOP, R10

  NOP                                 NOP
</pre>

<p>SMITH v2 also added two directives:</p>

<pre>  ; arbitrary text composing a source comment               ; Kilroy was here
  REP int OPCODE [destination[, source[, length]]]          REP 50 STOP
</pre>

<h3>Explanation</h3>

<p><code>MOV</code> will assign a register a value without modifying it.</p>

<p>The <code>TTY</code> forms of <code>MOV</code> are just a lazy way to refer to some
(potentially complex) system routine for the 'usual' input and output character streams,
so the program can communicate (in some primitive fashion) with the user.</p>

<p>The <code>PC</code> and <code>*</code> forms of <code>MOV</code> facilitate calculating
absolute addresses of subroutines.  Actually, <code>*</code> is more like a macro
which expands to the current line number in the source file.</p>

<p>The <code>"string"</code> form of <code>MOV</code> moves each character of the immediate
string into a successive register.  If <code>R0</code> contained 7, then <code>MOV R[R0], "cat"</code>
would result in <code>R7 == 'c', R8 == 'a', R9 == 't'</code>.</p>

<p><code>SUB</code> and
<code>MUL</code> will subtract or multiply a register by a value, respectively.
<code>NOT</code> will boolean-negate a register (false = 0/zero, true = 1/nonzero.)</p>

<p><code>COR</code> is the truly interesting opcode; it stands for
<code>CO</code>py by <code>R</code>egister.
The value of the length operand (which is always in a register) is a count of instructions.
This many instructions is copied from the source offset (which may be immediate or in a register)
to the destination offset (which is always immediate),
modifying the program (thus, self-modifying code.)</p>

<blockquote style="font-style: italic; margin-left: 2em;"><p>Note: prior to version 2007.0722 of the SMITH language, the result of actually
<b>overwriting</b> existing instructions with other instructions was not defined.
The original idea was that SMITH programs would merely <b>extend</b> themselves
by copying bits of themselves to just past the end of the program, like Sylvester
chasing Tweety around the Christmas tree on a toy train --
and I still think that's all that's needed for SMITH to be Turing-complete.
However, since the language does reasonably claim to be fully "self-modifying" and not just
"self-extending" (SEITH?!?), it's official: as of version 2007.0722, you can overwrite
instructions with other instructions using <code>COR</code>.  Even when the source and
destination ranges of such a copy overlap, the resulting series of instructions at the
destination should be an exact copy of the series at the source.</p>

  <blockquote>
    <p>Note note: It seems some of you are having trouble understanding the version
    system being used for SMITH, which is eminently understandable given that it is
    completely inconsistent.  The first version had no version.  The second version, released
    a scant few days later, containing many new instructions, was "v2".  That version was re-christened "version 2.0L" when we
    added the BSD license to it (because "L" is for "license", y'see?)  The third version, released a scant few years later, was
    "version 2007.0722".  This was back when we thought date-based version numbers could
    substitute entirely for sequence-based version numbers.  This version was also known
    as "version 2.1 revision 2007.0722" when we dropped that idea.  The current version,
    which differs mainly in that its documentation contains this paragraph, is
    version <b>2.1-2011.0922</b>.  Your knowledge is now complete.  Go home.</p>
  </blockquote>
</blockquote>

<p><code>BLA</code> is kind of a special version of <code>COR</code> that makes some
programming a lot easier.  It stands for <code>BLA</code>nk and as part of it's very
special format it takes an <i>immediate opcode</i> as its second argument.
It copies that
opcode (which must be either <code>NOP</code> or <code>STOP</code>) to the offset, and
uses the number found in the length register as a multiplier.</p>

<p>Execution starts at the beginning of the program (well, naturally,)
and stops when the program counter tries to execute beyond the last
instruction specified in the program.  A <code>STOP</code> instruction
will also serve this purpose, if you choose to use it.</p>

<h3>Huh.</h3>

<p>So there you go.  Myth debunked.  You don't need <code>if</code>
or <code>goto</code> or <code>while</code> or anything so fancy to be
Turing-Complete.  All you need is to be able to
<code>memcpy</code> yourself!  Don't that just take the cake?</p>

</body></html>