Tree @master (Download .tar.gz)
Micro-Tamsin.markdown @master — view markup · raw · history · blame
Micro-Tamsin
This is just the "fundaments" part of the spec, and a few other bits, that the Micro-Tamsin interpreter (written in Tamsin!) can handle.
-> Tests for functionality "Intepret Tamsin program"
Fundaments
A Tamsin program consists of one or more productions. A production consists of a name and a parsing rule (or just "rule" for short). Among other things, a rule may be a non-terminal, which is the name of a production, or a terminal, which is a literal string in double quotes. (A full grammar for Tamsin can be found in Appendix A.)
When run, a Tamsin program processes its input. It starts at the production
named main
, and evaluates its rule. A non-terminal in a rule "calls" the
production of that name in the program. A terminal in a a rule expects a token
identical to it to be on the input. If that expectation is met, it evaluates
to that token. If not, it raises an error. The final result of evaluating a
Tamsin program is sent to its output.
(If it makes it easier to think about, consider "its input" to mean "stdin",
and "token" to mean "character"; so the terminal "x"
is a command that either
reads the character x
from stdin and returns it (whence it is printed to
stdout by the main program), or errors out if it read something else.
Or, thinking about it from the other angle, we have here the rudiments for
defining a grammar for parsing a trivial language.)
| main = blerf.
| blerf = "p".
+ p
= p
| main = blerf.
| blerf = "p".
+ k
? expected 'p' found 'k'
Productions can be written that don't look at the input. A rule may also
consist of the keyword return
, followed a term; this expression simply
evaluates to that term and returns it. (More on terms later; for now,
think of them as strings.)
So, the following program always outputs blerp
, no matter what the input is.
| main = return blerp.
+ fadda wadda badda kadda nadda sadda hey
= blerp
Note that in the following, blerp
refers to the production named "blerp"
in one place, and in the other place, it refers to the term blerp
. Tamsin
sees the difference because of the context; return
must be followed by a
term, while a parsing rule cannot be part of a term.
| main = blerp.
| blerp = return blerp.
+ foo
+ foo
+ foo 0 0 0 0 0
= blerp
A rule may also consist of the keyword print
followed by a term, which,
when evaluated, sends the term to the output, and evaluates to the term.
(Mostly this is useful for debugging. In the following, world
is
repeated because it is both printed, and the result of the evaluation.)
| main = print hello & print world.
+ ahoshoshohspohdphs
= hello
= world
= world
A rule may also consist of two subrules joined by the &
operator.
The &
operator processes the left-hand side rule. If the LHS fails, then
the &
expression fails; otherwise, it continues and processes the
right-hand side rule. If the RHS fails, the &
expression fails; otherwise
it evaluates to what the RHS evaluated to.
| main = "a" & "p".
+ ap
= p
| main = "a" & "p".
+ ak
? expected 'p' found 'k'
| main = "a" & "p".
+ ep
? expected 'a' found 'e'
If you are too used to C or Javascript or the shell, you may use &&
instead of &
.
| main = "a" && "p".
+ ap
= p
A rule may also consist of two subrules joined by the |
operator.
The &
operator processes the left-hand side rule. If the LHS succeeds,
then the |
expression evaluates to what the LHS evaluted to, and the
RHS is ignored. But if the LHS fails, it processes the RHS; if the RHS
fails, the |
expression fails, but otherwise it evaluates to what the
RHS evaluated to.
For example, this program accepts 0
or 1
but nothing else.
| main = "0" | "1".
+ 0
= 0
| main = "0" | "1".
+ 1
= 1
| main = "0" | "1".
+ 2
? expected '1' found '2'
If you are too used to C or Javascript or the shell, you may use ||
instead of |
.
| main = "0" || "1".
+ 1
= 1
Using return
described above, this program accepts 0 or 1 and evaluates
to the opposite. (Note here also that &
has a higher precedence than |
.)
| main = "0" & return 1 | "1" & return 0.
+ 0
= 1
| main = "0" & return 1 | "1" & return 0.
+ 1
= 0
| main = "0" & return 1 | "1" & return 0.
+ 2
? expected '1' found '2'
Evaluation order can be altered by using parentheses, as per usual.
| main = "0" & ("0" | "1") & "1" & return ok.
+ 011
= ok
Note that if the LHS of |
fails, the RHS is tried at the position of
the stream that the LHS started on. This property is called "backtracking".
| ohone = "0" & "1".
| ohtwo = "0" & "2".
| main = ohone | ohtwo.
+ 02
= 2
Note that print
and return
never fail. Thus, code like the following
is "useless":
| main = foo & print hi | return useless.
| foo = return bar | print useless.
= hi
= hi
Note that return
does not exit the production immediately — although
this behaviour may be re-considered...
| main = return hello & print not_useless.
= not_useless
= not_useless
Alternatives can select code to be executed, based on the input.
| main = aorb & print aorb | cord & print cord & return ok.
| aorb = "a" & print ay | "b" & print bee.
| cord = "c" & print see | eorf & print eorf.
| eorf = "e" & print ee | "f" & print eff.
+ e
= ee
= eorf
= cord
= ok
And that's the basics. With these tools, you can write simple recursive-descent parsers. For example, to consume nested parentheses containing a zero:
| main = parens & "." & return ok.
| parens = "(" & parens & ")" | "0".
+ 0.
= ok
| main = parens & "." & return ok.
| parens = "(" & parens & ")" | "0".
+ (((0))).
= ok
(the error message on this test case is a little weird; it's because of
the backtracking. It tries to match (((0)))
against the beginning of
input, and fails, because the last )
is not present. So it tries to
match 0
at the beginning instead, and fails that too.)
| main = parens & "." & return ok.
| parens = "(" & parens & ")" | "0".
+ (((0)).
? expected '0' found '('
(the error message on this one is much more reasonable...)
| main = parens & "." & return ok.
| parens = "(" & parens & ")" | "0".
+ ((0))).
? expected '.' found ')'
To consume a comma-seperated list of one or more bits:
| main = bit & {"," & bit} & ".".
| bit = "0" | "1".
+ 1.
= .
| main = bit & {"," & bit} & ".".
| bit = "0" | "1".
+ 0,1,1,0,1,1,1,1,0,0,0,0,1.
= .
(again, backtracking makes the error a little odd)
| main = bit & {"," & bit} & ".".
| bit = "0" | "1".
+ 0,,1,0.
? expected '.' found ','
| main = bit & {"," & bit} & ".".
| bit = "0" | "1".
+ 0,10,0.
? expected '.' found '0'
Comments
A Tamsin comment is introduced with #
and continues until the end of the
line.
| # welcome to my Tamsin program!
| main = # comments may appear anywhere in the syntax
| # and a comment may be followed by a comment
| "z".
+ z
= z