git @ Cat's Eye Technologies Tamsin / master doc / Micro-Tamsin.markdown
master

Tree @master (Download .tar.gz)

Micro-Tamsin.markdown @masterview markup · raw · history · blame

Micro-Tamsin

This is just the "fundaments" part of the spec, and a few other bits, that the Micro-Tamsin interpreter (written in Tamsin!) can handle.

-> Tests for functionality "Intepret Tamsin program"

Fundaments

A Tamsin program consists of one or more productions. A production consists of a name and a parsing rule (or just "rule" for short). Among other things, a rule may be a non-terminal, which is the name of a production, or a terminal, which is a literal string in double quotes. (A full grammar for Tamsin can be found in Appendix A.)

When run, a Tamsin program processes its input. It starts at the production named main, and evaluates its rule. A non-terminal in a rule "calls" the production of that name in the program. A terminal in a a rule expects a token identical to it to be on the input. If that expectation is met, it evaluates to that token. If not, it raises an error. The final result of evaluating a Tamsin program is sent to its output.

(If it makes it easier to think about, consider "its input" to mean "stdin", and "token" to mean "character"; so the terminal "x" is a command that either reads the character x from stdin and returns it (whence it is printed to stdout by the main program), or errors out if it read something else. Or, thinking about it from the other angle, we have here the rudiments for defining a grammar for parsing a trivial language.)

| main = blerf.
| blerf = "p".
+ p
= p

| main = blerf.
| blerf = "p".
+ k
? expected 'p' found 'k'

Productions can be written that don't look at the input. A rule may also consist of the keyword return, followed a term; this expression simply evaluates to that term and returns it. (More on terms later; for now, think of them as strings.)

So, the following program always outputs blerp, no matter what the input is.

| main = return blerp.
+ fadda wadda badda kadda nadda sadda hey
= blerp

Note that in the following, blerp refers to the production named "blerp" in one place, and in the other place, it refers to the term blerp. Tamsin sees the difference because of the context; return must be followed by a term, while a parsing rule cannot be part of a term.

| main = blerp.
| blerp = return blerp.
+ foo
+ foo
+ foo 0 0 0 0 0
= blerp

A rule may also consist of the keyword print followed by a term, which, when evaluated, sends the term to the output, and evaluates to the term. (Mostly this is useful for debugging. In the following, world is repeated because it is both printed, and the result of the evaluation.)

| main = print hello & print world.
+ ahoshoshohspohdphs
= hello
= world
= world

A rule may also consist of two subrules joined by the & operator. The & operator processes the left-hand side rule. If the LHS fails, then the & expression fails; otherwise, it continues and processes the right-hand side rule. If the RHS fails, the & expression fails; otherwise it evaluates to what the RHS evaluated to.

| main = "a" & "p".
+ ap
= p

| main = "a" & "p".
+ ak
? expected 'p' found 'k'

| main = "a" & "p".
+ ep
? expected 'a' found 'e'

If you are too used to C or Javascript or the shell, you may use && instead of &.

| main = "a" && "p".
+ ap
= p

A rule may also consist of two subrules joined by the | operator. The & operator processes the left-hand side rule. If the LHS succeeds, then the | expression evaluates to what the LHS evaluted to, and the RHS is ignored. But if the LHS fails, it processes the RHS; if the RHS fails, the | expression fails, but otherwise it evaluates to what the RHS evaluated to.

For example, this program accepts 0 or 1 but nothing else.

| main = "0" | "1".
+ 0
= 0

| main = "0" | "1".
+ 1
= 1

| main = "0" | "1".
+ 2
? expected '1' found '2'

If you are too used to C or Javascript or the shell, you may use || instead of |.

| main = "0" || "1".
+ 1
= 1

Using return described above, this program accepts 0 or 1 and evaluates to the opposite. (Note here also that & has a higher precedence than |.)

| main = "0" & return 1 | "1" & return 0.
+ 0
= 1

| main = "0" & return 1 | "1" & return 0.
+ 1
= 0

| main = "0" & return 1 | "1" & return 0.
+ 2
? expected '1' found '2'

Evaluation order can be altered by using parentheses, as per usual.

| main = "0" & ("0" | "1") & "1" & return ok.
+ 011
= ok

Note that if the LHS of | fails, the RHS is tried at the position of the stream that the LHS started on. This property is called "backtracking".

| ohone = "0" & "1".
| ohtwo = "0" & "2".
| main = ohone | ohtwo.
+ 02
= 2

Note that print and return never fail. Thus, code like the following is "useless":

| main = foo & print hi | return useless.
| foo = return bar | print useless.
= hi
= hi

Note that return does not exit the production immediately — although this behaviour may be re-considered...

| main = return hello & print not_useless.
= not_useless
= not_useless

Alternatives can select code to be executed, based on the input.

| main = aorb & print aorb | cord & print cord & return ok.
| aorb = "a" & print ay | "b" & print bee.
| cord = "c" & print see | eorf & print eorf.
| eorf = "e" & print ee | "f" & print eff.
+ e
= ee
= eorf
= cord
= ok

And that's the basics. With these tools, you can write simple recursive-descent parsers. For example, to consume nested parentheses containing a zero:

| main = parens & "." & return ok.
| parens = "(" & parens & ")" | "0".
+ 0.
= ok

| main = parens & "." & return ok.
| parens = "(" & parens & ")" | "0".
+ (((0))).
= ok

(the error message on this test case is a little weird; it's because of the backtracking. It tries to match (((0))) against the beginning of input, and fails, because the last ) is not present. So it tries to match 0 at the beginning instead, and fails that too.)

| main = parens & "." & return ok.
| parens = "(" & parens & ")" | "0".
+ (((0)).
? expected '0' found '('

(the error message on this one is much more reasonable...)

| main = parens & "." & return ok.
| parens = "(" & parens & ")" | "0".
+ ((0))).
? expected '.' found ')'

To consume a comma-seperated list of one or more bits:

| main = bit & {"," & bit} & ".".
| bit = "0" | "1".
+ 1.
= .

| main = bit & {"," & bit} & ".".
| bit = "0" | "1".
+ 0,1,1,0,1,1,1,1,0,0,0,0,1.
= .

(again, backtracking makes the error a little odd)

| main = bit & {"," & bit} & ".".
| bit = "0" | "1".
+ 0,,1,0.
? expected '.' found ','

| main = bit & {"," & bit} & ".".
| bit = "0" | "1".
+ 0,10,0.
? expected '.' found '0'

Comments

A Tamsin comment is introduced with # and continues until the end of the line.

| # welcome to my Tamsin program!
| main = # comments may appear anywhere in the syntax
|        # and a comment may be followed by a comment
|   "z".
+ z
= z