git @ Cat's Eye Technologies Tamsin / 9aacfda
Default aliases for eof, any, print, and fail. Cat's Eye Technologies 11 years ago
3 changed file(s) with 46 addition(s) and 50 deletion(s). Raw diff Collapse all Expand all
1313 And what does a recursive-descent parser do? It consumes input. But
1414 don't *all* algorithms consume input? So why not have a language which
1515 makes it easy to write recursive-descent parsers, and force all programs
16 to be written as recursive-descent parsers? Then all code will be pretty!
16 to be written as recursive-descent parsers? Then *all* code will be pretty!
1717 (Yeah, sure, OK.)
1818
1919 Where I'm going with this, I don't quite know yet. It is a
2929
3030 Parse an algebraic expression for correctness.
3131
32 | main = (expr0 & $.eof & return ok).
32 | main = (expr0 & eof & return ok).
3333 | expr0 = expr1 & {"+" & expr1}.
3434 | expr1 = term & {"*" & term}.
3535 | term = "x" | "y" | "z" | "(" & expr0 & ")".
3636 + x+y*(z+x+y)
3737 = ok
3838
39 | main = (expr0 & $.eof & return ok).
39 | main = (expr0 & eof & return ok).
4040 | expr0 = expr1 & {"+" & expr1}.
4141 | expr1 = term & {"*" & term}.
4242 | term = "x" | "y" | "z" | "(" & expr0 & ")".
6767 Make a story more exciting!
6868
6969 | main = set S = '' & {translate → C & set S = S + C} & return S.
70 | translate = "." & return '!' | "?" & return '?!' | $.any.
70 | translate = "." & return '!' | "?" & return '?!' | any.
7171 + Chapter 1
7272 + ---------
7373 + It was raining. She knocked on the door. She heard
108108
109109 Parse and evaluate a little S-expression-based language.
110110
111 * See [Case Study](doc/Case_Study.markdown)
111 * See [Case Study: Evaluating S-expressions](doc/Case_Study.markdown)
112112
113113 For more information
114114 --------------------
150150
151151 * comments
152152 * `$.return`
153 * default aliases: return, fail, any, eof, print
154153 * arbitrary non-printable characters in terms and such
155154 * make `return` optional when token is unambiguously the start of a term
156155 * make `set` optional
6868 (Mostly this is useful for debugging. In the following, `world` is
6969 repeated because it is both printed, and the result of the evaluation.)
7070
71 | main = $.print(hello) & $.print(world).
71 | main = print hello & print world.
7272 + ahoshoshohspohdphs
7373 = hello
7474 = world
146146 Note that `print` and `return` never fail. Thus, code like the following
147147 is "useless":
148148
149 | main = foo & $.print(hi) | return useless.
150 | foo = return bar | $.print(useless).
149 | main = foo & print hi | return useless.
150 | foo = return bar | print useless.
151151 = hi
152152 = hi
153153
154154 Note that `return` does not exit the production immediately — although
155155 this behaviour may be re-considered...
156156
157 | main = return hello & $.print(not_useless).
157 | main = return hello & print not_useless.
158158 = not_useless
159159 = not_useless
160160
161161 Alternatives can select code to be executed, based on the input.
162162
163 | print(X) = $.print(X).
164 | main = aorb & print(aorb) | cord & print(cord) & return ok.
165 | aorb = "a" & print(ay) | "b" & print(bee).
166 | cord = "c" & print(see) | eorf & print(eorf).
167 | eorf = "e" & print(ee) | f & print(eff).
163 | main = aorb & print aorb | cord & print cord & return ok.
164 | aorb = "a" & print ay | "b" & print bee.
165 | cord = "c" & print see | eorf & print eorf.
166 | eorf = "e" & print ee | f & print eff.
168167 + e
169168 = ee
170169 = eorf
442441 and it is always in scope.
443442
444443 The module `$` contains a number of built-in productions which would not
445 be possible or practical to implement in Tamsin. Among them:
446
447 * `$.any`, which matches any token
448 * `$.eof`, which matches the end of the input
449 * `$.char`, the character scanner (more on scanner productions below)
450 * `$.tamsin`, the tamsin scanner (more on scanner productions below)
444 be possible or practical to implement in Tamsin. See Appendix B for a list.
451445
452446 Advanced Parsing
453447 ----------------
454448
455 ### $.EOF ###
449 ### eof ###
456450
457451 If there is more input available than what we wrote the program to consume,
458452 the program still succeeds.
461455 + apparently
462456 = p
463457
464 The built-in production `$.eof` may be used to match against the end of the
458 The built-in production `eof` may be used to match against the end of the
465459 input (colloquially called "EOF".)
466460
467 | main = "a" & "p" & $.eof.
461 | main = "a" & "p" & eof.
468462 + ap
469463 = EOF
470464
471465 This is how you can make it error out if there is extra input remaining.
472466
473 | main = "a" & "p" & $.eof.
467 | main = "a" & "p" & eof.
474468 + apt
475469 ? expected EOF found 't'
476470
477471 The end of the input is a virtual infinite stream of EOF's. You can match
478472 against them until the cows come home. The cows never come home.
479473
480 | main = "a" & "p" & $.eof & $.eof & $.eof.
474 | main = "a" & "p" & eof & eof & eof.
481475 + ap
482476 = EOF
483477
484 ### $.any ###
485
486 The built-in production `$.any` matches any token defined by the scanner
478 ### any ###
479
480 The built-in production `any` matches any token defined by the scanner
487481 except for EOF. (Remember that for now "token defined by the scanner"
488482 means "character", but that that can be changed, as you'll see below.)
489483
490 | main = $.any & $.any & $.any.
484 | main = any & any & any.
491485 + (@)
492486 = )
493487
494 | main = $.any & $.any.
488 | main = any & any.
495489 + a
496490 ? expected any token, found EOF
497491
545539 + 0000
546540 = zero(zero(zero(zero(nil))))
547541
548 ### $.fail ###
549
550 The built-in production `$.fail` always fails. This lets you establish
542 ### fail ###
543
544 The built-in production `fail` always fails. This lets you establish
551545 global flags, of a sort. It takes a term, which it uses as the failure message.
552 Note that the term must be in parentheses — more on that in a minute.
553546
554547 | debug = return ok.
555548 | main = (debug & return walla | "0").
556549 + 0
557550 = walla
558551
559 | debug = $.fail(notdebugging).
552 | debug = fail notdebugging.
560553 | main = (debug & return walla | "0").
561554 + 0
562555 = 0
563556
564 | main = set E = 'Goodbye, world!' & $.fail(E).
557 | main = set E = 'Goodbye, world!' & fail E.
565558 + hsihdsihdsih
566559 ? Goodbye, world!
567560
678671 returns an atom, which the client will treat as a token.
679672
680673 | main = scanner using $.char.
681 | scanner = {scan → A & $.print(A)} & ".".
674 | scanner = {scan → A & print A} & ".".
682675 | scan = {" "} & ("-" & ">" & return '->' | "(" | ")" | "," | ";" | word).
683676 | word = letter → L & {letter → M & set L = L + M}.
684677 | letter = "a" | "b" | "c" | "d" | "e" | "f" | "g".
800793
801794 | main = program using scanner.
802795 | scanner = scan using $.char.
803 | print(X) = $.print(X).
804796 | scan = (
805797 | "c" & "a" & "t" & return cat | "d" & "o" & "g" & return dog
806798 | ).
807 | program = "cat" & print(1) &
808 | ("cat" & print(2) | "dog" & print(3)) &
809 | "dog" & print(4) & return ok.
799 | program = "cat" & print 1 &
800 | ("cat" & print 2 | "dog" & print 3) &
801 | "dog" & print 4 & return ok.
810802 + catcatdog
811803 = 1
812804 = 2
815807
816808 | main = program using scanner.
817809 | scanner = scan using $.char.
818 | print(X) = $.print(X).
819810 | scan = (
820811 | "c" & "a" & "t" & return cat | "d" & "o" & "g" & return dog
821812 | ).
822 | program = "cat" & print(1) &
823 | ("cat" & print(2) | "dog" & print(3)) &
824 | "dog" & print(4) & return ok.
813 | program = "cat" & print 1 &
814 | ("cat" & print 2 | "dog" & print 3) &
815 | "dog" & print 4 & return ok.
825816 + catdogdog
826817 = 1
827818 = 3
10701061 the production's rule.
10711062
10721063 | main = donkey(world).
1073 | donkey[$.any → E] = return hello(E).
1064 | donkey[any → E] = return hello(E).
10741065 = hello(w)
10751066
10761067 | main = donkey(world).
1077 | donkey[$.any → E using $.tamsin] = return hello(E).
1068 | donkey[any → E using $.tamsin] = return hello(E).
10781069 = hello(world)
10791070
10801071 No variables from the caller leak into the called production.
10811072
10821073 | main = set F = whatever & donkey(world).
1083 | donkey[$.any → E] = return hello(F).
1074 | donkey[any → E] = return hello(F).
10841075 ? KeyError
10851076
10861077 Terms are stringified before being matched.
11781169 * `$.eof` -- succeeds on eof and returns eof, otherwise fails
11791170 * `$.any` -- fails on eof, succeeds and returns token on any other token
11801171 * `$.print(X)` -- prints X to output as a side-effect, returns X
1172 * `$.fail(X)` -- always fails, giving X as the error message
11811173
11821174 Appendix C. Notes
11831175 -----------------
391391 self.listeners = listeners
392392 self.scanner = Scanner(buffer, listeners=self.listeners)
393393 self.scanner.push_engine(TamsinScannerEngine())
394 self.aliases = {}
394 self.aliases = {
395 'eof': (0, ('PRODREF', '$', 'eof')),
396 'any': (0, ('PRODREF', '$', 'any')),
397 'print': (1, ('PRODREF', '$', 'print')),
398 'fail': (1, ('PRODREF', '$', 'fail')),
399 }
395400
396401 def eof(self):
397402 return self.scanner.eof()