git @ Cat's Eye Technologies Treacle / 18f4081
Convert documentation to Markdown. catseye 10 years ago
2 changed file(s) with 228 addition(s) and 217 deletion(s). Raw diff Collapse all Expand all
0 The Treacle Programming Language
1 ================================
2
3 Language version 1.0.
4 Chris Pressey, Cat's Eye Technologies
5
6 Introduction
7 ------------
8
9 Treacle is a programming language based on an extended form of
10 term-rewriting which we shall call, somewhat innacurately (or at least
11 arbitrarily,) *context rewriting*.
12
13 Like Arboretuum, its successor built around *forest-rewriting*, Treacle
14 was intended as a language for specifying compilers. Treacle is somewhat
15 more successful at pulling it off, however; context rewriting
16 encompasses, and is more expressive than, forest-rewriting.
17
18 Context rewriting is meant to refer to the fact that Treacle's rewriting
19 patterns may contain holes – designated "containers" for subpatterns
20 which may match not just the *immediate* child of the term which the
21 parent pattern matched (as in conventional term-rewriting) but also *any
22 one of that child's descendents*, no matter how deeply nested.
23
24 When a hole is matched to some term, that term is searched for the
25 subpattern given inside the hole. The search may be performed in either
26 leftmost-innermost or leftmost-outermost order; this is specified by a
27 qualifier associated with the hole. Because of this, Treacle need not
28 specify a language-wide reduction order; the hole construct acts as a
29 kind of search operator which explicitly encodes search order into each
30 pattern.
31
32 Context rewriting also deconstructs the conventional concept of the
33 variable, splitting it into a name and a wildcard. Any pattern or
34 subpattern may be named, not just wildcards. Even holes may be named. At
35 the same time, wildcards, which match arbitrary terms, may occur
36 unnamed. Upon a successful match, only those terms which matched named
37 patterns are recorded in the unifier.
38
39 Further, each rule in Treacle may contain multiple terms (replacements)
40 on the right-hand side of a rewriting rule, and each of these may have
41 its own name. When the term undergoing rewriting (called the subject) is
42 rewritten, each named replacement is substituted into the subject at the
43 position matched by the part of the pattern that is labelled by that
44 same name.
45
46 Lastly, replacements may contain special atomic terms called newrefs.
47 When a newref is written into the subject, it takes the form of a new,
48 unique symbol, guaranteed (or at least reasonably assumed) to be
49 different from all other symbols that are in, or could be in, the
50 subject. When multiple newrefs (possibly in multiple replacements) in
51 the same rule are written into the subject at the same time (i.e., on
52 the same rewriting step,) they all take the same form (and so are equal
53 to each other, and only to each other – nothing else.) In Treacle's
54 capacity as a compiler-definition language, newrefs are useful for
55 generating internal labels for, e.g., translating control structures to
56 machine code jumps.
57
58 It is important to remember that, while subpatterns may be nested in
59 holes, and these may in turn contain more holes, there is no
60 corresponding hierarchical nature to the *bindings* which occur in
61 Treacle patterns: all variables of the same name must unify to
62 equivalent terms, regardless of where they occur in the pattern (inside
63 or outside a hole.)
64
65 Syntax
66 ------
67
68 We're almost ready to give some examples to elucidate all this, but
69 first we need a syntax to give them in. Here it is:
70
71 - atoms are denoted by strings of lower-case letters;
72 - terms are denoted by lists of subterms inside parentheses;
73 - named terms are denoted by `(? name subterm)`;
74 - holes are denoted by `(:i subterm)` or `(:o subterm)`, corresponding
75 to innermost and outermost search order, respectively;
76 - wildcards are denoted by `*`;
77 - newrefs are denoted by `@`; and
78 - named replacements are denoted `X : term`.
79
80 Examples
81 --------
82
83 Now we are ready to give some examples.
84
85 ### Patterns
86
87 - The pattern `(a b (? X *))` matches `(a b (c (d b)))`, with the
88 unifier `X=(c (d b))`. Also, `(a (? Y *) (c (d (? Y *))))` matches
89 the same subject with `Y=b`. This is all quite conventional.
90 - We can also match `(a (? X b) *)` to this subject. The unifier will
91 *always* be `X=b` when this pattern matches, regardless of the
92 subject. This tells us nothing we did not already know. But it
93 demonstrates the decoupling of names and wildcards in Treacle. (It
94 will also become useful when we get to replacements, since that
95 atomic `b` term named by `X` can be supplanted by something: we have
96 named not just a subterm, but a location in the subject.)
97 - The pattern `(a b (:i (d b)))` matches the subject as well. Observe
98 how the hole allowed `(d b)` to be sought inside the subterm at the
99 location where the hole matched. Note also that the pattern would
100 just as easily match the subject `(a b (w x (w y (w z (d b)))))`,
101 because it doesn't matter how deep `(d b)` is embedded in the
102 subterm.
103 - If the pattern included a name, like `(a b (? X (:i (d b))))`, the
104 match with the subject would result in the unifier `X=(c (d b))`.
105 Likewise, the pattern `(a b (:i (? X (d b))))` would match the
106 subject with the unifier `X=(d b)`.
107 - The pattern `(a (? X *) (:i (d (? X *))))` also matches the subject,
108 with the unifier `X=b`. This is a good example of the expressive
109 power of pattern-matching in Treacle: we are basically asking to
110 search the contents of the 3rd subterm, for whatever the 2nd subterm
111 is.
112
113 ### Rules
114
115 - Say we have a rule where the pattern is `(a b (:i (? X (d b))))`,
116 and the lone replacement is `X : a`. This rule would match the
117 original subject `(a b (c (d b)))`, unifying with `X=(d b)`, and
118 would rewrite the subject to `(a b (c a))`.
119 - Or, say our rule's pattern is `(a (? Y *) (:i (? X (d *))))`, and
120 the set of replacements is {`X : (? Y)`, `Y : (? X)`}. This rule
121 would also match the subject, with a unifier of {`X=(d b)`, `Y=b`},
122 and would rewrite the subject to `(a (d b) (c b))`. Again, notice
123 the expressivity of this rule: we're basically asking Treacle to
124 swap whatever occurs next to the `a`, with whatever occurs alongside
125 a `d` somewhere inside the term that occurs next to that.
126
127 Mechanism
128 ---------
129
130 We can think of the mechanism by which context rewriting is undertaken,
131 as follows.
132
133 We pattern-match "as usual": recursively traverse the pattern and the
134 subject. Where there are literals in the pattern, we make sure those
135 same values appear in the subject, in the same place. Where there are
136 named subpatterns in the pattern, we bind the name to the position in
137 the subject, and insert that binding into a unifier, before trying to
138 match the subpattern to that position. (We do an occurs check first, to
139 make sure that the name isn't already bound to something else.)
140
141 Note that we bind the name, not to a subterm in the subject, but to a
142 *position* in the subject. If you like, you can think of context
143 rewriting building a "unifier by reference" rather than the rather more
144 conventional "unifier by value". This is useful, because the presence of
145 holes means that we will have more of a need to know where we want to
146 install a replacement.
147
148 When we encounter a hole in the pattern, we take the subpattern that
149 appears in the hole and begin searching for that subpattern in the
150 subterm of the subject whose position corresponds to the hole. We pass
151 this subsearch our unifier (so that it can use the variable bindings
152 already established for occurs checks.) If the subsearch fails to match,
153 then we also fail to match. If the subsearch succeeds, we continue the
154 pattern-matching process with the unifier it produced.
155
156 If everything succeeds, we have a unifier. We go through the
157 replacements, look up the name of each replacement in the unifier to
158 find the location in the subject where it matched, expand all the
159 variable names in the replacement with the contents of the unifier, and
160 "poke" the expanded replacement into the subject at the location.
161
162 Implementation
163 --------------
164
165 Like Arboretuum, there is a reference implementation of Treacle in
166 relatively pure Scheme, meant to normatively fill in any gaps in the
167 description of the language given in this document.
168
169 Discussion
170 ----------
171
172 You may wonder, why forest-rewriting, or context rewriting? To be sure,
173 it does not add any computational power to term-rewriting, which is
174 already Turing-complete. But it does add a significant amount of
175 expressiveness. While this expressiveness seems to come at a signficant
176 cost (at least, as imagined in a naïve implementation,) there are two
177 advantages it might provide, one practical and one theoretical, which
178 I'll get to in a second.
179
180 The idea latent in forest-rewriting, which I didn't explain too well in
181 the Arboretuum documentation, is to *partition the subject*. Context
182 rewriting continues and generalizes this idea; while in forest-rewriting
183 it is obvious what the partitions are (named trees in a forest,) in
184 context-rewriting, the partitions would be subterms of some given term
185 (for example, the top-level term.) An engine implementing
186 context-rewriting might need some supplementary information or deductive
187 ability in order to "see" and exploit the partitions, but they could
188 nonetheless be identified.
189
190 One major effect of partitioning is to ease the locality constraint. If
191 you've ever tried programming in pure term-rewriting, you notice that
192 you have to "keep all your state together": if there are multiple pieces
193 of information in the tree of terms that relate to the reduction you
194 want to accomplish, they have to be in a bounded distance from, and in a
195 fixed relationship with, each other. If some piece is far away, it will
196 have to be brought – *literally* brought! – to bear on the situation, by
197 moving it through the tree through successive "bubbling" rewrites.
198
199 Forest-rewriting eases this by having multiple independent trees: some
200 piece of information can be anywhere in some other tree. Context
201 rewriting eases it by having holes in which the piece of information can
202 be found anywhere.
203
204 Partitioning the subject could have the practical benefit of improving
205 locality of reference in the rewriting engine. Each partition can reside
206 in its own memory buffer which is fixed in some way, for example in one
207 or more cache lines. Since we don't need to "bubble" information through
208 the term, each partition can stay in its own cached area, and we should
209 see fewer cache misses.
210
211 Partitioning the subject could also have the theoretical benefit of
212 making it easier to prove that the rewriting terminates. If you look
213 through some of the unit tests in `tests.scm`, you might notice that
214 some of them go to some lengths to avoid rewriting certain trees to
215 anything larger than they were. The size of each partition is then
216 monotonically decreasing, and so it will eventually "run out", at which
217 point the rewriting process must of course terminate. We might not be
218 able to achieve the ideal case where, on each rewrite, at least one of
219 the partitions shrinks and the rest stay the same size. The closer we
220 can come to it, however, the less burdensome should be the task of
221 proving that the entire system terminates, because many of the cases
222 should be trivial.
223
224 Happy whacky rewriting all sorts of fun ways!
225 Chris Pressey
226 Chicago, Illinois
227 April 12, 2008
+0
-217
doc/treacle.html less more
0 <?xml version="1.0"?>
1 <!-- encoding: UTF-8 -->
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
3 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
4 <head>
5 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
6 <title>The Treacle Programming Language</title>
7 <!-- begin html doc dynamic markup -->
8 <script type="text/javascript" src="/contrib/jquery-1.6.4.min.js"></script>
9 <script type="text/javascript" src="/scripts/documentation.js"></script>
10 <!-- end html doc dynamic markup -->
11 </head>
12 <body>
13
14 <h1>The Treacle Programming Language</h1>
15
16 <p>Language version 1.0.<br/>
17 Chris Pressey, Cat's Eye Technologies</p>
18
19 <h2>Introduction</h2>
20
21 <p><dfn>Treacle</dfn> is a programming language based on an extended form of term-rewriting
22 which we shall call, somewhat innacurately (or at least arbitrarily,) <em>context rewriting</em>.</p>
23
24 <p>Like Arboretuum, its successor built around <em>forest-rewriting</em>, Treacle was intended as
25 a language for specifying compilers. Treacle is somewhat more successful at pulling it off, however;
26 context rewriting encompasses, and is more expressive than, forest-rewriting.</p>
27
28 <p><dfn>Context rewriting</dfn> is meant to refer to the fact that Treacle's rewriting patterns
29 may contain <dfn>holes</dfn> – designated "containers" for subpatterns which may match not just
30 the <em>immediate</em> child of the term which the parent pattern matched (as in conventional term-rewriting)
31 but also <em>any one of that child's descendents</em>, no matter how deeply nested.</p>
32
33 <p>When a hole is matched to some term, that term is searched for the
34 subpattern given inside the hole. The search may be performed in either
35 leftmost-innermost or leftmost-outermost order; this is specified
36 by a qualifier associated with the hole. Because of this, Treacle need not
37 specify a language-wide reduction order; the hole construct acts as a kind of
38 search operator which explicitly encodes search order into each pattern.</p>
39
40 <p>Context rewriting also deconstructs the conventional concept of the variable,
41 splitting it into a <dfn>name</dfn> and a <dfn>wildcard</dfn>. Any pattern or
42 subpattern may be named, not just wildcards. Even holes may be named. At the same time,
43 wildcards, which match arbitrary terms, may occur unnamed. Upon a successful match,
44 only those terms which matched named patterns are recorded in the unifier.</p>
45
46 <p>Further, each rule in Treacle may contain multiple terms (<dfn>replacements</dfn>) on the right-hand side
47 of a rewriting rule, and each of these may have its own name. When the term undergoing rewriting
48 (called the <dfn>subject</dfn>) is rewritten, each named replacement is substituted into the subject at the position
49 matched by the part of the pattern that is labelled by that same name.</p>
50
51 <p>Lastly, replacements may contain special atomic terms called <dfn>newrefs</dfn>. When a
52 newref is written into the subject, it takes the form of a new, unique symbol, guaranteed (or at least
53 reasonably assumed) to be different from all other symbols that are in, or could be in, the subject.
54 When multiple newrefs (possibly in multiple replacements) in the same rule are written into the subject
55 at the same time (i.e., on the same rewriting step,)
56 they all take the same form (and so are equal to each other, and only to each other – nothing else.) In Treacle's
57 capacity as a compiler-definition language, newrefs are useful for generating internal labels for,
58 e.g., translating control structures to machine code jumps.</p>
59
60 <p>It is important to remember that, while subpatterns may be nested in holes, and these
61 may in turn contain more holes, there is no corresponding hierarchical nature to the <em>bindings</em>
62 which occur in Treacle patterns: all variables of the same name must unify to equivalent terms, regardless of where they
63 occur in the pattern (inside or outside a hole.)</p>
64
65 <h2>Syntax</h2>
66
67 <p>We're almost ready to give some examples to elucidate all this, but first we need a
68 syntax to give them in. Here it is:</p>
69
70 <ul>
71 <li>atoms are denoted by strings of lower-case letters;</li>
72 <li>terms are denoted by lists of subterms inside parentheses;</li>
73 <li>named terms are denoted by <code>(? <var>name</var> <var>subterm</var>)</code>;</li>
74 <li>holes are denoted by <code>(:i <var>subterm</var>)</code> or <code>(:o <var>subterm</var>)</code>,
75 corresponding to innermost and outermost search order, respectively;</li>
76 <li>wildcards are denoted by <code>*</code>;</li>
77 <li>newrefs are denoted by <code>@</code>; and</li>
78 <li>named replacements are denoted <code>X : <var>term</var></code>.</li>
79 </ul>
80
81 <h2>Examples</h2>
82
83 <p>Now we are ready to give some examples.</p>
84
85 <h3>Patterns</h3>
86
87 <ul>
88 <li>The pattern <code>(a b (? X *))</code> matches <code>(a b (c (d b)))</code>, with the unifier
89 <code>X=(c (d b))</code>. Also, <code>(a (? Y *) (c (d (? Y *))))</code> matches the same subject
90 with <code>Y=b</code>. This is all quite conventional.</li>
91
92 <li>We can also match <code>(a (? X b) *)</code> to this subject. The unifier will <em>always</em> be
93 <code>X=b</code> when this pattern matches, regardless of the subject. This tells us nothing we
94 did not already know. But it demonstrates the decoupling of names and wildcards in Treacle. (It will also become useful when we get to replacements, since that atomic <code>b</code> term named by <code>X</code>
95 can be supplanted by something: we have named not just a subterm, but a location in the subject.)</li>
96
97 <li>The pattern <code>(a b (:i (d b)))</code> matches the subject as well.
98 Observe how the hole allowed <code>(d b)</code> to be sought
99 inside the subterm at the location where the hole matched.
100 Note also that the pattern would just as easily match the subject
101 <code>(a b (w x (w y (w z (d b)))))</code>, because it doesn't matter how
102 deep <code>(d b)</code> is embedded in the subterm.</li>
103
104 <li>If the pattern included a name, like <code>(a b (? X (:i (d b))))</code>,
105 the match with the subject would result in the unifier <code>X=(c (d b))</code>.
106 Likewise, the pattern <code>(a b (:i (? X (d b))))</code> would match the subject
107 with the unifier <code>X=(d b)</code>.</li>
108
109 <li>The pattern <code>(a (? X *) (:i (d (? X *))))</code> also matches the
110 subject, with the unifier <code>X=b</code>. This is a good example of the
111 expressive power of pattern-matching in Treacle: we are basically asking
112 to search the contents of the 3rd subterm, for whatever the 2nd subterm is.</li>
113
114 </ul>
115
116 <h3>Rules</h3>
117
118 <ul>
119
120 <li>Say we have a rule where the pattern is <code>(a b (:i (? X (d b))))</code>,
121 and the lone replacement is <code>X : a</code>. This rule would match the
122 original subject <code>(a b (c (d b)))</code>, unifying with <code>X=(d b)</code>,
123 and would rewrite the subject to <code>(a b (c a))</code>.</li>
124
125 <li>Or, say our rule's pattern is <code>(a (? Y *) (:i (? X (d *))))</code>,
126 and the set of replacements is {<code>X : (? Y)</code>, <code>Y : (? X)</code>}.
127 This rule would also match the subject, with a unifier of {<code>X=(d b)</code>,
128 <code>Y=b</code>}, and would rewrite the subject to <code>(a (d b) (c b))</code>.
129 Again, notice the expressivity of this rule: we're basically asking Treacle to swap whatever
130 occurs next to the <code>a</code>, with whatever occurs alongside a <code>d</code>
131 somewhere inside the term that occurs next to that.</li>
132
133 </ul>
134
135 <h2>Mechanism</h2>
136
137 <p>We can think of the mechanism by which context rewriting is undertaken, as follows.</p>
138
139 <p>We pattern-match "as usual": recursively traverse the pattern and the subject. Where there are
140 literals in the pattern, we make sure those same values appear in the subject, in the same place.
141 Where there are named subpatterns in the pattern, we bind the name to the position in the subject, and
142 insert that binding into a unifier, before trying to match the subpattern to that position.
143 (We do an occurs check first, to make sure that the name isn't already bound to something else.)</p>
144
145 <p>Note that we bind the name, not to a subterm in the subject, but to a <em>position</em> in the subject.
146 If you like, you can think of context rewriting building a "unifier by reference" rather than the rather
147 more conventional "unifier by value". This is useful, because the presence of holes means that we will
148 have more of a need to know where we want to install a replacement.</p>
149
150 <p>When we encounter a hole in the pattern, we take the subpattern that appears in the hole
151 and begin searching for that subpattern in the subterm of the subject whose position corresponds
152 to the hole. We pass this subsearch our unifier (so that it can use the variable bindings already
153 established for occurs checks.) If the subsearch fails to match, then we also fail to match.
154 If the subsearch succeeds, we continue the pattern-matching process with the unifier it produced.</p>
155
156 <p>If everything succeeds, we have a unifier. We go through the replacements, look up the name
157 of each replacement in the unifier to find the location in the subject where it matched, expand all the
158 variable names in the replacement with the contents of the unifier, and "poke" the expanded replacement
159 into the subject at the location.</p>
160
161 <h2>Implementation</h2>
162
163 <p>Like Arboretuum, there is a reference implementation of Treacle in relatively pure Scheme, meant to
164 normatively fill in any gaps in the description of the language given in this document.</p>
165
166 <h2>Discussion</h2>
167
168 <p>You may wonder, why forest-rewriting, or context rewriting? To be sure, it does not add any
169 computational power to term-rewriting, which is already Turing-complete. But it does add a significant
170 amount of expressiveness. While this expressiveness seems to come at a signficant cost (at least,
171 as imagined in a naïve implementation,) there are two advantages it might provide, one practical
172 and one theoretical, which I'll get to in a second.</p>
173
174 <p>The idea latent in forest-rewriting, which I didn't explain too well in the Arboretuum documentation,
175 is to <em>partition the subject</em>. Context rewriting continues and generalizes this idea;
176 while in forest-rewriting it is obvious what the partitions are (named trees in a forest,) in
177 context-rewriting, the partitions would be subterms of some given term (for example, the top-level
178 term.) An engine implementing context-rewriting might need some supplementary information
179 or deductive ability in order to "see" and exploit the partitions, but they could nonetheless be
180 identified.</p>
181
182 <p>One major effect of partitioning is to ease the locality constraint. If you've ever tried programming
183 in pure term-rewriting, you notice that you have to "keep all your state together": if there are
184 multiple pieces of information in the tree of terms that relate to the reduction you want to accomplish,
185 they have to be in a bounded distance from, and in a fixed relationship with, each other.
186 If some piece is far away, it will have to be brought – <em>literally</em> brought! –
187 to bear on the situation, by moving it through the tree through successive "bubbling" rewrites.</p>
188
189 <p>Forest-rewriting eases this by having multiple independent trees: some piece of information
190 can be anywhere in some other tree. Context rewriting eases it by having holes in which the
191 piece of information can be found anywhere.</p>
192
193 <p>Partitioning the subject could have the practical benefit of improving locality of reference
194 in the rewriting engine. Each partition can reside in its own memory buffer which is fixed in
195 some way, for example in one or more cache lines. Since we don't need to "bubble" information
196 through the term, each partition can stay in its own cached area, and we should see fewer
197 cache misses.</p>
198
199 <p>Partitioning the subject could also have the theoretical benefit of making it easier
200 to prove that the rewriting terminates, If you look through some of the unit tests
201 in <code>tests.scm</code>, you might notice that some of them go to some lengths to avoid
202 rewriting certain trees to anything larger than they were.
203 The size of each partition is then monotonically decreasing, and so it will eventually "run out",
204 at which point the rewriting process must of course terminate. We might not be able to
205 achieve the ideal case where, on each rewrite, at least one of the partitions shrinks and the
206 rest stay the same size. The closer we can come to it, however, the less burdensome should be
207 the task of proving that the entire system terminates, because many of the cases should
208 be trivial.</p>
209
210 <p>Happy whacky rewriting all sorts of fun ways!
211 <br/>Chris Pressey
212 <br/>Chicago, Illinois
213 <br/>April 12, 2008</p>
214
215 </body>
216 </html>