Convert CRLF EOLs to LFs.
catseye
13 years ago
0 | The Hev Programming Language | |
1 | ============================ | |
2 | ||
3 | Introduction | |
4 | ------------ | |
5 | ||
6 | Hey, does this thing look at all familiar? | |
7 | ||
8 | () [] -> . ++ -- Left to right | |
9 | ++ -- + - ! ~ * & (type) sizeof Right to left | |
10 | * / % Left to right | |
11 | + - Left to right | |
12 | << >> Left to right | |
13 | < <= > >= Left to right | |
14 | == != Left to right | |
15 | & Left to right | |
16 | ^ Left to right | |
17 | | Left to right | |
18 | && Left to right | |
19 | || Left to right | |
20 | ?: Left to right | |
21 | = += -= *= /= %= &= ^= |= <<= >>= Right to left | |
22 | , Left to right | |
23 | ||
24 | That's right: it's the precedence table for operators in the C language. | |
25 | OK, some of them (like `()`) are pretty easy to remember, I guess, but | |
26 | the logic behind most of these choices escapes me. Really, can you give | |
27 | me a *good* reason for why `&` should have a higher precedence than `|`? | |
28 | And why is `^` in-between? And why in the world are `->` and `++` on the | |
29 | same level? What I'm getting at is, how many times did you have to go | |
30 | and reference this chart before these arbitrary rules got burned into | |
31 | your nervous system somewhere between your brain and your fingers? | |
32 | ||
33 | And hey, you think that's bad? Perl 5 has like 129 operators at like 24 | |
34 | levels of precedence. | |
35 | ||
36 | You may well ask: is there something that can save us from this | |
37 | insanity? | |
38 | ||
39 | Yes, there is. In this document I will describe Hev, a novel, nay, | |
40 | innovative, nay, radical, nay, revolutionary, nay, *totally gnarly* new | |
41 | programming language which provides **the infix you love** with | |
42 | **an unlimited number of operator precedence levels** and **absolutely | |
43 | no need for parentheses or memorization!** | |
44 | ||
45 | Sound too good to be true...? Read on! | |
46 | ||
47 | Syntax | |
48 | ------ | |
49 | ||
50 | Hev's breathtaking syntactic slight-of-hand is accomplished by a | |
51 | synergistic combination of two features: | |
52 | ||
53 | - Have an unbounded number of infix binary operators. | |
54 | - Make precedence explicit. | |
55 | ||
56 | To fit this bill, all we need is a single syntactic construct that can | |
57 | explicitly express an unbounded number of discrete operators, and at the | |
58 | same time, their precedence. | |
59 | ||
60 | Well, I chose integers. | |
61 | ||
62 | Positive integers, to be precise. So `3` is an infix operator. So is | |
63 | `15`, and it has a higher precedence than `3`. So is `514229`, and it | |
64 | has an even higher precedence than `15`, but lower than | |
65 | `25852016738884976640000`. *See* how easy it is? I can just name two | |
66 | operators at random, and you can tell me which one has the higher | |
67 | precedence without a second thought! | |
68 | ||
69 | Oh, but what good are operators if they don't have anything to operate | |
70 | on? We need values, too. And since we have an unbounded number of | |
71 | operators, there's a certain sense to having only a bounded number of | |
72 | values. | |
73 | ||
74 | Well, why not the logical extreme: *no values at all*? Well, OK, for the | |
75 | sake of syntax we need to have one value, but since there's nothing it | |
76 | can be differentiated against, it's effectively no values. | |
77 | Syntactically, this value, or lack thereof, is denoted `,`. (Yeah, | |
78 | that's a comma.) | |
79 | ||
80 | And, we'll probably need variables at some point, too, I'm guessing. We | |
81 | should probably have a nice big supply of those, just so we don't run | |
82 | into some artifical bound at some point that arbitrarily prevents Hev | |
83 | from being Turing-complete. So, let's say that any string of consecutive | |
84 | symbols drawn from `+`, `-`, `*` and `/` is an identifier for a | |
85 | variable. That should do nicely. | |
86 | ||
87 | There's still a bit of a problem, though -- those pesky parentheses. You | |
88 | might need to nest a `5`-expression into the LHS or the RHS of a | |
89 | `3`-expression, and that would seemingly require parentheses. How do we | |
90 | avoid this? Well -- if we're flexible on what `3` and `5` actually | |
91 | *mean*, maybe we can just avoid this dilemma entirely! This brings us | |
92 | to... | |
93 | ||
94 | Semantics | |
95 | --------- | |
96 | ||
97 | So we have all these infix binary operators, and this one value which I | |
98 | insist is essentially a non-value, and we need to be able to make | |
99 | something sensible out of this mess -- *without* using parentheses to do | |
100 | nesting. | |
101 | ||
102 | Well, what can we build? | |
103 | ||
104 | Trees. | |
105 | ||
106 | Yep, binary trees. They're a bit unlike the "normal" trees of Computer | |
107 | Science, which almost universally have some sort of values stored at | |
108 | their leaves. These ones don't. They're just... you know, trees. But we | |
109 | can definately build them. And we don't need any parentheses. If you | |
110 | want to nest some expression inside another, you just pick operators | |
111 | with higher precedence levels for that expression. | |
112 | ||
113 | So `,5,10,5,` is a tree - a complete binary tree with 3 levels - a root | |
114 | node (`10`), two intermediate nodes (both `5`) and four leaves (`,,,,`) | |
115 | with no values in them (or a single, meaningless value, repeated four | |
116 | times, if you like.) And please realize that this is the *same* tree as | |
117 | `,1,3,2,` -- it's just that different operators were used to construct | |
118 | it. Those operators aren't "in" the tree in any sense, and their | |
119 | magnitude is used only to determine their precedence. | |
120 | ||
121 | But now for the splendid part. We can put *variables* in these trees! | |
122 | Which means, we can think of them as *patterns* that can match other | |
123 | trees. Which means, we can specify *rules* as pairs of patterns and | |
124 | substitutions, to be substituted in when the pattern matches. Which | |
125 | means, we can construct a rule-based language! A rewriting language, in | |
126 | fact. I think I'll call this approach valueless tree rewriting. | |
127 | ||
128 | So, for example, the tree `+10*` matches that tree `,5,10,5,` given | |
129 | above. The variables `+` and `*` both unify with `,5,`. But note that | |
130 | this pattern matches `,41,76,` too, where `+` unifies with `,41,` and | |
131 | `*` unifies with `,`. And in fact it matches countless other possible | |
132 | valueless trees. | |
133 | ||
134 | Execution Model | |
135 | --------------- | |
136 | ||
137 | A Hev program consists of a valueless binary tree. The left branch of | |
138 | the root leads to a ruleset; the right branch leads to a valueless | |
139 | binary tree which represents the data of the program: it is the state of | |
140 | the program, the thing that is being rewritten. This data tree may not | |
141 | contain any variables: the leaves must be entirely `,`'s. | |
142 | ||
143 | A ruleset consists of a node where the left branch leads to either a | |
144 | ruleset or to a `,` and the right branch leads to a rule. A rule is a | |
145 | node where the left branch is a pattern and the right branch is a | |
146 | substitution. The pattern is a valueless binary tree which may contain | |
147 | not only `,`'s but also any variables at its leaves. The substitution | |
148 | may contain both `,`'s and variables, but it may not contain any | |
149 | variables which do not appear in the corresponding pattern of the rule. | |
150 | ||
151 | Each rule in the ruleset is considered in turn, starting with the rule | |
152 | nearest the root. The pattern of the rule is matched against the data | |
153 | tree. The structure of the tree must match some subtree of the data | |
154 | tree; a variable can match any structure of the data tree, but no | |
155 | variable can match two different structures. (The same variable | |
156 | identifier may appear multiple times in a pattern; all instances of that | |
157 | variable must match the same structure.) If there are multiple subtrees | |
158 | of the data tree that match, only the **topmost** one is considered. | |
159 | This is usually called "top-down rewriting". | |
160 | ||
161 | When a match occurs, the substitution of the rule is instantiated. Any | |
162 | variables occuring in the substitution are replaced with the structures | |
163 | that those variables matched in the pattern. (This is why all the | |
164 | variables appearing in the substitution must also appear in the | |
165 | pattern.) The data tree is then modified: the subtree that was matched | |
166 | is removed and in its place the instantiated substitution is grafted. | |
167 | The process then repeats (starting over with the topmost rule.) | |
168 | ||
169 | When a rule fails to match, the data tree is left alone and the next | |
170 | rule (one node lower down in the ruleset) is tried. When there are no | |
171 | more rules to try in the ruleset, the program ends. | |
172 | ||
173 | Miscellaneous Notes | |
174 | ------------------- | |
175 | ||
176 | You can leave out the `,` at the very beginning and very end of a Hev | |
177 | program. It's implied. Also, whitespace is allowed, even between the | |
178 | digits of an operator or the symbols of a variable... for whatever good | |
179 | it'll do you. | |
180 | ||
181 | Implementation | |
182 | -------------- | |
183 | ||
184 | `hev.hs` is a reference implementation of Hev in Haskell. It can be used | |
185 | as something to check this language description against - any | |
186 | discrepancy is either a bug in the implementation, or an error in this | |
187 | document. `hev.hs` shouldn't be used as an official reference for Hev | |
188 | behaviour that's not described in this document, but heck, it's better | |
189 | than nothing, right? | |
190 | ||
191 | History | |
192 | ------- | |
193 | ||
194 | It was sometime in November of 2005 when I came up with the idea to try | |
195 | to "break the precedence barrier" and started writing Hev. I continued | |
196 | to refine the idea and worked on it, on and off, after that. In October | |
197 | of 2006 I got a stubborn notion in my head that the parser should only | |
198 | make one pass over the program text, so I wasted a day trying to figure | |
199 | out how to code that in Haskell. In June of 2007 I finally got down to | |
200 | writing test cases and debugging it. | |
201 | ||
202 | Happy `,`! | |
203 | ||
204 | -Chris Pressey | |
205 | Cat's Eye Technologies | |
206 | June 17, 2007 | |
207 | Vancouver, BC | |
0 | The Hev Programming Language | |
1 | ============================ | |
2 | ||
3 | Introduction | |
4 | ------------ | |
5 | ||
6 | Hey, does this thing look at all familiar? | |
7 | ||
8 | () [] -> . ++ -- Left to right | |
9 | ++ -- + - ! ~ * & (type) sizeof Right to left | |
10 | * / % Left to right | |
11 | + - Left to right | |
12 | << >> Left to right | |
13 | < <= > >= Left to right | |
14 | == != Left to right | |
15 | & Left to right | |
16 | ^ Left to right | |
17 | | Left to right | |
18 | && Left to right | |
19 | || Left to right | |
20 | ?: Left to right | |
21 | = += -= *= /= %= &= ^= |= <<= >>= Right to left | |
22 | , Left to right | |
23 | ||
24 | That's right: it's the precedence table for operators in the C language. | |
25 | OK, some of them (like `()`) are pretty easy to remember, I guess, but | |
26 | the logic behind most of these choices escapes me. Really, can you give | |
27 | me a *good* reason for why `&` should have a higher precedence than `|`? | |
28 | And why is `^` in-between? And why in the world are `->` and `++` on the | |
29 | same level? What I'm getting at is, how many times did you have to go | |
30 | and reference this chart before these arbitrary rules got burned into | |
31 | your nervous system somewhere between your brain and your fingers? | |
32 | ||
33 | And hey, you think that's bad? Perl 5 has like 129 operators at like 24 | |
34 | levels of precedence. | |
35 | ||
36 | You may well ask: is there something that can save us from this | |
37 | insanity? | |
38 | ||
39 | Yes, there is. In this document I will describe Hev, a novel, nay, | |
40 | innovative, nay, radical, nay, revolutionary, nay, *totally gnarly* new | |
41 | programming language which provides **the infix you love** with | |
42 | **an unlimited number of operator precedence levels** and **absolutely | |
43 | no need for parentheses or memorization!** | |
44 | ||
45 | Sound too good to be true...? Read on! | |
46 | ||
47 | Syntax | |
48 | ------ | |
49 | ||
50 | Hev's breathtaking syntactic slight-of-hand is accomplished by a | |
51 | synergistic combination of two features: | |
52 | ||
53 | - Have an unbounded number of infix binary operators. | |
54 | - Make precedence explicit. | |
55 | ||
56 | To fit this bill, all we need is a single syntactic construct that can | |
57 | explicitly express an unbounded number of discrete operators, and at the | |
58 | same time, their precedence. | |
59 | ||
60 | Well, I chose integers. | |
61 | ||
62 | Positive integers, to be precise. So `3` is an infix operator. So is | |
63 | `15`, and it has a higher precedence than `3`. So is `514229`, and it | |
64 | has an even higher precedence than `15`, but lower than | |
65 | `25852016738884976640000`. *See* how easy it is? I can just name two | |
66 | operators at random, and you can tell me which one has the higher | |
67 | precedence without a second thought! | |
68 | ||
69 | Oh, but what good are operators if they don't have anything to operate | |
70 | on? We need values, too. And since we have an unbounded number of | |
71 | operators, there's a certain sense to having only a bounded number of | |
72 | values. | |
73 | ||
74 | Well, why not the logical extreme: *no values at all*? Well, OK, for the | |
75 | sake of syntax we need to have one value, but since there's nothing it | |
76 | can be differentiated against, it's effectively no values. | |
77 | Syntactically, this value, or lack thereof, is denoted `,`. (Yeah, | |
78 | that's a comma.) | |
79 | ||
80 | And, we'll probably need variables at some point, too, I'm guessing. We | |
81 | should probably have a nice big supply of those, just so we don't run | |
82 | into some artifical bound at some point that arbitrarily prevents Hev | |
83 | from being Turing-complete. So, let's say that any string of consecutive | |
84 | symbols drawn from `+`, `-`, `*` and `/` is an identifier for a | |
85 | variable. That should do nicely. | |
86 | ||
87 | There's still a bit of a problem, though -- those pesky parentheses. You | |
88 | might need to nest a `5`-expression into the LHS or the RHS of a | |
89 | `3`-expression, and that would seemingly require parentheses. How do we | |
90 | avoid this? Well -- if we're flexible on what `3` and `5` actually | |
91 | *mean*, maybe we can just avoid this dilemma entirely! This brings us | |
92 | to... | |
93 | ||
94 | Semantics | |
95 | --------- | |
96 | ||
97 | So we have all these infix binary operators, and this one value which I | |
98 | insist is essentially a non-value, and we need to be able to make | |
99 | something sensible out of this mess -- *without* using parentheses to do | |
100 | nesting. | |
101 | ||
102 | Well, what can we build? | |
103 | ||
104 | Trees. | |
105 | ||
106 | Yep, binary trees. They're a bit unlike the "normal" trees of Computer | |
107 | Science, which almost universally have some sort of values stored at | |
108 | their leaves. These ones don't. They're just... you know, trees. But we | |
109 | can definately build them. And we don't need any parentheses. If you | |
110 | want to nest some expression inside another, you just pick operators | |
111 | with higher precedence levels for that expression. | |
112 | ||
113 | So `,5,10,5,` is a tree - a complete binary tree with 3 levels - a root | |
114 | node (`10`), two intermediate nodes (both `5`) and four leaves (`,,,,`) | |
115 | with no values in them (or a single, meaningless value, repeated four | |
116 | times, if you like.) And please realize that this is the *same* tree as | |
117 | `,1,3,2,` -- it's just that different operators were used to construct | |
118 | it. Those operators aren't "in" the tree in any sense, and their | |
119 | magnitude is used only to determine their precedence. | |
120 | ||
121 | But now for the splendid part. We can put *variables* in these trees! | |
122 | Which means, we can think of them as *patterns* that can match other | |
123 | trees. Which means, we can specify *rules* as pairs of patterns and | |
124 | substitutions, to be substituted in when the pattern matches. Which | |
125 | means, we can construct a rule-based language! A rewriting language, in | |
126 | fact. I think I'll call this approach valueless tree rewriting. | |
127 | ||
128 | So, for example, the tree `+10*` matches that tree `,5,10,5,` given | |
129 | above. The variables `+` and `*` both unify with `,5,`. But note that | |
130 | this pattern matches `,41,76,` too, where `+` unifies with `,41,` and | |
131 | `*` unifies with `,`. And in fact it matches countless other possible | |
132 | valueless trees. | |
133 | ||
134 | Execution Model | |
135 | --------------- | |
136 | ||
137 | A Hev program consists of a valueless binary tree. The left branch of | |
138 | the root leads to a ruleset; the right branch leads to a valueless | |
139 | binary tree which represents the data of the program: it is the state of | |
140 | the program, the thing that is being rewritten. This data tree may not | |
141 | contain any variables: the leaves must be entirely `,`'s. | |
142 | ||
143 | A ruleset consists of a node where the left branch leads to either a | |
144 | ruleset or to a `,` and the right branch leads to a rule. A rule is a | |
145 | node where the left branch is a pattern and the right branch is a | |
146 | substitution. The pattern is a valueless binary tree which may contain | |
147 | not only `,`'s but also any variables at its leaves. The substitution | |
148 | may contain both `,`'s and variables, but it may not contain any | |
149 | variables which do not appear in the corresponding pattern of the rule. | |
150 | ||
151 | Each rule in the ruleset is considered in turn, starting with the rule | |
152 | nearest the root. The pattern of the rule is matched against the data | |
153 | tree. The structure of the tree must match some subtree of the data | |
154 | tree; a variable can match any structure of the data tree, but no | |
155 | variable can match two different structures. (The same variable | |
156 | identifier may appear multiple times in a pattern; all instances of that | |
157 | variable must match the same structure.) If there are multiple subtrees | |
158 | of the data tree that match, only the **topmost** one is considered. | |
159 | This is usually called "top-down rewriting". | |
160 | ||
161 | When a match occurs, the substitution of the rule is instantiated. Any | |
162 | variables occuring in the substitution are replaced with the structures | |
163 | that those variables matched in the pattern. (This is why all the | |
164 | variables appearing in the substitution must also appear in the | |
165 | pattern.) The data tree is then modified: the subtree that was matched | |
166 | is removed and in its place the instantiated substitution is grafted. | |
167 | The process then repeats (starting over with the topmost rule.) | |
168 | ||
169 | When a rule fails to match, the data tree is left alone and the next | |
170 | rule (one node lower down in the ruleset) is tried. When there are no | |
171 | more rules to try in the ruleset, the program ends. | |
172 | ||
173 | Miscellaneous Notes | |
174 | ------------------- | |
175 | ||
176 | You can leave out the `,` at the very beginning and very end of a Hev | |
177 | program. It's implied. Also, whitespace is allowed, even between the | |
178 | digits of an operator or the symbols of a variable... for whatever good | |
179 | it'll do you. | |
180 | ||
181 | Implementation | |
182 | -------------- | |
183 | ||
184 | `hev.hs` is a reference implementation of Hev in Haskell. It can be used | |
185 | as something to check this language description against - any | |
186 | discrepancy is either a bug in the implementation, or an error in this | |
187 | document. `hev.hs` shouldn't be used as an official reference for Hev | |
188 | behaviour that's not described in this document, but heck, it's better | |
189 | than nothing, right? | |
190 | ||
191 | History | |
192 | ------- | |
193 | ||
194 | It was sometime in November of 2005 when I came up with the idea to try | |
195 | to "break the precedence barrier" and started writing Hev. I continued | |
196 | to refine the idea and worked on it, on and off, after that. In October | |
197 | of 2006 I got a stubborn notion in my head that the parser should only | |
198 | make one pass over the program text, so I wasted a day trying to figure | |
199 | out how to code that in Haskell. In June of 2007 I finally got down to | |
200 | writing test cases and debugging it. | |
201 | ||
202 | Happy `,`! | |
203 | ||
204 | -Chris Pressey | |
205 | Cat's Eye Technologies | |
206 | June 17, 2007 | |
207 | Vancouver, BC |