git @ Cat's Eye Technologies Hev / f1129ef
Convert CRLF EOLs to LFs. catseye 13 years ago
1 changed file(s) with 208 addition(s) and 208 deletion(s). Raw diff Collapse all Expand all
0 The Hev Programming Language
1 ============================
2
3 Introduction
4 ------------
5
6 Hey, does this thing look at all familiar?
7
8 () [] -> . ++ -- Left to right
9 ++ -- + - ! ~ * & (type) sizeof Right to left
10 * / % Left to right
11 + - Left to right
12 << >> Left to right
13 < <= > >= Left to right
14 == != Left to right
15 & Left to right
16 ^ Left to right
17 | Left to right
18 && Left to right
19 || Left to right
20 ?: Left to right
21 = += -= *= /= %= &= ^= |= <<= >>= Right to left
22 , Left to right
23
24 That's right: it's the precedence table for operators in the C language.
25 OK, some of them (like `()`) are pretty easy to remember, I guess, but
26 the logic behind most of these choices escapes me. Really, can you give
27 me a *good* reason for why `&` should have a higher precedence than `|`?
28 And why is `^` in-between? And why in the world are `->` and `++` on the
29 same level? What I'm getting at is, how many times did you have to go
30 and reference this chart before these arbitrary rules got burned into
31 your nervous system somewhere between your brain and your fingers?
32
33 And hey, you think that's bad? Perl 5 has like 129 operators at like 24
34 levels of precedence.
35
36 You may well ask: is there something that can save us from this
37 insanity?
38
39 Yes, there is. In this document I will describe Hev, a novel, nay,
40 innovative, nay, radical, nay, revolutionary, nay, *totally gnarly* new
41 programming language which provides **the infix you love** with
42 **an unlimited number of operator precedence levels** and **absolutely
43 no need for parentheses or memorization!**
44
45 Sound too good to be true...? Read on!
46
47 Syntax
48 ------
49
50 Hev's breathtaking syntactic slight-of-hand is accomplished by a
51 synergistic combination of two features:
52
53 - Have an unbounded number of infix binary operators.
54 - Make precedence explicit.
55
56 To fit this bill, all we need is a single syntactic construct that can
57 explicitly express an unbounded number of discrete operators, and at the
58 same time, their precedence.
59
60 Well, I chose integers.
61
62 Positive integers, to be precise. So `3` is an infix operator. So is
63 `15`, and it has a higher precedence than `3`. So is `514229`, and it
64 has an even higher precedence than `15`, but lower than
65 `25852016738884976640000`. *See* how easy it is? I can just name two
66 operators at random, and you can tell me which one has the higher
67 precedence without a second thought!
68
69 Oh, but what good are operators if they don't have anything to operate
70 on? We need values, too. And since we have an unbounded number of
71 operators, there's a certain sense to having only a bounded number of
72 values.
73
74 Well, why not the logical extreme: *no values at all*? Well, OK, for the
75 sake of syntax we need to have one value, but since there's nothing it
76 can be differentiated against, it's effectively no values.
77 Syntactically, this value, or lack thereof, is denoted `,`. (Yeah,
78 that's a comma.)
79
80 And, we'll probably need variables at some point, too, I'm guessing. We
81 should probably have a nice big supply of those, just so we don't run
82 into some artifical bound at some point that arbitrarily prevents Hev
83 from being Turing-complete. So, let's say that any string of consecutive
84 symbols drawn from `+`, `-`, `*` and `/` is an identifier for a
85 variable. That should do nicely.
86
87 There's still a bit of a problem, though -- those pesky parentheses. You
88 might need to nest a `5`-expression into the LHS or the RHS of a
89 `3`-expression, and that would seemingly require parentheses. How do we
90 avoid this? Well -- if we're flexible on what `3` and `5` actually
91 *mean*, maybe we can just avoid this dilemma entirely! This brings us
92 to...
93
94 Semantics
95 ---------
96
97 So we have all these infix binary operators, and this one value which I
98 insist is essentially a non-value, and we need to be able to make
99 something sensible out of this mess -- *without* using parentheses to do
100 nesting.
101
102 Well, what can we build?
103
104 Trees.
105
106 Yep, binary trees. They're a bit unlike the "normal" trees of Computer
107 Science, which almost universally have some sort of values stored at
108 their leaves. These ones don't. They're just... you know, trees. But we
109 can definately build them. And we don't need any parentheses. If you
110 want to nest some expression inside another, you just pick operators
111 with higher precedence levels for that expression.
112
113 So `,5,10,5,` is a tree - a complete binary tree with 3 levels - a root
114 node (`10`), two intermediate nodes (both `5`) and four leaves (`,,,,`)
115 with no values in them (or a single, meaningless value, repeated four
116 times, if you like.) And please realize that this is the *same* tree as
117 `,1,3,2,` -- it's just that different operators were used to construct
118 it. Those operators aren't "in" the tree in any sense, and their
119 magnitude is used only to determine their precedence.
120
121 But now for the splendid part. We can put *variables* in these trees!
122 Which means, we can think of them as *patterns* that can match other
123 trees. Which means, we can specify *rules* as pairs of patterns and
124 substitutions, to be substituted in when the pattern matches. Which
125 means, we can construct a rule-based language! A rewriting language, in
126 fact. I think I'll call this approach valueless tree rewriting.
127
128 So, for example, the tree `+10*` matches that tree `,5,10,5,` given
129 above. The variables `+` and `*` both unify with `,5,`. But note that
130 this pattern matches `,41,76,` too, where `+` unifies with `,41,` and
131 `*` unifies with `,`. And in fact it matches countless other possible
132 valueless trees.
133
134 Execution Model
135 ---------------
136
137 A Hev program consists of a valueless binary tree. The left branch of
138 the root leads to a ruleset; the right branch leads to a valueless
139 binary tree which represents the data of the program: it is the state of
140 the program, the thing that is being rewritten. This data tree may not
141 contain any variables: the leaves must be entirely `,`'s.
142
143 A ruleset consists of a node where the left branch leads to either a
144 ruleset or to a `,` and the right branch leads to a rule. A rule is a
145 node where the left branch is a pattern and the right branch is a
146 substitution. The pattern is a valueless binary tree which may contain
147 not only `,`'s but also any variables at its leaves. The substitution
148 may contain both `,`'s and variables, but it may not contain any
149 variables which do not appear in the corresponding pattern of the rule.
150
151 Each rule in the ruleset is considered in turn, starting with the rule
152 nearest the root. The pattern of the rule is matched against the data
153 tree. The structure of the tree must match some subtree of the data
154 tree; a variable can match any structure of the data tree, but no
155 variable can match two different structures. (The same variable
156 identifier may appear multiple times in a pattern; all instances of that
157 variable must match the same structure.) If there are multiple subtrees
158 of the data tree that match, only the **topmost** one is considered.
159 This is usually called "top-down rewriting".
160
161 When a match occurs, the substitution of the rule is instantiated. Any
162 variables occuring in the substitution are replaced with the structures
163 that those variables matched in the pattern. (This is why all the
164 variables appearing in the substitution must also appear in the
165 pattern.) The data tree is then modified: the subtree that was matched
166 is removed and in its place the instantiated substitution is grafted.
167 The process then repeats (starting over with the topmost rule.)
168
169 When a rule fails to match, the data tree is left alone and the next
170 rule (one node lower down in the ruleset) is tried. When there are no
171 more rules to try in the ruleset, the program ends.
172
173 Miscellaneous Notes
174 -------------------
175
176 You can leave out the `,` at the very beginning and very end of a Hev
177 program. It's implied. Also, whitespace is allowed, even between the
178 digits of an operator or the symbols of a variable... for whatever good
179 it'll do you.
180
181 Implementation
182 --------------
183
184 `hev.hs` is a reference implementation of Hev in Haskell. It can be used
185 as something to check this language description against - any
186 discrepancy is either a bug in the implementation, or an error in this
187 document. `hev.hs` shouldn't be used as an official reference for Hev
188 behaviour that's not described in this document, but heck, it's better
189 than nothing, right?
190
191 History
192 -------
193
194 It was sometime in November of 2005 when I came up with the idea to try
195 to "break the precedence barrier" and started writing Hev. I continued
196 to refine the idea and worked on it, on and off, after that. In October
197 of 2006 I got a stubborn notion in my head that the parser should only
198 make one pass over the program text, so I wasted a day trying to figure
199 out how to code that in Haskell. In June of 2007 I finally got down to
200 writing test cases and debugging it.
201
202 Happy `,`!
203
204 -Chris Pressey
205 Cat's Eye Technologies
206 June 17, 2007
207 Vancouver, BC
0 The Hev Programming Language
1 ============================
2
3 Introduction
4 ------------
5
6 Hey, does this thing look at all familiar?
7
8 () [] -> . ++ -- Left to right
9 ++ -- + - ! ~ * & (type) sizeof Right to left
10 * / % Left to right
11 + - Left to right
12 << >> Left to right
13 < <= > >= Left to right
14 == != Left to right
15 & Left to right
16 ^ Left to right
17 | Left to right
18 && Left to right
19 || Left to right
20 ?: Left to right
21 = += -= *= /= %= &= ^= |= <<= >>= Right to left
22 , Left to right
23
24 That's right: it's the precedence table for operators in the C language.
25 OK, some of them (like `()`) are pretty easy to remember, I guess, but
26 the logic behind most of these choices escapes me. Really, can you give
27 me a *good* reason for why `&` should have a higher precedence than `|`?
28 And why is `^` in-between? And why in the world are `->` and `++` on the
29 same level? What I'm getting at is, how many times did you have to go
30 and reference this chart before these arbitrary rules got burned into
31 your nervous system somewhere between your brain and your fingers?
32
33 And hey, you think that's bad? Perl 5 has like 129 operators at like 24
34 levels of precedence.
35
36 You may well ask: is there something that can save us from this
37 insanity?
38
39 Yes, there is. In this document I will describe Hev, a novel, nay,
40 innovative, nay, radical, nay, revolutionary, nay, *totally gnarly* new
41 programming language which provides **the infix you love** with
42 **an unlimited number of operator precedence levels** and **absolutely
43 no need for parentheses or memorization!**
44
45 Sound too good to be true...? Read on!
46
47 Syntax
48 ------
49
50 Hev's breathtaking syntactic slight-of-hand is accomplished by a
51 synergistic combination of two features:
52
53 - Have an unbounded number of infix binary operators.
54 - Make precedence explicit.
55
56 To fit this bill, all we need is a single syntactic construct that can
57 explicitly express an unbounded number of discrete operators, and at the
58 same time, their precedence.
59
60 Well, I chose integers.
61
62 Positive integers, to be precise. So `3` is an infix operator. So is
63 `15`, and it has a higher precedence than `3`. So is `514229`, and it
64 has an even higher precedence than `15`, but lower than
65 `25852016738884976640000`. *See* how easy it is? I can just name two
66 operators at random, and you can tell me which one has the higher
67 precedence without a second thought!
68
69 Oh, but what good are operators if they don't have anything to operate
70 on? We need values, too. And since we have an unbounded number of
71 operators, there's a certain sense to having only a bounded number of
72 values.
73
74 Well, why not the logical extreme: *no values at all*? Well, OK, for the
75 sake of syntax we need to have one value, but since there's nothing it
76 can be differentiated against, it's effectively no values.
77 Syntactically, this value, or lack thereof, is denoted `,`. (Yeah,
78 that's a comma.)
79
80 And, we'll probably need variables at some point, too, I'm guessing. We
81 should probably have a nice big supply of those, just so we don't run
82 into some artifical bound at some point that arbitrarily prevents Hev
83 from being Turing-complete. So, let's say that any string of consecutive
84 symbols drawn from `+`, `-`, `*` and `/` is an identifier for a
85 variable. That should do nicely.
86
87 There's still a bit of a problem, though -- those pesky parentheses. You
88 might need to nest a `5`-expression into the LHS or the RHS of a
89 `3`-expression, and that would seemingly require parentheses. How do we
90 avoid this? Well -- if we're flexible on what `3` and `5` actually
91 *mean*, maybe we can just avoid this dilemma entirely! This brings us
92 to...
93
94 Semantics
95 ---------
96
97 So we have all these infix binary operators, and this one value which I
98 insist is essentially a non-value, and we need to be able to make
99 something sensible out of this mess -- *without* using parentheses to do
100 nesting.
101
102 Well, what can we build?
103
104 Trees.
105
106 Yep, binary trees. They're a bit unlike the "normal" trees of Computer
107 Science, which almost universally have some sort of values stored at
108 their leaves. These ones don't. They're just... you know, trees. But we
109 can definately build them. And we don't need any parentheses. If you
110 want to nest some expression inside another, you just pick operators
111 with higher precedence levels for that expression.
112
113 So `,5,10,5,` is a tree - a complete binary tree with 3 levels - a root
114 node (`10`), two intermediate nodes (both `5`) and four leaves (`,,,,`)
115 with no values in them (or a single, meaningless value, repeated four
116 times, if you like.) And please realize that this is the *same* tree as
117 `,1,3,2,` -- it's just that different operators were used to construct
118 it. Those operators aren't "in" the tree in any sense, and their
119 magnitude is used only to determine their precedence.
120
121 But now for the splendid part. We can put *variables* in these trees!
122 Which means, we can think of them as *patterns* that can match other
123 trees. Which means, we can specify *rules* as pairs of patterns and
124 substitutions, to be substituted in when the pattern matches. Which
125 means, we can construct a rule-based language! A rewriting language, in
126 fact. I think I'll call this approach valueless tree rewriting.
127
128 So, for example, the tree `+10*` matches that tree `,5,10,5,` given
129 above. The variables `+` and `*` both unify with `,5,`. But note that
130 this pattern matches `,41,76,` too, where `+` unifies with `,41,` and
131 `*` unifies with `,`. And in fact it matches countless other possible
132 valueless trees.
133
134 Execution Model
135 ---------------
136
137 A Hev program consists of a valueless binary tree. The left branch of
138 the root leads to a ruleset; the right branch leads to a valueless
139 binary tree which represents the data of the program: it is the state of
140 the program, the thing that is being rewritten. This data tree may not
141 contain any variables: the leaves must be entirely `,`'s.
142
143 A ruleset consists of a node where the left branch leads to either a
144 ruleset or to a `,` and the right branch leads to a rule. A rule is a
145 node where the left branch is a pattern and the right branch is a
146 substitution. The pattern is a valueless binary tree which may contain
147 not only `,`'s but also any variables at its leaves. The substitution
148 may contain both `,`'s and variables, but it may not contain any
149 variables which do not appear in the corresponding pattern of the rule.
150
151 Each rule in the ruleset is considered in turn, starting with the rule
152 nearest the root. The pattern of the rule is matched against the data
153 tree. The structure of the tree must match some subtree of the data
154 tree; a variable can match any structure of the data tree, but no
155 variable can match two different structures. (The same variable
156 identifier may appear multiple times in a pattern; all instances of that
157 variable must match the same structure.) If there are multiple subtrees
158 of the data tree that match, only the **topmost** one is considered.
159 This is usually called "top-down rewriting".
160
161 When a match occurs, the substitution of the rule is instantiated. Any
162 variables occuring in the substitution are replaced with the structures
163 that those variables matched in the pattern. (This is why all the
164 variables appearing in the substitution must also appear in the
165 pattern.) The data tree is then modified: the subtree that was matched
166 is removed and in its place the instantiated substitution is grafted.
167 The process then repeats (starting over with the topmost rule.)
168
169 When a rule fails to match, the data tree is left alone and the next
170 rule (one node lower down in the ruleset) is tried. When there are no
171 more rules to try in the ruleset, the program ends.
172
173 Miscellaneous Notes
174 -------------------
175
176 You can leave out the `,` at the very beginning and very end of a Hev
177 program. It's implied. Also, whitespace is allowed, even between the
178 digits of an operator or the symbols of a variable... for whatever good
179 it'll do you.
180
181 Implementation
182 --------------
183
184 `hev.hs` is a reference implementation of Hev in Haskell. It can be used
185 as something to check this language description against - any
186 discrepancy is either a bug in the implementation, or an error in this
187 document. `hev.hs` shouldn't be used as an official reference for Hev
188 behaviour that's not described in this document, but heck, it's better
189 than nothing, right?
190
191 History
192 -------
193
194 It was sometime in November of 2005 when I came up with the idea to try
195 to "break the precedence barrier" and started writing Hev. I continued
196 to refine the idea and worked on it, on and off, after that. In October
197 of 2006 I got a stubborn notion in my head that the parser should only
198 make one pass over the program text, so I wasted a day trying to figure
199 out how to code that in Haskell. In June of 2007 I finally got down to
200 writing test cases and debugging it.
201
202 Happy `,`!
203
204 -Chris Pressey
205 Cat's Eye Technologies
206 June 17, 2007
207 Vancouver, BC