0 | |
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
1 | |
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
|
2 | |
<head>
|
3 | |
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
4 | |
<title>The Hev Programming Language</title>
|
5 | |
<!-- begin html doc dynamic markup -->
|
6 | |
<script type="text/javascript" src="/contrib/jquery-1.6.4.min.js"></script>
|
7 | |
<script type="text/javascript" src="/scripts/documentation.js"></script>
|
8 | |
<!-- end html doc dynamic markup -->
|
9 | |
</head>
|
10 | |
<body>
|
11 | |
|
12 | |
<h1>The Hev Programming Language</h1>
|
13 | |
|
14 | |
<h2>Introduction</h2>
|
15 | |
|
16 | |
<p>Hey, does this thing look at all familiar?</p>
|
17 | |
|
18 | |
<pre>() [] -> . ++ -- Left to right
|
19 | |
++ -- + - ! ~ * & (type) sizeof Right to left
|
20 | |
* / % Left to right
|
21 | |
+ - Left to right
|
22 | |
<< >> Left to right
|
23 | |
< <= > >= Left to right
|
24 | |
== != Left to right
|
25 | |
& Left to right
|
26 | |
^ Left to right
|
27 | |
| Left to right
|
28 | |
&& Left to right
|
29 | |
|| Left to right
|
30 | |
?: Left to right
|
31 | |
= += -= *= /= %= &= ^= |= <<= >>= Right to left
|
32 | |
, Left to right</pre>
|
33 | |
|
34 | |
<p>That's right: it's the precedence table for operators in the C language.
|
35 | |
OK, some of them (like <code>()</code>) are pretty easy to remember, I guess,
|
36 | |
but the logic behind most of these choices escapes me.
|
37 | |
Really, can you give me a <em>good</em> reason for why <code>&</code>
|
38 | |
should have a higher precedence than <code>|</code>? And why is <code>^</code> in-between?
|
39 | |
And why in the world are <code>-></code> and <code>++</code> on the same level?
|
40 | |
What I'm getting at is, how many times did you have to go and reference this
|
41 | |
chart before these arbitrary rules got burned into your nervous system somewhere
|
42 | |
between your brain and your fingers?</p>
|
43 | |
|
44 | |
<p>And hey, you think that's bad? Perl 5 has like 129 operators at like 24 levels of
|
45 | |
precedence.</p>
|
46 | |
|
47 | |
<p>You may well ask: is there something that can save us from this insanity?</p>
|
48 | |
|
49 | |
<p>Yes, there is. In this document I will describe Hev, a <del>novel</del> <del>innovative</del>
|
50 | |
<del>radical</del> <del>revolutionary</del> totally gnarly new programming
|
51 | |
language which provides <strong>the infix you love</strong>
|
52 | |
with <strong>an unlimited number of operator precedence levels</strong>
|
53 | |
and <strong>absolutely no need for parentheses or memorization!</strong></p>
|
54 | |
|
55 | |
<p>Sound too good to be true...? Read on!</p>
|
56 | |
|
57 | |
<h2>Syntax</h2>
|
58 | |
|
59 | |
<p>Hev's breathtaking syntactic slight-of-hand is accomplished by a
|
60 | |
synergistic combination of two features:</p>
|
61 | |
|
62 | |
<ul>
|
63 | |
<li>Have an unbounded number of infix binary operators.</li>
|
64 | |
<li>Make precedence explicit.</li>
|
65 | |
</ul>
|
66 | |
|
67 | |
<p>To fit this bill, all we need is a single syntactic construct that can
|
68 | |
explicitly express an unbounded number of discrete operators, and at the
|
69 | |
same time, their precedence.</p>
|
70 | |
|
71 | |
<p>Well, I chose integers.</p>
|
72 | |
|
73 | |
<p>Positive integers, to be precise. So <code>3</code> is an infix operator.
|
74 | |
So is <code>15</code>, and it has a higher precedence than <code>3</code>.
|
75 | |
So is <code>514229</code>, and it has an even higher precedence
|
76 | |
than <code>15</code>, but lower than <code>25852016738884976640000</code>.
|
77 | |
<em>See</em> how easy it is? I can just name two operators at random,
|
78 | |
and you can tell me which one has the higher precedence without a second thought!</p>
|
79 | |
|
80 | |
<p>Oh, but what good are operators if they don't have anything to operate on?
|
81 | |
We need values, too. And since we have an unbounded number of operators, there's
|
82 | |
a certain sense to having only a bounded number of values.</p>
|
83 | |
|
84 | |
<p>Well, why not the logical extreme: <em>no values at all</em>? Well, OK, for the sake of syntax we need to have
|
85 | |
one value, but since there's nothing it can be differentiated against,
|
86 | |
it's effectively no values. Syntactically, this value, or lack thereof, is
|
87 | |
denoted <code>,</code>. (Yeah, that's a comma.)</p>
|
88 | |
|
89 | |
<p>And, we'll probably need variables at some point, too, I'm guessing.
|
90 | |
We should probably have a nice big supply of those, just so we don't
|
91 | |
run into some artifical bound at some point that arbitrarily prevents Hev from
|
92 | |
being Turing-complete. So, let's say that any string of consecutive symbols
|
93 | |
drawn from <code>+</code>, <code>-</code>, <code>*</code> and <code>/</code>
|
94 | |
is an identifier for a variable. That should do nicely.</p>
|
95 | |
|
96 | |
<p>There's still a bit of a problem, though -- those pesky parentheses.
|
97 | |
You might need to nest a <code>5</code>-expression into the LHS or the RHS
|
98 | |
of a <code>3</code>-expression, and that would seemingly require parentheses.
|
99 | |
How do we avoid this? Well -- if we're
|
100 | |
flexible on what <code>3</code> and <code>5</code> actually <em>mean</em>,
|
101 | |
maybe we can just avoid this dilemma entirely! This brings us to...</p>
|
102 | |
|
103 | |
<h2>Semantics</h2>
|
104 | |
|
105 | |
<p>So we have all these infix binary operators, and this one value which I insist is
|
106 | |
essentially a non-value, and we need to be able to make something sensible out of this mess --
|
107 | |
<em>without</em> using parentheses to do nesting.</p>
|
108 | |
|
109 | |
<p>Well, what can we build?</p>
|
110 | |
|
111 | |
<p>Trees.</p>
|
112 | |
|
113 | |
<p>Yep, binary trees. They're a bit unlike the "normal" trees of Computer Science,
|
114 | |
which almost universally have some sort of values stored at their leaves.
|
115 | |
These ones don't. They're just... you know, trees. But we can definately build them.
|
116 | |
And we don't need any parentheses. If you want to nest some expression inside
|
117 | |
another, you just pick operators with higher precedence levels for that
|
118 | |
expression.</p>
|
119 | |
|
120 | |
<p>So <code>,5,10,5,</code> is a tree - a complete binary tree with 3 levels -
|
121 | |
a root node (<code>10</code>), two intermediate nodes (both <code>5</code>)
|
122 | |
and four leaves (<code>,,,,</code>) with no values in them (or a single,
|
123 | |
meaningless value, repeated four times, if you like.) And please realize that
|
124 | |
this is the <em>same</em> tree as <code>,1,3,2,</code> -- it's just that
|
125 | |
different operators were used to construct it. Those operators aren't "in"
|
126 | |
the tree in any sense, and their magnitude is used only to determine their
|
127 | |
precedence.</p>
|
128 | |
|
129 | |
<p>But now for the splendid part.
|
130 | |
We can put <em>variables</em> in these trees! Which means, we can think of them
|
131 | |
as <em>patterns</em> that can match other trees. Which means, we can specify <em>rules</em>
|
132 | |
as pairs of patterns and substitutions, to be substituted in when the pattern matches.
|
133 | |
Which means, we can construct a rule-based language! A rewriting language, in fact.
|
134 | |
I think I'll call this approach <dfn>valueless tree rewriting</dfn>.</p>
|
135 | |
|
136 | |
<p>So, for example, the tree <code>+10*</code> matches that tree
|
137 | |
<code>,5,10,5,</code> given above. The variables <code>+</code>
|
138 | |
and <code>*</code> both unify with <code>,5,</code>.
|
139 | |
But note that this pattern matches <code>,41,76,</code> too,
|
140 | |
where <code>+</code> unifies with <code>,41,</code> and
|
141 | |
<code>*</code> unifies with <code>,</code>.
|
142 | |
And in fact it matches countless other possible valueless trees.</p>
|
143 | |
|
144 | |
<h2>Execution Model</h2>
|
145 | |
|
146 | |
<p>A Hev program consists of a valueless binary tree. The left branch
|
147 | |
of the root leads to a ruleset; the right branch leads to a valueless binary
|
148 | |
tree which represents the data of the program: it is the state of the program,
|
149 | |
the thing that is being rewritten. This data tree may not
|
150 | |
contain any variables: the leaves must be entirely <code>,</code>'s.</p>
|
151 | |
|
152 | |
<p>A ruleset consists of a node where the left branch leads to either a ruleset
|
153 | |
or to a <code>,</code> and the right branch leads to a rule. A rule is a node
|
154 | |
where the left branch is a pattern and the right branch is a substitution.
|
155 | |
The pattern is a valueless binary tree which may contain not only <code>,</code>'s
|
156 | |
but also any variables at its leaves. The substitution may contain both
|
157 | |
<code>,</code>'s and variables, but it may not contain any variables which do
|
158 | |
not appear in the corresponding pattern of the rule.</p>
|
159 | |
|
160 | |
<p>Each rule in the ruleset is considered in turn, starting with the rule
|
161 | |
nearest the root. The pattern of the rule is matched against the data tree.
|
162 | |
The structure of the tree must match some subtree of the data tree;
|
163 | |
a variable can match any structure of the data tree, but no variable can
|
164 | |
match two different structures. (The same variable identifier may appear
|
165 | |
multiple times in a pattern; all instances of that variable must match the
|
166 | |
same structure.) If there are multiple subtrees of the data tree that match,
|
167 | |
only the <strong>topmost</strong> one is considered. This is usually called
|
168 | |
"top-down rewriting".</p>
|
169 | |
|
170 | |
<p>When a match occurs, the substitution of the rule is instantiated.
|
171 | |
Any variables occuring in the substitution are replaced with the structures
|
172 | |
that those variables matched in the pattern. (This is why all the variables
|
173 | |
appearing in the substitution must also appear in the pattern.)
|
174 | |
The data tree is then modified: the subtree that was matched is removed and
|
175 | |
in its place the instantiated substitution is grafted. The process
|
176 | |
then repeats (starting over with the topmost rule.)</p>
|
177 | |
|
178 | |
<p>When a rule fails to match, the data tree is left alone and
|
179 | |
the next rule (one node lower down in the ruleset) is tried.
|
180 | |
When there are no more rules to try in the ruleset, the program ends.</p>
|
181 | |
|
182 | |
<h2>Miscellaneous Notes</h2>
|
183 | |
|
184 | |
<p>You can leave out the <code>,</code> at the very beginning and very end
|
185 | |
of a Hev program. It's implied. Also, whitespace is allowed, even between
|
186 | |
the digits of an operator or the symbols of a variable... for whatever
|
187 | |
good it'll do you.</p>
|
188 | |
|
189 | |
<h2>Implementation</h2>
|
190 | |
|
191 | |
<p><code>hev.hs</code> is a reference implementation of Hev in Haskell.
|
192 | |
It can be used as something to check this language description against -
|
193 | |
any discrepancy is either a bug in the implementation, or an error in this
|
194 | |
document. <code>hev.hs</code> shouldn't be used as an official reference
|
195 | |
for Hev behaviour that's not described in this document, but heck, it's
|
196 | |
better than nothing, right?</p>
|
197 | |
|
198 | |
<h2>History</h2>
|
199 | |
|
200 | |
<p>It was sometime in November of 2005 when I came up with the idea to try to
|
201 | |
"break the precedence barrier" and started writing Hev. I continued to refine
|
202 | |
the idea and worked on it, on and off, after that.
|
203 | |
In October of 2006 I got a stubborn notion in my head that the parser should
|
204 | |
only make one pass over the program text, so I wasted a day trying to
|
205 | |
figure out how to code that in Haskell. In June of 2007 I finally got down
|
206 | |
to writing test cases and debugging it.</p>
|
207 | |
|
208 | |
<p>Happy <code>,</code>!</p>
|
209 | |
|
210 | |
<p>-Chris Pressey
|
211 | |
<br />Cat's Eye Technologies
|
212 | |
<br />June 17, 2007
|
213 | |
<br />Vancouver, BC</p>
|
214 | |
|
215 | |
</body>
|
216 | |
</html>
|