git @ Cat's Eye Technologies Eightebed / 7f764c9
Convert documentation to Markdown. --HG-- rename : doc/eightebed.html => README.markdown catseye 10 years ago
2 changed file(s) with 229 addition(s) and 248 deletion(s). Raw diff Collapse all Expand all
0 The Eightebed Programming Language
1 ==================================
2
3 Language version 1.1
4
5 Abstract
6 --------
7
8 While discussing [Cyclone](http://cyclone.thelanguage.org/), Gregor
9 Richards stated that in order for a language to support explicit
10 `malloc()`ing and `free()`ing of allocated memory, while also being safe
11 (in the sense of not being able to execute or dereference
12 incorrectly-populated memory) would require that language to either
13 support garbage collection, or to not implement `free()`. In his words:
14
15 > A C-like language which provides a true explicit free() cannot be
16 > safe. (By "true" I mean that you can get that memory back in a later
17 > malloc().) To be safe a language must either never free (which is bad)
18 > or be GC'd. [C-like languages being] imperative languages with
19 > pointers at arbitrary data, where safety is defined as not seeing that
20 > data as a different type.
21
22 Eightebed was designed as a counterexample to that claim. Eightebed is a
23 small, C-like language with explicit `malloc()` and `free()`. Memory is
24 actually freed by `free()` and might be re-allocated by a future
25 `malloc()`. Yet Eightebed is a safe language, requiring only a modicum
26 of static analysis and runtime support, and in particular, it neither
27 specifies nor requires garbage collection:
28
29 - Garbage, reasonably defined as "any unreachable block of memory", is
30 disregarded and considered a memory leak, as is good and proper (or
31 at least accepted) in a language with explicit memory management;
32 and
33 - Nothing is collected in any way.
34
35 Without Loss of Generality
36 --------------------------
37
38 We place some restrictions on Eightebed in order that our implementation
39 of a compiler and analyzer for it may be simplified. These restrictions
40 do not, we assert, prevent the language from being "C-like", as it would
41 be possible to extend the language to include them; the only thing we
42 would be adding if we were to do so would be additional complexity in
43 implementation. These restrictions are:
44
45 - There are no functions in Eightebed. Common functionality can be
46 repeated verbatim inline, and recursion can be replaced with `while`
47 loops.
48 - Pointers may only point to named types, not integers or other
49 pointers, and only structures may be named. The effect of a pointer
50 to an integer or pointer may be easily achieved by pointing to a
51 named structure which consists of only an integer or pointer itself.
52 - Structures may not contain structures. Again, this can be easily
53 simulated by "flattening" the structure into a single structure with
54 perhaps differentiated names.
55
56 Syntax
57 ------
58
59 ### EBNF Grammar
60
61 Note that where this grammar is a little weird, it is only to support
62 being fully LL(1) to ease parser construction. Notably, the syntax to
63 access a member of a structure uses both square brackets around the
64 structure and a dot between structure and member. Unlike C, there is no
65 syntax like `->` to dereference and access a member in one go; you need
66 to dereference with `@`, then access the member with `[].`.
67
68 Eightebed ::= {TypeDecl} {VarDecl} Block.
69 Block ::= "{" {Stmt} "}".
70 TypeDecl ::= "type" NameType Type ";"
71 Type ::= "int"
72 | "struct" "{" {Decl} "}"
73 | "ptr" "to" Type
74 | NameType.
75 Decl ::= Type Name ";".
76 VarDecl ::= "var" Decl.
77 Stmt ::= "while" Expr Block
78 | "if" Expr Block ["else" Block]
79 | "free" Ref ";"
80 | "print" Expr ";"
81 | Ref "=" Expr ";".
82 Ref ::= "[" Ref "]" "." Name
83 | "@" Ref
84 | Name.
85 Expr ::= "(" Expr ("+"|"-"|"*"|"/"|"="|">"|"&"|"|") Expr ")"
86 | "malloc" NameType
87 | "valid" Expr
88 | IntLit
89 | Ref.
90
91 ### Example Program
92
93 type node struct {
94 int value;
95 ptr to node next;
96 };
97 var ptr to node jim;
98 var ptr to node george;
99 {
100 jim = malloc node;
101 if valid jim {
102 [@jim].value = (1 + 4);
103 george = jim;
104 }
105 if valid george {
106 print [@george].value;
107 }
108 free george;
109 free jim;
110 }
111
112 How it Works
113 ------------
114
115 ### Static Analysis
116
117 Dereferencing a pointer x must only occur at the safe start of the
118 "then" part of an `if` statement whose test condition consists only of
119 the expression `valid x`. The safe start of a block is the set of
120 statements preceding and including the first assignment statement or
121 `free`. (This is on the [admittedly somewhat pessimistic] assumption
122 that any assignment could invalidate x.) (*New in 1.1*: the safe start
123 must precede the first `free` statement, to prevent creation of dangling
124 aliased pointers. Thanks Gregor!) To simplify implementation, we limit x
125 to a simple variable name rather than a full expression. (This too is
126 without loss of generality, as it is a simple matter to use a temporary
127 variable to store the result of a pointer expression.) Any attempt to
128 dereference a pointer which does not follow these rules is caught by the
129 static checker and disallowed.
130
131 ### Runtime Support
132
133 Every pointer in the Eightebed language is implemented internally as a
134 structure of a machine pointer (obtained, for instance, by C's
135 `malloc()`) coupled with a boolean flag called `valid`. When a chunk of
136 memory is initially successfully allocated, `valid` is set to true.
137 Freeing a pointer first checks this flag; freeing the machine pointer is
138 only attempted if `valid` is true. In addition, just before freeing the
139 machine pointer, we invalidate all aliases to that pointer. (Starting
140 with the "root set" of the program's global variables, we traverse all
141 memory blocks reachable by following valid pointers from them, looking
142 for pointers which match the pointer about to be freed; any we find, we
143 set their `valid` flags to false.) After freeing a pointer, we set its
144 `valid` to false.
145
146 ### Why this Works
147
148 Because of the static analysis, it is not possible to dereference a
149 pointer at a point in the program where we do not know for certain that
150 it is valid (i.e., it is not possible to dereference an invalid
151 pointer.) Because of the runtime support, as soon as a pointer becomes
152 invalid, all aliases of it become invalid as well. (All reachable
153 aliases, that is – but if an alias isn't reachable, it can't be
154 dereferenced anyway.) Add both of these together, and you get memory
155 that can leak without any risk of being reused.
156
157 And no, this isn't garbage collection, because (as stated already) we
158 don't care about garbage and we don't collect anything. Yes, the runtime
159 support looks a bit like the mark phase of a mark-and-sweep garbage
160 collector, but even it has a different job: not marking everything that
161 is reachable, rather invalidating all aliases of a given pointer.
162
163 And finally, yes, I realize how little this proves. Long live loopholes.
164
165 16:19:38 <Gregor> We implement this without a GC by stuffing most of a GC into the free function, thereby making it just as slow as a GC'd language with none of the advantages!
166 16:25:29 <Gregor> So yes, although you have managed to fit my requirements, I am wildly underwhelmed :P
167
168 Reference Implementation
169 ------------------------
170
171 Cat's Eye Technologies provides a cockamamie reference implementation of
172 Eightebed called `8ebed2c.py`. Written in Python 2.6, it compiles
173 Eightebed code to C, and for convenience will optionally compile that C
174 with the C compiler of your choice and run the resulting executable.
175
176 `8ebed2c.py` ships with a fairly extensive (for a language like this!)
177 suite of test programs, which can of course double as example sources;
178 these can be found in the `eightebed.tests` module.
179
180 For an appreciation of just how cockamamie `8ebed2c.py` is, run
181 `8ebed2c.py --help` and read through the command-line options it
182 provides.
183
184 Legal Issues
185 ------------
186
187 The name Eightebed started life as a typo for the word "enlightened"
188 made on an iPhone by a mysterious individual known only as Alise. (Well,
189 perhaps not *only*.) Alise has aggressively asserted her intellectual
190 property rights by copyrighting [*sic*] the name Eightebed. Cat's Eye
191 Technologies has pursued permission to use the name for this language,
192 only to be told that the procedure for obtaining such permission
193 "involves five yaks, a Golden toad that hasn't eaten for five days, five
194 boxes of antique confetti (not stripped of uranium), dye number 90
195 (blood green), a very confused weasel, and three pieces of A4.15 paper."
196
197 Cat's Eye Technologies' legal-and-yak-husbandry team is currently
198 investigating the feasibility of this arrangement, and as of this
199 writing, official permission is still pending. If complications persist,
200 another, less contentious name (such as "Microsoft Windows 7") may need
201 to be chosen for this language.
202
203 17:52:08 <alise> cpressey: I request that all harm is done to animals in the making of this production.
204
205 Future Work
206 -----------
207
208 *In which we reveal the outline of a grand plan for a blockbuster sequel
209 to Eightebed which will never materialize*
210
211 - To be titled *Eightebed: Ascension* or *Eightebed: Generations*. At
212 least, title should have one of those bad-ass colons in it. Possibly
213 *Eightebed: Eightebed*.
214 - To support functions, analysis of arbitrary expressions as the
215 condition in an `if valid`, pointers to unnamed types, structures
216 which contain other structures, and all that other boring stuff that
217 we just said doesn't matter.
218 - To have a literate specification written in SUPER ITALIAN, thus
219 giving all programs the power of UNMATCHED PROPHETIC SNEEZING.
220 - To be co-authored with Frank Zappa (note: turns out Mr. Zappa is
221 dead. Maybe Tipper Gore instead? Yes, that should work.)
222 - ~~To include a garbage collector.~~
223 - Puppets???
224
225 Happy leaking!
226 Chris Pressey
227 September 1, 2010
228 Evanston, IL
+0
-248
doc/eightebed.html less more
0 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
1 <!-- encoding: UTF-8 -->
2 <html xmlns="http://www.w3.org/1999/xhtml" lang="en">
3 <head>
4 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
5 <title>The Eightebed Programming Language</title>
6 <!-- begin html doc dynamic markup -->
7 <script type="text/javascript" src="/contrib/jquery-1.6.4.min.js"></script>
8 <script type="text/javascript" src="/scripts/documentation.js"></script>
9 <!-- end html doc dynamic markup -->
10 </head>
11 <body>
12
13 <h1>The Eightebed Programming Language</h1>
14
15 <p>Language version 1.1</p>
16
17 <h2>Abstract</h2>
18
19 <p>While discussing <a class="external" href="http://cyclone.thelanguage.org/">Cyclone</a>,
20 Gregor Richards stated that in order for a language
21 to support explicit <code>malloc()</code>ing and <code>free()</code>ing of
22 allocated memory, while also being safe (in the sense of not being able to execute
23 or dereference incorrectly-populated memory) would require that language to
24 either support garbage collection, or to not implement <code>free()</code>.
25 In his words:</p>
26
27 <blockquote><p>A C-like language which provides a true explicit free() cannot be safe.
28 (By "true" I mean that you can get that memory back in a later malloc().)
29 To be safe a language must either never free (which is bad) or be GC'd.
30 [C-like languages being] imperative languages with pointers at arbitrary data, where safety is
31 defined as not seeing that data as a different type.</p></blockquote>
32
33 <p><dfn>Eightebed</dfn> was designed as a counterexample to that claim.
34 Eightebed is a small, C-like language with explicit <code>malloc()</code>
35 and <code>free()</code>. Memory is actually freed by <code>free()</code>
36 and might be re-allocated by a future <code>malloc()</code>. Yet Eightebed
37 is a safe language, requiring only a modicum of static analysis and runtime support,
38 and in particular, it neither specifies nor requires garbage collection:</p>
39
40 <ul>
41 <li><dfn>Garbage</dfn>, reasonably defined as "any unreachable block of memory",
42 is disregarded and considered a memory leak, as is good and proper (or at least
43 accepted) in a language with explicit memory management; and</li>
44 <li>Nothing is collected in any way.</li>
45 </ul>
46
47 <h2>Without Loss of Generality</h2>
48
49 <p>We place some restrictions on Eightebed in order that our implementation of
50 a compiler and analyzer for it may be simplified. These restrictions do not, we assert,
51 prevent the language from being "C-like", as it would be possible to extend the
52 language to include them; the only thing we would be adding if we were to do so
53 would be additional complexity in implementation. These restrictions are:</p>
54
55 <ul>
56 <li>There are no functions in Eightebed. Common functionality can be repeated
57 verbatim inline, and recursion can be replaced with <code>while</code> loops.</li>
58
59 <li>Pointers may only point to named types, not integers or other pointers, and
60 only structures may be named. The effect of a pointer to an integer or pointer
61 may be easily achieved by pointing to a named structure which consists of only
62 an integer or pointer itself.</li>
63
64 <li>Structures may not contain structures. Again, this can be easily simulated
65 by "flattening" the structure into a single structure with perhaps differentiated names.</li>
66 </ul>
67
68 <h2>Syntax</h2>
69
70 <h3>EBNF Grammar</h3>
71
72 <p>Note that where this grammar is a little weird, it is only to support
73 being fully LL(1) to ease parser construction. Notably, the syntax to access
74 a member of a structure uses both square brackets around the structure
75 and a dot between structure and member. Unlike C, there is no syntax like
76 <code>-&gt;</code> to dereference and access a member in one go; you need
77 to dereference with <code>@</code>, then access the member with <code>[].</code>.</p>
78
79 <pre>
80 Eightebed ::= {TypeDecl} {VarDecl} Block.
81 Block ::= "{" {Stmt} "}".
82 TypeDecl ::= "type" Name<sub>Type</sub> Type ";"
83 Type ::= "int"
84 | "struct" "{" {Decl} "}"
85 | "ptr" "to" Type
86 | Name<sub>Type</sub>.
87 Decl ::= Type Name ";".
88 VarDecl ::= "var" Decl.
89 Stmt ::= "while" Expr Block
90 | "if" Expr Block ["else" Block]
91 | "free" Ref ";"
92 | "print" Expr ";"
93 | Ref "=" Expr ";".
94 Ref ::= "[" Ref "]" "." Name
95 | "@" Ref
96 | Name.
97 Expr ::= "(" Expr ("+"|"-"|"*"|"/"|"="|"&gt;"|"&amp;"|"|") Expr ")"
98 | "malloc" Name<sub>Type</sub>
99 | "valid" Expr
100 | IntLit
101 | Ref.
102 </pre>
103
104 <h3>Example Program</h3>
105
106 <pre>
107 type node struct {
108 int value;
109 ptr to node next;
110 };
111 var ptr to node jim;
112 var ptr to node george;
113 {
114 jim = malloc node;
115 if valid jim {
116 [@jim].value = (1 + 4);
117 george = jim;
118 }
119 if valid george {
120 print [@george].value;
121 }
122 free george;
123 free jim;
124 }
125 </pre>
126
127 <h2>How it Works</h2>
128
129 <h3>Static Analysis</h3>
130
131 <p>Dereferencing a pointer <var>x</var> must only occur at the
132 safe start of the "then" part of an <code>if</code> statement whose
133 test condition consists only of the expression <code>valid <var>x</var></code>.
134 The <dfn>safe start</dfn> of a block is the set of statements preceding and
135 including the first assignment statement or <code>free</code>. (This is on the
136 [admittedly somewhat pessimistic]
137 assumption that any assignment could invalidate <var>x</var>.)
138 (<em>New in 1.1</em>: the safe start must precede the first <code>free</code>
139 statement, to prevent creation of dangling aliased pointers. Thanks Gregor!)
140 To simplify implementation, we limit <var>x</var> to a simple variable name
141 rather than a full expression. (This too is without loss of generality, as it is a
142 simple matter to use a temporary variable to store the result of a pointer expression.)
143 Any attempt to dereference a pointer which
144 does not follow these rules is caught by the static checker and disallowed.</p>
145
146 <h3>Runtime Support</h3>
147
148 <p>Every pointer in the Eightebed language is implemented internally
149 as a structure of a <dfn>machine pointer</dfn> (obtained, for instance, by C's <code>malloc()</code>)
150 coupled with a boolean flag called <code>valid</code>.
151 When a chunk of memory is initially successfully allocated, <code>valid</code> is
152 set to true. Freeing a pointer first checks this flag; freeing the
153 machine pointer is only attempted if <code>valid</code> is true.
154 In addition, just before freeing the machine pointer, we invalidate
155 all aliases to that pointer. (Starting with the "root set" of the
156 program's global variables, we traverse all memory blocks reachable
157 by following valid pointers from them, looking for pointers which
158 match the pointer about to be freed; any we find, we set their <code>valid</code>
159 flags to false.) After freeing a pointer, we set its <code>valid</code>
160 to false.</p>
161
162 <h3>Why this Works</h3>
163
164 <p>Because of the static analysis, it is not possible to dereference
165 a pointer at a point in the program where we do not know for certain
166 that it is valid (i.e., it is not possible to dereference an invalid
167 pointer.) Because of the runtime support, as soon as a pointer becomes
168 invalid, all aliases of it become invalid as well. (All reachable
169 aliases, that is – but if an alias isn't reachable, it can't be dereferenced
170 anyway.) Add both of these together, and you get memory that can
171 leak without any risk of being reused.</p>
172
173 <p>And no, this isn't garbage collection, because (as stated already)
174 we don't care about garbage and we don't collect anything. Yes, the
175 runtime support looks a bit like the mark phase of a mark-and-sweep
176 garbage collector, but even it has a different job: not marking everything
177 that is reachable, rather invalidating all aliases of a given pointer.</p>
178
179 <p>And finally, yes, I realize how little this proves. Long live loopholes.</p>
180
181 <pre>
182 16:19:38 &lt;Gregor&gt; We implement this without a GC by stuffing most of a GC into the free function, thereby making it just as slow as a GC'd language with none of the advantages!
183 16:25:29 &lt;Gregor&gt; So yes, although you have managed to fit my requirements, I am wildly underwhelmed :P
184 </pre>
185
186 <h2>Reference Implementation</h2>
187
188 <p>Cat's Eye Technologies provides a cockamamie reference implementation of
189 Eightebed called <code>8ebed2c.py</code>. Written in Python 2.6,
190 it compiles Eightebed code to C, and for convenience will optionally
191 compile that C with the C compiler of your choice and run the resulting
192 executable.</p>
193
194 <p><code>8ebed2c.py</code> ships with a fairly extensive (for a language
195 like this!) suite of test programs, which can of course double as example
196 sources; these can be found in the <code>eightebed.tests</code> module.</p>
197
198 <p>For an appreciation of just how cockamamie <code>8ebed2c.py</code> is,
199 run <code>8ebed2c.py --help</code> and read through the command-line options
200 it provides.</p>
201
202 <h2>Legal Issues</h2>
203
204 <p>The name Eightebed started life as a typo for the word "enlightened" made on an
205 iPhone by a mysterious individual known only as Alise. (Well, perhaps not <em>only</em>.)
206 Alise has aggressively asserted her intellectual property rights by copyrighting
207 [<i>sic</i>] the name Eightebed. Cat's Eye Technologies has pursued permission to use
208 the name for this language, only to be told that the procedure for obtaining such
209 permission "involves five yaks, a Golden toad that hasn't eaten for five days,
210 five boxes of antique confetti (not stripped of uranium), dye number 90 (blood green),
211 a very confused weasel, and three pieces of A4.15 paper."</p>
212
213 <p>Cat's Eye Technologies' legal-and-yak-husbandry team is currently investigating
214 the feasibility of this arrangement, and as of this writing, official permission is
215 still pending. If complications persist, another, less contentious name
216 (such as "Microsoft Windows 7") may need to be chosen for this language.</p>
217
218 <pre>
219 17:52:08 &lt;alise&gt; cpressey: I request that all harm is done to animals in the making of this production.
220 </pre>
221
222 <h2>Future Work</h2>
223
224 <p><i>In which we reveal the outline of a grand plan for a blockbuster sequel to Eightebed which will never materialize</i></p>
225
226 <ul>
227 <li>To be titled <em>Eightebed: Ascension</em> or <em>Eightebed: Generations</em>. At least, title should have
228 one of those bad-ass colons in it. Possibly <em>Eightebed: Eightebed</em>.</li>
229 <li>To support functions, analysis of arbitrary expressions as the condition in an <code>if valid</code>,
230 pointers to unnamed types, structures which contain other structures, and all that other boring stuff that
231 we just said doesn't matter.</li>
232 <li>To have a literate specification written in SUPER ITALIAN, thus giving all programs the power of
233 UNMATCHED PROPHETIC SNEEZING.</li>
234 <li>To be co-authored with Frank Zappa (note: turns out Mr. Zappa is dead. Maybe Tipper Gore instead? Yes,
235 that should work.)</li>
236 <li><del>To include a garbage collector.</del></li>
237 <li>Puppets???</li>
238 </ul>
239
240 <p>Happy leaking!
241 <br />Chris Pressey
242 <br />September 1, 2010
243 <br />Evanston, IL
244 </p>
245
246 </body>
247 </html>