git @ Cat's Eye Technologies Eightebed / fc3c2fa
Convert CRLF EOLs to LFs. catseye 10 years ago
1 changed file(s) with 233 addition(s) and 233 deletion(s). Raw diff Collapse all Expand all
0 The Eightebed Programming Language
1 ==================================
2
3 Language version 1.1
4
5 Abstract
6 --------
7
8 While discussing [Cyclone](http://cyclone.thelanguage.org/), Gregor
9 Richards stated that in order for a language to support explicit
10 `malloc()`ing and `free()`ing of allocated memory, while also being safe
11 (in the sense of not being able to execute or dereference
12 incorrectly-populated memory) would require that language to either
13 support garbage collection, or to not implement `free()`. In his words:
14
15 > A C-like language which provides a true explicit free() cannot be
16 > safe. (By "true" I mean that you can get that memory back in a later
17 > malloc().) To be safe a language must either never free (which is bad)
18 > or be GC'd. [C-like languages being] imperative languages with
19 > pointers at arbitrary data, where safety is defined as not seeing that
20 > data as a different type.
21
22 Eightebed was designed as a counterexample to that claim. Eightebed is a
23 small, C-like language with explicit `malloc()` and `free()`. Memory is
24 actually freed by `free()` and might be re-allocated by a future
25 `malloc()`. Yet Eightebed is a safe language, requiring only a modicum
26 of static analysis and runtime support, and in particular, it neither
27 specifies nor requires garbage collection:
28
29 - Garbage, reasonably defined as "any unreachable block of memory", is
30 disregarded and considered a memory leak, as is good and proper (or
31 at least accepted) in a language with explicit memory management;
32 and
33 - Nothing is collected in any way.
34
35 Without Loss of Generality
36 --------------------------
37
38 We place some restrictions on Eightebed in order that our implementation
39 of a compiler and analyzer for it may be simplified. These restrictions
40 do not, we assert, prevent the language from being "C-like", as it would
41 be possible to extend the language to include them; the only thing we
42 would be adding if we were to do so would be additional complexity in
43 implementation. These restrictions are:
44
45 - There are no functions in Eightebed. Common functionality can be
46 repeated verbatim inline, and recursion can be replaced with `while`
47 loops.
48 - Pointers may only point to named types, not integers or other
49 pointers, and only structures may be named. The effect of a pointer
50 to an integer or pointer may be easily achieved by pointing to a
51 named structure which consists of only an integer or pointer itself.
52 - Structures may not contain structures. Again, this can be easily
53 simulated by "flattening" the structure into a single structure with
54 perhaps differentiated names.
55
56 Syntax
57 ------
58
59 ### EBNF Grammar
60
61 Note that where this grammar is a little weird, it is only to support
62 being fully LL(1) to ease parser construction. Notably, the syntax to
63 access a member of a structure uses both square brackets around the
64 structure and a dot between structure and member. Unlike C, there is no
65 syntax like `->` to dereference and access a member in one go; you need
66 to dereference with `@`, then access the member with `[].`.
67
68 Eightebed ::= {TypeDecl} {VarDecl} Block.
69 Block ::= "{" {Stmt} "}".
70 TypeDecl ::= "type" NameType Type ";"
71 Type ::= "int"
72 | "struct" "{" {Decl} "}"
73 | "ptr" "to" Type
74 | NameType.
75 Decl ::= Type Name ";".
76 VarDecl ::= "var" Decl.
77 Stmt ::= "while" Expr Block
78 | "if" Expr Block ["else" Block]
79 | "free" Ref ";"
80 | "print" Expr ";"
81 | Ref "=" Expr ";".
82 Ref ::= "[" Ref "]" "." Name
83 | "@" Ref
84 | Name.
85 Expr ::= "(" Expr ("+"|"-"|"*"|"/"|"="|">"|"&"|"|") Expr ")"
86 | "malloc" NameType
87 | "valid" Expr
88 | IntLit
89 | Ref.
90
91 ### Example Program
92
93 type node struct {
94 int value;
95 ptr to node next;
96 };
97 var ptr to node jim;
98 var ptr to node george;
99 {
100 jim = malloc node;
101 if valid jim {
102 [@jim].value = (1 + 4);
103 george = jim;
104 }
105 if valid george {
106 print [@george].value;
107 }
108 free george;
109 free jim;
110 }
111
112 How it Works
113 ------------
114
115 ### Static Analysis
116
117 Dereferencing a pointer x must only occur at the _safe start_ of the
118 "then" part of an `if` statement whose test condition consists only of
119 the expression `valid x`. The safe start of a block is the set of
120 statements preceding and including the first assignment statement or
121 `free`. (This is on the [admittedly somewhat pessimistic] assumption
122 that any assignment could invalidate x.) (_New in 1.1_: the safe start
123 must precede the first `free` statement, to prevent creation of dangling
124 aliased pointers. Thanks Gregor!) To simplify implementation, we limit x
125 to a simple variable name rather than a full expression. (This too is
126 without loss of generality, as it is a simple matter to use a temporary
127 variable to store the result of a pointer expression.) Any attempt to
128 dereference a pointer which does not follow these rules is caught by the
129 static checker and disallowed.
130
131 ### Runtime Support
132
133 Every pointer in the Eightebed language is implemented internally as a
134 structure of a machine pointer (obtained, for instance, by C's
135 `malloc()`) coupled with a boolean flag called `valid`. When a chunk of
136 memory is initially successfully allocated, `valid` is set to true.
137 Freeing a pointer first checks this flag; freeing the machine pointer is
138 only attempted if `valid` is true. In addition, just before freeing the
139 machine pointer, we invalidate all aliases to that pointer. (Starting
140 with the "root set" of the program's global variables, we traverse all
141 memory blocks reachable by following valid pointers from them, looking
142 for pointers which match the pointer about to be freed; any we find, we
143 set their `valid` flags to false.) After freeing a pointer, we set its
144 `valid` to false.
145
146 ### Why this Works
147
148 Because of the static analysis, it is not possible to dereference a
149 pointer at a point in the program where we do not know for certain that
150 it is valid (i.e., it is not possible to dereference an invalid
151 pointer.) Because of the runtime support, as soon as a pointer becomes
152 invalid, all aliases of it become invalid as well. (All reachable
153 aliases, that is – but if an alias isn't reachable, it can't be
154 dereferenced anyway.) Add both of these together, and you get memory
155 that can leak without any risk of being reused.
156
157 And no, this isn't garbage collection, because (as stated already) we
158 don't care about garbage and we don't collect anything. Yes, the runtime
159 support looks a bit like the mark phase of a mark-and-sweep garbage
160 collector, but even it has a different job: not marking everything that
161 is reachable, rather invalidating all aliases of a given pointer.
162
163 And finally, yes, I realize how little this proves. Long live loopholes.
164
165 16:19:38 <Gregor> We implement this without a GC by stuffing most of a
166 GC into the free function, thereby making it just as
167 slow as a GC'd language with none of the advantages!
168 16:25:29 <Gregor> So yes, although you have managed to fit my
169 requirements, I am wildly underwhelmed :P
170
171 Reference Implementation
172 ------------------------
173
174 Cat's Eye Technologies provides a cockamamie reference implementation of
175 Eightebed called `8ebed2c.py`. Written in Python 2.6, it compiles
176 Eightebed code to C, and for convenience will optionally compile that C
177 with the C compiler of your choice and run the resulting executable.
178
179 `8ebed2c.py` ships with a fairly extensive (for a language like this!)
180 suite of test programs, which can of course double as example sources;
181 these can be found in the `eightebed.tests` module.
182
183 For an appreciation of just how cockamamie `8ebed2c.py` is, run
184 `8ebed2c.py --help` and read through the command-line options it
185 provides.
186
187 Legal Issues
188 ------------
189
190 The name Eightebed started life as a typo for the word "enlightened"
191 made on an iPhone by a mysterious individual known only as Alise. (Well,
192 perhaps not *only*.) Alise has aggressively asserted her intellectual
193 property rights by copyrighting [*sic*] the name Eightebed. Cat's Eye
194 Technologies has pursued permission to use the name for this language,
195 only to be told that the procedure for obtaining such permission
196 "involves five yaks, a Golden toad that hasn't eaten for five days, five
197 boxes of antique confetti (not stripped of uranium), dye number 90
198 (blood green), a very confused weasel, and three pieces of A4.15 paper."
199
200 Cat's Eye Technologies' legal-and-yak-husbandry team is currently
201 investigating the feasibility of this arrangement, and as of this
202 writing, official permission is still pending. If complications persist,
203 another, less contentious name (such as "Microsoft Windows 7") may need
204 to be chosen for this language.
205
206 17:52:08 <alise> cpressey: I request that all harm is done to animals
207 in the making of this production.
208
209 Future Work
210 -----------
211
212 *In which we reveal the outline of a grand plan for a blockbuster sequel
213 to Eightebed which will never materialize*
214
215 - To be titled _Eightebed: Ascension_ or _Eightebed: Generations_. At
216 least, title should have one of those bad-ass colons in it. Possibly
217 _Eightebed: Eightebed_.
218 - To support functions, analysis of arbitrary expressions as the
219 condition in an `if valid`, pointers to unnamed types, structures
220 which contain other structures, and all that other boring stuff that
221 we just said doesn't matter.
222 - To have a literate specification written in SUPER ITALIAN, thus
223 giving all programs the power of UNMATCHED PROPHETIC SNEEZING.
224 - To be co-authored with Frank Zappa (note: turns out Mr. Zappa is
225 dead. Maybe Tipper Gore instead? Yes, that should work.)
226 - ~~To include a garbage collector.~~
227 - Puppets???
228
229 Happy leaking!
230 Chris Pressey
231 September 1, 2010
232 Evanston, IL
0 The Eightebed Programming Language
1 ==================================
2
3 Language version 1.1
4
5 Abstract
6 --------
7
8 While discussing [Cyclone](http://cyclone.thelanguage.org/), Gregor
9 Richards stated that in order for a language to support explicit
10 `malloc()`ing and `free()`ing of allocated memory, while also being safe
11 (in the sense of not being able to execute or dereference
12 incorrectly-populated memory) would require that language to either
13 support garbage collection, or to not implement `free()`. In his words:
14
15 > A C-like language which provides a true explicit free() cannot be
16 > safe. (By "true" I mean that you can get that memory back in a later
17 > malloc().) To be safe a language must either never free (which is bad)
18 > or be GC'd. [C-like languages being] imperative languages with
19 > pointers at arbitrary data, where safety is defined as not seeing that
20 > data as a different type.
21
22 Eightebed was designed as a counterexample to that claim. Eightebed is a
23 small, C-like language with explicit `malloc()` and `free()`. Memory is
24 actually freed by `free()` and might be re-allocated by a future
25 `malloc()`. Yet Eightebed is a safe language, requiring only a modicum
26 of static analysis and runtime support, and in particular, it neither
27 specifies nor requires garbage collection:
28
29 - Garbage, reasonably defined as "any unreachable block of memory", is
30 disregarded and considered a memory leak, as is good and proper (or
31 at least accepted) in a language with explicit memory management;
32 and
33 - Nothing is collected in any way.
34
35 Without Loss of Generality
36 --------------------------
37
38 We place some restrictions on Eightebed in order that our implementation
39 of a compiler and analyzer for it may be simplified. These restrictions
40 do not, we assert, prevent the language from being "C-like", as it would
41 be possible to extend the language to include them; the only thing we
42 would be adding if we were to do so would be additional complexity in
43 implementation. These restrictions are:
44
45 - There are no functions in Eightebed. Common functionality can be
46 repeated verbatim inline, and recursion can be replaced with `while`
47 loops.
48 - Pointers may only point to named types, not integers or other
49 pointers, and only structures may be named. The effect of a pointer
50 to an integer or pointer may be easily achieved by pointing to a
51 named structure which consists of only an integer or pointer itself.
52 - Structures may not contain structures. Again, this can be easily
53 simulated by "flattening" the structure into a single structure with
54 perhaps differentiated names.
55
56 Syntax
57 ------
58
59 ### EBNF Grammar
60
61 Note that where this grammar is a little weird, it is only to support
62 being fully LL(1) to ease parser construction. Notably, the syntax to
63 access a member of a structure uses both square brackets around the
64 structure and a dot between structure and member. Unlike C, there is no
65 syntax like `->` to dereference and access a member in one go; you need
66 to dereference with `@`, then access the member with `[].`.
67
68 Eightebed ::= {TypeDecl} {VarDecl} Block.
69 Block ::= "{" {Stmt} "}".
70 TypeDecl ::= "type" NameType Type ";"
71 Type ::= "int"
72 | "struct" "{" {Decl} "}"
73 | "ptr" "to" Type
74 | NameType.
75 Decl ::= Type Name ";".
76 VarDecl ::= "var" Decl.
77 Stmt ::= "while" Expr Block
78 | "if" Expr Block ["else" Block]
79 | "free" Ref ";"
80 | "print" Expr ";"
81 | Ref "=" Expr ";".
82 Ref ::= "[" Ref "]" "." Name
83 | "@" Ref
84 | Name.
85 Expr ::= "(" Expr ("+"|"-"|"*"|"/"|"="|">"|"&"|"|") Expr ")"
86 | "malloc" NameType
87 | "valid" Expr
88 | IntLit
89 | Ref.
90
91 ### Example Program
92
93 type node struct {
94 int value;
95 ptr to node next;
96 };
97 var ptr to node jim;
98 var ptr to node george;
99 {
100 jim = malloc node;
101 if valid jim {
102 [@jim].value = (1 + 4);
103 george = jim;
104 }
105 if valid george {
106 print [@george].value;
107 }
108 free george;
109 free jim;
110 }
111
112 How it Works
113 ------------
114
115 ### Static Analysis
116
117 Dereferencing a pointer x must only occur at the _safe start_ of the
118 "then" part of an `if` statement whose test condition consists only of
119 the expression `valid x`. The safe start of a block is the set of
120 statements preceding and including the first assignment statement or
121 `free`. (This is on the [admittedly somewhat pessimistic] assumption
122 that any assignment could invalidate x.) (_New in 1.1_: the safe start
123 must precede the first `free` statement, to prevent creation of dangling
124 aliased pointers. Thanks Gregor!) To simplify implementation, we limit x
125 to a simple variable name rather than a full expression. (This too is
126 without loss of generality, as it is a simple matter to use a temporary
127 variable to store the result of a pointer expression.) Any attempt to
128 dereference a pointer which does not follow these rules is caught by the
129 static checker and disallowed.
130
131 ### Runtime Support
132
133 Every pointer in the Eightebed language is implemented internally as a
134 structure of a machine pointer (obtained, for instance, by C's
135 `malloc()`) coupled with a boolean flag called `valid`. When a chunk of
136 memory is initially successfully allocated, `valid` is set to true.
137 Freeing a pointer first checks this flag; freeing the machine pointer is
138 only attempted if `valid` is true. In addition, just before freeing the
139 machine pointer, we invalidate all aliases to that pointer. (Starting
140 with the "root set" of the program's global variables, we traverse all
141 memory blocks reachable by following valid pointers from them, looking
142 for pointers which match the pointer about to be freed; any we find, we
143 set their `valid` flags to false.) After freeing a pointer, we set its
144 `valid` to false.
145
146 ### Why this Works
147
148 Because of the static analysis, it is not possible to dereference a
149 pointer at a point in the program where we do not know for certain that
150 it is valid (i.e., it is not possible to dereference an invalid
151 pointer.) Because of the runtime support, as soon as a pointer becomes
152 invalid, all aliases of it become invalid as well. (All reachable
153 aliases, that is – but if an alias isn't reachable, it can't be
154 dereferenced anyway.) Add both of these together, and you get memory
155 that can leak without any risk of being reused.
156
157 And no, this isn't garbage collection, because (as stated already) we
158 don't care about garbage and we don't collect anything. Yes, the runtime
159 support looks a bit like the mark phase of a mark-and-sweep garbage
160 collector, but even it has a different job: not marking everything that
161 is reachable, rather invalidating all aliases of a given pointer.
162
163 And finally, yes, I realize how little this proves. Long live loopholes.
164
165 16:19:38 <Gregor> We implement this without a GC by stuffing most of a
166 GC into the free function, thereby making it just as
167 slow as a GC'd language with none of the advantages!
168 16:25:29 <Gregor> So yes, although you have managed to fit my
169 requirements, I am wildly underwhelmed :P
170
171 Reference Implementation
172 ------------------------
173
174 Cat's Eye Technologies provides a cockamamie reference implementation of
175 Eightebed called `8ebed2c.py`. Written in Python 2.6, it compiles
176 Eightebed code to C, and for convenience will optionally compile that C
177 with the C compiler of your choice and run the resulting executable.
178
179 `8ebed2c.py` ships with a fairly extensive (for a language like this!)
180 suite of test programs, which can of course double as example sources;
181 these can be found in the `eightebed.tests` module.
182
183 For an appreciation of just how cockamamie `8ebed2c.py` is, run
184 `8ebed2c.py --help` and read through the command-line options it
185 provides.
186
187 Legal Issues
188 ------------
189
190 The name Eightebed started life as a typo for the word "enlightened"
191 made on an iPhone by a mysterious individual known only as Alise. (Well,
192 perhaps not *only*.) Alise has aggressively asserted her intellectual
193 property rights by copyrighting [*sic*] the name Eightebed. Cat's Eye
194 Technologies has pursued permission to use the name for this language,
195 only to be told that the procedure for obtaining such permission
196 "involves five yaks, a Golden toad that hasn't eaten for five days, five
197 boxes of antique confetti (not stripped of uranium), dye number 90
198 (blood green), a very confused weasel, and three pieces of A4.15 paper."
199
200 Cat's Eye Technologies' legal-and-yak-husbandry team is currently
201 investigating the feasibility of this arrangement, and as of this
202 writing, official permission is still pending. If complications persist,
203 another, less contentious name (such as "Microsoft Windows 7") may need
204 to be chosen for this language.
205
206 17:52:08 <alise> cpressey: I request that all harm is done to animals
207 in the making of this production.
208
209 Future Work
210 -----------
211
212 *In which we reveal the outline of a grand plan for a blockbuster sequel
213 to Eightebed which will never materialize*
214
215 - To be titled _Eightebed: Ascension_ or _Eightebed: Generations_. At
216 least, title should have one of those bad-ass colons in it. Possibly
217 _Eightebed: Eightebed_.
218 - To support functions, analysis of arbitrary expressions as the
219 condition in an `if valid`, pointers to unnamed types, structures
220 which contain other structures, and all that other boring stuff that
221 we just said doesn't matter.
222 - To have a literate specification written in SUPER ITALIAN, thus
223 giving all programs the power of UNMATCHED PROPHETIC SNEEZING.
224 - To be co-authored with Frank Zappa (note: turns out Mr. Zappa is
225 dead. Maybe Tipper Gore instead? Yes, that should work.)
226 - ~~To include a garbage collector.~~
227 - Puppets???
228
229 Happy leaking!
230 Chris Pressey
231 September 1, 2010
232 Evanston, IL