This document is not Robin-specific. It's in "Specs on Spec" now.
Cat's Eye Technologies
11 years ago
0 | Practical Matters | |
1 | ================= | |
2 | ||
3 | This document is a collection of notes I've made over the years about the | |
4 | practical matters of production programming languages -- usually stemming | |
5 | from being irked by some existing programming language's lack of adequate | |
6 | (in my opinion) support for them. As such, these thoughts may be overblown | |
7 | and sophistry-laden, and not even things I necessarily want to see in Robin. | |
8 | But it is nice to have a central place to put them. | |
9 | ||
10 | Fundamental Abstractions | |
11 | ------------------------ | |
12 | ||
13 | The following facilities should be either built-in to the language, or part | |
14 | of the standard (highly standardized) libraries: | |
15 | ||
16 | * Tracing. Ideally, the programmer should be able to easily browse all the | |
17 | relevant reduction steps, and the relevant data being manipulated therein, | |
18 | in the part of the program's execution that interests them. In addition, | |
19 | this should be something that can be enabled without polluting the source | |
20 | code (overmuch). | |
21 | ||
22 | This could be done, and fairly well, with techniques from aspect-oriented | |
23 | programming. The rules to describe what to trace (or to highlight in a | |
24 | full trace) could be specified in what amounts to a configuration file, | |
25 | and thus be an implementation issue rather than a language issue. | |
26 | ||
27 | Unfortunately, this ideal is hard to achieve, so the system should also | |
28 | support... | |
29 | ||
30 | * Logging. Logging is basically an ad-hoc way to explicitly achieve | |
31 | selective tracing: the programmer knows what points in the program, and | |
32 | what data, are of interest to them, and outputs that data to the log at | |
33 | those points. | |
34 | ||
35 | Whether this is "debug logging" during development, or to support post- | |
36 | mortem analysis of issues in production, it amounts to the same thing: | |
37 | debugging, just on different time scales. | |
38 | ||
39 | The use of a "log level" is mostly just a way to filter the trace built | |
40 | up in the log files. This is not necessarily a bad idea, but it should | |
41 | probably not be linear; information should be logged based on the reason | |
42 | that it is being logged, probably in the form of some sort of "tag", and | |
43 | filterable on that (whether at the time the log is being recorded, or | |
44 | being read.) | |
45 | ||
46 | In Robin, logging should not count as a side-effect. | |
47 | ||
48 | The logging function itself should have some properties: | |
49 | ||
50 | - Should not have side-effects (for example from evaluating its arguments), | |
51 | so that if it is not executed (because we are not interested in that | |
52 | part of the execution trace) the behaviour of the program is not changed. | |
53 | ||
54 | - In fact, should ensure that its arguments have no side-effects, and | |
55 | ideally, be total, with no chance of hanging or crashing. | |
56 | ||
57 | - Should pretty-print the relevant values, include the type and other | |
58 | metadata of the values, and put clearly visible delimeters around the | |
59 | values so printed. | |
60 | ||
61 | - Should include the source filename and line number. | |
62 | ||
63 | - Should not be overridable (shadowed? not sure what I meant here.) | |
64 | ||
65 | * History. This is more relevant in a language with mutable values, but | |
66 | as part of tracing, it is useful to know the history of mutations of a | |
67 | value. With immutable values, it would be useful to be able to view | |
68 | all the reductions which fed into the computation of the value at a | |
69 | point. Either way, however, this is expensive, so should be specified | |
70 | selectively. Again, an external, aspect-like configuration language | |
71 | for specifying which values to watch makes this an implementation issue. | |
72 | ||
73 | * Command-line option parsing. This should not rely on the Unix or DOS | |
74 | idea of a command line, and it should be unified with parameter passing | |
75 | in the language itself; calling an executable built in the language with | |
76 | arguments `a b c` should be no different from calling a function from | |
77 | within the language with the arguments `a b c` (probably as string values.) | |
78 | ||
79 | Reflection | |
80 | ---------- | |
81 | ||
82 | * First-class tracebacks. When a program, for example, encounters an error | |
83 | parsing an external file such as a configuration file, it should be able to | |
84 | report the position in that file that caused the error as part of the | |
85 | traceback, for consistency. Java has some limited facilities for this, and | |
86 | some Python libraries do this (Jinja2? werkzeug?) using frame hacks, but | |
87 | a less clumsy solution would be nice. | |
88 | ||
89 | Tracebacks are *not* a special case of logging, or an artefact of throwing | |
90 | exceptions. Since the traceback is basically a formatted version of the | |
91 | current continuation, this suggests the two facilities should be unified, | |
92 | perhaps not totally, but to a high degree. | |
93 | ||
94 | Abstractions, not Wrappers | |
95 | -------------------------- | |
96 | ||
97 | The basic principle here is that the existing APIs of most libraries are | |
98 | (let's be polite) less than ideal, especially when they were designed for | |
99 | some other language (such as C), and instead of blindly wrapping them in a | |
100 | new language, the designer should at least *try* to make something nicer. | |
101 | ||
102 | The abstractions should also recognize that modern computer systems are | |
103 | generally not resource-starved (or at least that truly high-level | |
104 | programming languages should not treat them that way.) | |
105 | ||
106 | This applies to very basic facilities as well as what are usually thought | |
107 | of as external libraries. Specifically, | |
108 | ||
109 | * Date and time: We can do better than simply copycatting interfaces like | |
110 | `strftime`. All time data should be stored consistently, in GMT, always | |
111 | with a time zone. | |
112 | ||
113 | * String formatting: We can do better than simply copycatting interfaces | |
114 | like `printf`. We can use visual formatting strings, where fixed-size | |
115 | slots appear as fixed-sized placeholders (of the same size) in the | |
116 | formatting string. (See also the scathing prog21 criticism of the | |
117 | vertical tab character.) | |
118 | ||
119 | * Line-oriented communication: We can look at line-oriented communication | |
120 | more generally, as a form of record-oriented communication where the | |
121 | "delimiter set" for each record is {LF, CR, CRLF}. | |
122 | ||
123 | The programmer who really wants atavistic interfaces like those mentioned | |
124 | above can always implement them as "compatibility modules" if they wish. | |
125 | ||
126 | Seperation from the Implementation | |
127 | ---------------------------------- | |
128 | ||
129 | This is just a repeat of the above section in slightly different terms. | |
130 | ||
131 | A language should avoid tying any language construct (e.g. imports, | |
132 | include files) to the file system or the operating system. Instead, | |
133 | have mappings between e.g. module names and where they live in the file | |
134 | system, and between our model of a running computer and a real OS. | |
135 | These mapping could be specified in configuration files which are | |
136 | in the domain of the implementation and outside the domain of the | |
137 | language, i.e. they never appear in programs. | |
138 | ||
139 | Standard modules supplied with the language should expose *models* of | |
140 | commonplace artefacts out in the world, for example operating systems. | |
141 | The models are similar to the artefacts, in order that the burden of | |
142 | implementing an interface from the model to any given artefact is not | |
143 | too great. However, the models are *not* the artefacts. Programs | |
144 | should be written to the model, not to the artefact. | |
145 | ||
146 | People who construct bindings to the language should be encouraged | |
147 | (only because they can't effectively be required) to create models | |
148 | more abstract than the libraries that they are binding. | |
149 | ||
150 | Insofar as possible, we can have a compiler optimize things so that they | |
151 | match the underlying architecture. The language should allows and even | |
152 | encourage definitions in the most general sense; special cases are to be | |
153 | detected and optimized when they occur, instead of instituting those | |
154 | special cases into the language itself. | |
155 | ||
156 | Another aspect of this point of philosophy is that it should be possible | |
157 | to specify and change the performance characteristics of the program | |
158 | (but ideally not its behaviour) from outside the program, using | |
159 | configuration files. | |
160 | ||
161 | This counts as a practical matter because maintaining code which is | |
162 | cluttered with implementation-specific artefacts is burdensome. | |
163 | ||
164 | Serialization | |
165 | ------------- | |
166 | ||
167 | (This section needs to be rewritten) | |
168 | ||
169 | - All primitive values must be serializable | |
170 | - All primitive values must be round-trippable | |
171 | - All primitive values must thus have an order to them (like Ruby 1.9's | |
172 | hashes) because in this world of representations, orderless things don't | |
173 | really exist | |
174 | - When building user-defined values from primitive values it must be | |
175 | easy to retain these serialization properties in the composite value | |
176 | - This is actually fairly agnostic of the particular serialization format | |
177 | (yaml, xml, binary, etc) | |
178 | - S-expressions are trivially serializable, except for functions | |
179 | ||
180 | Formatting | |
181 | ---------- | |
182 | ||
183 | Closely related to serialization. | |
184 | ||
185 | Many languages support a "standard" operation to convert an arbitrary value to | |
186 | a string. Some even have two (e.g. Python's `str` and `repr`). | |
187 | ||
188 | But in reality, there are any number of ways to convert a value to a string. | |
189 | Why should the string representation of 16 necessarily be `"16"` -- why not | |
190 | `"0xf"` or `"XVI"`? `"16"` is fine, but it should be explicitly noted to be | |
191 | the default for the reason that it's the most convenient for the audience of | |
192 | humans who use the decimal Arabic notation when dealing with numbers. | |
193 | ||
194 | How can we support both a reasonable (and possibly configurable) default | |
195 | formatting, as well as any number of other ways to format values which would | |
196 | be more appropriate in different contexts? | |
197 | ||
198 | Can we pass a "style" argument to the string-conversion function? | |
199 | ||
200 | Should we establish a "design pattern" for writing formatting functions, and | |
201 | provide support for implementing such patterns? | |
202 | ||
203 | (Also, `format` is probably a better name for this function than `str`.) | |
204 | ||
205 | Multiple Environments | |
206 | --------------------- | |
207 | ||
208 | (This section needs to be rewritten) | |
209 | ||
210 | - Lots of software runs in multiple environments - "development", "qa", | |
211 | "production" | |
212 | - Inherently support that idea | |
213 | ||
214 | Assertions | |
215 | ---------- | |
216 | ||
217 | (This section needs to be rewritten) | |
218 | ||
219 | - Software engineering is more about defining invariants than writing code. | |
220 | - An "assert" command which produces details errors in development, but only | |
221 | logs warnings in production environments | |
222 | - Very lightweight so that programmers use it without thinking | |
223 | (Python's `self.assertEqual()` is *not* lightweight) | |
224 | (Erlang's `A = {foo,B}` IS lightweight) | |
225 | - So a conditional, by itself, is an assertion. (?) | |
226 | ||
227 | Interfaces | |
228 | ---------- | |
229 | ||
230 | (This section needs to be rewritten) | |
231 | ||
232 | One way or another, it should be possible to discover (programmatically, | |
233 | through reflection of some sort) the set of operations that a value supports -- | |
234 | its interface. Each operation has a name and a signature of some sort. | |
235 | ||
236 | Collections are interfaces. | |
237 | ||
238 | Some parts of an interface might be "private". This -- information hiding -- | |
239 | is obviously a somewhat complex topic. The obvious bit is that information | |
240 | hiding is useful to prevent unintended changes to program state, but it also | |
241 | hinders debugging and testing. | |
242 | ||
243 | Usability | |
244 | --------- | |
245 | ||
246 | Memorization is not a good thing to make programmers do. This can be | |
247 | addressed by either copying things from an existing language that the | |
248 | programmer base can be expected to already have memorized, or by providing | |
249 | a more orthogonal set of things which maps to the culture which programmers, | |
250 | as people, already live in. (For example, few people in the Western world | |
251 | do not know that `&` means "and".) | |
252 | ||
253 | Non-alphabetic symbols should, idealy, have the same meaning regardless of | |
254 | the context they're used in -- in other words, the language should avoid | |
255 | using the same symbol for different purposes in different contexts. | |
256 | ||
257 | (Lots of languages are lacking here. In C, `*` is both multiplication and | |
258 | dereferencing. In Python, `.` is both object attribute access and package | |
259 | hierarchy -- although packages are, at least, kind of like objects. In Lua, | |
260 | `=` is both assignment and key value association.) | |
261 | ||
262 | Programming Languages vs. Operating Systems | |
263 | ------------------------------------------- | |
264 | ||
265 | (this section needs to be cleaned up -- not sure where to put it, and it | |
266 | arguably doesn't belong here) | |
267 | ||
268 | What you see before you in this distribution can be described as a | |
269 | programming language, but many of the ideas took root while thinking about | |
270 | operating systems. | |
271 | ||
272 | What's the difference between a programming language and an operating system? | |
273 | ||
274 | Well, maybe less than you think. | |
275 | ||
276 | Programming languages do need to define the environment in which they can | |
277 | express programs. Sometimes this is a specific OS (like early C on Unix) -- | |
278 | or they claim to be "portable", but then they're really just defining an | |
279 | abstraction against all the possible OS'es they think they'll run on. Often | |
280 | this abstract is clumsy, but some languages put a lot of thought into it, | |
281 | like Smalltalk. | |
282 | ||
283 | Operating systems, on the other hand, don't tell you what programming | |
284 | language to use -- or do they? A modern OS insists everything is, at some | |
285 | point, in native machine language, and a running instance will almost always | |
286 | be limited to a single machine language of a single architecture. Somewhat | |
287 | more alternative OS'es define a virtual machine language to abstract away | |
288 | from the concrete machine language. Usually this virtual machine language | |
289 | looks like a machine language, but sometimes it's a tad more high-level, | |
290 | like Lisp. Any way you slice it, the OS does sanction a particular, albeit | |
291 | usually low-level, programming language. | |
292 | ||
293 | Where PL's and OS's seem to meet more-or-less neatly is in the idea of the | |
294 | VM, so let's examine that. | |
295 | ||
296 | Most modern virtual machines are designed to implement high-level languages | |
297 | in a modern operating system environment. The JVM was specifically designed | |
298 | for running Java, and while .NET was ostensibly designed for multiple | |
299 | languages, the bytecode is pretty closely tuned to C\#. | |
300 | ||
301 | What these VMs were not designed to do, but what a VM "should really" be | |
302 | designed to do (if it, at least, wants to live up to the name "virtual | |
303 | machine") is to abstract the *hardware* and provide virtualizations | |
304 | (abstractions) of the available devices. | |
305 | ||
306 | An environment contains zero or more devices. A device exposes zero | |
307 | or more services. Each service conforms to one or more interfaces. | |
308 | Each service may additionally require one or more services be available | |
309 | (by interface). | |
310 | ||
311 | At one point I was calling this place where programming language and | |
312 | operating system meet a "CE" (Computational Environment) because | |
313 | "operating system" is far too generic-sounding and "programming language" | |
314 | doesn't address the important environmental aspect here. Whether I would | |
315 | continue to use the term CE or not, as it could just add to the confusion. | |
316 | ||
317 | How do most programming languages deal with the abstraction of available | |
318 | (or virtual) devices? Terribly, I would say. Take, as a simple example, | |
319 | an addressable character screen device. Someone writes a library, in C, | |
320 | to access it (e.g. `ncurses`,) providing an API comprising C functions | |
321 | and C structs. Someone then writes a binding or a wrapper (e.g. using | |
322 | `swig`) or otherwise foreign-function interfaces it to the language, usually | |
323 | exposing the exact same C-level API naively adapted to the programming | |
324 | language. Then you, the programmer in this language, wrestle with working | |
325 | with the device almost exactly as a C programmer would, initializing and | |
326 | releasing it as a C programmer would, with limitations on how you may or | |
327 | may not use it from multithreaded code like a C programmer would (which | |
328 | might be brutally different from how the runtime for your programming | |
329 | language implementation assumes that its world works.) All this, with the | |
330 | added hassle of having to make sure you have all these bindings for the | |
331 | device for your chosen implementation of your language built and installed | |
332 | correctly. |