git @ Cat's Eye Technologies Robin / 78a0401
This document is not Robin-specific. It's in "Specs on Spec" now. Cat's Eye Technologies 11 years ago
1 changed file(s) with 0 addition(s) and 333 deletion(s). Raw diff Collapse all Expand all
+0
-333
doc/Practical_Matters.markdown less more
0 Practical Matters
1 =================
2
3 This document is a collection of notes I've made over the years about the
4 practical matters of production programming languages -- usually stemming
5 from being irked by some existing programming language's lack of adequate
6 (in my opinion) support for them. As such, these thoughts may be overblown
7 and sophistry-laden, and not even things I necessarily want to see in Robin.
8 But it is nice to have a central place to put them.
9
10 Fundamental Abstractions
11 ------------------------
12
13 The following facilities should be either built-in to the language, or part
14 of the standard (highly standardized) libraries:
15
16 * Tracing. Ideally, the programmer should be able to easily browse all the
17 relevant reduction steps, and the relevant data being manipulated therein,
18 in the part of the program's execution that interests them. In addition,
19 this should be something that can be enabled without polluting the source
20 code (overmuch).
21
22 This could be done, and fairly well, with techniques from aspect-oriented
23 programming. The rules to describe what to trace (or to highlight in a
24 full trace) could be specified in what amounts to a configuration file,
25 and thus be an implementation issue rather than a language issue.
26
27 Unfortunately, this ideal is hard to achieve, so the system should also
28 support...
29
30 * Logging. Logging is basically an ad-hoc way to explicitly achieve
31 selective tracing: the programmer knows what points in the program, and
32 what data, are of interest to them, and outputs that data to the log at
33 those points.
34
35 Whether this is "debug logging" during development, or to support post-
36 mortem analysis of issues in production, it amounts to the same thing:
37 debugging, just on different time scales.
38
39 The use of a "log level" is mostly just a way to filter the trace built
40 up in the log files. This is not necessarily a bad idea, but it should
41 probably not be linear; information should be logged based on the reason
42 that it is being logged, probably in the form of some sort of "tag", and
43 filterable on that (whether at the time the log is being recorded, or
44 being read.)
45
46 In Robin, logging should not count as a side-effect.
47
48 The logging function itself should have some properties:
49
50 - Should not have side-effects (for example from evaluating its arguments),
51 so that if it is not executed (because we are not interested in that
52 part of the execution trace) the behaviour of the program is not changed.
53
54 - In fact, should ensure that its arguments have no side-effects, and
55 ideally, be total, with no chance of hanging or crashing.
56
57 - Should pretty-print the relevant values, include the type and other
58 metadata of the values, and put clearly visible delimeters around the
59 values so printed.
60
61 - Should include the source filename and line number.
62
63 - Should not be overridable (shadowed? not sure what I meant here.)
64
65 * History. This is more relevant in a language with mutable values, but
66 as part of tracing, it is useful to know the history of mutations of a
67 value. With immutable values, it would be useful to be able to view
68 all the reductions which fed into the computation of the value at a
69 point. Either way, however, this is expensive, so should be specified
70 selectively. Again, an external, aspect-like configuration language
71 for specifying which values to watch makes this an implementation issue.
72
73 * Command-line option parsing. This should not rely on the Unix or DOS
74 idea of a command line, and it should be unified with parameter passing
75 in the language itself; calling an executable built in the language with
76 arguments `a b c` should be no different from calling a function from
77 within the language with the arguments `a b c` (probably as string values.)
78
79 Reflection
80 ----------
81
82 * First-class tracebacks. When a program, for example, encounters an error
83 parsing an external file such as a configuration file, it should be able to
84 report the position in that file that caused the error as part of the
85 traceback, for consistency. Java has some limited facilities for this, and
86 some Python libraries do this (Jinja2? werkzeug?) using frame hacks, but
87 a less clumsy solution would be nice.
88
89 Tracebacks are *not* a special case of logging, or an artefact of throwing
90 exceptions. Since the traceback is basically a formatted version of the
91 current continuation, this suggests the two facilities should be unified,
92 perhaps not totally, but to a high degree.
93
94 Abstractions, not Wrappers
95 --------------------------
96
97 The basic principle here is that the existing APIs of most libraries are
98 (let's be polite) less than ideal, especially when they were designed for
99 some other language (such as C), and instead of blindly wrapping them in a
100 new language, the designer should at least *try* to make something nicer.
101
102 The abstractions should also recognize that modern computer systems are
103 generally not resource-starved (or at least that truly high-level
104 programming languages should not treat them that way.)
105
106 This applies to very basic facilities as well as what are usually thought
107 of as external libraries. Specifically,
108
109 * Date and time: We can do better than simply copycatting interfaces like
110 `strftime`. All time data should be stored consistently, in GMT, always
111 with a time zone.
112
113 * String formatting: We can do better than simply copycatting interfaces
114 like `printf`. We can use visual formatting strings, where fixed-size
115 slots appear as fixed-sized placeholders (of the same size) in the
116 formatting string. (See also the scathing prog21 criticism of the
117 vertical tab character.)
118
119 * Line-oriented communication: We can look at line-oriented communication
120 more generally, as a form of record-oriented communication where the
121 "delimiter set" for each record is {LF, CR, CRLF}.
122
123 The programmer who really wants atavistic interfaces like those mentioned
124 above can always implement them as "compatibility modules" if they wish.
125
126 Seperation from the Implementation
127 ----------------------------------
128
129 This is just a repeat of the above section in slightly different terms.
130
131 A language should avoid tying any language construct (e.g. imports,
132 include files) to the file system or the operating system. Instead,
133 have mappings between e.g. module names and where they live in the file
134 system, and between our model of a running computer and a real OS.
135 These mapping could be specified in configuration files which are
136 in the domain of the implementation and outside the domain of the
137 language, i.e. they never appear in programs.
138
139 Standard modules supplied with the language should expose *models* of
140 commonplace artefacts out in the world, for example operating systems.
141 The models are similar to the artefacts, in order that the burden of
142 implementing an interface from the model to any given artefact is not
143 too great. However, the models are *not* the artefacts. Programs
144 should be written to the model, not to the artefact.
145
146 People who construct bindings to the language should be encouraged
147 (only because they can't effectively be required) to create models
148 more abstract than the libraries that they are binding.
149
150 Insofar as possible, we can have a compiler optimize things so that they
151 match the underlying architecture. The language should allows and even
152 encourage definitions in the most general sense; special cases are to be
153 detected and optimized when they occur, instead of instituting those
154 special cases into the language itself.
155
156 Another aspect of this point of philosophy is that it should be possible
157 to specify and change the performance characteristics of the program
158 (but ideally not its behaviour) from outside the program, using
159 configuration files.
160
161 This counts as a practical matter because maintaining code which is
162 cluttered with implementation-specific artefacts is burdensome.
163
164 Serialization
165 -------------
166
167 (This section needs to be rewritten)
168
169 - All primitive values must be serializable
170 - All primitive values must be round-trippable
171 - All primitive values must thus have an order to them (like Ruby 1.9's
172 hashes) because in this world of representations, orderless things don't
173 really exist
174 - When building user-defined values from primitive values it must be
175 easy to retain these serialization properties in the composite value
176 - This is actually fairly agnostic of the particular serialization format
177 (yaml, xml, binary, etc)
178 - S-expressions are trivially serializable, except for functions
179
180 Formatting
181 ----------
182
183 Closely related to serialization.
184
185 Many languages support a "standard" operation to convert an arbitrary value to
186 a string. Some even have two (e.g. Python's `str` and `repr`).
187
188 But in reality, there are any number of ways to convert a value to a string.
189 Why should the string representation of 16 necessarily be `"16"` -- why not
190 `"0xf"` or `"XVI"`? `"16"` is fine, but it should be explicitly noted to be
191 the default for the reason that it's the most convenient for the audience of
192 humans who use the decimal Arabic notation when dealing with numbers.
193
194 How can we support both a reasonable (and possibly configurable) default
195 formatting, as well as any number of other ways to format values which would
196 be more appropriate in different contexts?
197
198 Can we pass a "style" argument to the string-conversion function?
199
200 Should we establish a "design pattern" for writing formatting functions, and
201 provide support for implementing such patterns?
202
203 (Also, `format` is probably a better name for this function than `str`.)
204
205 Multiple Environments
206 ---------------------
207
208 (This section needs to be rewritten)
209
210 - Lots of software runs in multiple environments - "development", "qa",
211 "production"
212 - Inherently support that idea
213
214 Assertions
215 ----------
216
217 (This section needs to be rewritten)
218
219 - Software engineering is more about defining invariants than writing code.
220 - An "assert" command which produces details errors in development, but only
221 logs warnings in production environments
222 - Very lightweight so that programmers use it without thinking
223 (Python's `self.assertEqual()` is *not* lightweight)
224 (Erlang's `A = {foo,B}` IS lightweight)
225 - So a conditional, by itself, is an assertion. (?)
226
227 Interfaces
228 ----------
229
230 (This section needs to be rewritten)
231
232 One way or another, it should be possible to discover (programmatically,
233 through reflection of some sort) the set of operations that a value supports --
234 its interface. Each operation has a name and a signature of some sort.
235
236 Collections are interfaces.
237
238 Some parts of an interface might be "private". This -- information hiding --
239 is obviously a somewhat complex topic. The obvious bit is that information
240 hiding is useful to prevent unintended changes to program state, but it also
241 hinders debugging and testing.
242
243 Usability
244 ---------
245
246 Memorization is not a good thing to make programmers do. This can be
247 addressed by either copying things from an existing language that the
248 programmer base can be expected to already have memorized, or by providing
249 a more orthogonal set of things which maps to the culture which programmers,
250 as people, already live in. (For example, few people in the Western world
251 do not know that `&` means "and".)
252
253 Non-alphabetic symbols should, idealy, have the same meaning regardless of
254 the context they're used in -- in other words, the language should avoid
255 using the same symbol for different purposes in different contexts.
256
257 (Lots of languages are lacking here. In C, `*` is both multiplication and
258 dereferencing. In Python, `.` is both object attribute access and package
259 hierarchy -- although packages are, at least, kind of like objects. In Lua,
260 `=` is both assignment and key value association.)
261
262 Programming Languages vs. Operating Systems
263 -------------------------------------------
264
265 (this section needs to be cleaned up -- not sure where to put it, and it
266 arguably doesn't belong here)
267
268 What you see before you in this distribution can be described as a
269 programming language, but many of the ideas took root while thinking about
270 operating systems.
271
272 What's the difference between a programming language and an operating system?
273
274 Well, maybe less than you think.
275
276 Programming languages do need to define the environment in which they can
277 express programs. Sometimes this is a specific OS (like early C on Unix) --
278 or they claim to be "portable", but then they're really just defining an
279 abstraction against all the possible OS'es they think they'll run on. Often
280 this abstract is clumsy, but some languages put a lot of thought into it,
281 like Smalltalk.
282
283 Operating systems, on the other hand, don't tell you what programming
284 language to use -- or do they? A modern OS insists everything is, at some
285 point, in native machine language, and a running instance will almost always
286 be limited to a single machine language of a single architecture. Somewhat
287 more alternative OS'es define a virtual machine language to abstract away
288 from the concrete machine language. Usually this virtual machine language
289 looks like a machine language, but sometimes it's a tad more high-level,
290 like Lisp. Any way you slice it, the OS does sanction a particular, albeit
291 usually low-level, programming language.
292
293 Where PL's and OS's seem to meet more-or-less neatly is in the idea of the
294 VM, so let's examine that.
295
296 Most modern virtual machines are designed to implement high-level languages
297 in a modern operating system environment. The JVM was specifically designed
298 for running Java, and while .NET was ostensibly designed for multiple
299 languages, the bytecode is pretty closely tuned to C\#.
300
301 What these VMs were not designed to do, but what a VM "should really" be
302 designed to do (if it, at least, wants to live up to the name "virtual
303 machine") is to abstract the *hardware* and provide virtualizations
304 (abstractions) of the available devices.
305
306 An environment contains zero or more devices. A device exposes zero
307 or more services. Each service conforms to one or more interfaces.
308 Each service may additionally require one or more services be available
309 (by interface).
310
311 At one point I was calling this place where programming language and
312 operating system meet a "CE" (Computational Environment) because
313 "operating system" is far too generic-sounding and "programming language"
314 doesn't address the important environmental aspect here. Whether I would
315 continue to use the term CE or not, as it could just add to the confusion.
316
317 How do most programming languages deal with the abstraction of available
318 (or virtual) devices? Terribly, I would say. Take, as a simple example,
319 an addressable character screen device. Someone writes a library, in C,
320 to access it (e.g. `ncurses`,) providing an API comprising C functions
321 and C structs. Someone then writes a binding or a wrapper (e.g. using
322 `swig`) or otherwise foreign-function interfaces it to the language, usually
323 exposing the exact same C-level API naively adapted to the programming
324 language. Then you, the programmer in this language, wrestle with working
325 with the device almost exactly as a C programmer would, initializing and
326 releasing it as a C programmer would, with limitations on how you may or
327 may not use it from multithreaded code like a C programmer would (which
328 might be brutally different from how the runtime for your programming
329 language implementation assumes that its world works.) All this, with the
330 added hassle of having to make sure you have all these bindings for the
331 device for your chosen implementation of your language built and installed
332 correctly.