git @ Cat's Eye Technologies Dieter / master README.md
master

Tree @master (Download .tar.gz)

README.md @masterview rendered · raw · history · blame

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
The Dieter Programming Language
===============================

_Entry_ [@ catseye.tc](https://catseye.tc/node/Dieter)
| _See also:_ [Castile](https://codeberg.org/catseye/Castile#castile)

Description
-----------

Dieter (that's Dieter as in the German masculine given name Dieter, not
dieter as in "one who diets") is a little experimental programming
language which conflates *type qualifiers* with *modules*. In this
article, we'll first describe these mechanisms. Then we'll show how
their interaction produces something that resembles object-oriented
programming.

### Type Qualifiers ###

A type qualifier is simply a keyword which may be placed before any type
expression, further specializing it. Type qualifiers should be familiar
to programmers of C, where *storage classes*, such as `const` and
`static`, are examples of type qualifiers. The cardinal rule of type
qualifiers in Dieter is this: **during assignment (or
parameter-passing), a variable (or parameter) whose type possesses some
type qualifier *only* accepts values of the same type that possess *at
least* that same qualifier.**

Some basic properties of type qualifiers follow (but first, for
concreteness, let's stipulate that Dieter has an assortment of primitive
types: `bool`, `int`, `rat`, and `string`. And `void` denoting the type
of the absence of any value.) The order that type qualifiers are given
in a type expression doesn't matter: if `beefy` and `gnarly` are type
qualifiers, then `beefy gnarly int` and `gnarly beefy int` are wholly
equivalent types. Neither does the number of occurrences matter:
`beefy gnarly beefy int` is equivalent to `beefy gnarly int`.
Algebraically speaking, type qualifiers are commmutative and idempotent.

You can't assign an `int` to a `beefy int` (or pass it to a procedure
expecting a `beefy int`,) but you can assign a `beefy int` to an `int`.
This might sound counter-intuitive initially, but after you think about
it a bit I'm sure you'll conclude it's appropriate: it's just like how
you can assign a `const int` to an `int` in C, but not an `int` to a
`const int`. `int` is more general, and will tolerate information being
lost in the copy, whereas `beefy int` is more specific, and will not
accept a value with insufficient information associated with it. Or, if
you prefer, while all `beefy int`s are `int`s, there are `int`s that
aren't `beefy` and thus can't be assigned to a variable which is
restricted to `beefy int`s.

### Type Operators ###

In order to make use of type qualifiers in user programs, Dieter
provides type operators, similar to C's type-casting facilities, to
manipulate type qualifiers.

In fact, Dieter has only one explicit type operator, called `bestow`. It
can be used in an expression to endow its type with further type
qualifiers. `bestow` takes a type qualifier q and a value of some type t
and returns a new value of type q·t. (You can think of q·t as a kind of
pseudo-product type that obeys the algebraic laws given in the previous
section rather than those of conventional products.)

The complementary operation — stripping q·t back to t — is not
explicitly denoted; it happens automatically as needed. (This is the
underlying rule that was in effect, when we stated that a `beefy int`
can be assigned to an `int`.)

Note that type operators in no way alter the *value* of the expression
they are used in, only its *type*.

### Type Variables ###

Dieter supports uniform polymorphic types. Type expressions can contain
type variables (which I'll denote with ♥ for no good reason) which can
unify with concrete types or other type variables during instantiation
of a language construct. In practice, this means that procedures can
have type variables in their parameter and return types; these variables
are replaced by types specified at the call site whenever a procedure
call is type-checked.

The scope of each type variable ranges over a single instance of a
single procedure. So `♥x` in the parameter list of `grunt` is the same
type as every occurrence of `♥x` anywhere in the definition of `grunt`
during that same invokation of `grunt`, but may be different from `♥x`
in other invokations of `grunt` (calls to `grunt` from different call
sites), and different from `♥x` appearing in other procedures entirely.

A complication for polymorphism in Dieter is that type variables range
not only over core type expressions, but *also* over type qualifiers.
That is, the types `beefy gnarly int` and `beefy ♥t` unify, and their
most general unifier is {`♥t``gnarly int`}. This has a few
implications for the unification algorithm; see [Appendix
A](#appendix_a) for a detailed discussion of these.

### Modules ###

Dieter is a modular language, which has its usual meaning — a Dieter
program consists of a set of modules, each of which exposes only its
interface, not its implementation, to all other modules. In Dieter,
however, there is a notable twist: every type qualifier is defined by
some module which bears the same name. The key idea of Dieter is this:
**a type operator that affects a given qualifier may *only* be used in
the module which defines that qualifier.** The reasoning for this is
similar to the argument against using casts in the C language: the goal
of typing is to specify constraints on the program's structure and to
automatically check that they are met. If programmers are allowed to
muck with types willy-nilly, it defeats the purpose of having these
constraints in the first place. So the power to `bestow` a particular
type qualifier is only given out to the module "in charge" of that type
qualifier. This is a form of encapsulation: the internal details of some
thing (in this case, a type modifier) can only be manipulated by the
implementation of that thing.

### Procedures ###

Each Dieter module consists of a set of procedure declarations. Any
procedure can be called from any procedure in any module; they are
comparable to `public static` methods in Java in this respect.
Procedures are polymorphic, as mentioned; each one's parameter and
return types can contain type variables. At each call site, the type
variables are unified with the types of the values being passed to them,
and resolved ultimately to concrete types.

A bit of administrivia that we should also mention is that Dieter
requires procedures to be declared, though not necessarily defined,
before they are called. Much like Pascal, it offers a `forward`
declaration construct for this purpose. It can also be used to declare
procedures intrinsic to the implementation, procedures external to some
set of modules, or completely ficticious procedures for the purpose of
demonstrating and testing the type system.

### Example ###

We're now ready to give a very simple example of a Dieter module.

    module beefy

      procedure beef_up(x: ♥t): beefy ♥t
      begin
        return (bestow beefy x)
      end

    end

The intent of this example is to get the general gist of the language
across, not to demonstrate good programming practice. The procedure
`beef_up` essentially behaves as `bestow`, except, being a procedure, it
may be called from any procedure, including those in other modules. So,
it breaks the abstraction we just presented; it implements something
akin to a "generic setter method," providing direct access to a data
member that is ordinarily private.

### Object-oriented Programming ###

We'll now try to show that this interaction between type qualifiers and
modules produces something resembling object-oriented programming by
adding three language features to Dieter: a declaration form and two new
types.

-   Module-level variables are variables that are instantiated on, well,
    the level of the module. (This is in contrast to local variables in
    procedures, which I haven't said anything about, but which I've
    nevertheless assumed exist in Dieter...) Module-level variables
    can't be accessed directly from outside the module; they're
    comparable to `private static` fields in Java.
-   `ref` is a built-in type. These aren't references *to* anything;
    each is just a unique value, much like values of `ref` type in the
    language Erlang. A program can obtain a new `ref` value by calling
    `new_ref()`; the value thus obtained is guaranteed to be different
    from all other `ref` values in the program.
-   `map` is a built-in type constructor, over range (key) and domain
    (value) types, that defines a data structure that acts like a
    dictionary. The usual square-bracket syntax `x[y]` is used to
    reference a location within a map.

We can now construct a class of objects like so:

    module person

      var name_map: map from person ref to string
      var age_map: map from person ref to int

      procedure person_new(name: string, age: int): person ref
        var p: person ref
      begin
        p := bestow person new_ref()
        name_map[p] := name
        age_map[p] := age
        return p
      end

      procedure person_get_name(p: person ref): string
      begin
        return name_map[p]
      end

      procedure person_attend_birthday_party(p: person ref): void
      begin
        age_map[p] := succ(age_map[p])
      end

    end

Because the type qualifier `person` is defined by the module `person`,
which only uses `bestow` in one place, we know exactly what kind of
expressions can result in a type qualified by `person`. More to the
point, we know that no other procedure in any other module can create a
`person`-qualified type without calling `person_new`.

### Mixins ###

If we loosen the constraints on `map`s, and say that the range (key)
type can be left unspecified and that values of all types can be used as
keys in any given `map`, then we have have a way to make type qualifiers
work like mixins:

    module tagged

      var tag_map : map to string

      procedure make_tagged(x: ♥t): tagged ♥t
      begin
        return (bestow tagged x)
      end

      procedure tag(x: tagged ♥t, y: string): void
      begin
        tag_map[x] := y
      end

      procedure get_tag(x: tagged ♥t): string
      begin
        return tag_map[x]
      end

    end

I think this demonstrates an underrated principle of OO, namely
*identity*. If I call `tag(make_tagged(4), "mine")`, do *all*
occurrences of 4 have that tag, or just the ephemeral instance of 4
passed to `tag`? If there can't be seperate instances of 4, that might
be a reason for OO languages to not treat it as an object. And if you
believe seperate instances of 4 are silly, that's an argument against
"everything is an object".

### Inheritance ###

Since module-level variables are private to each module, they give
Dieter something akin to traditional encapsulation. But what about
inheritance?

Well, we can get something like inheritance if we give Dieter another
feature. We haven't said much about scope of names so far; in
particular, we haven't said what happens when one declares two different
procedures with the same name and number of parameters, and what, if
anything, should happen when client code tries to call such ambiguously
declared procedures.

Hey, in the spirit of compromise, let's just say that when this happens,
*all* those procedures are called! Specifically, all the procedures
whose formal parameter types are compatible with the types of the actual
parameters are called — procedures with parameters of completely
different types, or of more highly qualified types, are not called.

And, to keep this interesting, let's say that the order in which these
procedures are called depends on the generality of their parameter
types. If `grind(beefy ♥t):void` and `grind(beefy gnarly ♥t):void` are
both defined, and `grind` is called with a variable whose type is
qualified with both `beefy` and `gnarly`, then `grind(beefy ♥t):void`
(the more general) should be called first, then
`grind(beefy gnarly ♥t):void` (the more specific) called second.

There are a few issues now that need to be dealt with.

-   We've discussed compatibility of parameter types of a procedure, but
    not its return type. For simplicity, let's just disallow differing
    return types on different procedures with the same name — it's a
    compile-time error.
-   It would be nice if procedures had a way to capture the results of a
    procedure that was called before them in this calling chain. So,
    let's say that every procedure has access to an implicit read-only
    variable that holds the result of the previous procedure (if any)
    that was executed along this chain. The type of this variable will
    always be the same as the return type of the current procedure
    (because by the previous bullet point, the previous procedure must
    have had the same return type as the current one.) If no previous
    procedure with this signature was executed, the value will be `nil`.
    For syntax, let's call this implicit variable `super`.
-   It would also be nice if a procedure could prevent any further
    procedures from being called in this chain. Let's give Dieter a
    special form of `return` for this purpose — say, `return final`.
-   Finally, what about incomparable signatures? Say that, in the above
    example, procedures `grind(gnarly ♥t):void` and `grind(♥t):void` are
    defined too. Then, when `grind` is called with a `beefy gnarly`
    variable, `grind(♥t):void` gets called first (it's the most
    general), but which gets called next: `grind(gnarly ♥t):void` or
    `grind(beefy ♥t):void`? We can let the programmer give some sort of
    disambiguating assertion, like `order beefy < gnarly`, to enforce a
    partial ordering between type qualifiers. Then we know that the more
    general procedure — in this case `grind(gnarly ♥t):void` — will be
    called before `grind(beefy ♥t):void`, because we just noted that,
    all other things being equal, we want `gnarly` to be treated as more
    general than `beefy`.

Now: do you think it's a coincidence that the last problem looks similar
to the problems that come from multiple inheritance, where we don't
quite know what we should be inheriting when two of our superclasses are
descendants of the same class? Well, I can tell you this much: it's
definately *not* a coincidence that I chose `super` and `final` for the
names of the other two features!

This whole thing looks to me very much as if we were approaching
object-oriented programming from another direction. Which might not be
such a surprise, if one thinks of subclassing as a special form of type
qualification.

Of course, it's not *quite* the same in Dieter as it is in most OO
languages. For example, procedures in Dieter do not actually override
those with more general (less qualified) signatures, they simply add to
them. But, those more specific procedures could be written to ignore the
results of, and undo the state changes of, the more general procedures
in the chain that were called first, which would accomplish essentially
the same thing.

Background
----------

Why did I design this language, anyway?

### Hungarian Notation ###

The origins of the central idea in Dieter — **encapsulate type
qualifiers in modules that define them** — was an indirect result of
reading [Joel Spolsky's explanation of the value of Hungarian
notation](http://www.joelonsoftware.com/articles/Wrong.html). He
contrasts the original notion of Hungarian notation with the
cargo-cultist convention that it somehow degenerated into. He explains
how it helps make incorrect programs look incorrect.

I thought, why stop there? If the purpose of Hungarian notation is to
make evident assignment errors between variables that have the same type
but different *roles*, then why can't you have the compiler type-check
the roles, too? Well, you can, and that's basically what Dieter is doing
with type qualifiers — using them as a computer-checkable form of
Hungarian notation.

### Aliasing Prevention ###

What further spurred development of the idea was the problem of
aliasing. Aliasing is where some part of a program is dealing with two
references which might or might not point to the same thing —
importantly, you can't make guarantees for the sake of safety (or
optimization) that they *don't* point to the same thing. So you have to
assume that they might.

The contribution of type qualifiers to this is that in some situations
you might be able to give the two references two different type
qualifiers, even though they are basically the same type, thus
guaranteeing that they don't in fact refer to the same value.

There's a significant problem with this, though: it still doesn't give
you a hard-and-fast guarantee that no aliasing is occurring, because the
module that defines a modifier can still do whatever it likes with it
internally. It gives you only a sort of "module-level guarantee"; only
if you trust the individual modules involved to not be aliasing values
of these types, can you be sure the values won't be aliased "in the
large".

In addition, it's far from certain that there are lots of cases where
this would *in practice* support some genuine non-trivial beneficial
code pattern while preventing aliasing. It could be that all examples
where this works are quite contrived. I suppose that if I were to do
further work on Dieter, it would be to try to discover whether this is
really the case or not.

### Related work ###

Type qualifiers have been around informally for a long time, probably
almost as long as there have been types. Here's [Dennis Ritchie ranting
against certain type qualifiers proposed for ANSI
C](http://www.lysator.liu.se/c/dmr-on-noalias.html). (One of which,
coincidentally, was intended to deal with aliasing, albiet quite
myopically.)

Mark P Jones has written [a
paper](http://web.cecs.pdx.edu/~mpj/pubs/esop92.html) and [a
thesis-turned-book](http://web.cecs.pdx.edu/~mpj/pubs/thesis.html)
describing a theory of qualified types. Dieter can be seen as a concrete
instance of Jones's theory (as can many other type systems — it's a very
general theory), although I have not explicated it as such here, as the
resulting article would likely have been much less accessible.

Haskell's type classes (another type system easily seen as a concrete
instance of qualified types) are very similar to Dieter's type
qualifiers. However, in Haskell, every type either belongs to or does
not belong to some class: every `Int` is an `Eq Ord`, due to the
mathematical properties of `Int`s. In Dieter, every type can potentially
be modified with every qualifier: there can be `int`s, `eq int`s,
`ord int`s, and `eq ord int`s, all slightly different.

On the other end of the spectrum, solidly in the domain of application
rather than theory, [CQUAL](http://www.cs.umd.edu/~jfoster/cqual/) is an
extension of C which adds user-defined type qualifiers. CQUAL is
primarily concerned with inferring type qualifiers where there are none,
and is not concerned with encapsulating qualifiers within modules.

Conclusion
----------

I hope you found Dieter to be an entertaining little diversion through
type-qualifier-land (and that you did not expect too much more, or I
imagine you'll have been somewhat disappointed.)

Although there is no reference implementation of Dieter (and it's
unlikely that there could be without some more specified semantics,)
there is a reference parser and type-checker (not at all guaranteed to
be bug-free) written in Python of all things.

Appendix A
----------

### Polymorphic Typing with Type Qualifiers ###

While Dieter does not support full-blown type inference — all variables
must be explicitly notated with their type — it does support uniform
polymorphic typing using type variables, and it uses the core mechanism
of type inference, namely unification, to give those variables concrete
types at each usage instance.

In place of a concrete type, a type variable may be given. Like a
concrete type, this type variable can possess any number of type
qualifiers. During each instance of the thing being typed (for example,
each call to a procedure,) the type variables are resolved into concrete
types by unification.

The presence of type qualifiers makes this process more complicated.

The first thing to note is that unification during inference is no
longer commutative — it is no longer the case that, if A unifies with B,
then B necessarily unifies with A. As mentioned, a procedure with a
formal parameter of type `gnarly int` will not accept (i.e. is not
compatible with) an actual parameter with type simply `int`. But the
reverse situation is acceptable — we think of the `gnarly` modifier
being 'dropped'. The convention we will use when describing this
non-commutative unification is to call the formal parameter (or variable
being assigned to) the receptor and write it on the left, and call the
actual parameter (or expression being assigned) the provider and write
it on the right.

The cardinal rule, which applies to every primitive type expression,
including those with variables, is that **the qualifiers on the provider
must be a superset of the qualifiers on the receptor**.

As during the conventional algorithm, an unbound type variable in either
type expression will unify with (take on the binding of) the type
subexpression that corresponds with it in the other type expression. In
addition, here it will also take on the type qualifiers of that
subexpresion, if it can. This leaves us with the question of which ones,
and when.

If the variable is in the receptor position, it might as well unify with
*all* of the type qualifiers on the provider, because those must be a
superset of the receptor's in order to unify at all, and because down
the road, they can always be 'dropped' from the variable, should it be
used as a provider.

If the variable is in the provider position, the type in receptor
position can't possibly have any more qualifiers than the variable — or
it wouldn't be able to unify in the first place. So the variable might
as well unify with the whole expression in the receptor position.

If both positions contain variables, the provider's variable should be
bound to the receptor's type expression, because it will be the one that
is more general.

The other thing to note that differs from conventional type inference is
that **a type variable, once bound to a qualified type, may be
*re-bound* to a less qualified type**.

### Examples ###

Module names aren't really important for this, so they're omitted. We'll
assume the following intrinsics are available:

    forward and(bool, bool): bool
    forward equal(♥t, ♥t): bool
    forward print(string): void

#### Example 1 ####

    procedure thing(): void
      var i, j: int
      var s, t: string
    begin
      if and(equal(i, j), equal(s, t)) then print("yes") else print("no")
    end

Should typecheck successfully, because the two calls to `equal` are two
seperate instances of the type ♥t, but the two parameters inside the
type signature of `equal` are the same instance.

#### Example 2 ####

    forward glunt(beefy gnarly ♥t): gnarly ♥t
    ...
    procedure thing(): void
      var i: beefy gnarly int
    begin
      if equal(glunt(i), 4) then print("yes") else print("no")
    end

Should typecheck successfully. The call to `glunt` returns a
`gnarly int`. Midway through typechecking the call to `equal`, we obtain
{♥t → `gnarly int`}. But when we typecheck the second argument we see
that it's simply an `int`. We *re-bind* the variable to obtain {♥t →
`int`}. This is OK with respect to the first argument — we just consider
the `gnarly` to be dropped.

#### Example 3 ####

    forward traub(beefy gnarly ♥t): bool
    ...
    procedure thing(p: beefy ♥s): ♥s
    begin
      if traub(p) then print("yes") else print("no")
      return p
    end

Should *not* typecheck. The call to `traub` needs something qualified
with both `beefy` and `gnarly`. `beefy ♥s` will fail to unify with it.

Grammar
-------

    Dieter    ::= {Module | Ordering | Forward} ".".
    Ordering  ::= "order" Name/qual "<" Name/qual.
    Module    ::= "module" Name/qual {"var" VarDecl} {ProcDecl} "end".
    Forward   ::= "forward" Name/proc "(" [Type {"," Type}] ")" ":" Type.
    VarDecl   ::= Name/var ":" Type.
    ProcDecl  ::= "procedure" Name/proc "(" [VarDecl {"," VarDecl}] ")" ":" Type {"var" VarDecl} Statement.
    Statement ::= "begin" {Statement} "end"
                | "if" Expr "then" Statement ["else" Statement]
                | "while" Expr "do" Statement
                | Name/var ["[" Expr "]"] ":=" Expr
                | Name/proc "(" [Expr {"," Expr}] ")"
                | "return" ["final"] Expr
                .
    Expr      ::= Name/var ["[" Expr "]"]
                | Name/proc "(" [Expr {"," Expr}] ")"
                | "(" Expr ")"
                | "bestow" Qualifier Expr
                | "super"
                .
    Type      ::= {Name/qual} BareType.
    BareType  ::= "map" ["from" Type] "to" Type
                | "♥" Name/tvar
                | "bool" | "int" | "rat" | "string" | "ref"
                .