git @ Cat's Eye Technologies Sally / master doc / Sally.markdown
master

Tree @master (Download .tar.gz)

Sally.markdown @masterview rendered · raw · history · blame

Sally, a Strongly-Typed Language
================================

Version 1.0, Revision 2011.1231.

What is Sally?
--------------

Sally is basically FORTH, except:

- All functions are declared with fixed range and domain types;
- Strong type checking is applied to arguments and results;
- User-defined types can be introduced into the checking scheme; and
- Forward, instead of reverse, Polish notation is used.

Sally is also an exercise in the reduction of _excise_ (redundant,
extraneous, or sugary syntax).

For example, there is, unlike FORTH, no special symbol to indicate
the end of a function definition.  (A new definition starts whenever
the previous one ends.)

The ANSI C source code for the compiler is very small, less than
16K in total.  The compiler translates Sally programs to ANSI C.

Just to be silly I've scattered exercises for the reader around the
documentation.  I don't actually expect anyone to try any of them,
but they kind of indicate what I'd like to see in a "Sally++" if
there ever was such a beast.

Syntax
------

A Sally program consists of a series of function declarations.
Each declaration consists of the name and type of a new function
followed by a series of function applications and primitives.

The domain types are listed before the function name; the range
types, after.  For example, a function called `foo` which maps an
integer and a character to an integer, would be notated as

    int char foo int ...

The '...' represents the function applications which comprise the
body of the function.

The syntax of each application is dictated by the types given in
the function's definition.  For example, a correct syntax for
applying the `foo` function above might be

    foo 42 'X

(Note the lack of a need for parentheses.)  This application of
`foo` would be appropriate anywhere a previous application requires
an integer argument, or where a definition requires an integer
result.

Functions can accept multiple arguments and return multiple results.
A function which returns multiple results can pass those results as
a 'chunk' to the previous function application.  For example, the
following would return the double of the given argument `$1`:

    add dup $1

And this would probably appear in a function definition like so:

    int double int add dup $1

The type `void` indicates lack of any arguments or results.  When
it indicates the lack of arguments, it must be specified explicitly
(to indicate to the parser that the previous definition has ended.)
When used to indicate the lack of results, it can be implicitly
inferred by the fact that there are no types following the function
name.

EXERCISE:  See if you can find a way to change the parser so that
`void` is always implicit, even when it's used to indicate the
lack of arguments to a function.  Be prepared to deal with a token
which is an undeclared symbol in a mature way.

Functions which Accept and Return Functions
-------------------------------------------

A function can be passed to another function.  In order for a
function to be defined which accepts a function passed to it, a
_prototype_ of the type of function to be passed must be defined.
This prototype is then preceded in the type declaration by the
`like` operator.

For example, say we want a function which accepts an integer
and a function which maps an integer to another integer, and
returns an integer.  To do this, first we establish the prototype
function which is included in the arguments to the other function:

    int map int proto

(This kind of definition, a "proto", also functions like a "forward"
declaration in Pascal or an "extern" declaration in C.)

We then use the `like` operator in specifying the definition of the
other function:

    int like map other int ...

The function `other` now takes an integer and a function 'like map'
(that is, a function which has the same domain and range types as
the function called `map`, even if the body of `map` is not actually
defined) and returns an integer.

Even so, how does an application pass a function to another function?
You can't just name the function, because that indicates that you want
to apply it and use the return value.  You must instead "reference" or
"escape" the function by preceding it with `the`.  So to apply the
`other` function above to a function such as `negate`, one would write

    other 5 the negate

Assuming `negate` is declared with the same range and domain types as
`map`, this should pass the function `negate` without trying to apply
it (presumably leaving that up to the `other` function).

Speaking of which, there is one last issue to cover.  Once a function
like `other` has a reference to a function like `map` passed to it,
how does `other` use the passed function?

The answer is the `do` keyword.  `do` is followed by the number of
an argument.  It is not unlike $1, but it does not simply evaluate to
the argument, it actually calls it.  When this is the case, the
argument better be a function (all that strong typing stuff applies.)

EXERCISE:  See if you can extend the `the` operator to provide
lambda functions.  Use the following as a syntax guideline for how
one would specify a lambda function:

    other 5 the int -> int + $1 7

Remember, there's no special "End Function Definition" symbol!

EXERCISE:  Extend `do` to handle `the` functions directly.  Example:

    do the monkey

EXERCISE:  Finally, extend `like` to likewise handle `the` functions
directly.  Example:

    int like the int -> int proto other int ...

Type Checking and Casting
-------------------------

A named type can defined in Sally by defining it like you would define
a function, but with no range type, and with no definition, instead the
token `type`, as in:

    int semaphore type

In Sally, two types are equivalent if:

- Both are named and their names are the same; or
- Neither is named and they are structurally equivalent.

(By named, I of course mean, named by the user, not intrinsic to the
language.  Even though `int` is technically a name, it's unnamed for
our purposes here.)

Values can be cast from one type to another.  This is invaluable for
any structured data access in Sally.

No conversions are done during type casting.  Types can only be
cast to other types of the same size (number of elements -- you can
cast `char` to `int` but not to `int int` or `void`.)

Typecasts are performed with the `as` operator, as in

    as char 7

to represent a bell.

EXERCISE:  Implement type variables, so that one could define
functions like

    `1 `2 swap `2 `1 $2 $1

...which would be applicable no matter what types were passed to and
received from `swap`, as long as the types were consistent (all
`` `1 ``'s are the same type and all `` `2 ``'s are the same type, for any
given application of `swap`.)

There are several ways one may want to attempt this.  One is by
using the process of _unification_ to make sure all type variables
are used consistently.  Adding this to Sally may be excessively
tricky because of the way it does type checking.  With a stack of
types, there may be a more efficient algorithm for replacing type
variables with instances, and subsequently checking them for
consistency.

Libraries, Side Effects, and `main`
-----------------------------------

Libraries can be included before the first function definition in a
program.  There is no special "include" token, the library is simply
named, and if a file named that (plus the extension `.sal`) is located,
it is (virtually) inserted into that point in the program.

For example, many of the sample programs begin with the token
`stdlib`.  This loads the file `stdlib.sal` into that point in the
program, introducing a handful of function prototypes.

Libraries are also how Sally deals with side effects.  The philosophy
is that if the programmer wants to disturb the purity of the functional
paradigm by introducing side effects, they may do so, but they will be
made clearly aware of the fact.

Having said that, Sally can only perform side-effect communications
by including the library called `sidefxio` -- thus reminding the
programmer that there are side effect communications at work in the
following program.

Without including this library Sally is limited to batch
communications.

In both schemes, the function called `main` is the function which is
applied when the entire program is run.  `main` may have any number
of arguments and any number of return values, although they may only
be of `int` type.  For each argument, a value is required as a
command-line argument when the program is run; for each result, the
resultant value is displayed at the end of the program.

This little communications scheme is minimal, and limited, but it
does not introduce any side effects in any way, and is capable of
communicating any Turing-solvable problem, so it has some things
going for it.  To get around the limitations, however, you have to
resort to side effect I/O, where you can use the `print` and
`input` functions to report and obtain information from the user
while the program is running.

EXERCISE:  Loosen the constraints on the type of the `main`
function - allow arguments and return values of type `char`.

EBNF Grammar
------------

    Program     ::= {Definition}.
    Definition  ::= {Library} {Type} Name {Type}
                    ("type" | "proto" | {Application}).
    Type        ::= "int" | "char" | "like" Name | Name.
    Application ::= Primitive
                  | "$" Number
                  | (Name | "do" Number) {Instr}
                  | "as" {Type} Instr
                  | "if" Instr Instr Instr
                  | "the" Name.
    Primitive   ::= <<an integer notated in decimal like 123>>
                  | <<a character preceded by a single quote like 'A>>.

Change Log
----------

2000.0226

- Original release.
  
2003.1104

- Added GNU Makefile.
- Added `bin/` and `lib/` subdirectories.
- Split `sally.c` into `sally.c` and `runtime.c`.
- Added `libsally` library.

2010.0428

- Made compilable with `-ansi -pedantic`.
- Made documentation conform to XHTML 1.0 Strict.

2011.1231

- Converted language document into Markdown format.
- Merged change log into main document.