Tree @brython-experiment (Download .tar.gz)
Vinegar
Version 0.1 | See also: Tandem ∘ Tamsin
Overview
Vinegar is a semi-concatenative language where every operation can fail.
Whence "semi-concatenative"?
Well, a concatenative language has a single binary operator with which one can compose larger operations out of smaller operations, and that operator is represented by no explicit symbol but rather just the juxtaposition (concatenation) of operations.
The operations themselves are often functions that take stacks to stacks, and the binary operator usually works rather a lot like function composition.
And since it's the only binary operator, and it's associative, there are no parentheses, because there is no need for them.
Vinegar, in comparison, also has operations that work on stacks, and
an implicit binary operator that works a lot like function composition.
But it has another binary operator too, called alternation. This one
is not implicit -- it's notated |
and has a lower precedence than
concatenation -- and parentheses are available to group expressions.
This is not strictly speaking a concatenative language anymore, so we call it "semi-concatenative".
(But later on we'll consider a variation on how things are arranged here that can make it all look "a bit more concatenative" again.)
Whence "every operation can fail"?
This should be fairly intuitive -- when it is not possible for an operation to produce a sensible, expected result, it has failed.
Division is the classic example, where division by zero is undefined, and thus has failed. But even addition can fail, if you consider overflow of the machine integer size to be a failure.
And in a dynamically-typed stack-based language, such as concatenative languages often are, any operation could potentially underflow the stack. Which would be, y'know. A failure.
In Vinegar, every operation can fail. When faced with operations that seem like they always succeed, we sometimes contrive them so that there are ways that they can fail. And in cases where we cannot be bothered to do that, we just pretend there is a theoretical possibility that they could fail.
The relevant question is, what happens when an operation fails?
Well, that's what |
is for.
The next section attempts to describe the semantics of concatenation and alternation in more detail.
Concatenation and alternation
The rules of the concatenation operator are:
If, in a b
, a
fails, then b
is not performed, and a b
fails with the error result produced by a
failing.
If, in a b
, a
succeeds but b
fails, then a b
fails with the error result produced by b
failing.
If, in a b
, a
succeeds, then b
succeeds, then a b
succeeds with the succesful result of b
(which built
on the successful result of a
).
The rules for the alternation operator are:
If, in a | b
, a
succeeds, then b
is not performed, and
a | b
succeeds with the result successfully produced by a
.
If, in a | b
, a
fails, then b
is performed, and the
result of a | b
is the result of b
.
If, in a | b
, a
fails, then b
fails, then a | b
fails with the error result produced by b
failing.
It's worth noting that both concatenation and alternation are associative:
(a b) c = a (b c) = a b c
and
(a | b) | c = a | (b | c) = a | b | c
Similar things
It's all a bit like MonadPlus
in Haskell, with concatenation
being a lot like >=>
, although I really had Either
more
in mind than Maybe
, but Either
isn't an instance of
MonadPlus
for technical reasons, and I don't understand
monads anyway I'm sure. I'd explain further, but it's strictly
taboo -- I might start babbling about burritos, you see.
Probably a more familiar thing that it's similar to is the
Bourne shell. If a
and b
are executables, then a && b
executes a
and checks the error code. If the error code is
non-zero, it exits with that exit code, otherwise it executes
b
and exits with its exit code. Alternately, a || b
executes a
and checks the error code. If the error code is
zero, it exits with that exit code, otherwise it executes
b
and exits with its exit code.
The practical upshot of all this
With the alternation operator, we can implement exception handling.
In something like a b c | d
, the d
will be executed if a
fails, or if b
fails, or if c
fails. Failing is like "throw"
and |
is like "catch".
But here is another twist. We can also use alternation to
implement plain old conditional execution. Instead of checking if
an expression equals some value, we assert that it does
equal some value. If we happen to be wrong, then that is a failure.
We handle it on the RHS of a |
just like we would handle any
other failure. Rather than putting it in an else
clause.
As a bonus, a chain like a | b | c | d
is a terse
substitute for a chain of elsif
s.
So what is the actual difference between these two language features?
Well, failure happens for a reason. Sometimes you have enough
information to predict exactly what that reason would be, so you
don't really care about it, and your language construct doesn't
provide it (if
, Maybe
). Other times, you don't have enough
information to predict it, so you do want your language construct
to provide it, so that you can work with it (catch
, Either
).
Here we observe that, if we can pick only one of these alternatives, it's better to be provided with the reason for failure than to be not provided with it, because we can't obtain it otherwise, and if we really don't want it, we can always throw it out.
So in Vinegar, the RHS of an alternation always begins executing with the failure value that caused the RHS to execute pushed onto the top of the stack.
Example: Factorial in Vinegar
As an example, let's try to write a factorial function in Vinegar.
Well, first, let's get some preliminaries out of the way. Up 'til now we've been fairly vague about the actual language. Let's pin down some concrete syntax.
-> Tests for functionality "Execute Vinegar Program"
-> Functionality "Execute Vinegar Program" is implemented by
-> shell command
-> "python3 bin/vinegar <%(test-body-file)"
Each definition is on its own line, which is terminated by
a semicolon. The result of executing a program, is the result
of executing main
. The form int[n]
where n is a literal
integer in decimal notation, pushes n onto the stack.
main = other;
other = int[3];
==> OK([3])
Why, you may ask, is there all this square brackets and stuff
around simple literal values? Ah! That's so that literals can
fail! If it's not possible to parse the contents of the
[...]
part into a valid constant value, it has failed!
main = int[lEEt];
==> Failure(invalid literal for int() with base 10: 'lEEt')
There is a built-in operation to swap the top two values on the stack.
main = int[100] int[200] swap;
==> OK([200, 100])
There is a built-in operation to pop the topmost value off the stack and discard it.
main = int[40] int[50] pop int[60];
==> OK([40, 60])
There are some usual arithmetic operations too.
main = int[4] int[5] mul int[6] sub;
==> OK([14])
If there are not enough values on the stack for an operation, it fails with underflow.
main = swap;
==> Failure(underflow)
There is a built-in operation to pop the topmost two values and assert that they are equal.
main = int[5] int[5] eq!;
==> OK([])
main = int[5] int[8] eq!;
==> Failure(unequal)
There is a built-in operation to pop the topmost two values and assert that the second is greater than the first.
main = int[5] int[5] gt!;
==> Failure(not greater than)
main = int[5] int[8] gt!;
==> Failure(not greater than)
main = int[8] int[5] gt!;
==> OK([])
OK, now let's try to write a factorial function in Vinegar.
That factorial function in full
fact = dup int[1] gt! dup int[1] sub fact mul | pop;
main = int[5] fact;
==> OK([120])
What we have here is:
Take the first "argument" to the fact
function (with dup
)
and assert (with gt!
) that it is greater than 1. Then take that
argument again (dup
it again) and subtract one from it,
get the fact
of that value, and multiply the argument
(itself - no dup
this time) by that result. And leave that
on the stack, as the result value.
If any operation there failed, we just do nothing (we take
the failure value off the stack with pop
and discard it.)
So for example, if our assertion that the argument was
greater than 1 failed, it will just leave the argument on
the stack, as the result value.
That's not actually fantastic. What if mul
failed? In
that case we probably don't want to return whatever working
garbage happens to be on the stack as our result. Rather,
we want to fail too. So maybe we can write this more
pointifically.
fact = dup int[1] eq! | pop dup int[1] sub fact mul;
main = int[5] fact;
==> OK([120])
Now, we take the argument and assert that it is 1. If
it is, we just return it. If not, we compute factorial
on it, and return that. If anything in our factorial
computation fails, fact
itself fails, which is desirable.
Of course, this still breaks down if we're passed an argument that is zero or negative. What's the factorial of such a number? Let's say, for the sake of argument, it's considered an error. We can add that as an assertion.
fact = dup int[0] gt! (dup int[1] eq! | pop dup int[1] sub fact mul);
main = int[5] fact;
==> OK([120])
If it's 0 or negative, the gt!
assertion fails, and
the whole thing fails. If it's 1, the eq!
assertion
doesn't fail, and 1 is returned. If it's greater than
1, the eq!
assertion fails and factorial is computed.
Note that we do need parentheses here so that when the
gt!
fails, all of fact
fails, instead of it falling
back to the operation on the RHS of the |
.
A bit more concatenative
Fans of concatenative languages (concatenativophiles? concatenativesters? concatenativeniks?) will no doubt be disappointed at the appearance of an explicit binary operator, and even parentheses (ugh), in this language.
But if we take some frightful measures, we may put some distance between us and these pedestrian things.
If we want our binary operator to be notated by mere juxtaposition, we obviously cannot have more than one binary operator. Yet here, in this language, we have two.
We surmount this obstacle by noting that, although we have two binary operators in the language, we can restrict each definition to having only one kind of operator, and have multiple kinds of definitions. In some definitions, concatenation is implicit; in other definitions, alternation is implicit.
To be precise: A definition in which concatenation is implicit
is introduced with =&
. A definition in which alternation is
implicit is introduced with =|
.
Armed with these facts, we can rewrite the above definition of
fact
without infix operators and without parentheses
using multiple definitions, in this way:
fact =& fac1 fac2;
fac1 =& dup int[0] gt!;
fac2 =| fac3 fac4;
fac3 =& dup int[1] eq!;
fac4 =& pop dup int[1] sub fact mul;
main =& int[5] fact;
==> OK([120])
There! Are you happy now? Don't that just beat all?
In light of this stellar feature it is expected that serious
programmers would treat the plain =
form of definition as
a kind of wimpmode and shun it.
Further work
What we have here is probably sufficient for a version 0.1 release of the language -- it demonstrates most of the things I wanted to demonstrate with this idea. Still, it leaves a lot to be desired.
In particular, what is a Failure object? In the current implementation it contains a string, which is supposed to be the reason that the failure happened. Certainly, one could compare two Failures for equality, based on this failure message. But how useful or interesting is that, actually? So it is not implemented.
I think what a Failure should be is, not just a message, but a representation of the program state when the failure occured. Should it be the entire program state? I'm not sure.
Certainly, real-life exceptions usually include a backtrace, which lists all the calls that were in effect when the failure occurred.
If a Failure were that, then a Failure would be a kind of list. And there hasn't been much thought into what kinds of values can actually exist on the Vinegar stack. Can a list exist there? There is a certain urge to disallow that in the name of simplifying memory management -- the Forth school, sort of. But, having lists on the stack, and being able to manipulate them, would be very powerful; and having them unavoidably be Failures as well, would be rather esoteric.
Can you raise
a Failure once you have one on the stack? You ought
to, but it raises a vaguely philosophical question: since raise
can
fail (all operations can fail in Vinegar), how do you know when raise
succeeded? How do you know that it actually re-raised a
Failure(underflow)
it had on the stack, versus that there was nothing
on the stack and it actually underflowed? Maybe you don't, maybe you
give up on that epistemological question.
If a Failure represents the state of the program when it failed, is it a continuation? Can you continue it where it left off, with something changed, to try to make it avoid the failure this time? Why, wouldn't this be a neat way to implement backtracking search? If you could work out how to make it work neatly, I mean.
Commit History
@brython-experiment
git clone https://git.catseye.tc/Vinegar/
- A not-very-successful experiment with Brython. Chris Pressey 1 year, 3 months ago
- Add 3-clause BSD license. Chris Pressey 3 years ago
- Christen this "version 0.1". Chris Pressey (commit: GitHub) 3 years ago
- Brain dump on where this could go, if it were to keep going. Chris Pressey 3 years ago
- Support "a bit more concatenative" syntax. Chris Pressey 3 years ago
- Update example programs to conform to current syntax and semantics. Chris Pressey 3 years ago
- Push failure onto stack in `|`. Explain the justification for it. Chris Pressey 3 years ago
- Allow `!` at end of word, fix parser, consistentify examples/tests. Chris Pressey 3 years ago
- Put the literate tests inline in the language description. Chris Pressey 3 years ago
- Refactor into modules. Add test suite, test driver script. Chris Pressey 3 years ago