Tree @master (Download .tar.gz)

Fountain

Version 0.7

Cat's Eye Technologies' Fountain is a work-in-progress grammar formalism capable of expressing context-sensitive languages (CSLs), and supporting both efficient parsing and efficient generation of strings conforming to those languages.

It does this by allowing semantic constraints to be inserted between the elements of a production rule. To support efficient generation, these interspersed semantic constraints can be analyzed to determine a usable deterministic strategy for generation.

Here is an example Fountain grammar which expresses the classic CSL a^n b^n c^n:

Goal<n> ::=
    <. a = 0 .> { "a" <. a += 1 .> } <. a = n .>
    <. b = 0 .> { "b" <. b += 1 .> } <. b = n .>
    <. c = 0 .> { "c" <. c += 1 .> } <. c = n .>
    ;

During parsing based on this grammar, the variable n will be, like the others, initially undefined. The first time a = n is encountered, a will be unified with n, and will take on its value. When b = n is later encountered, unification of b with n will take place; if b is some value other than n, the parse will fail.

% echo -n "aaabbbccc" | ./bin/fountain parse eg/anbncn.fountain --
Success

In comparison, during generation, we assume the variable n has already been assigned a value, as part of the (externally supplied) input to the generation process. In addition, the repetition construct { "a" <. a += 1 .> } can "see" the a = n constraint that follows it; it will be checked on each iteration, and the repetition will terminate when it is true.

% ./bin/fountain generate eg/anbncn.fountain n=5
aaaaabbbbbccccc

Neither of the above processes involve backtracking; the string is parsed and generated in linear time. Note, however, that while Fountain supports deterministic operation, it does not enforce it. It is possible to write Fountain grammars that lead to backtracking search, or even infinite loops during generation. How best to handle these cases remains an open line of inquiry.

For a more definitive description of the Fountain language, see doc/Definition-of-Fountain.md.

For insight into the design choices underlying Fountain, see doc/Design-of-Fountain.md. This includes several interesting questions that the design of Fountain raises, such as:

Can't a Definite Clause Grammar (DCG) do what Fountain does?
Doesn't a Context-Sensitive Grammar (CSG) do what Fountain does?
Isn't Fountain really a programming language in disguise?
How can it be ensured that Fountain can express only the CSLs?
Should we think of constraints as relational operators?
Why would we want to support local variables?
How can parameter passing be implemented?
How can we apply randomness during generation?
How should parameters with different data types be handled?

TODO

Semantics

Implement getting data structures (strings (terminals) and dictionaries (feature structures), at a minimum) into the context and referenceable in constraints. Handle dictionaries during unification in the way that feature structures are conventionally handled in unification.
Empty sequences are not currently handled well during parsing. This is due to having to compute the lookahead on them; we would need to look at the context that they occur in (to compute the traditional FOLLOWS set of a non-terminal) and we don't currently do that.
Unsaturated variables are currently not handled well during parsing. When a production takes a parameter, it ought to be the case that, if that parameter is unsaturated (not yet unified with a concrete value), it will not prevent parsing from proceeding, eventually to a point where a concrete value can be unified with the parameter.

Implementation

Improved pretty-printing (coalesced terminals, no unnecessary parens, etc.)

Aspirational

TLA+-like Syntax for Constraints

For constraints, we could be inspired by the syntactic convention of TLA+, where variables are "before" if they have no prime and "after" if they do have a prime. So <. a += 1 .> would be instead <: a' = a + 1 :>. This is much nicer conceptually. It could maybe even be taken further, with superscripts or subscripts on both sides, like <: aᴿ = aᴸ + 1 :>. Well, maybe.

However, there are various equivalent ways to say the same thing in this system, for example <: 1 = a' - a :> and I'm not sure if we want to support translating them all to a canonical internal syntax.

We do need something to take the edge off here, as a sort of instance of the frame problem arises when you have something like <: x' > 0 :>; this suggests that on the right side, any positive value of x -- and any values at all for all the other variables in scope -- will satisfy the relation on the RHS. This isn't useful. We probably need a convention where we say that every variable z that isn't mentioned in the constraint is implicitly constrained by z' == z. And for ones that are mentioned in the constraint, we have to say this explicitly if we want it. So we would need to say <: x' > 0 && x' == x :> for the above.

But we also have a situation where this general syntax allows us to state very general, very powerful constraints. Is this a problem? Yes, but to some extent no; constraints like <: x' > 0 && x' < 9 :> simply introduce non-determinism, and would be disallowed by default but allowed in productions where non-determinism (backtracking) is enabled through use of an annotation. Unbounded non-determinism like <: x' > 0 :> would probably still be disallowed in all circumstances though.

If we follow through with this idea, things get a little interesting, because terminals can be thought of as relations on a list of tokens; a terminal like "*" can be thought of as <: ("*":t') == t) :> when parsing and <: t' == ("*":t) :> when generating. And then, isn't a production simply made up of a sequence of relations, and isn't it itself a relation, the composition of that sequence of relations? Couldn't we have multiple lists of tokens, and parse from (or generate to) them at the same time? Does this make a mockery of the idea of a "grammar"? These are interesting questions.

Other

Once there are data structures in the context, one neat idea would be for some of those data structures to be feature structures (Wikipedia), and the unification which currently occurs (in a truncated and efficient way) when a constraint is processed would be extended to include unification of feature structures. How exactly this would work, and whether it would help for any practical purpose, remain to be seen, but it sounds like it would be an interesting experiment.
Once there are data types other than integers in the context, have two of the data types be terminals (that is, strings) and non-terminals (that is, names of productions), and allow these to be "reflectively" applied, so that the grammar can expect (or generate) a string non-terminal that it has computed at some earlier point. Even more interestingly, because non-terminals can take parameters, this leads to the possiblity of writing grammars in continuation-passing style, whatever that may be worth.
Use Fountain's own parsing facilities to parse the Fountain grammar description! It's not entirely clear to me how much of it it could handle. But it would be close to "writing Fountain in Fountain".
Report error diagnostics (i.e. what caused a failure). My concern is that this will make the structure of the implementation more cloudy.

History

See HISTORY file.

Commit History @master `git clone https://git.catseye.tc/Fountain/`

»»