Tree @master (Download .tar.gz)
Fountain
Version 0.6
Cat's Eye Technologies' Fountain is a work-in-progress grammar formalism capable of expressing context-sensitive languages (CSLs), and supporting both efficient parsing and efficient generation of strings conforming to those languages.
It does this by allowing semantic constraints to be inserted between the elements of a production rule. To support efficient generation, these interspersed semantic constraints can be analyzed to determine a usable deterministic strategy for generation.
Here is an example Fountain grammar which expresses the classic CSL
a
^n b
^n c
^n:
Goal<n> ::=
<. a = 0 .> { "a" <. a += 1 .> } <. a = n .>
<. b = 0 .> { "b" <. b += 1 .> } <. b = n .>
<. c = 0 .> { "c" <. c += 1 .> } <. c = n .>
;
During parsing based on this grammar, the variable n
will be,
like the others, initially undefined. The first time a = n
is
encountered, a
will be unified with n
, and will take on its
value. When b = n
is later encountered, unification of b
with n
will take place; if b
is some value other than n
,
the parse will fail.
% echo -n "aaabbbccc" | ./bin/fountain parse eg/anbncn.fountain --
Success
In comparison, during generation, we assume the variable n
has
already been assigned a value, as part of the (externally supplied)
input to the generation process.
In addition, the repetition construct { "a" <. a += 1 .> }
can "see"
the a = n
constraint that follows it; it will be checked on each
iteration, and the repetition will terminate when it is true.
% ./bin/fountain generate eg/anbncn.fountain n=5
aaaaabbbbbccccc
Neither of the above processes involve backtracking; the string is parsed and generated in linear time. Note, however, that while Fountain supports deterministic operation, it does not enforce it. It is possible to write Fountain grammars that lead to backtracking search, or even infinite loops during generation. How best to handle these cases remains an open line of inquiry.
For a more definitive description of the Fountain language, see doc/Definition-of-Fountain.md.
For insight into the design choices underlying Fountain, see doc/Design-of-Fountain.md. This includes several interesting questions that the design of Fountain raises, such as:
- Can't a Definite Clause Grammar (DCG) do what Fountain does?
- Doesn't a Context-Sensitive Grammar (CSG) do what Fountain does?
- Isn't Fountain really a programming language in disguise?
- How can it be ensured that Fountain can express only the CSLs?
- Should we think of constraints as relational operators?
- Why would we want to support local variables?
- How can parameter passing be implemented?
- How can we apply randomness during generation?
- How should parameters with different data types be handled?
TODO
Semantics
- Implement getting data structures (strings (terminals) and dictionaries (feature structures), at a minimum) into the context and referenceable in constraints. Handle dictionaries during unification in the way that feature structures are conventionally handled in unification.
Implementation
- Improved pretty-printing (coalesced terminals, no unnecessary parens, etc.)
Aspirational
TLA+-like Syntax for Constraints
For constraints, we could be inspired by the syntactic convention of TLA+,
where variables are "before" if they have no prime and "after" if
they do have a prime. So <. a += 1 .>
would be instead
<: a' = a + 1 :>
. This is much nicer conceptually. It could
maybe even be taken further, with superscripts or subscripts on
both sides, like <: aᴿ = aᴸ + 1 :>
. Well, maybe.
However, there are various equivalent ways to say the same thing in this
system, for example <: 1 = a' - a :>
and I'm not sure if we want
to support translating them all to a canonical internal syntax.
We do need something to take the edge off here, as a sort of
instance of the frame problem arises when you have something like
<: x' > 0 :>
; this suggests that on the right side, any
positive value of x
-- and any values at all for all the other
variables in scope -- will satisfy the relation on the RHS. This
isn't useful. We probably need a convention where we say that
every variable z
that isn't mentioned in the constraint is implicitly
constrained by z' == z
. And for ones that are mentioned in the
constraint, we have to say this explicitly if we want it. So we would
need to say <: x' > 0 && x' == x :>
for the above.
But we also have a situation where this general syntax allows us to
state very general, very powerful constraints. Is this a problem?
Yes, but to some extent no; constraints like <: x' > 0 && x' < 9 :>
simply introduce non-determinism, and would be disallowed by default
but allowed in productions where non-determinism (backtracking) is
enabled through use of an annotation. Unbounded non-determinism
like <: x' > 0 :>
would probably still be disallowed in all
circumstances though.
If we follow through with this idea, things get a little interesting,
because terminals can be thought of as relations on a list of tokens;
a terminal like "*"
can be thought of as <: ("*":t') == t) :>
when
parsing and <: t' == ("*":t) :>
when generating. And then, isn't
a production simply made up of a sequence of relations, and isn't it
itself a relation, the composition of that sequence of relations?
Couldn't we have multiple lists of tokens, and parse from (or generate to)
them at the same time? Does this make a mockery of the idea of a
"grammar"? These are interesting questions.
Other
- Once there are data structures in the context, one neat idea would be for some of those data structures to be feature structures (Wikipedia), and the unification which currently occurs (in a truncated and efficient way) when a constraint is processed would be extended to include unification of feature structures. How exactly this would work, and whether it would help for any practical purpose, remain to be seen, but it sounds like it would be an interesting experiment.
- Once there are data types other than integers in the context, have two of the data types be terminals (that is, strings) and non-terminals (that is, names of productions), and allow these to be "reflectively" applied, so that the grammar can expect (or generate) a string non-terminal that it has computed at some earlier point. Even more interestingly, because non-terminals can take parameters, this leads to the possiblity of writing grammars in continuation-passing style, whatever that may be worth.
- Allow "fuel" to be tracked, to ensure the storage is bounded linearly to the amount of input. This aims to ensure that the grammar is contained within PSPACE.
- Use Fountain's own parsing facilities to parse the Fountain grammar description! It's not entirely clear to me how much of it it could handle. But it would be close to "writing Fountain in Fountain".
- Report error diagnostics (i.e. what caused a failure). My concern is that this will make the structure of the implementation more cloudy.
History
0.6
Parameters supplied externally to the parsing or generation process must be declared on the start symbol in order for them to be visible to the grammar.
Fixed up syntax for &&
so it is now infix again, on parsing and
pretty-printing. It is now called Conj
internally.
Added support for the following constraints:
- negation (prefix
~
) of constraints; - disjunction (infix
||
) of constraints; defined
, which succeeds only when a variable is defined;- lookahead assertions;
generating
, which succeeds only when generating.
(although these should all be considered somewhat experimental, presently).
The check (during deterministic operation) that every branch of an alternation has preconditions has been dropped. Instead the user will only get the error that multiple branches of the alternation are allowable, and the parsing or generation will not proceed.
Refactored internals to attempt to make parser and generator more similar to each other.
Added speculative writings on how we might desugar loops into recursive productions.
(Version 0.6 is perhaps best viewed as the fallout of trying to desugar loops into recursive functions, realizing afresh that loops have different semantics under parsing and generation (including an implicit ordered choice) and trying to capture those different semantics in constraints so that they could be applied during recursive applications just as well.)
0.5
Backtracking may be either nondeterministic (the default, with
the (*)
qualifier on a production) or deterministic (with the
(!)
qualifier). For parsing this doesn't matter, but for
generation, deterministic backtracking follows the source code
order. The ordering of nondeterministic backtracking is undefined
by the Fountain language, but the fountain
implementation uses
uniform pseudo-random shuffling of the alternatives based on a
user-supplied seed which defaults to 0.
Variables may take on values of types other than integer. This is a work in progress. Currently, the only other type of value available is the string type. The plan is that string type values can be used as (variable) terminals. But that plan will not come to fruition in the scope of version 0.5.
0.4
Language
Fountain now distinguishes between productions that may backtrack and ones that may not, in both parsing and generation.
By default, backtracking is not permitted. Productions may be marked
with (*)
to indicate that backtracking is permitted when processing
alternations in that production.
When backtracking is not permitted, every choice in an alternation must start with a constraint. Exactly zero or one of those constraints must be satisfied in a given state. If none are, that is a failure (and if this failure happens in an enclosing context where backtracking is permitted, then backtracking will occur in that context). If more than one are, the process will abort with an error message.
It should be noted that the implementation of backtracking currently has limitations. The scope of a "choice point" that can be backtracked to is limited to the alternation expression in which it occurs; a failure that occurs after (that is, outside of) this expression will not cause a backtrack to occur. In particular, backtracking does not work at all as you might expect inside a loop. There is a way to write recursive productions in a "tail recursive" manner such that they do backtrack, but this style of writing the grammar may not be entirely natural.
Also:
Greater than or equal and less than or equal constraints.
"Both" combinator on constraints (which really needs reworking).
Implementation
The --start-symbol
option may now be passed to fountain
to
cause it to start parsing or generating at the named non-terminal.
Implementation improvements such as better flattening of the AST representing the grammar during its parsing, and pretty-printing fragments of the AST (albeit crudely) when displaying them.
0.3
Comments and characters terminals given by Unicode code point were added to the Fountain syntax.
Inc, dec, greater than and less than constraints can take a simple expression (an integer literal or a variable name) on the right-hand side.
When parsing, parameters can also be supplied from external sources.
Example Fountain grammar describing a heartwarming and context-sensitive novel about a wayward animal companion has been included in the distribution.
Distribution placed under BSD license.
0.2
0.2 refined some of the core ideas of Fountain. The Design of Fountain document (consisting primarily of design questions rather than design answers) was written. Parameter passing was added to productions. Many small improvements were made to the reference implementation.
0.1 had an arb
construct, which was intended to signal that
a variable could be computed during parsing, but was needed to
be defined ("arbitrarily") beforehand during generation. Essentially
it asserted that the variable was defined, but only during generation.
During the design work for 0.2 it was determined that it was not
necessary (this sort of signal overlaps with parameters to a
production, which signal some kind of arguments need to be supplied;
and in another sense, shouldn't need to be stated inside the grammar
because it is the concern of the client of the grammar rather than
the grammar itself), and it was removed.
0.1
0.1 was the original release of Fountain, to show proof of concept. Only global variables were supported. Efficient choice was not supported.
Commit History
@master
git clone https://git.catseye.tc/Fountain/
- Arrange licensing info in repo according to REUSE 3.2 convention. Chris Pressey 2 months ago
- Apply BSD license to the newly added notes in the documentation. Chris Pressey 6 months ago
- Fix example to use the implemented lookahead assertions. Chris Pressey 6 months ago
- Diction and typos. Chris Pressey 6 months ago
- Tests for constraints during lookahead assertions. Chris Pressey 6 months ago
- Add `generating` constraint. Chris Pressey 6 months ago
- Test cases for lookahead constraint. Chris Pressey 6 months ago
- Implement applyParseConstraint with Lookahead constraint. Chris Pressey 6 months ago
- Handle lookahead constraints (structurally; still to fill in.) Chris Pressey 6 months ago
- Add syntax for lookahead assertion. Chris Pressey 6 months ago