git @ Cat's Eye Technologies Fountain / master doc / Desugaring-Repetition.md
master

Tree @master (Download .tar.gz)

Desugaring-Repetition.md @masterview markup · raw · history · blame

Desugaring Repetition in Fountain

The repetition construct { ... } defines a loop in Fountain. We would like to regard it as syntactic sugar for recursion. However, due to the dual nature of Fountain, the loop semantics differ slightly between parsing and generation, while those of recursion do not. This article attempts to grapple with the problem of a general method by which loops can be replaced by recursion in Fountain.

If the production looks like this:

Foo<n> ::= "a" <. i = 0 .> { "b" <. i += 1 .> } <. i = n .> "c";

As a first step, we can extract the loop into its own production. Note that we need to thread all the relevant parameters down through it:

Foo<n> ::= "a" <. i = 0 .> LoopFoo1<n, i> "c";
LoopFoo1<n, i> ::= { "b" <. i += 1 .> } <. i = n .>;

As the second step, we replace the loop with an alternation with two branches, one of which ends with recursing to the loop's production, the other of which contains the termination condition of the loop but consumes/generates no tokens:

Foo<n> ::= "a" <. i = 0 .> LoopFoo1<n, i> "c";
LoopFoo1<n, i> ::=
      "b" <. i += 1 .> LoopFoo1<n, i>
    | <. i = n .>
    ;

The problem is that, while the second branch will always have a condition associated with it for the purposes of efficient selection (it's the termination condition of the loop), the first branch doesn't necessarily.

Well, to give it one, we could formulate the negation of the 2nd branch's condition, or we could permit an else-type condition in this situation. Like:

LoopFoo1<n, i> ::=
      <. i = n .>
    | <. else .> "b" <. i += 1 .> LoopFoo1<n, i>
    ;

The else however suggests that loops are in fact hiding a sort of ordered choice operation, like the / operator in PEG grammars. We would like to avoid exposing such an operator in Fountain, if possible.

Notes

We lose something if the termination condition is just "what follows the loop". anbncn example no longer works to parse an unknown n, because it unifies a with n, and how do you negate this?

Here is an expanded version of the anbncn example to illustrate. This works for parsing and generation, when n is specified. Generation is not currently expected to be possible when n is not specified, but parsing is. But this cannot parse with unspecified n. It unifies n with 0 and parses nothing.

Goal<n> ::= <. a = 0 .> Loop0<a, n>
            <. b = 0 .> Loop1<b, n>
            <. c = 0 .> Loop2<c, n>;
Loop0<a, n> ::= <. ~a = n .> "a" <. a += 1 .> Loop0<a, n>
              | <. a = n .>;
Loop1<b, n> ::= <. ~b = n .> "b" <. b += 1 .> Loop1<b, n>
              | <. b = n .>;
Loop2<c, n> ::= <. ~c = n .> "c" <. c += 1 .> Loop2<c, n>
              | <. c = n .>;

From the code, here is the actual algorithm for parsing a loop:

(a) Parse the loop body. If it was successful, adopt the new state and go back to (a). If it failed, return the current state unchanged (successfully).

In grammar format:

Foo ::= <. preconditions .> LoopBody Foo
      | <. ~preconditions .>
      ;

From the code, here is the actual algorithm for generating from a loop:

(a) Generate from the loop body. If it failed, fail too. If it was successful, adopt the new state and check if all the postconditions (constraints following the loop) are true (satisfiable). If so, return successfully. If not, go back to (a).

In grammar format:

Foo      ::= LoopBody LoopTail.
LoopTail ::= <. postconditions .>
           | <. ~postconditions .> Foo
           ;

How can we merge these?

Foo      ::= <. preconditions .> LoopBody LoopTail
           | <. ~preconditions .>
           ;
LoopTail ::= <. postconditions .>
           | <. ~postconditions .> Foo
           ;

How can we formulate a recursive branching pattern that captures the same behaviour, for both parsing and generation? Keeping in mind that branching should be deterministic (there is the possibility of extending this to nondeterminism in future but this is too complex to think about right now).

What we have right now looks like this, for parsing:

(a) Select the applicable choice based on the preconditions: either a does not unify with n or it does. Parse that branch. If the first branch, go back to (a)

What we have right now looks like this, for generation:

(a) Select the applicable choice based on the preconditions: either a does not unify with n or it does. Generate from that branch. If the first branch, go back to (a)

One thing to try to fix this, is to leverage the lookahead constraints that are implicit in a loop body which begins with a terminal (which many of them will). Termination condition (that is, the preconditions for the loop-terminating branch) will then be the negation of the pre- conditions of the loop body. What is at beginning of loop might be a terminal. As a constraint, this would be a lookahead. We would negate that.

However, during generation, a lookahead constraint is meaningless and always succeeds. So it might not help us much. In the following

Loop2<c, n> ::= <. look "c" .> "c" <. c += 1 .> Loop2<c, n>
              | <. ~look "c" .> <. c = n .>;

we can choose between the two branches when parsing, but not when generating, ~look "c" will never be true. So what do we do?

We might need to ignore lookahead more strongly in generation?

Being pessimistic here but I don't think there is any way to get around this, i.e. to support one of the postconditions being a unification (rather than a rigid test); because this relies on the current implementation of loop as incorporating a sort of ordered alternation (like PEG) where one side is always tried first, and the other (containing the unification) is only tried if the first side failed.