Tree @master (Download .tar.gz)
Parser combinator libraries are easy to write. That's why GitHub is full of them.
I wanted to write a particularly simple one, both to expose its structure, and to have something to experiment with.
Parc.hs is so simple that you can only build a recognizer with it.
Generally, you want to do more with the input string than say whether it
is or is not in the language, yes? So, there are more extended
versions here too.
(Before getting to the extended versions, I want to highlight one thing, which is that the choice operator in Parc is ordered choice, as used in e.g. PEG parsers. This is arguably less mathematically nice than the non-deterministic choice used in e.g. context-free grammars, but it is unarguably simpler to implement.)
This lets you write an actual parser, with an actual output value.
But even so, the facility it provides for state management is rather crude. The type of the state variable is polymorphic, so the programmer can choose any type they like for it. But every parser combinator assumes the state it will take as input is the same type as the state it will produce on output. In a strictly typed language, that can be a bit limiting. Ideally, we want to write parsers that parse different types of things, then tie them together into bigger, composite parsers.
(This isn't a problem in a dynamically typed language, but in such languages you have a different problem -- namely, the possibility of a type mismatch somewhere in the composite parser which will only become apparent at runtime.)
So we have a very small modification called
In this, the
Parser type is polymorphic on two types, one for input
and one for output.
Some code that uses this can be found in
In fact, it is virtually a drop-in replacement for
ParcSt.hs, as the type
signature is simply more general.
It does illuminate the types of the combinators somewhat though;
seq has a
type that corresponds to transitivity, while
many requires that the output
type is the same as the input type, which makes sense if you think of it as
a loop that may occur any number of times -- only an output that can be fed
back into the input would make sense there.
To allow finer manipulation of the parsing state, we extend
with an extra combinator. We can't build this combinator out of existing
combinators; it needs to access the parse state directly. Also, its
definition turns out to be somewhat more complicated. So, this is where
we start leaving "fits on a page" territory.
ParcMerge.hs module extends
ParcSt2St.hs with a
merge takes a parser P and a function F which
takes two state values and returns a third state value as input; it
produces a new parser as its output. This new parser applies P, then
produces a new state by combining the state before P, and the state produced
by P, using F.
ParcMergeDemo.hs demonstrates how this
combinator can be used to apply arithmetic operators like
when parsing an arithmetic expression.
With what we have so far, if we merely accumulate state as we parse, we can parse only context-free languages.
But if we enable the parser to succeed or fail based on some predicate applied to the parsing state, we can parse context-sensitive languages.
ParcSt2St and adds a
assert that takes such a predicate on the parsing
state, and produces a parser which succeeds only when that predicate
is true on the current parsing state, failing otherwise.
The demo module gives an example of a parser for a classic
context-free language: strings of the form
This is where it starts getting markedly experimental. Feel free to stop reading now.
ParcAssert, we can parse context-sensitive languages. But
if we don't restrict ourselves to passing sufficiently simple predicates
assert, we can parse languages quite beyond the context-sensitive.
For example, we could use a predicate that checks if the
string passed to it is a valid sentence in Presburger arithmetic, or
even a valid computation trace of a Turing machine.
Formulating a set of parser combinators which is provably capable of parsing only the context-sensitive languages seems like a hard problem. One could probably create combinators that can describe any context-sensitive grammar (CSG), or an equivalent formalism such as a noncontracting grammar. But these formalisms are actually extremely inefficient ways to parse most context-sensitive languages.
We can take some small steps though.
First, we can establish a proviso that we only ever expect the functions that are passed to our higher-order combinators to do a small amount of computation. For concreteness, say polynomial time. (We could in fact assemble a library of "state manipulation combinators" and ensure those are the only ones that can be used, to enforce this.)
But that by itself isn't enough. If we allow combinators that recurse or loop without consuming input, we can still perform arbitrarily large amounts of computation.
So we get rid of
update, and instead of
use require the use of
pred, which always consumes a character before it
checks if it should succeed (possibly modifying the state) or fail.
And the result is
ParcConsume.hs, which is a lot
ParcSt2St, just with some things ripped out of it.
ParcConsumeDemo.hs shows that it can recognize
a context-sensitive language, and that the means by which it does so
relies on it consuming its input; and the lack of other means of manipulating
the state should be persuasive that it is limited in its ability to
make these calculations. Exactly how limited, though? I'm not
quite prepared to make a claim on that yet.
git clone https://git.catseye.tc/Parc/
- Add MIT License. Chris Pressey 5 months ago
- Add "see-also bar" to README. Chris Pressey 5 months ago
- Fix typo. Chris Pressey 5 months ago
- Small edits to the ParcConsume section. Chris Pressey 5 months ago
- Mention ordered vs. non-deterministic choice. Chris Pressey 5 months ago
- Add some documentation for `ParcConsume`. Chris Pressey 8 months ago
- Add `ParcConsume` module and a demo for it. Chris Pressey 8 months ago
- Add documentation for `ParcAssertDemo.hs`. Chris Pressey 8 months ago
- Add demo of an "assert" combinator, parsing CSLs. Needs docs. Chris Pressey 8 months ago
- Show how there are two approaches to backtracking with this. Chris Pressey 8 months ago