git @ Cat's Eye Technologies relwrite / master
Turn `complete` into a strategy that must be explicitly selected. Chris Pressey 2 months ago
3 changed file(s) with 54 addition(s) and 35 deletion(s). Raw diff Collapse all Expand all
00 `relwrite`
11 ==========
22
3 `relwrite` relates strings to string via a grammar in the Chomsky hierarchy.
3 `relwrite` relates strings to strings via a grammar in the Chomsky hierarchy.
44
55 What does "relate" mean in this context?
66
1919 best on small inputs.
2020
2121 There are, however, features intended to improve performance in the case of very
22 long derivations. Specifying a search strategy enables a **beam search** algorithm
22 long derivations. Search strategies can be used to enable a **beam search** algorithm
2323 which aggressively focuses on derivations with a desired propery, e.g. a particular
2424 minimum length. This does sacrifice completeness however -- only a handful of all
2525 the possible results will be returned.
2929
3030 ### Example usage
3131
32 Generate a string from a non-terminal in a grammar:
32 Generate a string from a starting non-terminal in a grammar:
3333
3434 ```
35 ./bin/relwrite eg/recursive-grammar.json --start "<Sentence>" --max-derivations=1
35 ./bin/relwrite complete eg/recursive-grammar.json \
36 --start "<Sentence>" --max-derivations=1
3637 ```
3738
3839 Parse a string w.r.t. a grammar:
3940
4041 ```
41 ./bin/relwrite eg/recursive-grammar.json --parse --start "a penguin sees a penguin then sees a dog"
42 ./bin/relwrite complete eg/recursive-grammar.json \
43 --parse --start "a penguin sees a penguin then sees a dog" \
44 --goal "<Sentence>"
4245 ```
4346
44 Generate a really long string from a non-terminal in a grammar, without running out
45 of memory and only taking a few hours of processor time:
47 Use the `complete` strategy to generate all possible strings from a
48 starting non-terminal in a grammar. NOTE that this can use unreasonable
49 amounts of resources, with possibly adverse effects on your system.
4650
4751 ```
48 ./bin/relwrite eg/recursive-grammar.json --start "<Sentence>" \
49 --max-derivations=1 --strategy=expand --expand-until=3000 \
52 ./bin/relwrite complete eg/sample-grammar.json --start "<Sentence>"
53 ```
54
55 Use the `expand` strategy to generate a really long string from a non-terminal
56 in a grammar, without running out of memory and only taking a few hours of
57 processor time:
58
59 ```
60 ./bin/relwrite expand eg/recursive-grammar.json \
61 --start "<Sentence>" --max-derivations=1 --expand-until=3000 \
5062 --output-file=out.json
5163 ```
5264
53 Parse a really long string from a non-terminal in a grammar, without running out
54 of memory and only taking a few hours of processor time. This assumes the string
55 to be parsed is in JSON format in the file `xyz.json`.
65 Use the `contract` strategy to parse a really long string from a non-terminal
66 in a grammar, without running out of memory and only taking a few hours of
67 processor time. This assumes the string to be parsed is in JSON format in
68 the file `out.json` -- the generation example above would produce this.
5669
5770 ```
58 ./bin/relwrite eg/recursive-grammar.json --parse --start-set-file=xyz.json --max-derivations=1 --strategy=contract
71 ./bin/relwrite contract eg/recursive-grammar.json \
72 --parse --start-set-file=out.json
5973 ```
6074
6175 ### Detailed usage
6882 `relwrite` uses the term "derivation" as a generic term meaning "a parse or a generated utterance".
6983 It also uses the term "utterance" to mean "any string of terminals and non-terminals".
7084
71 ### TODO (immediate)
72
73 * Turn `complete` into a strategy that must be explicitly selected.
74
75 ### TODO (aspirational)
85 ### TODO
7686
7787 Analyze the input grammar and classify it in the Chomsky hierarchy.
7888
4040 def derive(
4141 rules,
4242 working_utterances,
43 strategy,
4344 max_derivations=None,
4445 max_matches=None,
4546 verbose=False,
4647 save_snapshots_every=None,
47 strategy=None,
4848 expand_until=None,
4949 beam_width=10
5050 ):
5454 iter = 0
5555
5656 scoring_functions = {
57 'complete': None,
5758 'expand': lambda u: 0 - len(u),
5859 'contract': lambda u: len(u),
5960 'minimize-nonterminals': lambda u: sum(map(lambda s: s.startswith('<'), u)),
8485 working_utterances, final_utterances = generate(rules, working_utterances, max_matches=max_matches)
8586
8687 # beam search: sort by score and trim before continuing
87 if strategy:
88 working_utterances = sorted(working_utterances, key=scoring_functions[strategy])[:beam_width]
88 scoring_function = scoring_functions[strategy]
89 if scoring_function:
90 working_utterances = sorted(working_utterances, key=scoring_function)[:beam_width]
8991
9092 for utterance in final_utterances:
9193 collected_utterances.append(utterance)
77 def main(args):
88 argparser = ArgumentParser()
99
10 # NOTE: these options are provisional and will change
10 # NOTE: these options are somewhat provisional and may change
11
12 # Strategy
13
14 argparser.add_argument(
15 "strategy", metavar='STRATEGY', type=str,
16 help="apply this strategy to the search; must be one of "
17 "`complete`, `expand`, or `contract`. NOTE: while "
18 "`complete` will find all parses, it will also use "
19 "the most resources, with possibly adverse effects "
20 "on your system; the other strategies use "
21 "beam search to avoid this"
22 )
1123
1224 # Input/output specifying parameters
1325
2537
2638 argparser.add_argument(
2739 "--parse", action="store_true", default=False,
28 help="Process rules from right to left"
40 help="process rules from right to left"
2941 )
3042
3143 argparser.add_argument(
3244 "--start", metavar='UTTERANCE', type=str, default=None,
33 help="A single utterance to use as "
45 help="a single utterance to use as "
3446 "the starting point of the derivation"
3547 )
3648 argparser.add_argument(
3749 "--start-set-file", metavar='FILENAME', type=str, default=None,
38 help="Use the set of utterances in this JSON file as "
50 help="use the set of utterances in this JSON file as "
3951 "the starting point of the derivation"
4052 )
4153 argparser.add_argument(
4254 "--goal", metavar='UTTERANCE', type=str, default=None,
43 help="A single utterance; if given, the processor expects it "
55 help="a single utterance; if given, the processor expects it "
4456 "to be the final result of the derivation; if it is not, "
4557 "exits with a non-zero error code"
4658 )
4759
4860 argparser.add_argument(
4961 "--max-derivations", metavar='COUNT', type=int, default=None,
50 help="The maximum number of derivations to produce "
62 help="the maximum number of derivations to produce "
5163 "(default: no limit)"
5264 )
5365 argparser.add_argument(
5466 "--max-rewrites-per-utterance", metavar='COUNT', type=int, default=None,
55 help="If given, limits the number of times a pattern can rewrite "
67 help="if given, limits the number of times a pattern can rewrite "
5668 "any particular utterance during a single sweep "
5769 "(default: no limit, unless beam search is applied, in which case 10)"
5870 )
5971
60 argparser.add_argument(
61 "--strategy", metavar='STRATEGY', type=str, default=None,
62 help="Will apply a particular strategy (`expand` or `contract`) "
63 "under beam search"
64 )
6572 argparser.add_argument(
6673 "--beam-width", metavar='SIZE', type=int, default=10,
6774 help="When traversing with a strategy, specify the beam width "
107114 working_utterances = []
108115
109116 max_matches = options.max_rewrites_per_utterance
110 if options.strategy:
117 if options.strategy != 'complete':
111118 max_matches = max_matches or 10
112119
113120 result = derive(
114121 rules,
115122 working_utterances,
123 options.strategy,
116124 max_derivations=options.max_derivations,
117125 max_matches=max_matches,
118126 verbose=options.verbose,
119127 save_snapshots_every=options.save_snapshots_every,
120 strategy=options.strategy,
121128 expand_until=options.expand_until,
122129 beam_width=options.beam_width,
123130 )