Turn `complete` into a strategy that must be explicitly selected.
Chris Pressey
2 months ago
0 | 0 | `relwrite` |
1 | 1 | ========== |
2 | 2 | |
3 | `relwrite` relates strings to string via a grammar in the Chomsky hierarchy. | |
3 | `relwrite` relates strings to strings via a grammar in the Chomsky hierarchy. | |
4 | 4 | |
5 | 5 | What does "relate" mean in this context? |
6 | 6 | |
19 | 19 | best on small inputs. |
20 | 20 | |
21 | 21 | There are, however, features intended to improve performance in the case of very |
22 | long derivations. Specifying a search strategy enables a **beam search** algorithm | |
22 | long derivations. Search strategies can be used to enable a **beam search** algorithm | |
23 | 23 | which aggressively focuses on derivations with a desired propery, e.g. a particular |
24 | 24 | minimum length. This does sacrifice completeness however -- only a handful of all |
25 | 25 | the possible results will be returned. |
29 | 29 | |
30 | 30 | ### Example usage |
31 | 31 | |
32 | Generate a string from a non-terminal in a grammar: | |
32 | Generate a string from a starting non-terminal in a grammar: | |
33 | 33 | |
34 | 34 | ``` |
35 | ./bin/relwrite eg/recursive-grammar.json --start "<Sentence>" --max-derivations=1 | |
35 | ./bin/relwrite complete eg/recursive-grammar.json \ | |
36 | --start "<Sentence>" --max-derivations=1 | |
36 | 37 | ``` |
37 | 38 | |
38 | 39 | Parse a string w.r.t. a grammar: |
39 | 40 | |
40 | 41 | ``` |
41 | ./bin/relwrite eg/recursive-grammar.json --parse --start "a penguin sees a penguin then sees a dog" | |
42 | ./bin/relwrite complete eg/recursive-grammar.json \ | |
43 | --parse --start "a penguin sees a penguin then sees a dog" \ | |
44 | --goal "<Sentence>" | |
42 | 45 | ``` |
43 | 46 | |
44 | Generate a really long string from a non-terminal in a grammar, without running out | |
45 | of memory and only taking a few hours of processor time: | |
47 | Use the `complete` strategy to generate all possible strings from a | |
48 | starting non-terminal in a grammar. NOTE that this can use unreasonable | |
49 | amounts of resources, with possibly adverse effects on your system. | |
46 | 50 | |
47 | 51 | ``` |
48 | ./bin/relwrite eg/recursive-grammar.json --start "<Sentence>" \ | |
49 | --max-derivations=1 --strategy=expand --expand-until=3000 \ | |
52 | ./bin/relwrite complete eg/sample-grammar.json --start "<Sentence>" | |
53 | ``` | |
54 | ||
55 | Use the `expand` strategy to generate a really long string from a non-terminal | |
56 | in a grammar, without running out of memory and only taking a few hours of | |
57 | processor time: | |
58 | ||
59 | ``` | |
60 | ./bin/relwrite expand eg/recursive-grammar.json \ | |
61 | --start "<Sentence>" --max-derivations=1 --expand-until=3000 \ | |
50 | 62 | --output-file=out.json |
51 | 63 | ``` |
52 | 64 | |
53 | Parse a really long string from a non-terminal in a grammar, without running out | |
54 | of memory and only taking a few hours of processor time. This assumes the string | |
55 | to be parsed is in JSON format in the file `xyz.json`. | |
65 | Use the `contract` strategy to parse a really long string from a non-terminal | |
66 | in a grammar, without running out of memory and only taking a few hours of | |
67 | processor time. This assumes the string to be parsed is in JSON format in | |
68 | the file `out.json` -- the generation example above would produce this. | |
56 | 69 | |
57 | 70 | ``` |
58 | ./bin/relwrite eg/recursive-grammar.json --parse --start-set-file=xyz.json --max-derivations=1 --strategy=contract | |
71 | ./bin/relwrite contract eg/recursive-grammar.json \ | |
72 | --parse --start-set-file=out.json | |
59 | 73 | ``` |
60 | 74 | |
61 | 75 | ### Detailed usage |
68 | 82 | `relwrite` uses the term "derivation" as a generic term meaning "a parse or a generated utterance". |
69 | 83 | It also uses the term "utterance" to mean "any string of terminals and non-terminals". |
70 | 84 | |
71 | ### TODO (immediate) | |
72 | ||
73 | * Turn `complete` into a strategy that must be explicitly selected. | |
74 | ||
75 | ### TODO (aspirational) | |
85 | ### TODO | |
76 | 86 | |
77 | 87 | Analyze the input grammar and classify it in the Chomsky hierarchy. |
78 | 88 |
40 | 40 | def derive( |
41 | 41 | rules, |
42 | 42 | working_utterances, |
43 | strategy, | |
43 | 44 | max_derivations=None, |
44 | 45 | max_matches=None, |
45 | 46 | verbose=False, |
46 | 47 | save_snapshots_every=None, |
47 | strategy=None, | |
48 | 48 | expand_until=None, |
49 | 49 | beam_width=10 |
50 | 50 | ): |
54 | 54 | iter = 0 |
55 | 55 | |
56 | 56 | scoring_functions = { |
57 | 'complete': None, | |
57 | 58 | 'expand': lambda u: 0 - len(u), |
58 | 59 | 'contract': lambda u: len(u), |
59 | 60 | 'minimize-nonterminals': lambda u: sum(map(lambda s: s.startswith('<'), u)), |
84 | 85 | working_utterances, final_utterances = generate(rules, working_utterances, max_matches=max_matches) |
85 | 86 | |
86 | 87 | # beam search: sort by score and trim before continuing |
87 | if strategy: | |
88 | working_utterances = sorted(working_utterances, key=scoring_functions[strategy])[:beam_width] | |
88 | scoring_function = scoring_functions[strategy] | |
89 | if scoring_function: | |
90 | working_utterances = sorted(working_utterances, key=scoring_function)[:beam_width] | |
89 | 91 | |
90 | 92 | for utterance in final_utterances: |
91 | 93 | collected_utterances.append(utterance) |
7 | 7 | def main(args): |
8 | 8 | argparser = ArgumentParser() |
9 | 9 | |
10 | # NOTE: these options are provisional and will change | |
10 | # NOTE: these options are somewhat provisional and may change | |
11 | ||
12 | # Strategy | |
13 | ||
14 | argparser.add_argument( | |
15 | "strategy", metavar='STRATEGY', type=str, | |
16 | help="apply this strategy to the search; must be one of " | |
17 | "`complete`, `expand`, or `contract`. NOTE: while " | |
18 | "`complete` will find all parses, it will also use " | |
19 | "the most resources, with possibly adverse effects " | |
20 | "on your system; the other strategies use " | |
21 | "beam search to avoid this" | |
22 | ) | |
11 | 23 | |
12 | 24 | # Input/output specifying parameters |
13 | 25 | |
25 | 37 | |
26 | 38 | argparser.add_argument( |
27 | 39 | "--parse", action="store_true", default=False, |
28 | help="Process rules from right to left" | |
40 | help="process rules from right to left" | |
29 | 41 | ) |
30 | 42 | |
31 | 43 | argparser.add_argument( |
32 | 44 | "--start", metavar='UTTERANCE', type=str, default=None, |
33 | help="A single utterance to use as " | |
45 | help="a single utterance to use as " | |
34 | 46 | "the starting point of the derivation" |
35 | 47 | ) |
36 | 48 | argparser.add_argument( |
37 | 49 | "--start-set-file", metavar='FILENAME', type=str, default=None, |
38 | help="Use the set of utterances in this JSON file as " | |
50 | help="use the set of utterances in this JSON file as " | |
39 | 51 | "the starting point of the derivation" |
40 | 52 | ) |
41 | 53 | argparser.add_argument( |
42 | 54 | "--goal", metavar='UTTERANCE', type=str, default=None, |
43 | help="A single utterance; if given, the processor expects it " | |
55 | help="a single utterance; if given, the processor expects it " | |
44 | 56 | "to be the final result of the derivation; if it is not, " |
45 | 57 | "exits with a non-zero error code" |
46 | 58 | ) |
47 | 59 | |
48 | 60 | argparser.add_argument( |
49 | 61 | "--max-derivations", metavar='COUNT', type=int, default=None, |
50 | help="The maximum number of derivations to produce " | |
62 | help="the maximum number of derivations to produce " | |
51 | 63 | "(default: no limit)" |
52 | 64 | ) |
53 | 65 | argparser.add_argument( |
54 | 66 | "--max-rewrites-per-utterance", metavar='COUNT', type=int, default=None, |
55 | help="If given, limits the number of times a pattern can rewrite " | |
67 | help="if given, limits the number of times a pattern can rewrite " | |
56 | 68 | "any particular utterance during a single sweep " |
57 | 69 | "(default: no limit, unless beam search is applied, in which case 10)" |
58 | 70 | ) |
59 | 71 | |
60 | argparser.add_argument( | |
61 | "--strategy", metavar='STRATEGY', type=str, default=None, | |
62 | help="Will apply a particular strategy (`expand` or `contract`) " | |
63 | "under beam search" | |
64 | ) | |
65 | 72 | argparser.add_argument( |
66 | 73 | "--beam-width", metavar='SIZE', type=int, default=10, |
67 | 74 | help="When traversing with a strategy, specify the beam width " |
107 | 114 | working_utterances = [] |
108 | 115 | |
109 | 116 | max_matches = options.max_rewrites_per_utterance |
110 | if options.strategy: | |
117 | if options.strategy != 'complete': | |
111 | 118 | max_matches = max_matches or 10 |
112 | 119 | |
113 | 120 | result = derive( |
114 | 121 | rules, |
115 | 122 | working_utterances, |
123 | options.strategy, | |
116 | 124 | max_derivations=options.max_derivations, |
117 | 125 | max_matches=max_matches, |
118 | 126 | verbose=options.verbose, |
119 | 127 | save_snapshots_every=options.save_snapshots_every, |
120 | strategy=options.strategy, | |
121 | 128 | expand_until=options.expand_until, |
122 | 129 | beam_width=options.beam_width, |
123 | 130 | ) |