Merge branch 'master' into develop-0.4
Chris Pressey
1 year, 6 months ago
210 | 210 | |
211 | 211 | The fact that every CSL can be captured by an LBA suggests maybe we |
212 | 212 | could just linear-bound the amount of storage used by a Fountain |
213 | grammar in the worst case. Maybe; I haven't thought much about | |
214 | this yet. My initial impression is that it seems a bit artificial. | |
215 | ||
216 | Thinking about it a bit more, to make this work for both parsing | |
217 | and generation, what we really need to do is to show the context | |
218 | and the string (whether it be the input for parsing or the result | |
219 | of generation) are related in size linearly -- that one is never | |
220 | more than _k_ times bigger than the other, where _k_ is a constant | |
221 | for the grammar). It might be feasible to do that in simple cases. | |
222 | My intuition at the moment is that it surely breaks down at some | |
223 | point, but it's not immediately clear where that point begins. | |
213 | grammar in the worst case. | |
214 | ||
215 | Having thought about it, this is probably the way to go. When | |
216 | processing (parsing or generating) a Fountain grammar, the user ought | |
217 | to be able to specify a "fuel efficiency" _E_, which is the linear bound. | |
218 | ||
219 | (Whether this is specified in the source file, or through some other | |
220 | means like a command-line option, is immaterial for the present | |
221 | purposes. Presumbly though, it's omission doesn't stop us from | |
222 | processing the grammar, we simply don't make the check in this case) | |
223 | ||
224 | Each time a character is consumed from the input (resp. generated to | |
225 | the output), _E_ units of "fuel" are gained. Each time a new unit of | |
226 | storage is allocated for storing the context used by the grammar, | |
227 | one unit of "fuel" is expended. Expending more fuel than has been | |
228 | accumulated so far results in some kind of warning or error condition | |
229 | (the salient thing being that the user is made aware that this grammar | |
230 | exceeds the linear bound.) | |
231 | ||
232 | It should be noted that the integers are unbounded, so an | |
233 | operation like `a += 1`, may or may not allocate a new unit of | |
234 | storage (a machine word, say), so the usage needs to be recalculated | |
235 | afterwards. | |
236 | ||
237 | Freeing up storage does not allow the grammar to reclaim "fuel". | |
238 | ||
239 | This check could probably be done statically, using some kind of | |
240 | abstract interpretation; but it would also be possible (and probably | |
241 | a lot easier) to add it as a dynamic check while processing the | |
242 | grammar. | |
224 | 243 | |
225 | 244 | #### Does all this talk of complexity classes even mean anything? |
226 | 245 | |
315 | 334 | When generating from a grammar, we often want to take a "random sample" |
316 | 335 | of the space of utterances that the grammar defines. There are methods |
317 | 336 | that have been developed to do this; not just for grammars, but any |
318 | recursive description of a structure; for example [Boltzmann Samplers][]. | |
337 | recursive description of a structure; for example [Boltzmann Samplers][] | |
338 | (PDF). | |
319 | 339 | |
320 | 340 | We should probably go in this direction. |
321 | 341 | |
364 | 384 | [Exanoke]: https://catseye.tc/node/Exanoke |
365 | 385 | [Tamsin]: https://catseye.tc/node/Tamsin |
366 | 386 | [Tandem]: https://catseye.tc/node/Tandem |
367 | [Boltzmann Samplers]: https://github.com/cpressey/Some-Papers-I-Really-Liked#boltzmann-samplers-for-the-random-generation-of-combinatorial-structures | |
368 | [ambinate.py]: https://gist.github.com/cpressey/dd3f63eda91b33e429fa | |
387 | [Boltzmann Samplers]: https://algo.inria.fr/flajolet/Publications/DuFlLoSc04.pdf | |
388 | [ambinate.py]: https://codeberg.org/catseye/Dipple/src/branch/master/python/ambinate.py |