Update some documentation.
Chris Pressey
10 years ago
26 | 26 |
|
27 | 27 |
$ ./guten-gutter pg18613.txt > The_Golden_Scorpion.txt
|
28 | 28 |
|
|
29 |
History
|
|
30 |
-------
|
|
31 |
|
|
32 |
Originally, many of the experiments in this repository were importing
|
|
33 |
gutenizer's `gutenberg` module directly. Most have been updated to assume
|
|
34 |
that the input is plain text that has been, at your option, pre-cleaned by
|
|
35 |
a tool of your choice. The only exception is [quick-and-dirty-markov](../quick-and-dirty-markov),
|
|
36 |
which was a "race against the clock" and it doesn't feel right to clean it
|
|
37 |
up after the fact.
|
|
38 |
|
29 | 39 |
Future work
|
30 | 40 |
-----------
|
31 | 41 |
|
32 | |
Some texts on which this currently fails are:
|
|
42 |
Some texts on which the guten-gutter currently fails are:
|
33 | 43 |
|
34 | 44 |
* Princess of Mars (no "produced" line)
|
35 | 45 |
* Around the world in 80 days
|
|
47 | 57 |
I'm sure there are other Gutenberg texts for which this fails. Whence these
|
48 | 58 |
are found, this script's regular expressions should be adapted to match those
|
49 | 59 |
lines.
|
50 | |
|
51 | |
Really, the experiments in this repository should *not* be relying themselves
|
52 | |
on the `gutenberg.py` module, or any cleaner; they should take a pre-cleaned
|
53 | |
text file as input.
|
46 | 46 |
|
47 | 47 |
Indeed.
|
48 | 48 |
|
|
49 |
Related work
|
|
50 |
------------
|
|
51 |
|
|
52 |
This Python script has been translated to Javascript and has been made
|
|
53 |
available online here: [Text Uniquifier](http://catseye.tc/installation/Text_Uniquifier).
|
|
54 |
The Javascript version supports more options than this version, including
|
|
55 |
retaining paragraph or line breaks in the output, and treating words
|
|
56 |
case- and punctuation-insensitively.
|
|
57 |
|
49 | 58 |
Future work
|
50 | 59 |
-----------
|
51 | 60 |
|
52 | |
Maybe clean the words of punctuation too.
|
53 | |
|
54 | |
Make it work backwards -- only output a word if it does not occur further
|
|
61 |
Allow it to work backwards -- only output a word if it does not occur further
|
55 | 62 |
on in the text. (Reverse words, uniquify, reverse again.)
|
56 | |
|
57 | |
Write a version in Javascript so that it can be used in someone's web browser.
|