git @ Cat's Eye Technologies NaNoGenLab / cc4c6b4
Update some documentation. Chris Pressey 10 years ago
2 changed file(s) with 21 addition(s) and 10 deletion(s). Raw diff Collapse all Expand all
2626
2727 $ ./guten-gutter pg18613.txt > The_Golden_Scorpion.txt
2828
29 History
30 -------
31
32 Originally, many of the experiments in this repository were importing
33 gutenizer's `gutenberg` module directly. Most have been updated to assume
34 that the input is plain text that has been, at your option, pre-cleaned by
35 a tool of your choice. The only exception is [quick-and-dirty-markov](../quick-and-dirty-markov),
36 which was a "race against the clock" and it doesn't feel right to clean it
37 up after the fact.
38
2939 Future work
3040 -----------
3141
32 Some texts on which this currently fails are:
42 Some texts on which the guten-gutter currently fails are:
3343
3444 * Princess of Mars (no "produced" line)
3545 * Around the world in 80 days
4757 I'm sure there are other Gutenberg texts for which this fails. Whence these
4858 are found, this script's regular expressions should be adapted to match those
4959 lines.
50
51 Really, the experiments in this repository should *not* be relying themselves
52 on the `gutenberg.py` module, or any cleaner; they should take a pre-cleaned
53 text file as input.
4646
4747 Indeed.
4848
49 Related work
50 ------------
51
52 This Python script has been translated to Javascript and has been made
53 available online here: [Text Uniquifier](http://catseye.tc/installation/Text_Uniquifier).
54 The Javascript version supports more options than this version, including
55 retaining paragraph or line breaks in the output, and treating words
56 case- and punctuation-insensitively.
57
4958 Future work
5059 -----------
5160
52 Maybe clean the words of punctuation too.
53
54 Make it work backwards -- only output a word if it does not occur further
61 Allow it to work backwards -- only output a word if it does not occur further
5562 on in the text. (Reverse words, uniquify, reverse again.)
56
57 Write a version in Javascript so that it can be used in someone's web browser.