Needs more science, I said!
Chris Pressey
10 years ago
0 | 0 |
ending-concordance
|
1 | 1 |
==================
|
2 | 2 |
|
3 | |
Requirements
|
4 | |
------------
|
|
3 |
Hypothesis
|
|
4 |
----------
|
|
5 |
|
|
6 |
We hypothesize that words in English can be roughly categorized using a
|
|
7 |
characteristic as simple as the pair of letters that they end with, and
|
|
8 |
that this can be exploited to form sentences which look almost plausible.
|
|
9 |
|
|
10 |
Apparatus
|
|
11 |
---------
|
5 | 12 |
|
6 | 13 |
* Python 2.7.6 (probably works with older versions too)
|
7 | 14 |
* The `gutenberg.py` module from [gutenizer](https://github.com/okfn/gutenizer/)
|
8 | 15 |
* A bunch of Project Gutenberg texts in plain text format
|
9 | 16 |
|
10 | |
Basic Strategy
|
11 | |
--------------
|
|
17 |
Method
|
|
18 |
------
|
12 | 19 |
|
13 | 20 |
* Read in all the words.
|
14 | 21 |
* Index the words based on the final two letters in each word that is four or
|
15 | 22 |
more letters long.
|
16 | 23 |
* Write out words randomly chosen from two alternating end-two-letters groups.
|
17 | 24 |
|
18 | |
Sample Output
|
19 | |
-------------
|
|
25 |
Observations
|
|
26 |
------------
|
20 | 27 |
|
21 | 28 |
When run on _Principles of Scientific Management_, I got:
|
22 | 29 |
|
0 | 0 |
evaporating-text
|
1 | 1 |
================
|
2 | 2 |
|
3 | |
Requirements
|
4 | |
------------
|
|
3 |
Hypothesis
|
|
4 |
----------
|
|
5 |
|
|
6 |
We hypothesize that a novel, under the right circumstances, can evaporate.
|
|
7 |
|
|
8 |
Apparatus
|
|
9 |
---------
|
5 | 10 |
|
6 | 11 |
* Python 2.7.6 (probably works with older versions too)
|
7 | 12 |
* The `gutenberg.py` module from [gutenizer](https://github.com/okfn/gutenizer/)
|
8 | 13 |
* An input text (possibly from Project Gutenberg)
|
9 | 14 |
|
10 | |
Basic Strategy
|
11 | |
--------------
|
|
15 |
Method
|
|
16 |
------
|
12 | 17 |
|
13 | 18 |
* Collect all the sentences and count them: _s_ is the number of sentences.
|
14 | 19 |
* In each sentence, erase words. The probability of a word being erased
|
15 | 20 |
is _n_/_s_ where _n_ is the sentence number;.the first sentence is
|
16 | 21 |
numbered 0.
|
17 | 22 |
|
18 | |
Sample Output
|
19 | |
-------------
|
|
23 |
Observations
|
|
24 |
------------
|
20 | 25 |
|
21 | 26 |
When run on Voltaire's "Candide": at the beginning...
|
22 | 27 |
|
0 | 0 |
infix-neologisms
|
1 | 1 |
================
|
2 | 2 |
|
3 | |
Requirements
|
4 | |
------------
|
|
3 |
Hypothesis
|
|
4 |
----------
|
|
5 |
|
|
6 |
We hypothesize that new words can be formed from existing words by splitting
|
|
7 |
them open and sticking a word inside.
|
|
8 |
|
|
9 |
Apparatus
|
|
10 |
---------
|
5 | 11 |
|
6 | 12 |
* Python 2.7.6 (probably works with older versions too)
|
7 | 13 |
* A set of input words
|
8 | 14 |
|
9 | |
Basic Strategy
|
10 | |
--------------
|
|
15 |
Method
|
|
16 |
------
|
11 | 17 |
|
12 | 18 |
* Pick an input word.
|
13 | 19 |
* Split it into two parts. Pick another input word and insert it in
|
|
15 | 21 |
* Possibly repeat step #2.
|
16 | 22 |
* Output the word and repeat from step #1.
|
17 | 23 |
|
18 | |
Sample Output
|
19 | |
-------------
|
|
24 |
Observations
|
|
25 |
------------
|
20 | 26 |
|
21 | 27 |
Running it on `../generic-corpora/containers.txt` which I just threw together
|
22 | 28 |
after a few internet searches, you might get
|
0 | 0 |
naive-cut-up
|
1 | 1 |
============
|
2 | 2 |
|
3 | |
Requirements
|
4 | |
------------
|
|
3 |
Hypothesis
|
|
4 |
----------
|
|
5 |
|
|
6 |
We hypothesize that if we cut up a newspaper. We also hypothesize that
|
|
7 |
we cut up a newspaper.
|
|
8 |
|
|
9 |
Apparatus
|
|
10 |
---------
|
5 | 11 |
|
6 | 12 |
* Python 2.7.6 (probably works with older versions too)
|
7 | 13 |
* [Pillow](http://python-pillow.github.io/) (it might work with PIL too)
|
8 | 14 |
* Some scanned images of newspapers, books, etc., in PNG format, for example
|
9 | 15 |
obtained by [fetch-chronam](../fetch-chronam/)
|
10 | 16 |
|
11 | |
Basic Strategy
|
12 | |
--------------
|
|
17 |
Method
|
|
18 |
------
|
13 | 19 |
|
14 | 20 |
* Start with "blank" canvas. For simplicity, we actually use one of the
|
15 | 21 |
input images as the "canvas".
|
|
17 | 23 |
* Copy the image within the rectangle to a random location on the canvas.
|
18 | 24 |
* Repeat from step 2 until we guess we've covered the canvas.
|
19 | 25 |
|
20 | |
Usage
|
21 | |
-----
|
|
26 |
### Detailed procedure ###
|
22 | 27 |
|
23 | 28 |
First, we assume some PNGs of scanned newspaper pages involving some topic
|
24 | 29 |
(in this example, cheese) have been obtained. (PNG format is probably not
|
|
59 | 64 |
|
60 | 65 |
(You may wish to use a less clumsy image viewer than Ristretto, yourself.)
|
61 | 66 |
|
62 | |
Sample Output
|
63 | |
-------------
|
|
67 |
Observations
|
|
68 |
------------
|
64 | 69 |
|
65 | 70 |
It may be difficult to tell in this scaled-down sample, but the result was
|
66 | 71 |
surprisingly thematic in its reference to cheese:
|