README.md @master — view markup · raw · history · blame
wikimedia-illustrations
Hypothesis
We hypothesize that if we download some random public-domain images from Wikimedia Commons and inject them randomly into a text, it'll make just about any text look more interesting.
Apparatus
- Python 2.7.6 (probably works with older versions too)
- requests
- BeautifulSoup
- Pillow (it might work with PIL too)
- ImageMagick
- some kind of input text (uses lorem ipsum for now)
Method
- Get URLs for all images from all pages of a Wikimedia Commons category,
such as
PD_Gutenberg
orPD-Art_(PD-Japan)
, and write that list of URLs to an index file. - Select n images randomly from that index and download them.
- Convert them to PNGs and resize any that are wider than 400 pixels downward
- Inject those images as illustrations in a given text.
Observations
NOTE 1: to stay (IMO) well within Wikimedia's Terms of use, this script sleeps for 8 seconds after making any major HTTP request.
NOTE 2: just because an image is categorized as public domain on Wikimedia Commons does not mean it is necessarily in the public domain. It's always a good idea to double-check.
$ ./wikimedia-illustrations.py mkindex "PD-Art_(PD-Japan)"
http://commons.wikimedia.org/wiki/Category:PD-Art_(PD-Japan)
http://commons.wikimedia.org//w/index.php?title=Category:PD-Art_(PD-Japan)&filefrom=KitawakiI+Rioanji.jpg#mw-category-media
grabbed 2 category index pages
$ mkdir art
$ ./wikimedia-illustrations.py random 4 art/ Wikimedia-Commons-Category-index-PD-Art_\(PD-Japan\).txt
http://commons.wikimedia.org/wiki/File:Kawanabe_Kyosai_Renshishi2.jpg
http://upload.wikimedia.org/wikipedia/commons/1/10/Kawanabe_Kyosai_Renshishi2.jpg --> art/Kawanabe_Kyosai_Renshishi2.jpg
[...]
$ ristretto art/
This is all pretty crazy and a piece of lab equipment should really be broken off of it.
But anyway, this is what the end result looked like (for me), using illustrations taken from the PD-Gutenberg category. It should give you an idea of what to expect.