Some more notes.
Chris Pressey
6 years ago
4 | 4 | |
5 | 5 | Yet another script to archive stuff off teh internets. |
6 | 6 | |
7 | It's not a spider that automatically crawls previously undiscovered pages — it's intended | |
8 | to be run by a human to make backups of pages they have already read and recorded. | |
7 | It's not a spider that automatically crawls previously undiscovered webpages — it's intended | |
8 | to be run by a human to make backups of resources they have already seen and recorded the URLs of. | |
9 | 9 | |
10 | 10 | It was split off from [Feedmark][], which doesn't itself need to support this function. |
11 | 11 | |
79 | 79 | * Archive youtube links with youtube-dl. |
80 | 80 | * Handle failures (redirects, etc) better (detect 503 / "connection refused" better.) |
81 | 81 | * Allow use of an external tool like `wget` or `curl` to do fetching. |
82 | * Multiple `--article-roots`. | |
82 | 83 | |
83 | 84 | [Feedmark]: http://catseye.tc/node/Feedmark |