yastasoti
Yet another script to archive stuff off teh internets.
Was split off from Feedmark, which doesn't itself need to support this function.
Features
- input is a JSON list of objects containing links (such as those produced by Feedmark)
- output is a JSON list of objects that could not be retrieved, which can be fed back into the script as input
- checks links with
HEAD
requests by default; if--archive-links-to
is given, fetches a copy of each resource withGET
and saves it to disk - tries to be idempotent and not create a new local file if the remote file hasn't changed
- handles links that are local files; checks if the file exists locally
Planned features
- archive youtube links with youtube-dl.
- logging
- Handle failures (redirects, etc) better. Fall back to external tool like
wget
orcurl
.
Examples
Check that the links in a set of Feedmark documents all resolve:
feedmark --output-links article/*.md | yastasoti --article-root=article/ - | tee results.json
Since no --archive-links
options were given, this will make only HEAD
requests to check that the resources exist. It will not fetch them.
Archive stuff off teh internets:
cat >links.json << EOF
[
{
"url": "http://catseye.tc/"
}
]
EOF
yastasoti --archive-to=downloads links.json
Requirements
Python 2.7 for sure, Python 3.x not sure, will need to run some tests.
Requires requests
Python library and/or wget
external utility to make
network requests.
If tqdm
Python library is installed, will display a nice progress bar.
Commit History
@8ba12b038f1caba062d06b6a7655ad66483ee4fc
git clone https://git.catseye.tc/yastasoti/
- Accumulate all results, dump only the failures at the end. Chris Pressey 6 years ago
- Python 3 compatibility. Chris Pressey 6 years ago
- Call the option --archive-to, as "links" is a bit redundant here Chris Pressey 6 years ago
- The return value of handle_link() is not a Response object. Chris Pressey 6 years ago
- Add --fragile command-line argument. Chris Pressey 6 years ago
- Checkpoint an experimental thing, badly. Chris Pressey 6 years ago
- --delay-between-requests argument. Chris Pressey 6 years ago
- Continue to clean up. Chris Pressey 6 years ago
- Continue to clean up. Chris Pressey 6 years ago
- Refactor again. Chris Pressey 6 years ago