git @ Cat's Eye Technologies yastasoti / 06635fe
Log when checking and archiving links. Chris Pressey 1 year, 10 months ago
2 changed file(s) with 5 addition(s) and 4 deletion(s). Raw diff Collapse all Expand all
1717 #### Planned features ####
1818
1919 * Archive youtube links with youtube-dl.
20 * Handle failures (redirects, etc) better.
20 * Handle failures (redirects, etc) better (detect 503 / "connection refused" better.)
2121 * Allow use of an external tool like `wget` or `curl` to do fetching.
2222 * Allow categorization of downloaded stuff.
2323
2727
2828 feedmark --output-links article/*.md | yastasoti --article-root=article/ - | tee results.json
2929
30 Since no `--archive-links` options were given, this will make only `HEAD`
30 Since `--archive-to` was not specified, this will make only `HEAD`
3131 requests to check that the resources exist. It will not fetch them.
3232
3333 Archive stuff off teh internets:
4545
4646 Python 2.7 for sure, Python 3.x not sure, will need to run some tests.
4747
48 Requires `requests` Python library and/or `wget` external utility to make
49 network requests.
48 Requires `requests` Python library to make network requests.
5049
5150 If `tqdm` Python library is installed, will display a nice progress bar.
139139
140140 class LinkChecker(LinkTraverser):
141141 def handle_link(self, url):
142 logger.info("checking {}".format(url))
142143 response = requests.head(url, allow_redirects=True, headers={
143144 'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0',
144145 })
155156 self.missing_only = missing_only
156157
157158 def handle_link(self, url):
159 logger.info("archiving {}".format(url))
158160 dirname, filename = url_to_dirname_and_filename(url)
159161 dirname = os.path.join(self.dest_dir, dirname)
160162 if not os.path.exists(dirname):