Downloading
Pull whole items or selected files with concurrency, md5 verification, resume, and a flat layout.
archive download fetches files from an item into a per-item directory. It runs
several files at once, skips files already present with a matching md5, and can
verify and resume.
Whole item or selected files
archive download nasa # every file in the item
archive download nasa globe_west_540.jpg # one named file
archive download nasa a.jpg b.jpg # several named files
Files land under <out-dir>/<identifier>/ so multiple items never collide.
Point --out-dir (-d) wherever you like:
archive download nasa -d ./downloads
The default destination is ~/data/archive/download (override the whole data
root with --data-dir or the ARCHIVE_DATA_DIR environment variable).
Filtering what to fetch
The same --glob and --format filters as files decide what to download:
archive download nasa --glob '*.jpg' -d .
archive download principleofrelat00eins --format PDF -d books
Verification and resume
Files already on disk with a matching md5 are skipped, so re-running a download
only fetches what is missing or changed. Add --verify to re-check the md5 of
each file after it is fetched and fail loudly on a mismatch:
archive download nasa --format JPEG -d . --verify
An interrupted transfer leaves a .part file and resumes from where it stopped
on the next run, using an HTTP range request.
Layout
Some items store files in sub-directories. --flat drops those path
components so everything lands directly in the item directory:
archive download some-item --flat -d .
Concurrency and politeness
-j (default 8) sets how many files download at once; --rate sets the
minimum delay between requests, and --retries the number of backoff retries
on 429/5xx. The defaults are tuned to be fast without hammering the Archive.
Streaming a single file to stdout
To pipe one file straight into another tool instead of saving it, use the
download URL with -o raw is not the path here; for a single file the simplest
route is files ... -o url piped to curl, or name the file and a destination
of .:
archive files nasa --glob '*.jpg' -o url | head -1 | xargs curl -s | wc -c