Historical UDL Download CLI#
The swxsoc_reach package ships a command-line tool for downloading
REACH data from the Unified Data Library (UDL) over arbitrary historical
date ranges. Unlike the scheduled Lambda path
(download_UDL_reach_to_file()), the CLI takes
absolute UTC dates and writes one artifact per day, with append-only
telemetry that supports safe restart and resume.
The CLI is accessed via Python’s -m module flag:
python -m swxsoc_reach download --help
Quick start#
Download two days of data for a single sensor:
BASICAUTH="Basic <token>" python -m swxsoc_reach download \
--start-date 2026-01-01 \
--end-date 2026-01-02 \
--sensor-id REACH-1 \
--output-dir ./out
Resolving UDL credentials#
The CLI obtains the UDL HTTP Basic auth credential at startup via
swxsoc_reach.net.auth.resolve_udl_auth(). Resolution order:
The
BASICAUTHenvironment variable, if set, is used directly (local-dev / pre-exported credential).Otherwise, if
SECRET_ARN_UDLis set, the secret is fetched from AWS Secrets Manager (usingboto3’s standard credential and region resolution chain) and itsbasicauthJSON field is used. The resolved value is also written back toos.environ['BASICAUTH'].Otherwise the CLI exits with code
2and a clear error message.
boto3 is an optional dependency under the net extra:
pip install 'swxsoc_reach[net]'
If you only ever use BASICAUTH (path 1), boto3 is not required.
Required arguments#
--start-date YYYY-MM-DDInclusive UTC start date.
--end-date YYYY-MM-DDInclusive UTC end date.
--output-dir PATHDirectory where per-day artifacts are written. Created if missing.
Common options#
--telemetry-file PATHPath to the append-only telemetry CSV. Defaults to
<output-dir>/download_telemetry.csv.--sensor-id IDREACH sensor identifier or
ALL(defaultALL). Drives chunk size inget_reach_datetimelist():ALL→ ~288 UDL requests/day (5-minute chunks).A specific sensor (e.g.
REACH-1) → ~4 UDL requests/day (6-hour chunks).
--descriptor NAMEUDL
descriptorquery value (defaultQUICKLOOK).--output-format {csv,json}Output serialization format (default
csv).--retry-failedRe-attempt days whose latest telemetry status is
FAILED. Without this flag,FAILEDdays are skipped on restart.--limit-days NCap the number of days actually attempted, counted from the first day in the range that is not already
DOWNLOADEDwith its CSV on disk. Composes naturally with resume.--dry-runPlan only: log per-day actions, write no telemetry, no network calls. Does not require auth.
--aws-region REGIONOptional AWS region override for the Secrets Manager lookup.
-v/-vvIncrease logging verbosity.
AIMD rate-controller flags#
These knobs are forwarded to
download_UDL_reach_window() and tune the
adaptive (Additive Increase / Multiplicative Decrease) request rate
used by the per-day downloader. The defaults are the same as the
Lambda path; an ALL historical backfill (≈288 req/day) typically
benefits from raising --max-concurrent-requests and
--initial-rate.
Flag |
Default |
|---|---|
|
4 |
|
5.0 req/s |
|
1.0 |
|
0.5 |
|
5.0 req/s |
|
25.0 req/s |
Restart and resume semantics#
The CLI is idempotent. On every run it loads the telemetry CSV and, for each day in the requested range, picks an action:
Latest prior status |
Artifact present? |
Action |
|---|---|---|
(no row) |
n/a |
run |
|
yes |
skip (idempotent) |
|
no |
re-download |
|
n/a |
skip (terminal) |
|
n/a |
skip, unless |
|
n/a |
re-run (interrupted) |
Telemetry CSV schema#
One row is appended per attempt. Older rows for the same date are
preserved; HistoricalTelemetry.load_state returns the most-recent
row per chunk_date_utc (file order breaks ties).
Column |
Description |
|---|---|
|
UUID4 stamped on every row written by a single CLI invocation. |
|
|
|
ISO 8601 UTC, inclusive ( |
|
ISO 8601 UTC, exclusive ( |
|
|
|
Number of records in the written artifact. |
|
Per-sensor upper-bound baseline ( |
|
|
|
Wall-clock seconds for the per-day attempt. |
|
Size of the written artifact in MiB. |
|
Absolute path of the written artifact ( |
|
Echo of the run inputs. |
|
Populated on |
|
ISO 8601 UTC timestamps for the attempt. |
Exit codes#
0— every planned day succeeded or was a known skip.1— at least one day ended inFAILED.2— usage / configuration error (auth resolution failed, inverted date range, etc.).
Running via Docker#
The team’s existing Docker image ships swxsoc_reach pre-installed.
Override the entrypoint to invoke the CLI:
docker run --rm -it \
-e SECRET_ARN_UDL=$SECRET_ARN_UDL \
-e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
-e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
-e AWS_SESSION_TOKEN=$AWS_SESSION_TOKEN \
-v /local/output:/output_dir \
--entrypoint python <team-image>:latest \
-m swxsoc_reach download \
--start-date 2026-01-01 \
--end-date 2026-01-31 \
--sensor-id ALL \
--output-dir /output_dir
Operator runbook: interrupted runs#
If a multi-day run is interrupted (Ctrl-C, container killed, network loss):
The telemetry CSV in
--output-diris consistent on disk (fsyncafter every row).Days that were mid-flight at interrupt time will have a
DOWNLOAD_PENDINGrow as their latest entry — these will be re-run on the next invocation.Days that completed normally have a
DOWNLOADEDrow plus their artifact on disk and will be skipped on the next invocation.To pick up where you left off, simply re-run the same command. Add
--retry-failedif you also want to re-attempt days that ended inFAILED.
To inspect progress without running anything, use --dry-run:
python -m swxsoc_reach download \
--start-date 2026-01-01 --end-date 2026-01-31 \
--output-dir ./out --dry-run -v