Acquisition Manager Examples
Externalized from the agent definition per the few-shot-examples rule (#1587).
Acquisition Manager — Worked Examples
Externalized from the agent definition per the few-shot-examples rule (#1587).
These are the concrete shell, JSON, and report samples that back the
Acquisition Manager's process. The agent definition keeps the prose, tables
(tool-selection matrix, retry matrix, extraction format selection), rules, and a
single compact inline anchor; the full samples live here.
Tool Installation Verification
Before orchestrating downloads, verify required tools are available:
command -v yt-dlp >/dev/null 2>&1 || echo "MISSING: yt-dlp"
command -v wget >/dev/null 2>&1 || echo "MISSING: wget"
command -v curl >/dev/null 2>&1 || echo "MISSING: curl"
command -v ffmpeg >/dev/null 2>&1 || echo "MISSING: ffmpeg"
If critical tools are missing, escalate to human with installation instructions.
Format Selection Strategy (full)
Video Content (Concerts, Performances)
Primary Strategy: Best quality up to 1080p with separate audio
yt-dlp -f 'bestvideo[height<=1080]+bestaudio/best[height<=1080]' \
--merge-output-format mkv \
<URL>
Rationale:
- 1080p balances quality and file size for archival
- Separate streams allow future audio extraction
- MKV container preserves both streams losslessly
Fallback Chain:
1. `bestvideo[height<=1080]+bestaudio` (best available up to 1080p)
2. `bestvideo[height<=720]+bestaudio` (720p fallback)
3. `best[height<=1080]` (combined stream)
4. `best` (accept any available quality)
Audio-Only Content (Music, Podcasts)
Primary Strategy: Best audio quality, extract to Opus
yt-dlp -f 'bestaudio' --extract-audio --audio-format opus --audio-quality 128K \
<URL>
Rationale:
- Opus provides excellent quality at lower bitrates than MP3
- 128K Opus ≈ 192K MP3 in perceived quality
- Native format for modern archival
Fallback Chain:
1. `bestaudio` → extract to Opus 128K
2. `bestaudio` → extract to MP3 320K (if Opus conversion fails)
3. `bestaudio` → keep original format (m4a, webm)
4. `best` → extract audio from video
Format Selection Logic
# Determine content type and select format
select_format() {
local url="$1"
local content_type="$2" # video | audio | auto
if [[ "$content_type" == "auto" ]]; then
# Query available formats to determine type
local formats=$(yt-dlp -F "$url" 2>/dev/null)
if echo "$formats" | grep -q "video only"; then
content_type="video"
else
content_type="audio"
fi
fi
case "$content_type" in
video)
echo "bestvideo[height<=1080]+bestaudio/best[height<=1080]"
;;
audio)
echo "bestaudio"
;;
esac
}
Directory Structure
Organize downloads in a consistent, navigable structure:
<base_path>/
├── <artist_name>/
│ ├── <era_or_album>/
│ │ ├── audio/
│ │ │ ├── 01-track.opus
│ │ │ ├── 02-track.opus
│ │ │ └── .curator/
│ │ │ ├── metadata.json
│ │ │ ├── sources.txt
│ │ │ └── checksums.sha256
│ │ └── video/
│ │ ├── concert-1.mkv
│ │ ├── concert-2.mkv
│ │ └── .curator/
│ │ ├── metadata.json
│ │ ├── sources.txt
│ │ └── thumbnails/
│ └── .curator/
│ └── artist-info.json
Directory Creation
create_acquisition_structure() {
local base_path="$1"
local artist="$2"
local era="$3"
# Sanitize names (remove special chars, spaces to underscores)
artist=$(echo "$artist" | tr ' ' '_' | tr -cd '[:alnum:]_-')
era=$(echo "$era" | tr ' ' '_' | tr -cd '[:alnum:]_-')
local audio_dir="$base_path/$artist/$era/audio"
local video_dir="$base_path/$artist/$era/video"
mkdir -p "$audio_dir/.curator"
mkdir -p "$video_dir/.curator"
mkdir -p "$base_path/$artist/.curator"
echo "$audio_dir"
echo "$video_dir"
}
Metadata Storage
Store acquisition metadata in `.curator/` directories:
metadata.json - Download session info
{
"session_id": "20260214-143022",
"timestamp": "2026-02-14T14:30:22Z",
"source_plan": "sources/plan-001.yaml",
"downloads": [
{
"url": "https://youtube.com/watch?v=...",
"filename": "concert-1.mkv",
"format": "bestvideo[height<=1080]+bestaudio",
"status": "completed",
"filesize_bytes": 1073741824,
"duration_seconds": 3600,
"checksum_sha256": "abc123..."
}
]
}
sources.txt - Original URLs for reference
# Downloaded 2026-02-14
https://youtube.com/watch?v=...
https://archive.org/details/...
Parallel Download Management
Concurrency Control
Rule: Maximum 3 concurrent downloads to prevent network saturation
Implementation:
# Track active downloads
ACTIVE_DOWNLOADS=()
MAX_CONCURRENT=3
start_download() {
local url="$1"
local output_dir="$2"
local format="$3"
# Wait if at capacity
while [[ ${#ACTIVE_DOWNLOADS[@]} -ge $MAX_CONCURRENT ]]; do
check_and_cleanup_completed
sleep 5
done
# Launch download in background
local download_id="dl-$$-$(date +%s)"
(
download_file "$url" "$output_dir" "$format"
echo "COMPLETED:$download_id" >> /tmp/curator-downloads.status
) &
ACTIVE_DOWNLOADS+=("$download_id:$!")
}
check_and_cleanup_completed() {
local new_active=()
for entry in "${ACTIVE_DOWNLOADS[@]}"; do
local pid="${entry#*:}"
if kill -0 "$pid" 2>/dev/null; then
new_active+=("$entry")
fi
done
ACTIVE_DOWNLOADS=("${new_active[@]}")
}
Separate Working Directories
Critical: Each download agent MUST write to a separate directory to prevent conflicts
# Good: Separate directories
download_session_1/audio/track-01.opus
download_session_2/audio/track-02.opus
# Bad: Shared directory (causes .part file conflicts)
shared_audio/track-01.opus.part # Multiple writers collide
Background Execution Pattern
# Launch download in background with proper logging
{
yt-dlp -f "$format" \
--output "$output_dir/%(title)s.%(ext)s" \
--write-info-json \
--write-thumbnail \
"$url" \
2>&1 | tee "$output_dir/.curator/download.log"
echo $? > "$output_dir/.curator/exit-code"
} &
# Capture PID for tracking
DOWNLOAD_PID=$!
echo "$DOWNLOAD_PID" > "$output_dir/.curator/pid"
Progress Tracking
State File Format
Location: `<session_dir>/.curator/session-state.json`
{
"session_id": "20260214-143022",
"started_at": "2026-02-14T14:30:22Z",
"updated_at": "2026-02-14T14:35:10Z",
"status": "in_progress",
"total_downloads": 12,
"completed": 5,
"in_progress": 3,
"failed": 1,
"pending": 3,
"downloads": [
{
"id": "dl-001",
"url": "https://youtube.com/watch?v=abc123",
"status": "completed",
"format": "bestvideo[height<=1080]+bestaudio",
"output_file": "concert-1.mkv",
"filesize_bytes": 1073741824,
"duration_seconds": 3600,
"progress_percent": 100,
"speed_mbps": 12.5,
"eta_seconds": 0,
"started_at": "2026-02-14T14:30:25Z",
"completed_at": "2026-02-14T14:42:10Z",
"error": null
},
{
"id": "dl-002",
"url": "https://archive.org/details/xyz789",
"status": "in_progress",
"format": "FLAC",
"output_file": "album.flac",
"filesize_bytes": 524288000,
"progress_percent": 65,
"speed_mbps": 8.2,
"eta_seconds": 245,
"started_at": "2026-02-14T14:35:00Z",
"completed_at": null,
"error": null
}
]
}
Progress Monitoring
# Parse yt-dlp progress output
monitor_download_progress() {
local log_file="$1"
local state_file="$2"
tail -f "$log_file" | while read -r line; do
if [[ "$line" =~ \[download\].*([0-9]+\.[0-9]+)%.*at.*([0-9]+\.[0-9]+)(K|M)iB/s ]]; then
local progress="${BASH_REMATCH[1]}"
local speed="${BASH_REMATCH[2]}"
local unit="${BASH_REMATCH[3]}"
# Update state file
jq ".downloads[] |= if .id == \"$download_id\" then .progress_percent = $progress else . end" \
"$state_file" > "$state_file.tmp" && mv "$state_file.tmp" "$state_file"
fi
done
}
Error Handling
Error Detection Patterns
# Detect error types from yt-dlp output
classify_error() {
local error_message="$1"
case "$error_message" in
*"timed out"*|*"Connection reset"*)
echo "network_timeout"
;;
*"429"*|*"Too Many Requests"*)
echo "rate_limited"
;;
*"Video unavailable"*|*"Private video"*|*"removed"*)
echo "unavailable"
;;
*"Requested format is not available"*)
echo "format_unavailable"
;;
*"No space left"*|*"Disk full"*)
echo "disk_full"
;;
*"Permission denied"*)
echo "permission_denied"
;;
*"corrupted"*|*"checksum"*)
echo "corrupted"
;;
*)
echo "unknown"
;;
esac
}
Retry Logic
retry_download() {
local url="$1"
local output_dir="$2"
local format="$3"
local max_retries="$4"
local backoff_strategy="$5" # exponential | fixed
local attempt=0
local wait_seconds=5
while [[ $attempt -lt $max_retries ]]; do
attempt=$((attempt + 1))
echo "Attempt $attempt of $max_retries for $url"
if download_file "$url" "$output_dir" "$format"; then
echo "SUCCESS on attempt $attempt"
return 0
fi
# Classify error and determine if retry is appropriate
local error_type=$(classify_error "$(cat "$output_dir/.curator/download.log")")
if [[ "$error_type" == "disk_full" || "$error_type" == "permission_denied" ]]; then
echo "FATAL ERROR: $error_type - aborting retries"
return 2
fi
if [[ $attempt -lt $max_retries ]]; then
echo "Waiting ${wait_seconds}s before retry..."
sleep "$wait_seconds"
if [[ "$backoff_strategy" == "exponential" ]]; then
wait_seconds=$((wait_seconds * 3))
fi
fi
done
echo "FAILED after $max_retries attempts"
return 1
}
Fallback Chain
Quality Fallback Strategy
When requested format is unavailable, gracefully degrade through quality levels:
# Fallback chain for video
VIDEO_FORMATS=(
"bestvideo[height<=1080]+bestaudio/best[height<=1080]" # Best 1080p
"bestvideo[height<=720]+bestaudio/best[height<=720]" # 720p fallback
"bestvideo[height<=480]+bestaudio/best[height<=480]" # 480p fallback
"best" # Any available
)
# Fallback chain for audio
AUDIO_FORMATS=(
"bestaudio[ext=opus]/bestaudio[ext=m4a]/bestaudio" # Best audio
"worstaudio" # Low quality
)
try_format_fallback() {
local url="$1"
local output_dir="$2"
local format_array=("${@:3}")
for format in "${format_array[@]}"; do
echo "Trying format: $format"
if yt-dlp -f "$format" --output "$output_dir/%(title)s.%(ext)s" "$url"; then
echo "SUCCESS with format: $format"
return 0
fi
done
echo "FAILED: No fallback formats succeeded"
return 1
}
Audio Extraction
Batch Extraction from Video
After video downloads complete, extract audio tracks for archival:
extract_audio_from_videos() {
local video_dir="$1"
local audio_dir="$2"
local format="opus" # or mp3, flac
local quality="128K"
find "$video_dir" -type f \( -name "*.mkv" -o -name "*.mp4" -o -name "*.webm" \) | while read -r video_file; do
local basename=$(basename "$video_file" | sed 's/\.[^.]*$//')
local audio_file="$audio_dir/${basename}.${format}"
echo "Extracting audio: $video_file -> $audio_file"
ffmpeg -i "$video_file" \
-vn \
-acodec libopus \
-b:a "$quality" \
"$audio_file" \
2>&1 | tee "$audio_dir/.curator/extraction.log"
if [[ ${PIPESTATUS[0]} -eq 0 ]]; then
echo "SUCCESS: $audio_file"
else
echo "FAILED: $video_file"
fi
done
}
Network Mount Rules (Field-Tested) — full code samples
CRITICAL LESSONS from real-world testing:
Rule 1: NEVER Bulk Operations on Network Mounts
Problem: Network mounts (NFS, SMB, sshfs) have high latency for metadata operations. Operations like `ls -la` on 1000+ files can take minutes.
Solution: Always download to local disk first, then batch-copy to network mount.
# Bad: Download directly to network mount
yt-dlp -o "/mnt/network-storage/artist/%(title)s.%(ext)s" "$url"
# Result: 10x slower, frequent timeouts, partial writes
# Good: Download locally, then move
LOCAL_WORK="/tmp/curator-$$"
mkdir -p "$LOCAL_WORK"
yt-dlp -o "$LOCAL_WORK/%(title)s.%(ext)s" "$url"
# Then batch copy when complete
rsync -av --progress "$LOCAL_WORK/" "/mnt/network-storage/artist/"
Rule 2: Batch Metadata Updates
Problem: Writing individual .curator/ metadata files to network mount causes thrashing.
Solution: Accumulate metadata locally, write in single batch.
# Bad: Write metadata per download
for file in *.opus; do
echo "$file metadata" > "/mnt/network/.curator/$file.json"
done
# Good: Build metadata locally, copy once
mkdir -p "$LOCAL_WORK/.curator"
for file in *.opus; do
echo "$file metadata" > "$LOCAL_WORK/.curator/$file.json"
done
rsync -av "$LOCAL_WORK/.curator/" "/mnt/network/.curator/"
Rule 3: Verify Before Cleanup
Problem: Network mount disconnections can cause data loss if local copy deleted prematurely.
Solution: Verify remote copy with checksums before deleting local.
copy_to_network_mount() {
local local_dir="$1"
local remote_dir="$2"
# Copy files
rsync -av --checksum --progress "$local_dir/" "$remote_dir/"
# Verify checksums match
(cd "$local_dir" && find . -type f -exec sha256sum {} \; | sort) > /tmp/local-checksums
(cd "$remote_dir" && find . -type f -exec sha256sum {} \; | sort) > /tmp/remote-checksums
if diff /tmp/local-checksums /tmp/remote-checksums; then
echo "Verification SUCCESS - safe to delete local copy"
rm -rf "$local_dir"
else
echo "Verification FAILED - keeping local copy"
return 1
fi
}
Rule 4: Graceful Degradation on Mount Failure
Problem: Network mount failures should not crash the entire acquisition session.
Solution: Detect mount issues early, fall back to local-only mode.
check_mount_health() {
local mount_point="$1"
# Test write performance
local test_file="$mount_point/.curator-health-test"
local start_time=$(date +%s)
if ! timeout 5 dd if=/dev/zero of="$test_file" bs=1M count=10 2>/dev/null; then
echo "Mount UNHEALTHY or SLOW - falling back to local storage"
return 1
fi
local end_time=$(date +%s)
local duration=$((end_time - start_time))
rm -f "$test_file"
if [[ $duration -gt 3 ]]; then
echo "Mount performance degraded (${duration}s for 10MB write)"
return 1
fi
return 0
}
Quality Verification
After download completion, verify file integrity:
verify_download() {
local file_path="$1"
local expected_size="$2" # Optional
# Check file exists and is readable
if [[ ! -r "$file_path" ]]; then
echo "FAILED: File not readable: $file_path"
return 1
fi
# Check file size (must be > 0)
local actual_size=$(stat -f%z "$file_path" 2>/dev/null || stat -c%s "$file_path")
if [[ $actual_size -eq 0 ]]; then
echo "FAILED: Zero-byte file: $file_path"
return 1
fi
# Check expected size if provided
if [[ -n "$expected_size" ]]; then
local tolerance=$((expected_size / 100)) # 1% tolerance
local diff=$((actual_size - expected_size))
diff=${diff#-} # Absolute value
if [[ $diff -gt $tolerance ]]; then
echo "WARNING: Size mismatch (expected: $expected_size, actual: $actual_size)"
fi
fi
# Verify media file integrity with ffmpeg
if command -v ffmpeg >/dev/null; then
if ffmpeg -v error -i "$file_path" -f null - 2>&1 | grep -q "error"; then
echo "FAILED: Media file corrupted: $file_path"
return 1
fi
fi
echo "VERIFIED: $file_path ($actual_size bytes)"
return 0
}
Final Report
Generate comprehensive session report:
generate_session_report() {
local session_state="$1"
local total=$(jq '.total_downloads' "$session_state")
local completed=$(jq '.completed' "$session_state")
local failed=$(jq '.failed' "$session_state")
cat <<EOF
ACQUISITION SESSION REPORT
=========================
Session ID: $(jq -r '.session_id' "$session_state")
Started: $(jq -r '.started_at' "$session_state")
Completed: $(jq -r '.updated_at' "$session_state")
Downloads: $total total ($completed completed, $failed failed)
Successful Downloads:
$(jq -r '.downloads[] | select(.status == "completed") | " - \(.output_file) (\(.filesize_bytes | tonumber / 1048576 | floor)MB)"' "$session_state")
Failed Downloads:
$(jq -r '.downloads[] | select(.status == "failed") | " - \(.url)\
Error: \(.error)"' "$session_state")
Total Size: $(jq '[.downloads[] | select(.status == "completed") | .filesize_bytes] | add | . / 1073741824' "$session_state") GB
EOF
}