What happened
Dragonfly added native Hugging Face and ModelScope support to dfget via hf:// and modelscope://.
For platform teams, this removes a lot of custom model sync and cache glue.
You can pull models directly with built-in auth, revision pinning, and recursive fetches.
Why it matters
On inference clusters, model pull is usually the cold-start bottleneck and one of the biggest avoidable egress costs.
The traffic math is straightforward. A 130 GB model across 200 nodes is 26 TB if every node pulls from origin. With Dragonfly in front, origin should see roughly one full pull (~130 GB). That is the 99.5% claim.
It also reduces lock-in to internal mirror pipelines and custom fetch services that become permanent maintenance debt.
How it looks
dfget hf://deepseek-ai/DeepSeek-R1/model.safetensors -O /models/DeepSeek-R1/model.safetensorsdfget hf://owner/repo -O ./repo/ -rModelScope follows the same pattern with --ms-token and --ms-revision.
The key behavior is piece-level sharing. Seed peers can upload chunks before the full model download completes, so pulls become parallel instead of serialized.
Proven vs unproven
Proven:
- Native
hf://andmodelscope://support indfget - Cleaner integration path than custom mirror jobs
- Clear origin-traffic reduction math
Unproven (in public data so far):
- p50/p95 model-ready startup time at scale
- cross-AZ and cross-region behavior
- token handling guidance that avoids secrets in shell history
- failure behavior when origin is slow or rate-limited
- end-to-end security posture for private model pulls
What to do next
If you currently mirror Hugging Face or ModelScope into internal storage, run a controlled pilot and compare:
- cold-start time per pod
- origin egress
- east-west network load
- operational complexity (how much code/config you can delete)
If the numbers hold, delete the mirror pipeline.