Field Guide

Complete Guide

Fluid is a Kubernetes-native orchestrator for data-intensive workloads — AI training, big data, analytics — whose bottleneck is fetching remote data from object stores like S3, OSS, or HDFS. Instead of jobs pulling straight from remote storage every time, Fluid introduces the concept of a Dataset CRD backed by a distributed cache engine that runs alongside the compute workloads.

The architecture decouples “what data you want” (Dataset) from “how it’s cached” (Runtime). Fluid plugs into multiple cache engines — Alluxio, JuiceFS, JindoFS, GooseFS, Vineyard — and provisions them as Kubernetes resources. When a pod mounts a Dataset, Fluid’s scheduler extender prefers nodes where the relevant cache blocks already live, so training jobs read from a co-located cache over loopback or the local network instead of pulling terabytes across WAN links on every epoch. Warmup, preload, and eviction are all controlled through CRDs.

Fluid came out of Alibaba and Nanjing University and is a CNCF Incubating project. It’s most compelling when you’re doing repeated, read-heavy access to the same large dataset — ML training on a shared corpus is the canonical example — and your object store egress costs or latency have become the thing limiting GPU utilization.