Skip CNCF Sandbox Runtime / Cloud Native Storage

Technology Guide

Vineyard

License: Apache-2.0

Vineyard Logo

Field Guide

Complete Guide

Vineyard (v6d) is an in-memory immutable data manager for distributed analytics and ML pipelines. Its job is to let two stages of a pipeline pass large objects like dataframes, tensors, and graphs to each other without serializing to disk or over a socket.

Each node runs a vineyardd daemon that owns a shared memory segment. Objects are stored once and consumers in other processes on the same node map them in with zero copies. A separate metadata service, typically etcd, tracks where each object lives in the cluster, so a distributed dataframe partitioned across nodes can be addressed as a single logical object. Vineyard separates metadata from payload so new data structures can be plugged in by registering builders, resolvers, and I/O drivers.

It was created at Alibaba and is a CNCF sandbox project. Typical use cases are handing intermediate data between Spark, Dask, Ray, and PyTorch stages in Kubernetes-scheduled ML workflows, where the alternative is writing Parquet to object storage between every step.

CNCF Project

Cloud Native Computing Foundation

Accepted: 2021-04-28

No articles found for Vineyard yet. Check back soon!