Field Guide

Complete Guide

KServe is a Kubernetes-native model inference platform. It provides an InferenceService CRD that takes a model artifact (from S3, GCS, PVC, or an OCI registry) and a framework name, and stands up a serving deployment with autoscaling, canary rollouts, and a standard prediction HTTP/gRPC interface.

Under the hood it sits on top of Knative Serving for request-driven autoscaling (including scale-to-zero) and Istio or Gateway API for traffic routing. A model pod runs a framework-specific runtime — TensorFlow Serving, TorchServe, Triton, SKLearn, XGBoost, HuggingFace — wrapped in the KServe predictor contract, and can optionally chain to transformer and explainer pods for pre/post-processing and feature attribution. KServe implements the Open Inference Protocol (v2), so clients speak the same REST/gRPC API regardless of the backing runtime. ModelMesh, merged into the project, adds multi-model serving where hundreds of small models share a pool of runtime pods, which is how you serve a long tail of models cost-effectively.

It started life as KFServing inside Kubeflow and was spun out as its own project. Comparable tools include Seldon Core, BentoML, NVIDIA Triton standalone, and Ray Serve — KServe’s differentiator is the tight Kubernetes/Knative integration and the standardized inference protocol.

CNCF Project

Cloud Native Computing Foundation

Accepted: 2025-09-29

Incubating: 2025-09-29

Dev Stats

Community

Join the conversation

Slack Blog

No content found for KServe yet. Check back soon!

Related technologies