Prometheus is a pull-based metrics monitoring system and time-series database originally built at SoundCloud in 2012, inspired by Google’s Borgmon. It was the second project to join the CNCF after Kubernetes and graduated in 2018. It is the de facto standard for metrics collection in Kubernetes environments.
A Prometheus server scrapes HTTP endpoints at a configurable interval, parses the text-based exposition format (or OpenMetrics), and stores samples in a local TSDB organized into two-hour blocks on disk with a write-ahead log for durability. Each series is uniquely identified by a metric name and a set of key-value labels, which gives the data model its high dimensionality. Targets are discovered dynamically through service discovery for Kubernetes, Consul, EC2, GCE, Azure, DNS, and file-based configs. Instrumented applications expose metrics through client libraries (Go, Java, Python, Rust, etc.), and unmodified systems are covered by exporters like node_exporter, blackbox_exporter, and kube-state-metrics.
Queries are written in PromQL, a functional language with range vectors, instant vectors, and operators for aggregation, rate calculation, histograms, and joins across label sets. Alerting rules evaluate PromQL expressions and send fired alerts to Alertmanager, which handles grouping, inhibition, silencing, and routing to PagerDuty, OpsGenie, Slack, email, and webhooks. For long-term storage and horizontal scaling, Prometheus exposes a remote_write API consumed by Thanos, Cortex, Mimir, VictoriaMetrics, and M3DB.