Cloud Native and AI Infrastructure Digest: June 1, 2026

Miasma turns npm trusted publishing against Red Hat cloud packages

SafeDep and Wiz disclosed Miasma on June 1: an npm supply-chain attack against packages in the @redhat-cloud-services namespace.

The attacker did not bypass provenance. They made provenance point at the wrong thing. According to SafeDep, short-lived oidc-* branches were added to three RedHatInsights repositories, trusted-publishing workflows were changed, and modified packages were pushed with valid npm provenance. The missing control was branch authorization: the registry could prove which repository and workflow produced the artifact, but not that the publish came from a protected branch operators trusted.

The malicious packages executed during preinstall. They downloaded Bun when it was not already present, then scanned for cloud, Kubernetes, npm, GitHub, Vault, and password-manager credentials. Wiz says most affected versions had been revoked by its 13:00 UTC update, with two still remaining at publication time.

Platform teams should treat trusted-publishing configuration as production infrastructure. Bind publish workflows to protected branches or environments wherever the registry supports it, and block known-bad package versions before CI can run lifecycle scripts.

Sources: SafeDep, Wiz - June 1, 2026

vLLM publishes DGX Spark serving guidance for Nemotron-3-Super

The vLLM project published a June 1 deployment guide for running nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 through vLLM on NVIDIA DGX Spark.

The guide is useful because DGX Spark is not a normal data-center GPU target. Its GB10 Grace Blackwell SoC exposes a unified 128 GB CPU/GPU memory pool, so vLLM settings such as --gpu-memory-utilization, --max-model-len, and --max-num-seqs affect the operating system, container runtime, model weights, and KV cache together.

The published example uses the OpenAI-compatible vLLM server image, --max-model-len 131072, --gpu-memory-utilization 0.85, --max-num-seqs 4, and model-specific parser flags. vLLM reports a five-scenario single-Spark evaluation with median decode throughput in the 22.7 to 23.7 tokens/sec range after warm-up, and explicitly frames the result as recipe-specific rather than a general DGX Spark ceiling.

This matters for local and small-batch inference work because the post turns “will this fit?” into a runbook shape: pin an image, pre-stage weights, warm JIT paths before first use, and watch KV-cache plus TTFT metrics from /metrics.

Source: vLLM - June 1, 2026

vLLM-Omni documents GGUF quantization for diffusion serving

The vLLM-Omni documentation added June 1 guidance for GGUF quantization in diffusion-serving paths.

The implementation loads pre-quantized diffusion transformer weights from GGUF while keeping the rest of the pipeline on the base Hugging Face checkpoint for tokenizer, text encoder, scheduler, and VAE state. The docs list validated adapter paths for Qwen-Image, Z-Image, and FLUX.2-klein, while multi-stage omni, TTS, BAGEL, and GLM-Image paths remain unvalidated or require model-specific adapters.

For serving, the documented online path passes --diffusion-quantization-config '{"method":"gguf","gguf_model":"..."}' to vllm serve --omni. Hardware support is documented for NVIDIA Ampere, Ada/Hopper, and Blackwell GPUs, with ROCm marked unverified and Ascend marked unsupported.

The practical value is clearer failure behavior: unsupported models should fail with an adapter error instead of silently using a generic tensor-name mapper.

Source: vLLM-Omni - June 1, 2026

CNCF covers dynamic configuration patterns for Swift services on Kubernetes

CNCF published a June 1 technical post on Swift Configuration for cloud native Swift services.

The post focuses on operational configuration mechanics rather than language marketing: ordered configuration providers, dot-notation keys that map across files and environment variables, ConfigMap-style file reloading, and immutable snapshots so a request does not observe half of one configuration version and half of another.

The Kubernetes-specific detail is the torn-read problem during hot reload. Swift Configuration’s provider model replaces a complete snapshot atomically and keeps the previous valid snapshot when a reload is malformed, which is the behavior platform teams generally want from ConfigMap-mounted runtime configuration.

This is niche, but it is technically useful for teams running Swift services on Kubernetes because it gives the runtime a first-class model for precedence, reloads, and consistent reads instead of ad hoc ProcessInfo.environment plus hand-rolled file parsing.

Source: CNCF - June 1, 2026

Miasma turns npm trusted publishing against Red Hat cloud packages

vLLM publishes DGX Spark serving guidance for Nemotron-3-Super

vLLM-Omni documents GGUF quantization for diffusion serving

CNCF covers dynamic configuration patterns for Swift services on Kubernetes

Stay on top of cloud-native releases

More stories

runc 1.5.0 ships stable, Prometheus 3.13 enters RC, Talos patches etcd leak

vLLM, SGLang, Kubernetes, Kueue, and Helm ship runtime fixes

NCCL EP, OpenTelemetry Collector, and cert-manager ship runtime changes