Announcement·

StoatFlow: Kafka Streams compatible engine built to scale up — not out

The first alpha is here. Kafka Streams DSL on a single-replica runtime with virtual-thread parallelism — measurably less CPU, memory, and latency on the same hardware.

TL;DR

  • What: Single-replica JVM stream processor with the Kafka Streams DSL — JDK 25, Project Loom virtual threads.
  • Kafka Streams DSL compatibility: Drop-in — existing topology code ports with a dependency swap.
  • Why: up to 3.4× less CPU and 7.8× less container memory vs Kafka Streams on the same hardware; up to 13.6× lower P99 latency on stateful workloads.
  • Ceiling: A single 8-vCPU machine saturates around 200–300 MB/s uncompressed network throughput — well above typical Kafka stream processing workloads.
  • Access: Private alpha — reach out for early access.

Most stream-processing workloads fit on a single machine. Kafka Streams and Flink scale them out anyway — and you pay the architectural cost of distribution whether your workload needs it or not.

StoatFlow is the alternative for workloads that don't. Same Kafka Streams DSL, single replica per app, built on JDK 25 virtual threads — your existing topology code compiles against StoatFlow, and your operators stop paying the distribution tax.

For the how, head to Getting Started.

What we set out to fix

Stream processing on the JVM today gives you two well-known choices: Kafka Streams or Apache Flink. Both are remarkable. Both also scale horizontally by default — which is where most of their architectural and operational complexity comes from.

Three problems compound:

  • Hard to build, harder to run. Stateful joins, exactly-once, watermarks on out-of-order streams — each a deep practice. Production then layers on rebalance storms, restart loops, checkpoint failures, and state migrations that miss SLAs.
  • Every layer is a decision. Which Kafka client knobs to tune? What about RocksDB? StatefulSets, persistent volumes, static group membership, standby replicas? Deploy on Kubernetes without downtime? You answer all of it before your first event flows.
  • Most workloads don't need to scale out. Workloads that comfortably fit on a single modern machine pay the distribution tax for capacity they'll never use.

A different approach

StoatFlow runs as exactly one instance per application. No consumer-group rebalancing — there's no group. No state migration — state lives on the instance that owns it. No repartition topics — key-changing operations route through in-memory queues to other lanes inside the same process.

The single-replica bet is enabled by two recent shifts:

  • Modern JVM concurrency. Virtual threads (GA in JDK 21) give you thousands of concurrent lanes without platform-thread overhead. StoatFlow targets JDK 25 — virtual threads, structured concurrency, FFM all in.
  • Modern hardware. 16-vCPU compute-optimized instances, 10+ Gbps networking, NVMe storage — off-the-shelf on every major cloud.

Records dispatch to key-affinity lanes by consistent hashing — same key, same lane, ordered. Lane count is decoupled from Kafka partition count, so parallelism scales with cores, not partitions.

Exactly-once flows through commit barriers that sweep every lane and align with one Kafka transaction.

State (RocksDB or in-memory) is globally accessible to any virtual thread, by any key — layered with epoch buffers for read-your-writes consistency.

Existing topologies port unchanged

The DSL is the one Kafka Streams users already know. KStream, KTable, joins (primary-key, foreign-key, windowed), count / reduce / aggregate / cogroup, tumbling / hopping / session windows with grace and suppression, versioned state stores, the Processor API, interactive queries — all there.

Existing topologies port with a dependency swap and a config cleanup. Your StreamsBuilder, your operations, your Materialized definitions all compile against StoatFlow. What you remove is the multi-instance scaffolding: standby replicas, stream-thread counts, partition-aware tuning.

State migration is a different story. The recommended path is to reprocess your input topics — direct restoration from existing Kafka Streams changelog topics isn't supported. If your input retention rules that out, get in touch.

And on top of the DSL, StoatFlow ships primitives Kafka Streams doesn't:

  • Flink-style event-time and processing-time timers from any Processor
  • Flink-style watermarks with idleness alignment
  • Scheduled sources — topology-level emitters on an interval or cron
  • Atomic store operations (compute, merge)
  • KeyLockManager — atomic sections across multiple keys and stores

For the full surface, see Features.

The numbers

Benchmarked against Kafka Streams 4.1.1 on a Hetzner 8-vCPU machine, identical topologies, throughput parity. The headline:

  • P99 latency — up to 13.6× lower
  • CPU — up to 3.4× less
  • Container memory — up to 7.8× less
  • Restoration — 1.45 to 1.65× faster

Curious what's behind each scenario? The full benchmark report walks every benchmark — topology, infrastructure stack, serdes (String / Avro / Protobuf), load rates, and event size distributions — so you can compare against the workloads you actually run.

There's a ceiling, and we name it. On that Hetzner 8-core VM, benchmarks measure a 200–300 MB/s uncompressed network-bandwidth ceiling — in events, ~124K/sec on a 1KB stateless transform, up to ~2.1M/sec output on word-count-style aggregation. High-end hardware (96+ cores, faster NICs) hasn't been benchmarked yet.

Get early access

StoatFlow is in private alpha — distribution is invite-only while we work directly with each early-access team.

Plenty of questions this post doesn't cover — failover behaviour, cold-start times, scenario representativeness, the full benchmark report. The docs are still being written.

Reach out to request alpha or beta access — especially if you're running stateful Kafka Streams in production today. For release news and updates, follow on LinkedIn.