Stateful stream processing for Kafka, built to scale up — not out.

Cut the distribution tax: fewer moving parts, simpler operations, more throughput.

StoatFlow is a JVM stream processing library with the Kafka Streams DSL, deployed as a single replica. Built on modern Java 25 — virtual threads, structured concurrency, FFM. Lane parallelism no longer bounded by partition count. Exactly-once semantics, no rebalancing, less serialization, no state migration. Trades horizontal scaling for functional and operational simplicity — better performance at lower cost.

What we set out to fix

Today's choices for stream processing on the JVM are Kafka Streams and Apache Flink — innovative projects that shaped a decade of real-time systems. Both are built to scale out, distributing work across a cluster of instances — an architecture that carries a heavy, permanent complexity.

We call this the distribution tax: the cost for scale-out you don't need — paid in infrastructure, staff time, and performance.

The problem

Hard to build. Harder to run.

Stateful joins, exactly-once, watermarks on out-of-order streams, sub-second latency — each a deep practice. Production adds rebalance storms, restart loops, checkpoint failures, state migrations that miss SLAs.

Every layer is a decision.

Plain Java wiring or a micro-framework? Which Kafka client knobs to tune — and what about RocksDB? Deploy on Kubernetes without downtime? Avoid or mitigate rebalances? StatefulSets, PVs, static group membership and/or standby replica?

Most workloads don't need to scale out.

Kafka Streams and Flink scale horizontally — that's where their architectural and operational complexity comes from. Workloads that fit on one modern machine pay that tax for capacity they'll never use.

Our response

Exceptional DX, modern stack, rich tooling

Kotlin and Java APIs on JDK 25 — virtual threads, structured concurrency, FFM. Gradle/Maven plugin, TopologyTestDriver, batteries-included runtime.

Fast & efficient stream processing

Single-replica model, virtual-thread parallelism, in-memory repartitioning — measurably lower CPU, memory, and latency on the same hardware.

Simple & reliable operations

No rebalancing, no state migration. Health checks, metrics, debug endpoints out of the box. 400 ms / 800–1200 ms restarts on benchmarked hardware.

New capabilities

Flink-style watermarks and timers, scheduled sources, side outputs — with more on the roadmap.

Single-instance architecture

A new approach to stream processing

StoatFlow is a single-instance stream-processing runtime. Parallel processing on modern JVM primitives, designed to scale vertically.

StoatFlow single-instance architecture: Kafka consumer (all partitions) → Lane dispatcher → N virtual-thread lanes → Global state (RocksDB) → Transactional producer.

Architectural mechanics

Distinctive capabilities under the hood — what makes the single-replica model work in production.

Key-Affinity Lanes

Records with the same key always route to the same virtual-thread lane via consistent hashing. Per-key ordering is guaranteed. Lane count is decoupled from Kafka partition count — parallelism scales with cores, not partitions.

Commit Barriers

Periodic barriers (lightweight markers) sweep through every lane. When all lanes align on a barrier, changelogs and sink records commit atomically with consumer offsets in one Kafka transaction for exactly-once semantics.

Global State

Backed by RocksDB, or in-memory stores, all state is accessible from any virtual thread, by any key. Layered with epoch buffers acting as caches, it allows thread-safe access throughout.

Virtual-Thread Parallelism

Project Loom virtual threads on JDK 25. Hundreds to thousands of concurrent lanes on a single JVM. External I/O (REST calls, database queries, AI inference) runs naturally on virtual threads, no async-framework ceremony.

In-Memory Repartitioning

Key-changing operations (groupBy, fk joins, ...) route records through in-memory queues to new lanes. No repartition topics, no broker round-trips, no extra serialization, no network IO.

Thread-safe by design

Atomic read-modify-write on every store, mostly lock-free inter-thread message passing, plus a KeyLockManager utility for multi-key / multi-store atomic operations in custom Processors.

The full stream processing surface

Compute on continuous event streams — typically Kafka topics — split between stateless transformations (map, filter, route), stateful operations (joins, aggregations, sessions) over time, and async IO (enrichment, RPC). StoatFlow is built on these primitives.

Primitives

Streams / records

Continuous sequences of events flowing through the application.

Sources & sinks

Connectors that bring events in from and push results out to external systems (Kafka topics, etc.).

Operators / processors

Functions that transform, filter, enrich, aggregate, join, or route events.

Keys / partitioning

Groups related events together, ensures key-based ordering, determines where state lives.

State stores

Durable local or managed state used for counts, joins, aggregations, deduplication, session tracking.

Time / windows / timers

Primitives for reasoning about event time, late data, periodic actions, and bounded state.

What it solves

Stateful processing

Keeping, mutating, and reasoning about state across an unbounded stream — counts, joins, sessions, materialised views.

Fault tolerance & recovery

Surviving crashes, broker restarts, and node failures without data loss. State rebuilds correctly from durable logs.

Consistency & exactly-once semantics

Producing each output exactly once across crashes and retries. Atomic commit of state, sinks, and offsets.

Time handling & out-of-order data

Reasoning about event time vs. processing time. Watermarks, windows, and policies for late-arriving records.

Scaling & parallel execution

Spreading work across many threads, cores, or machines while preserving per-key ordering and correctness guarantees.

Operational model & runtime coordination

Deployment, restarts, rebalancing, health, metrics — the moving parts required to run a stream processing app in production.

Faster, leaner, on the same machine.

StoatFlow vs. Kafka Streams across CPU, memory, latency, and state restoration —
single 8-vCPU machine, EOS¹ and ALO² modes.

¹ EOS — Exactly-once semantics
² ALO — At-least-once semantics

Kafka Streams
StoatFlow

Hetzner ccx33 · 8 vCPU · 32 GB · 2026-04-22

CPU utilisation

Stateless (simple)

parity

KS
SF

Word count

27% less

KS
SF

Stateless (advanced)

2.3× less

KS
SF

Stateful joins

1.5× less

KS
SF

Container memory

Stateless (simple)

17% more

KS
SF

Word count

31% less

KS
SF

Stateless (advanced)

2.4× less

KS
SF

Stateful joins

7.8× less

KS
SF

True E2E P95 latency

Stateless (simple)

17% higher

KS
SF

Word count

3% lower

KS
SF

Stateless (advanced)

1.9× lower

KS
SF

Stateful joins

13.6× lower

KS
SF

State restoration

Restoration time

1.45× faster

KS
SF

On-disk state

35% less

KS
SF

Throughput parity throughout. Lower is better.

Curious what's behind each scenario? The full report walks every benchmark — topology, infrastructure stack, serdes (String / Avro / Protobuf), load rates, and event size distributions — so you can compare against the workloads you actually run.

100% Kafka Streams DSL and beyond

Drop into the DSL you already use — with the primitives Kafka Streams doesn't ship.

Same DSL, drop in

  • KStream, KTable, joins (PK, FK, window)
  • count, reduce, aggregate, cogroup
  • Tumbling, hopping, session windows + grace + suppression
  • Versioned state stores
  • Processor API
  • Interactive queries

Existing topologies port with a dependency swap and a config cleanup.

Beyond Kafka Streams

  • Timers (event-time + processing-time)
  • Watermarks + idleness alignment
  • Scheduled sources (interval, cron)
  • Async I/O for external calls (REST, DB, AI)
  • Atomic store operations (compute, merge)
  • KeyLockManager — multi-key atomic sections

Side outputs + hot-standby HA on the roadmap.

Features beyond Kafka Streams

On top of the DSL — Flink-inspired primitives and StoatFlow-native utilities.

Event-time & wall-clock timers in any Processor

Per-key event-time or processing-time timers from any Processor. Each timer fires in the same lane as the key's records, serialised with process() — so onTimer has full read/write state access without locks.
Common use: retry a failed enrichment after a delay, expire idle keys, time out incomplete sessions.

class DelayedEnrich implements Processor<String, Order, String, EnrichedOrder> {
    private static final long RETRY_MS = 10_000;
    private ProcessorContext<String, EnrichedOrder> ctx;
    private TimerService<String> timers;
    private KeyValueStore<String, Order> pending;
    private ReadOnlyKeyValueStore<String, Customer> customers;

    @Override public void init(ProcessorContext<String, EnrichedOrder> ctx) {
        this.ctx = ctx;
        this.timers = ctx.timerService();
        this.pending = ctx.getStateStore("pending");
        this.customers = ctx.getStateStore("customers");
    }

    @Override public void process(Record<String, Order> record) {
        Customer c = customers.get(record.value().customerId());
        if (c != null) {
            ctx.forward(new Record<>(record.key(), enrich(record.value(), c), record.timestamp()));
            return;
        }
        // Cache miss — stash the order and retry 10s later
        pending.put(record.key(), record.value());
        timers.registerProcessingTimeTimer(record.key(), timers.currentProcessingTime() + RETRY_MS);
    }

    @Override public void onTimer(long ts, String key, TimerContext<String, EnrichedOrder> tctx) {
        Order order = pending.delete(key);
        Customer c = customers.get(order.customerId());
        tctx.forward(key, c != null ? enrich(order, c) : fallback(order));
    }
}

// Wire into the topology — pass the store names the Processor declares
orders.process(DelayedEnrich::new, "pending", "customers");

Runtime, plugins & testing

Most stream-processing libraries hand you an engine and stop. StoatFlow goes further — the scaffolding around it ships as part of the product, not left as homework for every team to re-do.

Batteries-included micro-runtime

Health checks, metrics, debug endpoints, and config defaults distilled from years of running Kafka apps in production — already wired in.

Gradle & Maven plugins

The Gradle convention plugin ships application + shadow + JDK 25 toolchain. Optional Docker build via Jib; optional GraalVM native image — opt in with a flag.

TopologyTestDriver

In-memory, broker-free test driver — familiar to Kafka Streams users — extended for timers, scheduled sources, and watermark control.

Why StoatFlow?

Engineering choices grounded in years of running stream processing applications in production — and the practitioner background behind every architectural decision.

Better performance at lower costs

Benchmarked stateful workloads vs Kafka Streams — same hardware, throughput parity:

  • 4.4–7.8× less container memory
  • 1.5–3.4× less CPU

One single-instance deployment per app — no cluster, no requested by unterutilised k8s resources, no standby replicas. The cloud bill follows the resource curve.

Ship without specialist expertise

Production defaults baked in. Health checks, metrics, debug endpoints, Kubernetes probes — out of the box.

Consumer-group sizing, rebalance tuning, standby-replica strategy, checkpoint configuration — decisions you don't have to make.

Existing JVM teams productionize stream processing without first becoming Kafka, Kafka Streams, or Flink specialists.

Hartmut Armbruster — founder of StoatFlow

Built by Kafka practitioners

— Hartmut Armbruster

Speaker

3× Current — the leading Apache Kafka conference (Austin 2024 · London 2025 + 2026). Also Berlin Buzzwords (2026), WeAreDevelopers World Congress (Berlin 2024 + 2026), Big Data Conference Europe (Vilnius 2024).

Author

Kafka Streams Topology Design — an open standard for visualising and documenting Kafka Streams application architectures.

Industry

18 years engineering, with Kafka in production at HSBC, NEX (CME Group), Raiffeisen Switzerland, and Deutsche Bahn.

Frequently asked questions

Stateful stream processing for Kafka, built to scale up — not out.

Start your 30-day free trial after a quick intro call. Full features, no credit card, no commitment. Migrate your first topology in an afternoon.