Stateful stream processing for Kafka, built to scale up — not out.

Cut the distribution tax: fewer moving parts, simpler operations, more throughput.

StoatFlow is a JVM stream processing library with the Kafka Streams DSL, deployed as a single replica. Built on modern Java 25 — virtual threads, structured concurrency, FFM. Lane parallelism no longer bounded by partition count. Exactly-once semantics, no rebalancing, less serialization, no state migration. Trades horizontal scaling for functional and operational simplicity — better performance at lower cost.

Start free trial See how it works

What we set out to fix

Today's choices for stream processing on the JVM are Kafka Streams and Apache Flink — innovative projects that shaped a decade of real-time systems. Both are built to scale out, distributing work across a cluster of instances — an architecture that carries a heavy, permanent complexity.

We call this the distribution tax: the cost for scale-out you don't need — paid in infrastructure, staff time, and performance.

The problem

Hard to build. Harder to run.

Stateful joins, exactly-once, watermarks on out-of-order streams, sub-second latency — each a deep practice. Production adds rebalance storms, restart loops, checkpoint failures, state migrations that miss SLAs.

Every layer is a decision.

Plain Java wiring or a micro-framework? Which Kafka client knobs to tune — and what about RocksDB? Deploy on Kubernetes without downtime? Avoid or mitigate rebalances? StatefulSets, PVs, static group membership and/or standby replica?

Most workloads don't need to scale out.

Kafka Streams and Flink scale horizontally — that's where their architectural and operational complexity comes from. Workloads that fit on one modern machine pay that tax for capacity they'll never use.

Our response

Exceptional DX, modern stack, rich tooling

Kotlin and Java APIs on JDK 25 — virtual threads, structured concurrency, FFM. Gradle/Maven plugin, TopologyTestDriver, batteries-included runtime.

Fast & efficient stream processing

Single-replica model, virtual-thread parallelism, in-memory repartitioning — measurably lower CPU, memory, and latency on the same hardware.

Simple & reliable operations

No rebalancing, no state migration. Health checks, metrics, debug endpoints out of the box. 400 ms / 800–1200 ms restarts on benchmarked hardware.

New capabilities

Flink-style watermarks and timers, scheduled sources, side outputs — with more on the roadmap.

Read the full motivation

Single-instance architecture

A new approach to stream processing

StoatFlow is a single-instance stream-processing runtime. Parallel processing on modern JVM primitives, designed to scale vertically.

StoatFlow single-instance architecture: Kafka consumer (all partitions) → Lane dispatcher → N virtual-thread lanes → Global state (RocksDB) → Transactional producer.

Architectural mechanics

Distinctive capabilities under the hood — what makes the single-replica model work in production.

Key-Affinity Lanes

Records with the same key always route to the same virtual-thread lane via consistent hashing. Per-key ordering is guaranteed. Lane count is decoupled from Kafka partition count — parallelism scales with cores, not partitions.

Commit Barriers

Periodic barriers (lightweight markers) sweep through every lane. When all lanes align on a barrier, changelogs and sink records commit atomically with consumer offsets in one Kafka transaction for exactly-once semantics.

Global State

Backed by RocksDB, or in-memory stores, all state is accessible from any virtual thread, by any key. Layered with epoch buffers acting as caches, it allows thread-safe access throughout.

Virtual-Thread Parallelism

Project Loom virtual threads on JDK 25. Hundreds to thousands of concurrent lanes on a single JVM. External I/O (REST calls, database queries, AI inference) runs naturally on virtual threads, no async-framework ceremony.

In-Memory Repartitioning

Key-changing operations (groupBy, fk joins, ...) route records through in-memory queues to new lanes. No repartition topics, no broker round-trips, no extra serialization, no network IO.

Thread-safe by design

Atomic read-modify-write on every store, mostly lock-free inter-thread message passing, plus a KeyLockManager utility for multi-key / multi-store atomic operations in custom Processors.

How it works — architecture deep-dive

The full stream processing surface

Compute on continuous event streams — typically Kafka topics — split between stateless transformations (map, filter, route), stateful operations (joins, aggregations, sessions) over time, and async IO (enrichment, RPC). StoatFlow is built on these primitives.

Primitives

Streams / records

Continuous sequences of events flowing through the application.

Sources & sinks

Connectors that bring events in from and push results out to external systems (Kafka topics, etc.).

Operators / processors

Functions that transform, filter, enrich, aggregate, join, or route events.

Keys / partitioning

Groups related events together, ensures key-based ordering, determines where state lives.

State stores

Durable local or managed state used for counts, joins, aggregations, deduplication, session tracking.

Time / windows / timers

Primitives for reasoning about event time, late data, periodic actions, and bounded state.

What it solves

Stateful processing

Keeping, mutating, and reasoning about state across an unbounded stream — counts, joins, sessions, materialised views.

Fault tolerance & recovery

Surviving crashes, broker restarts, and node failures without data loss. State rebuilds correctly from durable logs.

Consistency & exactly-once semantics

Producing each output exactly once across crashes and retries. Atomic commit of state, sinks, and offsets.

Time handling & out-of-order data

Reasoning about event time vs. processing time. Watermarks, windows, and policies for late-arriving records.

Scaling & parallel execution

Spreading work across many threads, cores, or machines while preserving per-key ordering and correctness guarantees.

Operational model & runtime coordination

Deployment, restarts, rebalancing, health, metrics — the moving parts required to run a stream processing app in production.

Faster, leaner, on the same machine.

StoatFlow vs. Kafka Streams across CPU, memory, latency, and state restoration —
single 8-vCPU machine, EOS¹ and ALO² modes.

¹ EOS — Exactly-once semantics·
² ALO — At-least-once semantics

Kafka Streams

StoatFlow

Hetzner ccx33 · 8 vCPU · 32 GB · 2026-04-22

CPU utilisation

Stateless (simple)

parity

Word count

27% less

Stateless (advanced)

2.3× less

Stateful joins

1.5× less

Container memory

Stateless (simple)

17% more

Word count

31% less

Stateless (advanced)

2.4× less

Stateful joins

7.8× less

True E2E P95 latency

Stateless (simple)

17% higher

Word count

3% lower

Stateless (advanced)

1.9× lower

Stateful joins

13.6× lower

State restoration

Restoration time

1.45× faster

On-disk state

35% less

Throughput parity throughout. Lower is better.

Curious what's behind each scenario? The full report walks every benchmark — topology, infrastructure stack, serdes (String / Avro / Protobuf), load rates, and event size distributions — so you can compare against the workloads you actually run.

See the full benchmark report

100% Kafka Streams DSL and beyond

Drop into the DSL you already use — with the primitives Kafka Streams doesn't ship.

Same DSL, drop in

KStream, KTable, joins (PK, FK, window)
count, reduce, aggregate, cogroup
Tumbling, hopping, session windows + grace + suppression
Versioned state stores
Processor API
Interactive queries

Existing topologies port with a dependency swap and a config cleanup.

Beyond Kafka Streams

Timers (event-time + processing-time)
Watermarks + idleness alignment
Scheduled sources (interval, cron)
Async I/O for external calls (REST, DB, AI)
Atomic store operations (compute, merge)
KeyLockManager — multi-key atomic sections

Hot-standby HA now shipped; side outputs on the roadmap.

See the full comparison matrix

Features beyond Kafka Streams

On top of the DSL — Flink-inspired primitives and StoatFlow-native utilities.

Event-time & wall-clock timers in any Processor

Per-key event-time or processing-time timers from any Processor. Each timer fires in the same lane as the key's records, serialised with process() — so onTimer has full read/write state access without locks.
Common use: retry a failed enrichment after a delay, expire idle keys, time out incomplete sessions.

class DelayedEnrich implements Processor<String, Order, String, EnrichedOrder> {
    private static final long RETRY_MS = 10_000;
    private ProcessorContext<String, EnrichedOrder> ctx;
    private TimerService<String> timers;
    private KeyValueStore<String, Order> pending;
    private ReadOnlyKeyValueStore<String, Customer> customers;

    @Override public void init(ProcessorContext<String, EnrichedOrder> ctx) {
        this.ctx = ctx;
        this.timers = ctx.timerService();
        this.pending = ctx.getStateStore("pending");
        this.customers = ctx.getStateStore("customers");
    }

    @Override public void process(Record<String, Order> record) {
        Customer c = customers.get(record.value().customerId());
        if (c != null) {
            ctx.forward(new Record<>(record.key(), enrich(record.value(), c), record.timestamp()));
            return;
        }
        // Cache miss — stash the order and retry 10s later
        pending.put(record.key(), record.value());
        timers.registerProcessingTimeTimer(record.key(), timers.currentProcessingTime() + RETRY_MS);
    }

    @Override public void onTimer(long ts, String key, TimerContext<String, EnrichedOrder> tctx) {
        Order order = pending.delete(key);
        Customer c = customers.get(order.customerId());
        tctx.forward(key, c != null ? enrich(order, c) : fallback(order));
    }
}

// Wire into the topology — pass the store names the Processor declares
orders.process(DelayedEnrich::new, "pending", "customers");

Runtime, plugins & testing

Most stream-processing libraries hand you an engine and stop. StoatFlow goes further — the scaffolding around it ships as part of the product, not left as homework for every team to re-do.

Batteries-included micro-runtime

Health checks, metrics, debug endpoints, and config defaults distilled from years of running Kafka apps in production — already wired in.

Gradle & Maven plugins

The Gradle convention plugin ships application + shadow + JDK 25 toolchain. Optional Docker build via Jib; optional GraalVM native image — opt in with a flag.

TopologyTestDriver

In-memory, broker-free test driver — familiar to Kafka Streams users — extended for timers, scheduled sources, and watermark control.

Get started

Why StoatFlow?

Engineering choices grounded in years of running stream processing applications in production — and the practitioner background behind every architectural decision.

Better performance at lower costs

Benchmarked stateful workloads vs Kafka Streams — same hardware, throughput parity:

4.4–7.8× less container memory
1.5–3.4× less CPU

One single-instance deployment per app — no cluster, no over-provisioned, under-utilised k8s resources, no mandatory standby replicas. The cloud bill follows the resource curve.

Ship without specialist expertise

Production defaults baked in. Health checks, metrics, debug endpoints, Kubernetes probes — out of the box.

Consumer-group sizing, rebalance tuning, standby-replica strategy, checkpoint configuration — decisions you don't have to make.

Existing JVM teams productionize stream processing without first becoming Kafka, Kafka Streams, or Flink specialists.

Built by Kafka practitioners

— Hartmut Armbruster

Speaker

3× Current — the leading Apache Kafka conference (Austin 2024 · London 2025 + 2026). Also Berlin Buzzwords (2026), WeAreDevelopers World Congress (Berlin 2024 + 2026), Big Data Conference Europe (Vilnius 2024).

Author

Kafka Streams Topology Design — an open standard for visualising and documenting Kafka Streams application architectures.

Industry

18 years engineering, with Kafka in production at HSBC, NEX (CME Group), Raiffeisen Switzerland, and Deutsche Bahn.

LinkedIn thriving.dev

Frequently asked questions