Motivation
Why StoatFlow exists, and the wide middle of Kafka-native stream processing we target.
Stateful stream processing for Kafka, built to scale up — not out.Cut the distribution tax: fewer moving parts, simpler operations, more throughput.
StoatFlow is a different bet on how stream processing should work. This page explains the problem we set out to solve, the design choice at the heart of the system, and which workloads it fits — and which it doesn't.
Stream processing is hard
Stream processing is hard — building it well is one challenge, operating it reliably in production is another. Kafka Streams and Apache Flink are both excellent answers, and both demand serious investment.
Kafka Streams is a lightweight library embedded in your service. Apache Flink is a cluster-deployed engine. The DSLs differ, the runtimes differ, the operational profiles differ — but they share a starting assumption: stream processing should scale horizontally. Multiple instances coordinating through a consumer group; or a cluster of TaskManagers coordinating checkpoints.
For workloads that genuinely need that scale — terabytes of state, trillions of events per day — the architecture pays for itself. For the much larger middle band, it doesn't. Deploys trigger rebalancing storms. State migrates between instances. Repartition topics shuffle records through the brokers and back. Standby replicas duplicate state at extra resource cost. Engineering hours go into operational complexity — partition sizing, rebalance tuning, checkpoint configuration — rather than business logic.
We call this the distribution tax: the cost for scale-out you don't need — paid in infrastructure, staff time, and performance.
Until now, there's been no alternative. Stream processing meant choosing between a multi-instance library (Kafka Streams) and a multi-node cluster (Flink) — both with scale-out baked in. The middle path — single-machine, production-grade — didn't have a serious contender for stateful workloads.
The bet — single replica, modern JVM
StoatFlow runs your stream processing topology as a single replica.
That one architectural choice removes whole categories of distributed overhead. No consumer-group rebalancing — there's no "group". No state migration — state lives on the instance that owns it. No repartition topics — key changes route through in-memory queues to other lanes within the same process. Exactly-once semantics flows through a single commit barrier covering the whole topology, not per-task transactions scattered across instances.
What you give up is explicit: open-ended horizontal scaling. If your workload genuinely needs to grow beyond a single modern machine, StoatFlow isn't the right tool. What you gain is a model where lane parallelism is decoupled from Kafka partition count, where state is global (accessible from any lane), and where the cost of a deploy is a single-replica restart rather than a cluster-wide rebalance.
The trade is deliberate: simpler operations and better resource efficiency on a single box, in exchange for the option to scale out across many.
Why this is possible now
The single-replica bet works now because two things changed: the JVM concurrency model, and the hardware underneath it.
Modern JVM concurrency. Virtual threads (GA in JDK 21) let a single JVM run thousands of concurrent processing lanes without dedicating an OS thread to each one. In the platform-thread era, parallelism on a single box was bounded by core count and thread-pool sizing; blocking I/O — a REST call, a database query, an AI-inference request — would tie up a thread for the duration. With virtual threads, a blocked lane parks at near-zero cost, and the JVM continues to make progress on the rest. This is what makes external enrichment natural on a single instance rather than an async-framework exercise.
Modern hardware. On the major clouds, instances with 96+ cores, 25+ Gbps networking, and NVMe storage are now available off the shelf. Single-instance throughput, state size, and network bandwidth all scale with the largest single VM you can deploy — and that ceiling has grown substantially over the last few years.
The wide middle — who StoatFlow is for
StoatFlow is the third path.
Below the band, simple ETL pipes — a single transformation, no state, no joins — don't need a stream-processing framework at all. A plain Kafka consumer / producer does the job.
Above the band, workloads that genuinely require massive scale — think Netflix, Uber, Alibaba — use Flink for what it's best at: distributed stateful processing across many nodes, geographic redundancy, unified streaming and batch.
The middle is wide. Kafka-native applications that fit on one modern machine, that need exactly-once semantics, real joins, windowed aggregations, often external enrichment — and that benefit more from operational simplicity than from cluster elasticity. Until now, no library targeted that middle directly. StoatFlow does.
What we don't do
Single-replica means no horizontal scaling beyond one machine. If your sustained throughput requires more than one modern box, or your data needs to live in multiple geographic regions simultaneously, StoatFlow isn't the right fit.
Specifically out of scope:
- Open-ended horizontal scale. Throughput, state size, and memory are bounded by a single machine. On a Hetzner 8-core VM, benchmarks measure a 200–300 MB/s uncompressed network-bandwidth ceiling — in events, ~124K/sec on a 1KB stateless transform, up to ~2.1M/sec output on word-count-style aggregation. High-end infrastructure (96+ cores, faster NICs) hasn't been benchmarked yet, so actual headroom there is open. The ceiling moves with hardware, but it's still a ceiling.
- Multi-region or geo-distributed deployments. StoatFlow runs as one replica in one place. Active-active across regions requires a different architecture.
- Analytics, OLAP, or batch-style reporting. StoatFlow is not a query engine; it doesn't compete with ksqlDB, Druid, Pinot, or ClickHouse.
- Streaming SQL as a primary surface. StoatFlow is a Kotlin / Java DSL library for application developers. Teams expressing pipelines mainly in SQL are better served by Flink SQL.
- Teams that can't commit to JDK 25+. Virtual threads and the FFM stack aren't backward-portable; this is a hard requirement.
Stating these upfront isn't a defensive move. Designs that try to cover every case usually cover none well; saying what we don't do is part of saying what we do.
Where to go next
Three sibling pages turn the reasoning above into specifics:
- Features — what the DSL covers, the StoatFlow extensions on top, the state-store options.
- Benchmarks — the actual numbers behind the bet (latency, CPU, memory) versus Kafka Streams on identical hardware.
- Comparison matrix — head-to-head row-by-row against Kafka Streams, self-hosted Flink, and managed Flink platforms.
For the deeper architectural walk-through — lane dispatcher, commit-barrier protocol, state-store internals — see Architecture.