Platform ·Part 1 · April 17, 2026
KubeTrader: Data Platform
Introduction to KubeTrader and the data platform.
Introduction
Motivation
Experimenting with trading strategies is a spare-time habit of mine, and for a while before this project I had been sitting on two strategy families I wanted to put into production but never found the time for:
- Funding-rate strategies
- Basis-spread trading
Being in the space long enough, you build an instinct for what will and won't work, and I knew these two wouldn't, at least not as standard bots. Two constraints kill them at that scale:
- Latency: Too slow, and I become the optimal target for adverse selection.
- Breadth: Too small a trading universe, and the opportunity is too narrow to be meaningful.
If I wanted them to actually work, this could not be a weekend script. They needed a serious platform behind them, months of work or more, which is what I set out to build.
High level architecture
At a high level, I needed something that looked like this:
Setting the trading plane aside for now, one thing became clear early: this was a data problem.
Requirements
The functional requirements were clear enough. I needed a data platform that could ingest from multiple sources, normalize, distribute, compute, store, and serve it back for real-time monitoring and analytics, with a sandbox for experimenting on top.
As for the non-functional requirements, I prioritized:
- Low latency: The strategies live or die on the hot path, so the platform cannot be the bottleneck.
- Scalability: Across three axes: the trading universe (business), the processing (compute), and the data volume (storage).
- Extensibility: Adding a new exchange or a new strategy class should be cheap, ideally just a mapping.
- Data integrity: No loss, no duplicates.
- Simplicity: In both operations and delivery, because I run and optimize this alone.
Since I was bootstrapping this on my own funds, cost was, and still is, the major constraint. It shaped how I went about the whole project.
The first decision it forced was to split the work into milestones. v1 was about putting everything together as fast as I could, making sure the integrations work end to end, and keeping latency in a reasonable range, enough to answer the only question that mattered first: do these strategies deserve further investment? With v1 answered, the current iteration, v2, is about optimizing performance and going deeper.
The second was compute. Given the need for elasticity, simple operational overhead, and aggressive cost control, Kubernetes was the logical choice over bare metal. I was confident I could still drive latency into the microsecond range when I needed to. Moreover, I wanted to run the platform as cheaply as possible, so roughly 80% of it runs on spot nodes, which had an unexpected payoff too, as we will see later in the series.
Data platform
1. Data sources
Dealing with multiple exchanges and APIs is tedious work I would rather skip. So a reasonable first question: can I outsource it? The answer is yes, but it costs money and latency, two things I am not flexible on.
So, back to the drawing board. A method I use to visualize the big picture across data groups is to classify the data along the axes below. It removes ambiguity early and eases the design and code of the ingestion layer.
| Axis | Sub-type |
|---|---|
| Domain | Market / Trading / Reference |
| Access control | Anonymous / Authenticated |
| Connection | HTTPS / WSS / FIX |
| Session | Heartbeat / Renewal (listen keys) / Disconnect expected |
| Serialization | JSON / SBE |
| Interaction | Request-reply / Subscribe-push |
| Freshness | Real-time / Near real-time / Occasional / On demand |
| State model | Stateless / Snapshot / Delta |
1.1 Public data
A useful exercise is to give every endpoint a source profile card:
Trade · Binance spot (wss://stream.binance.com:9443/ws/btcusdt@trade) | |
|---|---|
| Domain | Market |
| Access control | Anonymous |
| Connection | WSS |
| Session | Heartbeat |
| Serialization | JSON |
| Interaction | Subscribe-push |
| Freshness | Real-time |
| State model | Stateless |
The same data can wear more than one card, exposed through different methods. The same feed might be available over both WSS and HTTP, or public over WSS with JSON but requiring auth for WSS with SBE. Sometimes the same data lives behind two different endpoints, each bundling it with other data you may or may not want. So choosing a source is not automatic, it takes some thought about what you actually need.
Once the sources are mapped, unification and normalization become mine to own, and two challenges show up.
The first is heterogeneity, and it is the bounded one. Data has to map into internal unified models: symbols, timestamps, fields. Sometimes fields do not line up cleanly, or exchanges expose different depths of the same information. It is tedious and takes care, but it is a static, mapping problem. You solve it once per venue and it stays solved.
The second is dynamism, and this is the harder one, because it never stays solved. Exchanges list and delist symbols almost daily. What is available on exchange A may not exist on exchange B, so the trading universe is constantly shifting under you. And the part that bothers me most: two exchanges can offer the same data type at completely different cadence. One pushes orderbook updates every 100ms, another every 10ms or in real time. That is not a mapping you write once. It is a moving target the system has to absorb continuously, and it shapes most of the decisions in the layers that follow.
1.2 Private account data
Private account data (trades, fills, positions) is one of the trickier things to get right, starting with where it should live: the data plane or the trading plane. The answer is both, in a cooperative model.
The trading plane holds its own canonical state and is responsible for its own recovery, locally or by re-fetching from the exchange. It has to be low latency and self-sufficient, so it keeps its own copy.
But the same account data also flows independently into the data plane, for two reasons:
- Change rate: the engine, its strategies, and its execution logic are a never-ending optimization game, so by design they have a high change rate. Every change is at best a small interruption, and at worst a rollback that takes the engine down. During that window, account visibility through the engine is gone.
- Independent consumers: risk management, kill switches, and monitoring must always see account data, and they cannot depend on the trading engine to deliver it, especially when the engine is the thing that has failed.
So account state lives in both places. The engine owns its canonical copy, and an independent path keeps the data plane and its consumers from ever going blind. A fill landing on a limit order, for example, arrives over the user stream onto the data plane, and the engine can also poll the exchange directly when it needs to confirm its own state.
2. Feed handler
2.1 The architecture
With the requirements, system qualities, and source structures defined, I could design the feed handler to deliver on them.
Extensibility
The workload fits naturally into a standard data-processing pipeline. The architecture relies on three components:
- an input mapped to a source,
- a processor that translates the raw payload into the internal model (using a DSL like JSONata),
- an output routing to the distribution layer.
Backpressure
Before writing code, I had to define what happens when a consumer falls behind or vanishes. The rule: nothing on the hot path should ever act on stale data.
The backpressure behavior splits into two distinct policies depending on data type:
- Irreplaceable data (orderbook snapshots + deltas): orderbook state is not available from Binance's historical API. Once the moment passes, that exact state cannot be refetched. So these are never dropped under pressure. They route to a dead-letter queue (SQS) to be reprocessed later.
- Recoverable data (trades, fills, liquidations): these are available via the historical API. Dropping them under pressure is safe, because a backfill job can pick them up from the exchange afterwards.
In all failure modes, observability has to surface the issue immediately.
Scalability
Scalability is driven by a shared-nothing model. Every pod owns its own partition. One pod handles the orderbook for a specific set of symbols, another handles trades for a different set. This scales horizontally until capped by external factors, exchange limits, compute, or consumer capacity. It also gives strong fault isolation, since a failing pod takes down its own partition and nothing else.
Latency
v1 had a tolerant latency budget, mostly consumed by the exchange round-trip itself. The latency that is actually mine, the in-process ingestion path, sits in the microsecond range. Knowing latency optimization was inevitable, I made a structural choice upfront: each partition runs on a single thread that owns its data end to end. Avoiding thread multiplexing keeps the door open for the v2 optimizations: cache-friendly code, NIC queues pinned to specific CPUs, single-thread instances like Graviton. The wire latency, which dominates the end-to-end budget, is covered in depth in the network architecture post.
One thread owns one shard end to end: a single OS thread, no multiplexing. One pod per market partition. Pods spread across spot nodes of mixed capacity, and a shard is the unit the operator moves when it rebalances.
2.2 v1 implementation
The goal for v1 was to build as fast as possible, targeting Binance futures (USDM perps specifically) and Binance spot.
Depending on payload size, I knew Python would choke at a few hundred to a couple of thousand messages per second, degrading p99 latency. The ideal answer was Go or Rust, but a pragmatic alternative presented itself: Redpanda Connect (formerly Benthos).
Benthos handled public market data over WSS with throughput that held under the v1 load. This cleanly solved the public feed. For the private user stream, which requires authenticated connections, 24-hour disconnect handling, and listen-key renewals, the message volume was low enough that I stuck with Python rather than building a custom Benthos component.
The dynamism problem
The trading universe shifts daily. To avoid hand-editing YAML templates every time a symbol listed or delisted, I wrote a Python/Jinja templating tool. It pulls current symbols from the exchange, shards them, and generates ConfigMaps. I push, ArgoCD syncs, and the pods spin up with the correct partitions.
Automating the tool itself hit a wall. Moving the script to GitHub Actions returned a 503: GitHub IPs are blocked by Binance to prevent misuse. For v1, I ran the tool manually. For v2, these workers will move to AWS CodeBuild.
2.3 v1 limitations
By the end of v1, the feed handler was running comfortably, ingesting both Binance feeds and sharding cleanly. But the limitations were obvious:
- Manual sharding. Partition counts were based on manual tuning rather than dynamic load. There was no mechanism to rebalance hot pods.
- Heavy rollouts. Because of the templating model, adding a symbol required regenerating configs, committing, and rolling out pods, even though Binance supports live WSS subscribe and unsubscribe.
- Thin failure recovery. If a pod died, its partition stayed offline until Kubernetes provisioned a replacement. There was no in-flight handoff.
The core issue is that v1 treated pods as static infrastructure, whereas the problem is dynamic and stateful. Symbols change, load shifts, and pods fail. The response to all three is the same: continuously reconcile the partition map against the live universe and the live pods. That is exactly what a Kubernetes operator is for.
2.4 KubeStream
The v2 plan is to turn the feed handler into a managed application. A custom operator, KubeStream, owns the partition map. It watches the symbol universe and the pods, and reconciles state continuously:
- Live updates. When a symbol lists or delists, the operator assigns or releases it on the right pod and triggers a live WSS subscribe or unsubscribe. No rollouts.
- Failover. If a pod crashes, KubeStream redistributes its symbols across the healthy pods until a replacement boots, then rebalances.
- Atomic rebalancing. If load drifts, the operator can move a symbol from a hot pod to a cooler one: subscribe on the new pod, confirm, unsubscribe on the old.
These cases push it beyond a plain Kubernetes operator. A controller reconciles desired and actual state declaratively, but moving a symbol cleanly is a multi-step workflow with steps that can fail partway: subscribe on B, confirm B is receiving, then unsubscribe on A, and handle the case where any step times out. That is orchestration, not reconciliation, which is why the longer-term direction is to back the operator with a workflow engine. At that point it is less a Kubernetes operator and more a data controller, owning the live distribution of the feed across the fleet.
This is the future plan, probably another multi-month project.
The latency blind spot
The first version of the feed handler ran on a dedicated Karpenter nodepool, a single compute family spanning a few instance generations (5th, 6th, and 7th). For the first few days everything ran smoothly. Nodes came and went as spot availability shifted, which is expected in this phase, and the latency profile stayed consistent.
Then orderbook latency degraded sharply. Exchange-to-app p99 went from around 5ms to roughly 20ms. My own processing latency was unchanged. The pods were pinned to the same AZ, and I had pushed nothing recently, since I was working on a different part of the system entirely. Pod metrics, node metrics, nothing looked out of the ordinary.
The only thing that had changed was that the workload had moved to a new node, which by itself looked routine. After some digging, the difference surfaced: the workload had been scheduled on 7th-generation nodes and had moved onto a 5th-generation one. Processing was unchanged and the AZ was the same, so I attributed the regression to the hardware generation, most likely a different Nitro card. Testing supported it: 6th-generation nodes matched 7th-generation performance, and they share the same Nitro family. I accepted a temporary fix, scheduling only on newer generations (6th, 7th, 8th).
But "most likely" is the problem. To give a conclusive answer I would need visibility I do not have: the kernel network path. From Binance to the application I am effectively blind. I know when Binance emitted an event and when I first received it, but everything in between is unknown.
This is what motivates the observability work in v2. The plan is to enable hardware timestamping, which these instances support, so every packet is timestamped at the NIC, and to hook eBPF probes along the path to see exactly where time is being spent. That lines up directly with the v2 goal of optimizing latency: you cannot optimize a path you cannot measure.
