DatumLabs

25

Real-time data streams flowing into ClickHouse with seconds-level freshness

5 min

Transformation models refreshing and serving dashboards and APIs

Zero

Rows lost during full migration from Docker Compose to Kubernetes

"A fast pipeline that breaks silently is not an asset. It is a liability. The goal for Venon was not speed alone. It was a foundation where every failure is caught by the system before a user notices it, every change is reviewed before it ships, and every component can be handed off without anxiety."

At a glance

25 real-time data streams flowing from production into ClickHouse with seconds-level freshness
Every infrastructure change reviewed in version control and deployed automatically
Transformation models refreshing every 5 minutes, serving analytics APIs and dashboards directly

Venon is an ad analytics and e-commerce platform. At its core, the product tracks how advertising spend translates into shop revenue — impressions, clicks, visits, orders, refunds — across multiple ad accounts and storefronts simultaneously. For a platform where the core value is performance visibility, the data layer is not a back-office concern. It is the product.

The production database handled the application workload well. It was not designed to also answer analytical queries. Running complex analysis directly against a live transactional system creates read pressure that competes with application traffic and makes query performance unpredictable at scale. The engineering team knew this. What they needed was an analytical layer that matched the seriousness of the product they were building.

‍

The Problem Was Not Just Speed

Venon had already tried a simpler path. The original analytics stack ran on Docker Compose — a reasonable starting point that quickly revealed its limits in production. Every deployment was manual. Every schema change carried risk. When something broke, finding out why meant digging through logs on a single machine with no structured observability.

The specific issues that made the setup unsustainable:

No redundancy — a single hardware failure meant downtime with no automatic recovery
Manual deployments with no version history, no rollback path, no review process
Pipeline failures that were discovered by users rather than caught by the system
No way to safely onboard new data sources without risking what already existed
Stale data that was hours old by the time it reached dashboards

The team was not looking for a faster pipeline. They were looking for a foundation they could trust, extend, and hand off without anxiety.

‍

Building for Production From Day One

The architecture Datum Labs designed and now operates for Venon starts from the production database and ends at the analytics surface, with every layer observable and every change traceable.

Change data capture reads the database write-ahead log continuously. Every insert, update, and delete in the production system becomes a structured event that travels through Kafka — a durable message stream that acts as a buffer between the source and the analytical layer. If ClickHouse is temporarily unavailable, events queue in Kafka and drain when it recovers. Nothing is lost.

ClickHouse receives the event stream and maintains 27 raw tables that mirror the production database in real time. On top of those, a structured transformation layer processes raw change data into business-ready models — cleaning column names, applying revenue logic, resolving refund attribution, and computing funnel steps. Mart tables serve as the stable interface that dashboards and APIs query. They refresh every five minutes.

What makes this different from a fast data pipeline is the discipline applied to every layer around it. Schema contracts enforced at the stream level prevent upstream changes from silently corrupting downstream tables. Every ClickHouse schema change goes through a reviewed workflow before it is applied. Slack alerts fire on any pipeline failure with enough context to diagnose the issue without opening a terminal.

‍

From Docker Compose to Kubernetes, Without Dropping a Row

Migrating a production pipeline without disruption is where most data projects fail quietly. The Venon migration ran in parallel — the new Kubernetes-based stack processed live data alongside the existing setup, and parity was validated before any traffic was cut over.

The migration surfaced real challenges:

The old connector held an active database replication slot that had to be cleanly handed off before the new one could take over
Kafka topic naming rules on Kubernetes conflicted with the existing naming conventions, requiring a normalisation layer
The initial database snapshot placed more pressure on the connection pool than expected, requiring careful tuning before the pipeline stabilised

Each of these was resolved before the cutover. Consumer lag reached zero within minutes of the final switch. Row counts and timestamps matched exactly across source and destination.

‍

What the Team Operates Today

The Venon stack runs entirely on Kubernetes. Every component — the change capture connector, Kafka, ClickHouse, the transformation layer, the orchestration engine — is defined in code and lives in version control. Deployments are automatic on merge. Rollbacks are reverts.

The result for the analytics surface:

Ad performance — impressions, clicks, spend, and revenue available within seconds of events landing in the production database
Shop revenue — GMV, refunds, and net revenue tracked in near real time, with attribution logic handled in the transformation layer rather than the API
Funnel visibility — from first page visit to conversion, every step updated within minutes, not hours
Historical backfills — the ability to reprocess past data through updated logic without touching the live pipeline

The observability layer monitors every layer continuously. Kafka consumer lag, ClickHouse insert throughput, transformation job success rates, and infrastructure health are all visible in a single monitoring surface. If a table has not updated within its expected window, an alert fires before any user notices.

Adding a new data source today means a new change capture topic, a new staging model, and a pull request. No re-architecture. No downtime. No surprises.

The pipeline Venon runs is not remarkable because it is fast. It is remarkable because it was built to be trustworthy — and every decision in the architecture reflects that.

Company Overview

An ad analytics and e-commerce platform where the data layer is the product. Venon tracks how advertising spend converts into shop revenue across multiple ad accounts and storefronts simultaneously, serving performance visibility to clients who operate at scale.

Venon runs a production-grade real-time analytics pipeline — built for the day it breaks, not just the day it launches