From 40-Minute to Sub-Minute Segmentation Queries: How Bitespeed rebuilt its customer segmentation engine using OLake and Apache Iceberg
How Bitespeed rebuilt its customer segmentation engine using OLake and Apache Iceberg - without breaking its budget

Company Overview​
Bitespeed is a customer engagement and messaging platform built for modern commerce brands. Launched in early 2023, the platform powers high-volume WhatsApp, SMS, and multi-channel messaging workflows, along with a sophisticated segmentation engine that enables brands to target users based on complex behavioral and transactional criteria.
At its core, Bitespeed operates as a customer data platform (CDP), stitching together Shopify orders, browser events, messaging data, and customer metadata to enable real-time campaigns and journeys. As adoption grew, so did the complexity and scale of its data.
The Early Architecture: One Database, Many Responsibilities​
Like many early-stage startups, Bitespeed began with a simple architecture: a single Postgres database that handled everything—transactions, operational queries, and analytics.
For a while, this worked.
But as customers began creating increasingly complex segments—queries spanning geography, order history, price thresholds, and time windows the system started to strain.
"We have a segmentation engine where users can create arbitrarily complex queries on the fly. That's where things started breaking down."

Nitish Gupta
Software Engineer
Analytical workloads began competing with live application traffic. Some segments would take 40–50 minutes to evaluate, while others never completed at all. Despite continuous query optimizations and application-level tuning, the system had reached a fundamental limit.
The Hidden Cost of Scaling Postgres​
To keep the platform responsive, Bitespeed repeatedly scaled its Postgres instance. Eventually, the company was spending $300 per day, nearly $10,000 per month on a single database.
Ironically, CPU and memory utilization stayed below 25%. The real cost driver was IOPS (Input/Output Operations Per Second).
"We were paying purely for IOPS. We had thrown money at the problem, and it still didn't really solve it."

Nitish Gupta
Software Engineer
This created a hard constraint:
Bitespeed needed both its application database and a new analytics platform to run within the same $10K monthly budget.

Exploring the Obvious Options (and Why They Didn't Work)​
Bitespeed explored several paths:
- Agency-led Redshift migration via AWS
- AWS DMS for database replication
- Databricks, which delivered strong performance in POCs
- Fivetran, evaluated for managed ingestion simplicity
Each option failed for a different reason. Agency-led pipelines were expensive and unpredictable. AWS DMS failed to migrate critical high-volume tables reliably. Databricks worked technically but introduced cost uncertainty at production scale. Managed tools came with pricing models that felt irreversible.
"Once you commit to these tools, there's no going back. You just keep paying whatever they ask."

Nitish Gupta
Software Engineer
The common thread: lack of cost control, operational clarity, and ownership.

Discovering OLake Through Apache Iceberg​
While researching alternatives, Bitespeed repeatedly encountered Apache Iceberg initially through ChatGPT and technical searches. That curiosity led them to OLake.
"I was literally searching 'move Postgres data to Apache Iceberg' and OLake kept showing up."

Nitish Gupta
Software Engineer
What stood out immediately:
- Open-source and self-hosted
- Clear documentation
- A narrow, focused mission: reliable ingestion into lakehouse formats
- No opaque pricing or lock-in
"I didn't want something fancy. I wanted something fast, cheap, and in my control."

Nitish Gupta
Software Engineer
Getting to Production Faster Than Expected​
Despite having no prior experience building data pipelines, the Bitespeed team was able to set up OLake within days.
"I set it up over a weekend. The documentation was clear, and things just worked."

Nitish Gupta
Software Engineer
OLake was deployed to sync Postgres data into Apache Iceberg tables on S3, forming the foundation of a new lakehouse architecture. Early hurdles of partitioning, compaction, and scale were resolved collaboratively with the OLake team.
"People were available on Slack, even late at night. That open-source spirit made all the difference."

Nitish Gupta
Software Engineer
What Changed After OLake​
1. Segmentation Queries Became Interactive​
Queries that once took nearly an hour now complete in under a minute.
"One minute is our new benchmark. If it takes longer, we assume the query is wrong."

Nitish Gupta
Software Engineer
This unlocked the ability to power the entire segmentation engine on Iceberg—handling thousands of queries per day.
2. Postgres Load Dropped Immediately​
With analytical queries moved off Postgres, Bitespeed can now downscale its primary database by multiple tiers.
"We expect to cut our Postgres cost from $300 a day to around $150 almost immediately."

Nitish Gupta
Software Engineer
3. The Platform Scales with the Business​
Bitespeed processes massive messaging volumes:
- 1+ billion WhatsApp messages already ingested
- Rapid growth across SMS, push, and RCS channels
- Five to Six thousand of segmentation queries per day, with headroom for more
"Iceberg makes 50,000 daily queries feel trivial. We're re-enabling features we had to disable before."

Nitish Gupta
Software Engineer
Operational Cost Snapshot​
- OLake ingestion: ~$30/day
- Iceberg maintenance and storage: ~$20/day
- Net impact: significant reduction in database spend while enabling new capabilities
The result: a scalable analytics platform that fits comfortably within Bitespeed's original budget constraints.
What's Next​
Bitespeed sees OLake and Iceberg as the long-term foundation for more than analytics.
Planned next steps include:
- Built-in monitoring and metrics for ingestion pipelines
- Automated compaction to manage file growth
- Making Iceberg data AI-ready, including semantic layers and RAG-friendly layouts
"We have all this data now. The next question is: how do we make it truly useful for AI?"

Nitish Gupta
Software Engineer
Summary​
With OLake, Bitespeed:
- Moved analytical workloads off Postgres without disruption
- Built a production-grade Iceberg lakehouse on a strict budget
- Reduced query latency from tens of minutes to seconds
- Regained cost predictability and architectural control
- Laid the groundwork for future AI-driven use cases
What started as a weekend experiment is now a core part of Bitespeed's data platform, proof that open-source, when done right, can outperform far more expensive alternatives.
OLake
Achieve 5x speed data replication to Lakehouse format with OLake, our open source platform for efficient, quick and scalable big data ingestion for real-time analytics.
