Apache Iceberg vs. Hive: Comprehensive Comparison for Data Lakehouses
Apache Hive and Apache Iceberg represent two different generations of the data lake ecosystem. Hive was born in the Hadoop era as a SQL abstraction over HDFS, excelling in batch ETL workloads and still valuable for organizations with large Hadoop/ORC footprints. Iceberg, by contrast, emerged in the cloud-native era as an open table format designed for multi-engine interoperability, schema evolution, and features like time travel. If you are running a legacy Hadoop stack with minimal need for engine diversity, Hive remains a practical choice. If you want a flexible, future-proof data lakehouse that supports diverse engines, reliable transactions, and governance at scale, Iceberg is the more strategic investment.
Data Ingestion From MySQL to Apache Iceberg: Optimizing Data Replication for Modern Analytics
MySQL powers countless production applications as a reliable operational database. But when it comes to analytics at scale, running heavy queries directly on MySQL can quickly become expensive, slow, and disruptive to transactional workloads.
Creating and Managing OLake Jobs with Docker CLI: A Practical Guide
A friendly, step-by-step walkthrough to configure replication from Postgres to Apache Iceberg (Glue catalog) using the OLake UI or the Docker CLI.
Comparing Delete Methods in Iceberg and Delta Lake: A Performance Review
In recent years, terms such as deletion vectors, position deletes, and other related concepts have become increasingly common in discussions around modern data lakehouse technologies. However, the nuances of these deletion mechanisms are not always well understood, despite their growing importance.
Building an Open Data Lakehouse: Integrating OLake, PrestoDB, MinIO, and Apache Iceberg
Learn how to build a complete open data lakehouse from scratch using MySQL, OLake, PrestoDB and MinIO. Get it running on your local machine in just a few steps with real-time CDC and analytics.
Building Modern Lakehouse with Iceberg, OLake, Lakekeeper & Trino
Iceberg is the storage "brain," OLake is the real-time "pipeline," and Trino is the fast "question-answering" engine. Together they turn raw object-storage files into a governed, low-latency analytics platform.