Skip to main content

OLake + Iceberg + Presto Playground

A self-contained developer playground to explore lakehouse architecture using Apache Iceberg with:

  • MySQL as the source database
  • OLake for schema discovery and CDC ingestion via a UI
  • MinIO as the object store
  • Temporal to orchestrate ingestion workflows
  • Presto to query Iceberg table

🎯 Goal: Let developers experience an Iceberg-native lakehouse end-to-end β€” without needing to stitch services or write config files β€” using a single docker-compose up.

✨ Features​

  • CDC from MySQL to Iceberg
  • Schema discovery and ingestion via OLake UI
  • Tables automatically created in Iceberg format
  • Presto queries with no manual registration
  • Visual orchestration with Temporal UI
  • 2 setup process using Docker Compose

🧱 Architecture​

MySQL --> init_cdc_user         
β†˜οΈŽ β†˜οΈŽ
data_loader binlog_checker
↓
OLake (UI) --> Iceberg (MinIO)
β†˜οΈŽ
Presto
OLake Services:
- Temporal
- PostgreSQL (Temporal DB)
- Temporal Worker
- Temporal UI
- Elasticsearch (Temporal search)

βš™οΈ Prerequisites​

  • Docker + Docker Compose
  • ~8 GB RAM
  • Git

🏁 Getting Started​

1. Clone the Repo​

git clone https://github.com/datazip-inc/olake/iceberg-presto-playground.git    
cd examples

2. Setup​

  • Edit persistence/config paths (required):
    The docker-compose.yml uses /your/chosen/host/path/olake-data as a placeholder for the host directory where Olake's persistent data and configuration will be stored. You must replace this with an actual path on your system before starting the services. You can change this by editing the x-app-defaults section at the top of docker-compose.yml:
x-app-defaults:   
host_persistence_path: &hostPersistencePath /your/host/path

Make sure the directory exists and is writable by the user running Docker (see how to change file permissions for Linux/macOS).

  • Customizing Admin User (optional):
    The stack automatically creates an initial admin user on first startup. The default credentials are:
    Username: "admin"
    Password: "password"
    Email: "test@example.com”
    To change these defaults, edit the x-signup-defaults section in your docker-compose.yml:

x-signup-defaults:
username: &defaultUsername "your-custom-username"
password: &defaultPassword "your-secure-password"
email: &defaultEmail "your-email@example.com"

3. Launch the Playground​

docker-compose up --build    

This will spin up:

  • MySQL + β€œinit-mysql-tasks” helpers:
    ”init-mysql-tasks" sets up the following things:
    • setup the cdc: sets up replication privileges
    • load data: inserts sample data
    • health checks: verifies setup
  • OLake backend services (UI + ingestion pipeline)
  • Temporal (Orchestration if required)
  • MinIO: Iceberg object store

Once this is setup, we can spin up Presto to check and query the iceberg table by running the following commands:

cd presto   
docker run -d --name my-olake-presto-coordinator
--network app-network
-p 80:8080
-v "$(pwd)/etc:/opt/presto-server/etc"
prestodb/presto:latest

The above command will first move to the presto directory where all the necessary config are already setup and the next one spins up the docker container within required network and port.

πŸ’» OLake UI Walkthrough​

  1. Access OLake UI: http://localhost:8000
    You will be asked to add username and password, please add the same username and password which you added in docker compose file, if you are using default it will be:
    username: admin
    password: password
  2. Choose MySQL as the source and name the source it can be as simple as mysql_temp
    If you plan to use the MySQL setup within the docker as a source use following config:
    To set up a Source connection to the primary_mysql database within Olake:
  • Host: host.docker.internal
  • Port: 3306
  • Database: employees
  • User: root
  • Password: password

olake-iceberg-presto-1

  1. Add Destination Details and choose Iceberg Connector and Rest as the catalog
    Use following credentials to sync directly to set up a Destination connection for Apache Iceberg within Olake:
  • Iceberg REST Catalog URL: http://host.docker.internal:8181
  • Iceberg S3 Path (example): s3://warehouse/employees/
  • Iceberg Database (example): employees
  • S3 Endpoint (for Iceberg data files written by Olake workers): http://host.docker.internal:9090
  • AWS Region: us-east-1
  • S3 Access Key: minio
  • S3 Secret Key: minio123

olake-iceberg-presto-2

  1. Use the UI to: Select tables & fields
    olake-iceberg-presto-3

  2. Add the job name and frequency (frequency can be optional if you are only trying to have a one time sync just to check out the iceberg ecosystem):
    olake-iceberg-presto-4

  3. Create the jobs and run the sync manually to start seeing data in Iceberg
    olake-iceberg-presto-5

  4. (optional) You can check the logs in jobs history section

You don't need to manually register tables β€” OLake writes directly in Iceberg format and Presto can pick them up automatically.

πŸ” Query in Presto
To Use Web UI, access the following in your browser

http://localhost:80/ui/   

olake-iceberg-presto-6

You can run the SQL queries in SQL client within this UI or in CLI to be accessed post docker exec in the presto container.

For UI, where β€œIceberg” is the catalog and β€œolake_iceberg_test” is your schema, you can run the SELECT * FROM iceberg.olake_iceberg_test.sample_table LIMIT 10; to checkout the sample table loaded in previous step:

olake-iceberg-presto-7

You’ll be querying live Iceberg tables stored in MinIO and created automatically by OLake.

Or

You can run the presto-cli inside the coordinator container with the below command:

docker exec -it presto-coordinator presto-cli   
presto>

After you run the command, the prompt should change from the shell prompt $ to the presto> CLI prompt. Run the SQL statement show catalogs to see a list of currently configured catalogs:

presto> show catalogs;   
Catalog
---------
iceberg
system

We'll be working almost exclusively with the "iceberg" catalog and "olake_iceberg_test" schema, so we can employ a USE statement to indicate that all the queries we run will be against tables in this catalog/schema combination unless specificed. Otherwise, we would have to use the fully-qualified table name for every statement (i.e., iceberg.olake_iceberg_test.<table_name>).

presto> USE iceberg.employees;   
USE
presto:employees>

After you run the command, the prompt should change from the shell prompt presto> to the presto:olake_iceberg_test> CLI prompt.
You can run SQL commands here like show tables;

presto:employees> show tables;   
Table
------------------
dept_emp
dept_manager
employees
olake_test_table
salaries
titles

We can query some of the Iceberg metadata information from Presto. Let's look at the hidden "history" table from Presto. Note that the quotation marks are required here.

presto:olake_iceberg_test> select * from "dept_emp$history";   
made_current_at | snapshot_id | parent_id | is_current_ancestor
-----------------------------+---------------------+---------------------+---------------------
2025-05-29 17:11:10.005 UTC | 4569288960464380470 | NULL | true
2025-05-29 17:11:25.881 UTC | 3854975833623379842 | 4569288960464380470 | true
2025-05-29 17:11:42.358 UTC | 679776884045929590 | 3854975833623379842 | true
2025-05-29 17:11:58.331 UTC | 989408625431662762 | 679776884045929590 | true
2025-05-29 17:12:18.400 UTC | 5268185526158132892 | 989408625431662762 | true
2025-05-29 17:12:43.364 UTC | 8146414278478815017 | 5268185526158132892 | true
2025-05-29 17:13:12.189 UTC | 3766241920740189413 | 8146414278478815017 | true
2025-05-29 17:13:31.398 UTC | 8298173997261493168 | 3766241920740189413 | true
2025-05-29 17:13:47.191 UTC | 159357971009808809 | 8298173997261493168 | true
(1 row)

Query 20250527_084640_00006_pziar, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
[Latency: client-side: 434ms, server-side: 405ms] [1 rows, 17B] [2 rows/s, 41B/s]

This shows us that we have a snapshot that was created at the moment we inserted data. We can get more details about the snapshot with the below query:

presto:olake_iceberg_test> SELECT * FROM "dept_emp$snapshots";   
committed_at | snapshot_id | parent_id | operation | man>
-----------------------------+---------------------+---------------------+-----------+----------------------------------------------------------->
2025-05-27 09:57:09.749 UTC | 3459967128286778433 | NULL | overwrite | s3://warehouse/olake_iceberg_test/sample_table/metadata/sn>
2025-05-27 09:57:10.482 UTC | 4648278466592688978 | 3459967128286778433 | overwrite | s3://warehouse/olake_iceberg_test/sample_table/metadata/sn>

This gets us a little more information, such as the type of operation, the manifest list file that this snapshot refers to, as well as a summary of the changes that were made as a result of this operation.

Let's go one level deeper and look at the current manifest list metadata:

presto:olake_iceberg_test> SELECT * FROM "dept_emp$manifests";   
path | length | partition_spec_id | added_snaps>
------------------------------------------------------------------------------------------------------+--------+-------------------+------------->
s3://warehouse/olake_iceberg_test/sample_table/metadata/69090e34-d2b5-40ef-88cb-9935ee628352-m0.avro | 7063 | 0 | 464827846659>
s3://warehouse/olake_iceberg_test/sample_table/metadata/4bc9994e-a8f2-46a3-85c3-9b57a972afda-m0.avro | 7063 | 0 | 345996712828>
s3://warehouse/olake_iceberg_test/sample_table/metadata/69090e34-d2b5-40ef-88cb-9935ee628352-m1.avro | 6988 | 0 | 464827846659>
s3://warehouse/olake_iceberg_test/sample_table/metadata/4bc9994e-a8f2-46a3-85c3-9b57a972afda-m1.avro | 6987 | 0 | 345996712828>

There are other hidden tables as well that you can interrogate. Here is a summary of all hidden tables that Presto can provide:

  • $properties: General properties of the given table
  • $history: History of table state changes
  • $snapshots: Details about the table snapshots
  • $manifests: Details about the manifest lists of different table snapshots
  • $partitions: Detailed partition information for the table
  • $files: Overview of data files in the current snapshot of the table

πŸ§ͺ Things to Try​

**πŸ“ Key Files**   
β”œβ”€β”€ docker-compose.yaml # Spins up source db, olake, iceberg rest and minio
β”œβ”€β”€ olake-data/ # to save all the olake data like configs
β”œβ”€β”€ presto
β”œβ”€β”€ README.md
β”œβ”€β”€ etc
β”œβ”€β”€ config.properties
β”œβ”€β”€ jvm.config
β”œβ”€β”€ catalog
β”œβ”€β”€ iceberg.properties
β”œβ”€β”€ README.md

No manual configuration or table registration needed. Just OLake UI + Presto + Iceberg.

πŸ“š References​

πŸ’¬ Feedback & Support​

We'd love to hear what you build with it.


Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

  • Email Support: Reach out to our team at hello@olake.io for prompt assistance.
  • Join our Slack Community: where we discuss future roadmaps, discuss bugs, help folks to debug issues they are facing and more.
  • Schedule a Call: If you prefer a one-on-one conversation, schedule a call with our CTO and team.

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!