Skip to main content

OLake Playground

OLake Playground is a self-contained environment for exploring lakehouse architecture using Apache Iceberg. It comes preconfigured with all the required components, allowing you to experience the complete workflow without manual setup.

Included Components​

  • MySQL – Source database
  • OLake – Schema discovery and CDC ingestion via an intuitive UI
  • MinIO – Object store for data storage
  • Temporal – Workflow orchestration for ingestion processes
  • Presto – Query engine for Iceberg tables

Objective​

Enable developers to experiment with an end-to-end, Iceberg-native lakehouse in minutes. Simply run a single Docker Compose docker-compose up command to launch the full stack β€” no service stitching, no configuration files required.

βš™οΈ Prerequisites​

  • Docker: Latest version installed and running
  • Docker Compose: Latest version installed (usually included with Docker Desktop)
  • Resources: Allocate sufficient memory and CPU to Docker (e.g., 8GB+ RAM recommended)

Key Highlights​

  • CDC from MySQL to Iceberg – Seamless change data capture for near real-time data updates
  • Schema Discovery & Ingestion via OLake UI – Automatically detect and ingest schemas with an intuitive interface
  • Iceberg Table Creation – Tables are automatically created in Iceberg format, ready for use
  • Presto Query Ready – Query Iceberg tables instantly, with no manual registration required
  • Visual Workflow Orchestration – Manage ingestion workflows through the Temporal UI
  • Simple 2-Step Setup – Get started quickly using Docker Compose

Configuration & Set Up​

1. Clone the Repository​

git clone https://github.com/datazip-inc/olake.git
cd olake/examples

2. Set Up​

Edit Persistence/Config Paths (Optional)​

In docker-compose.yml, the path:

/your/chosen/host/path/olake-data

is a placeholder for the host directory where OLake will store its persistent data and configuration. Before starting the services, replace this with an actual path on your system by updating the x-app-defaults section at the top of docker-compose.yml:

x-app-defaults:host_persistence_path: &hostPersistencePath /your/host/path 

Make sure the directory exists and is writable by the user running Docker. (File permissions for Linux/MacOS).

Customizing Admin User (optional):​

The stack automatically creates an initial admin user on first startup. The default credentials are:

Username: "admin"
Password: "password"
Email: "test@example.com”

To change these defaults, edit the x-signup-defaults section in your docker-compose.yml:

x-signup-defaults:
username: &defaultUsername "your-custom-username"
password: &defaultPassword "your-secure-password"
email: &defaultEmail "your-email@example.com"

3. Launch the Playground​

docker-compose up -d   

On the first run, Docker will download all the necessary images, and the init-mysql-tasks service will clone the "weather" CSV and load it into MySQL. This initial setup, especially the docker image download part, can take some amount of time (potentially 5-10 minutes or more depending on internet speed and machine performance).

This will spin up:

  • MySQL + β€œinit-mysql-tasks” helpers: init-mysql-tasks sets up the following things -

    • setup the cdc: sets up replication privileges

    • load data: inserts sample data

    • health checks: verifies setup

  • OLake backend services (UI + ingestion pipeline)

  • Temporal (Orchestration if required)

  • MinIO: Iceberg object store

4. Accessing the Services​

Once the stack is up and running (especially after init-mysql-tasks and olake-app are healthy/started):

  • Olake Application UI: http://localhost:8000

    Default credentials:

    Username: admin

    Password: password

  • MySQL (primary_mysql):

    • Verify Source Data: Access the MySQL CLI
    docker exec -it primary_mysql mysql -u root -ppassword
    • Select the weather database and query the table
    USE weather;
    SELECT * FROM weather LIMIT 10;

    This will display the first 10 rows of the weather table.

5. Interacting with Olake​

  1. Log in to the Olake UI at http://localhost:8000 using the default credentials.

  2. Create and Configure a Job: Create a Job to define and run the data pipeline: On the main page, click on the "Create your first Job" button

Set up the Source:​

  • Connector: MySQL

  • Version: chose the latest available version

  • Name of your source: olake_mysql

  • Host: host.docker.internal

  • Port: 3306

  • Database: weather

  • Username: root

  • Password: password

OLake's job creation screen for setting up a new MySQL source, including connection details and sample source.json configuration

Set up the Destination:​

  • Connector: Apache Iceberg

  • Catalog: REST catalog

  • Name of your destination: olake_iceberg

  • Version: chose the latest available version

  • Iceberg REST Catalog URL: http://host.docker.internal:8181

  • Iceberg S3 Path (example): s3://warehouse/weather/

  • Iceberg Database (example): weather

  • S3 Endpoint (for Iceberg data files written by Olake workers): http://host.docker.internal:9090

  • AWS Region: us-east-1

  • S3 Access Key: minio

  • S3 Secret Key: minio123

OLake Apache Iceberg destination setup with REST catalog and S3 settings

Select Streams to sync:​

  • Select the weather table using checkbox to sync from Source to Destination.

  • Click on the weather table and set Normalization to true using the toggle button.

Configure Job:​

  • Set job name and replication frequency.

Save and Run the Job:​

  • Save the job configuration.

  • Run the job manually from the UI to initiate the data pipeline from MySQL to Iceberg by selecting Sync now.

6. Query in Presto​

To access the presto web UI run the following commands :

cd presto 
docker run -d --name olake-presto-coordinator \
--network app-network \
-p 80:8080 \
-v "$(pwd)/etc:/opt/presto-server/etc" \
prestodb/presto:latest

Presto can be accessed here :

http://localhost:80/ui/ 

Presto cluster overview with no queries running and one active worker

You can run the SQL queries in SQL client within this UI or in CLI to be accessed post docker exec in the presto container.

For UI, where β€œIceberg” is the catalog and β€œweather” is your schema, you can run the SELECT * FROM iceberg.weather.weather LIMIT 10; Weather data SQL query results showing date, station, and temperature columns

You’ll be querying live Iceberg tables stored in MinIO and created automatically by OLake. Or You can run the presto-cli inside the olake presto coordinator container with the below command:

docker exec -it olake-presto-coordinator presto-cli 
presto>

After you run the command, the prompt should change from the shell prompt $ to the presto> CLI prompt. Run the SQL statement show catalogs to see a list of currently configured catalogs:

presto> show catalogs;   
Catalog
---------
iceberg
system

We'll be working almost exclusively with the "iceberg" catalog and "weather" schema, so we can employ a USE statement to indicate that all the queries we run will be against tables in this catalog/schema combination unless specificed. Otherwise, we would have to use the fully-qualified table name for every statement (iceberg.weather.table_name)

presto> use iceberg.weather;
USE
presto:weather>

After you run the command, the prompt should change from the shell prompt presto> to the presto:weather> CLI prompt. You can run SQL commands here like

show tables;

presto:weather> show tables;
Table
------------------
olake_test_table
weather
(2 row)


πŸ’‘ Join the OLake Community!

Got questions, ideas, or just want to connect with other data engineers?
πŸ‘‰ Join our Slack Community to get real-time support, share feedback, and shape the future of OLake together. πŸš€

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!