Skip to main content

OLake Playground

OLake Playground is a self-contained environment for exploring lakehouse architecture using Apache Iceberg. It comes preconfigured with all the required components, allowing you to experience the complete workflow without manual setup.

Included Components​

  • MySQL – Source database
  • OLake – Schema discovery and CDC ingestion via an intuitive UI
  • MinIO – Object store for data storage
  • Temporal – Workflow orchestration for ingestion processes
  • Presto – Query engine for Iceberg tables

Objective​

Enable developers to experiment with an end-to-end, Iceberg-native lakehouse in minutes. Simply run a single docker-compose up command to launch the full stack β€” no service stitching, no configuration files required.

βš™οΈ Prerequisites​

  • Docker: Latest version installed and running
  • Docker Compose: Latest version installed (usually included with Docker Desktop)
  • Resources: Allocate sufficient memory and CPU to Docker (e.g., 8GB+ RAM recommended)

Key Highlights​

  • CDC from MySQL to Iceberg – Seamless change data capture for near real-time data updates
  • Schema Discovery & Ingestion via OLake UI – Automatically detect and ingest schemas with an intuitive interface
  • Iceberg Table Creation – Tables are automatically created in Iceberg format, ready for use
  • Presto Query Ready – Query Iceberg tables instantly, with no manual registration required
  • Visual Workflow Orchestration – Manage ingestion workflows through the Temporal UI
  • Simple 2-Step Setup – Get started quickly using Docker Compose

Configuration & Set Up​

1. Clone the Repository​

git clone https://github.com/datazip-inc/olake.git
cd olake/examples

2. Set Up​

Edit Persistence/Config Paths (Optional)​

In docker-compose.yml, the path:

/your/chosen/host/path/olake-data

is a placeholder for the host directory where OLake will store its persistent data and configuration. Before starting the services, replace this with an actual path on your system by updating the x-app-defaults section at the top of docker-compose.yml:

x-app-defaults:host_persistence_path: &hostPersistencePath /your/host/path 

Make sure the directory exists and is writable by the user running Docker. (File permissions for Linux/MacOS).

Customizing Admin User (optional):​

The stack automatically creates an initial admin user on first startup. The default credentials are:

Username: "admin"
Password: "password"
Email: "test@example.com”

To change these defaults, edit the x-signup-defaults section in your docker-compose.yml:

x-signup-defaults:
username: &defaultUsername "your-custom-username"
password: &defaultPassword "your-secure-password"
email: &defaultEmail "your-email@example.com"

3. Launch the Playground​

docker-compose up -d   

On the first run, Docker will download all the necessary images, and the init-mysql-tasks service will clone the "weather" CSV and load it into MySQL. This initial setup, especially the docker image download part, can take some amount of time (potentially 5-10 minutes or more depending on internet speed and machine performance).

This will spin up:

  • MySQL + β€œinit-mysql-tasks” helpers: init-mysql-tasks sets up the following things -

    • setup the cdc: sets up replication privileges

    • load data: inserts sample data

    • health checks: verifies setup

  • OLake backend services (UI + ingestion pipeline)

  • Temporal (Orchestration if required)

  • MinIO: Iceberg object store

4. Accessing the Services​

Once the stack is up and running (especially after init-mysql-tasks and olake-app are healthy/started):

  • Olake Application UI: http://localhost:8000

    Default credentials:

    Username: admin

    Password: password

  • MySQL (primary_mysql):

    • Verify Source Data: Access the MySQL CLI
    docker exec -it primary_mysql mysql -u root -ppassword
    • Select the weather database and query the table
    USE weather;
    SELECT * FROM weather LIMIT 10;

    This will display the first 10 rows of the weather table.

5. Interacting with Olake​

  1. Log in to the Olake UI at http://localhost:8000 using the default credentials.

  2. Create and Configure a Job: Create a Job to define and run the data pipeline: On the main page, click on the "Create your first Job" button

Set up the Source:​

  • Connector: MySQL

  • Version: chose the latest available version

  • Name of your source: olake_mysql

  • Host: host.docker.internal

  • Port: 3306

  • Database: weather

  • Username: root

  • Password: password

olake-presto

Set up the Destination:​

  • Connector: Apache Iceberg

  • Catalog: REST catalog

  • Name of your destination: olake_iceberg

  • Version: chose the latest available version

  • Iceberg REST Catalog URL: http://host.docker.internal:8181

  • Iceberg S3 Path (example): s3://warehouse/weather/

  • Iceberg Database (example): weather

  • S3 Endpoint (for Iceberg data files written by Olake workers): http://host.docker.internal:9090

  • AWS Region: us-east-1

  • S3 Access Key: minio

  • S3 Secret Key: minio123

olake-presto

Select Streams to sync:​

  • Select the weather table using checkbox to sync from Source to Destination.

  • Click on the weather table and set Normalization to true using the toggle button.

Configure Job:​

  • Set job name and replication frequency.

Save and Run the Job:​

  • Save the job configuration.

  • Run the job manually from the UI to initiate the data pipeline from MySQL to Iceberg by selecting Sync now.

6. Query in Presto​

To access the presto web UI run the following commands :

cd presto 
docker run -d --name olake-presto-coordinator \
--network app-network \
-p 80:8080 \
-v "$(pwd)/etc:/opt/presto-server/etc" \
prestodb/presto:latest

Presto can be accessed here :

http://localhost:80/ui/ 

olake-presto

You can run the SQL queries in SQL client within this UI or in CLI to be accessed post docker exec in the presto container.

For UI, where β€œIceberg” is the catalog and β€œweather” is your schema, you can run the SELECT * FROM iceberg.weather.weather LIMIT 10; olake-presto

You’ll be querying live Iceberg tables stored in MinIO and created automatically by OLake. Or You can run the presto-cli inside the olake presto coordinator container with the below command:

docker exec -it olake-presto-coordinator presto-cli 
presto>

After you run the command, the prompt should change from the shell prompt $ to the presto> CLI prompt. Run the SQL statement show catalogs to see a list of currently configured catalogs:

presto> show catalogs;   
Catalog
---------
iceberg
system

We'll be working almost exclusively with the "iceberg" catalog and "weather" schema, so we can employ a USE statement to indicate that all the queries we run will be against tables in this catalog/schema combination unless specificed. Otherwise, we would have to use the fully-qualified table name for every statement (iceberg.weather.table_name)

presto> use iceberg.weather;
USE
presto:weather>

After you run the command, the prompt should change from the shell prompt presto> to the presto:weather> CLI prompt. You can run SQL commands here like

show tables;

presto:weather> show tables;
Table
------------------
olake_test_table
weather
(2 row)


πŸ’‘ Join the OLake Community!

Got questions, ideas, or just want to connect with other data engineers?
πŸ‘‰ Join our Slack Community to get real-time support, share feedback, and shape the future of OLake together. πŸš€

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!