OLake Playground
OLake Playground is a self-contained environment for exploring lakehouse architecture using Apache Iceberg. It comes preconfigured with all the required components, allowing you to experience the complete workflow without manual setup.
Included Componentsβ
- MySQL β Source database
- OLake β Schema discovery and CDC ingestion via an intuitive UI
- MinIO β Object store for data storage
- Temporal β Workflow orchestration for ingestion processes
- Presto β Query engine for Iceberg tables
Objectiveβ
Enable developers to experiment with an end-to-end, Iceberg-native lakehouse in minutes. Simply run a single docker-compose up
command to launch the full stack β no service stitching, no configuration files required.
βοΈ Prerequisitesβ
- Docker: Latest version installed and running
- Docker Compose: Latest version installed (usually included with Docker Desktop)
- Resources: Allocate sufficient memory and CPU to Docker (e.g., 8GB+ RAM recommended)
Key Highlightsβ
- CDC from MySQL to Iceberg β Seamless change data capture for near real-time data updates
- Schema Discovery & Ingestion via OLake UI β Automatically detect and ingest schemas with an intuitive interface
- Iceberg Table Creation β Tables are automatically created in Iceberg format, ready for use
- Presto Query Ready β Query Iceberg tables instantly, with no manual registration required
- Visual Workflow Orchestration β Manage ingestion workflows through the Temporal UI
- Simple 2-Step Setup β Get started quickly using Docker Compose
Configuration & Set Upβ
1. Clone the Repositoryβ
git clone https://github.com/datazip-inc/olake.git
cd olake/examples
2. Set Upβ
Edit Persistence/Config Paths (Optional)β
In docker-compose.yml, the path:
/your/chosen/host/path/olake-data
is a placeholder for the host directory where OLake will store its persistent data and configuration. Before starting the services, replace this with an actual path on your system by updating the x-app-defaults section at the top of docker-compose.yml:
x-app-defaults:host_persistence_path: &hostPersistencePath /your/host/path
Make sure the directory exists and is writable by the user running Docker. (File permissions for Linux/MacOS).
Customizing Admin User (optional):β
The stack automatically creates an initial admin user on first startup. The default credentials are:
Username: "admin"
Password: "password"
Email: "test@example.comβ
To change these defaults, edit the x-signup-defaults section in your docker-compose.yml:
x-signup-defaults:
username: &defaultUsername "your-custom-username"
password: &defaultPassword "your-secure-password"
email: &defaultEmail "your-email@example.com"
3. Launch the Playgroundβ
docker-compose up -d
On the first run, Docker will download all the necessary images, and the init-mysql-tasks service will clone the "weather" CSV and load it into MySQL. This initial setup, especially the docker image download part, can take some amount of time (potentially 5-10 minutes or more depending on internet speed and machine performance).
This will spin up:
-
MySQL + βinit-mysql-tasksβ helpers: init-mysql-tasks sets up the following things -
-
setup the cdc: sets up replication privileges
-
load data: inserts sample data
-
health checks: verifies setup
-
-
OLake backend services (UI + ingestion pipeline)
-
Temporal (Orchestration if required)
-
MinIO: Iceberg object store
4. Accessing the Servicesβ
Once the stack is up and running (especially after init-mysql-tasks and olake-app are healthy/started):
-
Olake Application UI: http://localhost:8000
Default credentials:
Username: admin
Password: password
-
MySQL (primary_mysql):
- Verify Source Data: Access the MySQL CLI
docker exec -it primary_mysql mysql -u root -ppassword
- Select the weather database and query the table
USE weather;
SELECT * FROM weather LIMIT 10;This will display the first 10 rows of the weather table.
5. Interacting with Olakeβ
-
Log in to the Olake UI at http://localhost:8000 using the default credentials.
-
Create and Configure a Job: Create a Job to define and run the data pipeline: On the main page, click on the "Create your first Job" button
Set up the Source:β
-
Connector: MySQL
-
Version: chose the latest available version
-
Name of your source: olake_mysql
-
Host: host.docker.internal
-
Port: 3306
-
Database: weather
-
Username: root
-
Password: password
Set up the Destination:β
-
Connector: Apache Iceberg
-
Catalog: REST catalog
-
Name of your destination: olake_iceberg
-
Version: chose the latest available version
-
Iceberg REST Catalog URL: http://host.docker.internal:8181
-
Iceberg S3 Path (example): s3://warehouse/weather/
-
Iceberg Database (example): weather
-
S3 Endpoint (for Iceberg data files written by Olake workers): http://host.docker.internal:9090
-
AWS Region: us-east-1
-
S3 Access Key: minio
-
S3 Secret Key: minio123
Select Streams to sync:β
-
Select the weather table using checkbox to sync from Source to Destination.
-
Click on the weather table and set Normalization to true using the toggle button.
Configure Job:β
- Set job name and replication frequency.
Save and Run the Job:β
-
Save the job configuration.
-
Run the job manually from the UI to initiate the data pipeline from MySQL to Iceberg by selecting Sync now.
6. Query in Prestoβ
To access the presto web UI run the following commands :
cd presto
docker run -d --name olake-presto-coordinator \
--network app-network \
-p 80:8080 \
-v "$(pwd)/etc:/opt/presto-server/etc" \
prestodb/presto:latest
Presto can be accessed here :
http://localhost:80/ui/
You can run the SQL queries in SQL client within this UI or in CLI to be accessed post docker exec in the presto container.
For UI, where βIcebergβ is the catalog and βweatherβ is your schema, you can run the
SELECT * FROM iceberg.weather.weather LIMIT 10;
Youβll be querying live Iceberg tables stored in MinIO and created automatically by OLake. Or You can run the presto-cli inside the olake presto coordinator container with the below command:
docker exec -it olake-presto-coordinator presto-cli
presto>
After you run the command, the prompt should change from the shell prompt $ to the presto> CLI prompt. Run the SQL statement show catalogs to see a list of currently configured catalogs:
presto> show catalogs;
Catalog
---------
iceberg
system
We'll be working almost exclusively with the "iceberg" catalog and "weather" schema, so we can employ a USE statement to indicate that all the queries we run will be against tables in this catalog/schema combination unless specificed. Otherwise, we would have to use the fully-qualified table name for every statement (iceberg.weather.table_name)
presto> use iceberg.weather;
USE
presto:weather>
After you run the command, the prompt should change from the shell prompt presto> to the presto:weather> CLI prompt. You can run SQL commands here like
show tables;
presto:weather> show tables;
Table
------------------
olake_test_table
weather
(2 row)