Last updated:6/30/2025|... min read

Docker Compose to local setup and testing

This page explains how to set up your local test environment using Docker Compose. Here's is the docker-compose.yml file used for local setup and testing.

The setup is simple:

Step 1. Clone the OLake repository and navigate to the local test directory.

git clone git@github.com:datazip-inc/olake.git

Step 2. Start the Docker Compose services.

Step 3. Create the source configuration file (source.json) in any directory of your choosing. Just make sure to reference its file path it in the sync command.

Step 4. Create the destination configuration file (destination.json).

Step 5. Run the discover process for stream schema discovery. ( This command creates a streams.json file with complete schema of your database like all streams / tables / collections, column names. column data types, etc. You can also set partition_regex).

Step 6. Run the sync process.

Step 7. Verify the data population via spark-sql.

Now lets move on to detailed setup instructions.

Prerequisites Installation

This section covers the installation steps for Java 17, Docker & Docker Compose, and Maven for macOS (M1 & Intel), Linux, and Windows.

Java 17

macOS (M1 & Intel)
Linux
Windows

Install via Homebrew:

brew install openjdk@17

Or follow the Adoptium Installation Guide.

Verification:

java -version

Ensure it mentions Java 17.

Note: If needed, add export PATH="/usr/local/opt/openjdk@17/bin:$PATH" (or equivalent) to your shell config (e.g., ~/.bashrc or ~/.zshrc).

For Ubuntu/Debian:

sudo apt update && sudo apt install openjdk-17-jdk

Verification:

java -version

Ensure it mentions Java 17.

Note: Add the Java binary to your PATH if your system doesn’t do it automatically.

Download from Oracle or Adoptium.
Run the installer.
Make sure the Java bin directory is in your System PATH.

Verification (Command Prompt):

java -version

Ensure it mentions Java 17.

Docker & Docker Compose

macOS (M1 & Intel)
Linux
Windows

Option 1: Docker Desktop

Install Docker Desktop from Docker Hub.
Ensure Docker is running.

Option 2: Homebrew

brew install --cask docker

Verification:

docker --version

Follow Docker Docs for your distribution.
For Ubuntu/Debian (example):

sudo apt update && sudo apt install docker.io docker-compose

Verification:

docker --version

Download Docker Desktop from Docker Hub.
Run the installer.
Launch Docker Desktop.

Verification (Command Prompt):

docker --version

Maven Installation

macOS (M1 & Intel)
Linux
Windows

Install via Homebrew:

brew install maven

Or download from Apache Maven.

Verification:

mvn -version

For Ubuntu/Debian:

sudo apt install maven

Or follow the Maven Installation Guide for other distros.

Verification:

mvn -version

Download Maven from Apache Maven.
Extract or install it.
Add the Maven bin folder to your PATH.

Verification (Command Prompt):

mvn -version

Local catalog Test Setup

You can test:

Glue
Hive
JDBC
REST

catalogs using the local test setup.

Steps to run:

Prerequisite: Ensure Docker is installed (instructions in the Prerequisites section above).

Clone OLake and Navigate to the local test directory:

git clone git@github.com:datazip-inc/olake.git
cd writers/iceberg/local-test

Start the docker-compose services:
```
docker compose up
```
This command starts the following services:
- Postgres: Acts as the JDBC catalog.
- Lakekeeper: Acts as the Iceberg catalog.
- hive-metastore: Acts as the Hive catalog.
- Minio: Provides an AWS S3-like filesystem for storing Iceberg data.
- Spark: For querying the Iceberg data.
Create the source configuration file:
Get the source configs of all Databases here from where you wish to sync data from.
Create the destination configuration file:
Create a file named destination.json in your working directory with the following content:

AWS Glue
JDBC Catalog
REST Catalog
Hive Catalog

destination.json
{
    "type": "ICEBERG",
    "writer": {
      "catalog_type": "glue",
      "iceberg_s3_path": "s3://<BUCKET_NAME>/<S3_PREFIX_VALUE>",
      "aws_region": "ap-south-1",
      "aws_access_key": "XXX",
      "aws_secret_key": "XXX",
      "iceberg_db": "ICEBERG_DATABASE_NAME",
      "grpc_port": 50051,
      "server_host": "localhost"
    }
}

destination.json
{
  "type": "ICEBERG",
  "writer": {
    "catalog_type": "jdbc",
    "jdbc_url": "jdbc:postgresql://host.docker.internal:5432/iceberg",
    "jdbc_username": "iceberg",
    "jdbc_password": "password",
    "iceberg_s3_path": "s3a://warehouse",
    "s3_endpoint": "http://host.docker.internal:9000",
    "s3_use_ssl": false,
    "s3_path_style": true,
    "aws_access_key": "admin",
    "aws_region": "ap-south-1",
    "aws_secret_key": "password",
    "iceberg_db": "ICEBERG_DATABASE_NAME"
  }
}

destination.json
{
  "type": "ICEBERG",
  "writer": {
    "catalog_type": "rest",
    "rest_catalog_url": "http://localhost:8181/catalog",
    "iceberg_s3_path": "warehouse",
    "iceberg_db": "ICEBERG_DATABASE_NAME"
  }
}

destination.json
{
    "type": "ICEBERG",
    "writer": {
        "catalog_type": "hive",
        "iceberg_s3_path": "s3a://warehouse/",
        "aws_region": "us-east-1",
        "aws_access_key": "admin",
        "aws_secret_key": "password",
        "s3_endpoint": "http://localhost:9000",
        "hive_uri": "thrift://localhost:9083",
        "s3_use_ssl": false,
        "s3_path_style": true,
        "hive_clients": 5,
        "hive_sasl_enabled": false,
        "iceberg_db": "ICEBERG_DATABASE_NAME"
    }
}

Refer here for more about catalog configs.

Run the Sync Process:
- Discover Command: <DISCOVER_COMMAND>
- Sync Command: <SYNC_COMMAND>
- Sync with State Command: <SYNC_WITH_STATE_COMMAND>

Refer to respective Database docs to use the command for discover schema and sync the data.

note

The first time you run the sync command, it might take a while for the process to build and download the necessary dependencies, docker images and .jar files. Subsequent runs will be faster.
The destination.json file should be in the same directory where you run the sync command.

tip

If you are testing the docker compose setup with changing some configs regarding destination and some processes get stuck, consider removing / deleting the following files and directories:

target directory from - writers/iceberg/debezium-server-iceberg-sink/target
debezium-server-iceberg-sink.jar file.
restart the docker compose services.

If you see the final screen after running the sync command such as the below one: iceberg-sync

the data is synced successfully to the Iceberg table.

Verify Data Population:

Connect to the spark-iceberg container:
```
docker exec -it spark-iceberg bash
```
Start Spark SQL:
```
spark-sql
```
Run a query to inspect your data:
```
select * from olake_iceberg.olake_iceberg.table_name;
```

info

The __op column values will have r, c, u, d that stands for - "r" for read/backfill, "c" for create, "u" for update, and "d" for delete. The __op column is used to track the operation performed on the data.

Using Spark UI

Query data using Spark SQL in a Jupyter notebook or via the Spark UI. You can access the Spark UI via Jupyter Notebook or directly through the Spark UI.

Step 1: Head over to http://localhost:8888 to access the Spark UI. Step 2: Click on File > New > Notebook.

select-catalog

Step 3: Choose Python 3 as the kernel.

choose-kernel

Step 4: The Jupyter Notebook will open in a new tab. Step 5: To run the SQL queries, you can use the following code snippet in a Jupyter Notebook cell:

%%sql

SELECT * FROM CATALOG_NAME.ICEBERG_DATABASE_NAME.TABLE_NAME;

query-sql-via-spark

CATALOG_NAME can be: jdbc_catalog, hive_catalog, rest_catalog, etc.
ICEBERG_DATABASE_NAME is the name of the Iceberg database you created / added as a value in destination.json file.

Now you can run queries on your Iceberg data using Spark SQL. Some useful commands are:

show databases;
use <database_name>
show tables from olake_iceberg.olake_iceberg;
describe formatted olake_iceberg.olake_iceberg.table_name;

info

Refer to spark-defaults.conf for more information about the values set for catalog configurations.

Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!

Docker Compose to local setup and testing

Prerequisites Installation

Java 17

Docker & Docker Compose

Maven Installation

Local catalog Test Setup

Using Spark UI

Need Assistance?

Join our growing community

GitHub

Slack

Twitter

LinkedIn

YouTube

Prerequisites Installation​

Java 17​

Docker & Docker Compose​

Maven Installation​

Local catalog Test Setup​

Using Spark UI​

Need Assistance?

Join our growing community

GitHub

Slack

Twitter

LinkedIn

YouTube

Prerequisites Installation

Java 17

Docker & Docker Compose

Maven Installation

Local catalog Test Setup

Using Spark UI