What is the Iceberg Writer?
The Iceberg Writer syncs data from databases (MySQL, MongoDB, PostgreSQL) into Apache Iceberg. Apache Iceberg is a table format that offers a number of benefits over traditional table formats like Parquet and ORC. Iceberg tables are designed to be efficient for both reads and writes, and they support schema evolution, ACID transactions, and time travel.
Supported Catalogs
Catalog type | Docs / example | Comments | |
---|---|---|---|
![]() | REST – Lakekeeper | LINK | [Officially Supported] Rust-native catalog with optimistic locking; Helm chart available for K8s. |
![]() | REST - Unity | LINK | [Supported and tested] Unity Catalog (Databricks) with Personal Access Token authentication. |
REST – Gravitino | LINK | [Supported, Yet to be tested] Uses standard Iceberg REST; Gravitino adds multi-cloud routing. | |
![]() | REST – Nessie | LINK | [Supported, Yet to be tested] Time-travel branches/tags; supply nessie.endpoint in destination.json . |
![]() | REST – Polaris | LINK | [Supported, Yet to be tested] |
![]() | Hive Metastore | Hive catalog config | Classic HMS; good fit for on-prem Hadoop or EMR. |
![]() | JDBC Catalog (Postgres/MySQL) | JDBC catalog sample | Stores Iceberg metadata in an RDBMS(Postgres); easiest to spin up locally with Postgres. |
AWS Glue Catalog | Glue catalog IAM & config | Best choice on AWS; lets Athena, EMR, and Redshift query the same tables instantly. | |
![]() | Azure Purview | Not Planned, submit a request | |
![]() | BigLake Metastore | Not Planned, submit a request |
For more catalog options, please refer to the OLake roadmap.
Quick Start Guide
OLake UI is live (beta)! You can now use the UI to configure your Iceberg Destination, manage and configure various catalogs. Check it out at OLake UI regarding how to setup using Docker Compose and running it locally.
- Use OLake UI for Apache Iceberg
- Use OLake CLI for Apache Iceberg
Create a Iceberg Destination in OLake UI
Follow the steps below to get started with Iceberg Destination using the OLake UI (assuming the OLake UI is running locally on localhost:8000):
- Navigate to Destinations Tab.
- Click on
+ Create Destination
. - Select
Apache Iceberg
as the Destination type from Connector drop down, select the Catalog type (AWS Glue, REST, JDBC, Hive). - Fill in the required connection details in the form. For details regarding the connection details, refer to the Iceberg Destination Configuration docs section on the right side of UI.
- Click on
Create ->
- OLake will test the destination connection and display the results. If the connection is successful, you will see a success message. If there are any issues, OLake will provide error messages to help you troubleshoot.
This will create a Iceberg destination in OLake, now you can use this destination in your Jobs Pipeline to sync data from any Database to Apache Iceberg.
Edit Iceberg Destination in OLake UI
To edit an existing Iceberg destination in OLake UI, follow these steps:
- Navigate to the Destinations Tab.
- Locate the Iceberg destination you want to edit from
Active destination
orInactive destination
tabs or using the search destination bar. - Click on the
Edit
button next to the Destinations from theActions
tab (3 dots). - Update the connection details as needed in the form and Click on
Save Changes
.
Editing a destination can break pipeline.
You will see a notification saying "Due to editing, the jobs are going to get affected".
Editing this destination will affect the following jobs that are associated with this destination and as a result will fail immediately. Do you still want to edit the destination?
- OLake will test the updated destination connection once you hit confirm on the destination Editing Caution Modal. If the connection is successful, you will see a success message. If there are any issues, we will provide error messages to help you troubleshoot.
Jobs Associated with Iceberg Destination
In the Destination Edit page, you can see the list of jobs that are associated with this destination. You can also see the status of each job, whether it is running, failed, or completed and can pause the job from the same screen as well.
Delete Iceberg Destination in OLake UI
To delete an existing Iceberg Destination in OLake UI, follow these steps:
- Navigate to the Destination Tab.
- Locate the destination you want to delete from
Active Destinations
orInactive Destinations
tabs or using the search destination bar. - Click on the
Delete
button next to the destinations from theActions
tab (3 dots).
- A confirmation dialog will appear asking you to confirm the deletion.
- Click on
Delete
to confirm the deletion.
This will remove the Iceberg Destination from OLake.
You can also delete a Destination from the Destination Edit page by clicking on the Delete
button at the bottom of the page.
Its a simple 3 step process:
- Create a source config file and lets name it
source.json
, - Create another config file named
destination.json
and - Run the discover and sync commands to fetch the schema and start syncing the data respectively.
source.json
- holds the source database information like host, port, username, password, database name, etc.destination.json
- holds the iceberg destination configurations like iceberg table name, iceberg database name, catalog information, etc.
Now, depending upon from where (source) to where (destination) you would like to sync the data, you can choose the below configurations.
- PostgreSQL to Iceberg | Postgres Source Config
- MongoDB to Iceberg | MongoDB Source Config
- MySQL to Iceberg | MySQL Source Config
Now that you have the source configuration set, lets move on to the destination configuration.
OLake supports 4 types of catalogs for Iceberg writer, JDBC, AWS Glue, REST and Hive Meta Store. You can choose any of these destination configurations based on your requirement.
-
Run Sync Commands:
To replicate data from the source database to the Iceberg table, you need to run the sync commands. The sync command will read the data from the source database and write it to the Iceberg table.- Discover Command:
<DISCOVER_COMMAND>
- Sync Command:
<SYNC_COMMAND>
- Sync with State Command:
<SYNC_WITH_STATE_COMMAND>
- Discover Command:
Refer to respective Database docs to use the command for discover schema and sync the data.
- MongoDB Discover and sync command
- Postgres Discover and sync command
- MySQL Discover and sync command
A sample disover & sync command would look like this:
docker run --pull=always \
-v /Users/USERNAME/Desktop/projects/OLAKE_DIRECTORY:/mnt/config \
olakego/source-mongodb:latest \
discover \
--config /mnt/config/source.json
The olakego/source-mongodb
is the OLake image for MongoDB source. You can replace it with the respective source image for PostgreSQL (source-postgres) or MySQL (source-mysql) or can build one locally.
OLake System Columns (Iceberg storage layer)
Column name | Iceberg data type | Description |
---|---|---|
data | string (materialised as a single JSON string in Parquet) | Snapshot of the source row at the time of the event, if used with normalization turned off. • Contains every source-table column as key → value pairs. • If the pipeline is running in “normalised” mode, individual keys may also appear as first-class columns in the Iceberg table. |
_olake_id | string | Deterministic, content-addressable identifier for the record. Computed as a hash of the source table’s primary-key value(s). |
_olake_timestamp | timestamp (stored as INT64 epoch µs in Parquet) | Ingestion timestamp generated by OLake at write time (always UTC). Useful for auditing, incremental reads, and late-arrival handling. |
_op_type | string (one-char code) | Change-event type emitted by the connector: r = historical back-fill/read c = insert/create u = update d = delete. |
_cdc_timestamp | timestamp (stored as INT64 epoch µs in Parquet) | Exact commit timestamp captured from the source database’s change-data-capture stream (e.g., WAL, binlog). Represents when the mutation happened on the upstream system, independent of when OLake processed it. Defaults to a epoch start time if CDC operation not performed. |
If any columns from the source database does not have value in it, it won't get created in the iceberg tables.