Skip to main content

What is the Iceberg Writer?

The Iceberg Writer syncs data from databases (MySQL, MongoDB, PostgreSQL) into Apache Iceberg. Apache Iceberg is a table format that offers a number of benefits over traditional table formats like Parquet and ORC. Iceberg tables are designed to be efficient for both reads and writes, and they support schema evolution, ACID transactions, and time travel.

Supported Catalogs

Catalog typeDocs / exampleComments
REST – LakekeeperLINK[Officially Supported] Rust-native catalog with optimistic locking; Helm chart available for K8s.
REST - UnityLINK[Supported and tested] Unity Catalog (Databricks) with Personal Access Token authentication.
REST – GravitinoLINK[Supported, Yet to be tested] Uses standard Iceberg REST; Gravitino adds multi-cloud routing.
REST – NessieLINK[Supported, Yet to be tested] Time-travel branches/tags; supply nessie.endpoint in destination.json.
REST – PolarisLINK[Supported, Yet to be tested]
Hive MetastoreHive catalog configClassic HMS; good fit for on-prem Hadoop or EMR.
JDBC Catalog (Postgres/MySQL)JDBC catalog sampleStores Iceberg metadata in an RDBMS(Postgres); easiest to spin up locally with Postgres.
AWS Glue CatalogGlue catalog IAM & configBest choice on AWS; lets Athena, EMR, and Redshift query the same tables instantly.
Azure PurviewNot Planned, submit a request
BigLake MetastoreNot Planned, submit a request

For more catalog options, please refer to the OLake roadmap.

Quick Start Guide

info

OLake UI is live (beta)! You can now use the UI to configure your Iceberg Destination, manage and configure various catalogs. Check it out at OLake UI regarding how to setup using Docker Compose and running it locally.

olake-destination-iceberg

Create a Iceberg Destination in OLake UI

Follow the steps below to get started with Iceberg Destination using the OLake UI (assuming the OLake UI is running locally on localhost:8000):

  1. Navigate to Destinations Tab.
  2. Click on + Create Destination.
  3. Select Apache Iceberg as the Destination type from Connector drop down, select the Catalog type (AWS Glue, REST, JDBC, Hive).
  4. Fill in the required connection details in the form. For details regarding the connection details, refer to the Iceberg Destination Configuration docs section on the right side of UI.
  5. Click on Create ->
  6. OLake will test the destination connection and display the results. If the connection is successful, you will see a success message. If there are any issues, OLake will provide error messages to help you troubleshoot.

This will create a Iceberg destination in OLake, now you can use this destination in your Jobs Pipeline to sync data from any Database to Apache Iceberg.

Edit Iceberg Destination in OLake UI

To edit an existing Iceberg destination in OLake UI, follow these steps:

  1. Navigate to the Destinations Tab.
  2. Locate the Iceberg destination you want to edit from Active destination or Inactive destination tabs or using the search destination bar.
  3. Click on the Edit button next to the Destinations from the Actions tab (3 dots).
  4. Update the connection details as needed in the form and Click on Save Changes.
caution

Editing a destination can break pipeline.

You will see a notification saying "Due to editing, the jobs are going to get affected".

Editing this destination will affect the following jobs that are associated with this destination and as a result will fail immediately. Do you still want to edit the destination?

olake-destination-edit-2

  1. OLake will test the updated destination connection once you hit confirm on the destination Editing Caution Modal. If the connection is successful, you will see a success message. If there are any issues, we will provide error messages to help you troubleshoot.

Jobs Associated with Iceberg Destination

In the Destination Edit page, you can see the list of jobs that are associated with this destination. You can also see the status of each job, whether it is running, failed, or completed and can pause the job from the same screen as well.

olake-destination-associated-job-1

Delete Iceberg Destination in OLake UI

To delete an existing Iceberg Destination in OLake UI, follow these steps:

  1. Navigate to the Destination Tab.
  2. Locate the destination you want to delete from Active Destinations or Inactive Destinations tabs or using the search destination bar.
  3. Click on the Delete button next to the destinations from the Actions tab (3 dots).

olake-destination-delete-2

  1. A confirmation dialog will appear asking you to confirm the deletion.
  2. Click on Delete to confirm the deletion.

olake-destination-delete-1

This will remove the Iceberg Destination from OLake.

note

You can also delete a Destination from the Destination Edit page by clicking on the Delete button at the bottom of the page.

OLake System Columns (Iceberg storage layer)

Column nameIceberg data typeDescription
datastring (materialised as a single JSON string in Parquet)Snapshot of the source row at the time of the event, if used with normalization turned off.
• Contains every source-table column as key → value pairs.
• If the pipeline is running in “normalised” mode, individual keys may also appear as first-class columns in the Iceberg table.
_olake_idstringDeterministic, content-addressable identifier for the record.
Computed as a hash of the source table’s primary-key value(s).
_olake_timestamptimestamp (stored as INT64 epoch µs in Parquet)Ingestion timestamp generated by OLake at write time (always UTC). Useful for auditing, incremental reads, and late-arrival handling.
_op_typestring (one-char code)Change-event type emitted by the connector:
r = historical back-fill/read c = insert/create u = update d = delete.
_cdc_timestamptimestamp (stored as INT64 epoch µs in Parquet)Exact commit timestamp captured from the source database’s change-data-capture stream (e.g., WAL, binlog). Represents when the mutation happened on the upstream system, independent of when OLake processed it. Defaults to a epoch start time if CDC operation not performed.
note

If any columns from the source database does not have value in it, it won't get created in the iceberg tables.


Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

  • Email Support: Reach out to our team at hello@olake.io for prompt assistance.
  • Join our Slack Community: where we discuss future roadmaps, discuss bugs, help folks to debug issues they are facing and more.
  • Schedule a Call: If you prefer a one-on-one conversation, schedule a call with our CTO and team.

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!