Skip to main content

What is the Iceberg Writer?

The Iceberg Writer syncs data from databases (MySQL, MongoDB, PostgreSQL) into Apache Iceberg. Apache Iceberg is a table format that offers a number of benefits over traditional table formats like Parquet and ORC. Iceberg tables are designed to be efficient for both reads and writes, and they support schema evolution, ACID transactions, and time travel.

Supported Catalogs

Catalog TypeDescription
JDBCUses PostgreSQL as the metadata catalog (local testing)
AWS GlueUses AWS Glue for metadata catalog and AWS S3 for storage
RESTUses a REST API for metadata catalog and storage

For more catalog options, please refer to the OLake roadmap.

Quick Start Guide

Its a simple 3 step process:

  1. Create a config file and lets name it config.json,
  2. Create another config file named writer.json and
  3. Run the discover and sync commands to fetch the schema and start syncing the data respectively.
info
  1. config.json - holds the source database information like host, port, username, password, database name, etc.
  2. writer.json - holds the iceberg writer configurations like iceberg table name, iceberg database name, catalog information, etc.

Now, depending upon from where (source) to where (destination) you would like to sync the data, you can choose the below configurations.

  1. PostgreSQL to Iceberg | Postgres Source Config
  2. MongoDB to Iceberg | MongoDB Source Config
  3. MySQL to Iceberg | MySQL Source Config

Now that you have the source configuration set, lets move on to the destination configuration.

Here's what the writer.json looks like for the AWS Glue catalog configuration:

writer.json
{
"type": "ICEBERG",
"writer": {
"normalization": false,
"s3_path": "s3://bucket_name/olake_iceberg/test_olake",
"aws_region": "ap-south-1",
"aws_access_key": "XXX",
"aws_secret_key": "XXX",
"database": "olake_iceberg",
"grpc_port": 50051,
"server_host": "localhost"
}
}

Get more information refer here

  1. Run Sync Commands:
    • Discover Command: <DISCOVER_COMMAND>
    • Sync Command: <SYNC_COMMAND>
    • Sync with State Command: <SYNC_WITH_STATE_COMMAND>

Refer to respective Database docs to use the command for discover schema and sync the data.

A sample disover & sync command would look like this:

docker run \
-v /Users/USERNAME/Desktop/projects/olake-docker:/mnt/config \
olakego/source-mongodb:latest \
discover \
--config /mnt/config/config.json
info

The olakego/source-mongodb is the OLake image for MongoDB source. You can replace it with the respective source image for PostgreSQL (source-postgres) or MySQL (source-mysql) or can build one locally.


Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

  • Email Support: Reach out to our team at hello@olake.io for prompt assistance.
  • Join our Slack Community: where we discuss future roadmaps, discuss bugs, help folks to debug issues they are facing and more.
  • Schedule a Call: If you prefer a one-on-one conversation, schedule a call with our CTO and team.

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!

Join OLake Cloud waitlist
Low-cost
Easy Data Ingestion
Elastic Scaling
Join Waitlist