Skip to main content

writer.json

OLake supports two writer modes for exporting your database data in Parquet format:

  1. Local Parquet Writer
  2. S3 Writer
  3. Iceberg Writer (coming soon)

Before proceeding with either configuration, please ensure you have completed the getting started instructions. For more background details, refer to the README section.

1. Local Parquet Writer

The local writer mode is used to write Parquet files directly to a local directory inside your Docker container. The local directory is mapped to your host file system via a Docker volume. To run OLake via docker, follow getting started guide.

Sample Configuration

{
"type": "PARQUET",
"writer": {
"normalization": false,
"local_path": "/mnt/config"
}
}

Configuration Key Details

KeyData TypeExample ValueDescription & Possible Values
typestring"PARQUET"Specifies the output file format. Currently, only the Parquet format is supported.
writer.normalizationbooleanfalseDetermines whether OLake applies level-1 JSON flattening to nested objects. Set to true if you require normalized output; otherwise, use false.
writer.local_pathstring"/mnt/config"The local directory inside the Docker container where Parquet files will be stored. This path is mapped to your host file system via a Docker volume.

Note: This configuration enables the Parquet local writer. For more details, check out the README section.

2. S3 Writer

OLake’s Parquet S3 writer allows you to write your data directly into an Amazon S3 bucket in Parquet format. This mode is ideal for users who want to leverage S3’s scalable storage for their data outputs.

Sample Configuration

{
"type": "PARQUET",
"writer": {
"normalization": false,
"s3_bucket": "olake-s3-test",
"s3_region": "ap-south-1",
"s3_access_key": "xxxxxxxxxxxxxxxxxxxx",
"s3_secret_key": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"s3_path": "/data"
}
}

Configuration Key Details

For Configuration and other details regarding S3, refer to S3 writer docs.

3. Data Partitioning

While not directly configured in writer.json, data partitioning is an important aspect of how your data is organized when written to storage. For more details on partitioning strategies (especially when using S3), please refer to the S3 partitioning documentation.

4. Upcoming Features: Iceberg Writer

The Iceberg writer is an upcoming feature, expected to be released by the end of February 2025. Stay tuned for updates and further documentation on its configuration and usage.

If you have any further questions or need additional support, please refer to the getting started section or the OLake README.


Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

  • Email Support: Reach out to our team at hello@olake.io for prompt assistance.
  • Join our Slack Community: where we discuss future roadmaps, discuss bugs, help folks to debug issues they are facing and more.
  • Schedule a Call: If you prefer a one-on-one conversation, schedule a call with our CTO and team.

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!