writer.json
OLake supports two writer modes for exporting your database data in Parquet format:
- Local Parquet Writer
- S3 Writer
- Iceberg Writer (coming soon)
Before proceeding with either configuration, please ensure you have completed the getting started instructions for the database you want to connect from. For more background details, refer to the README section.
- Iceberg
- S3
- Local
Apache Iceberg
The Iceberg Writer syncs data from databases (MySQL, MongoDB, PostgreSQL) into Apache Iceberg.
- AWS Glue
- JDBC + MinIO Local
{
"type": "ICEBERG",
"writer": {
"normalization": false,
"s3_path": "s3://bucket_name/olake_iceberg/test_olake",
"aws_region": "ap-south-1",
"aws_access_key": "XXX",
"aws_secret_key": "XXX",
"database": "olake_iceberg",
"grpc_port": 50051,
"server_host": "localhost"
}
}
{
"type": "ICEBERG",
"writer": {
"catalog_type": "jdbc",
"jdbc_url": "jdbc:postgresql://localhost:5432/iceberg",
"jdbc_username": "iceberg",
"jdbc_password": "password",
"normalization": false,
"iceberg_s3_path": "s3a://warehouse",
"s3_endpoint": "http://localhost:9000",
"s3_use_ssl": false,
"s3_path_style": true,
"aws_access_key": "admin",
"aws_secret_key": "password",
"iceberg_db": "olake_iceberg"
}
}
For sample configuration and other details regarding Apache Iceberg, refer to Iceberg writer docs.
S3 Writer
OLake’s Parquet S3 writer allows you to write your data directly into an Amazon S3 bucket in Parquet format. This mode is ideal for users who want to leverage S3’s scalable storage for their data outputs.
{
"type": "PARQUET",
"writer": {
"normalization": false,
"s3_bucket": "olake-s3-test",
"s3_region": "ap-south-1",
"s3_access_key": "xxxxxxxxxxxxxxxxxxxx",
"s3_secret_key": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"s3_path": "/data"
}
}
Key | Description | Data Type | Probable Values |
---|---|---|---|
type | Specifies the output file format for writing data. Currently, only the "PARQUET" format is supported. | string | "PARQUET" |
writer.normalization | Indicates whether data normalization (JSON flattening) should be applied before writing data to S3. Use true if you require normalized output, or false if not. | boolean | true or false (this example uses false ) |
writer.s3_bucket | The name of the Amazon S3 bucket where your output files will be stored. Ensure that the bucket exists and that you have proper access. | string | A valid S3 bucket name (e.g. "olake-s3-test" ) |
writer.s3_region | The AWS region where the specified S3 bucket is hosted. | string | AWS region codes such as "ap-south-1" , "us-west-2" , etc. |
writer.s3_access_key | The AWS access key used for authenticating S3 requests. This is typically a 20-character alphanumeric string. | string | A valid AWS access key |
writer.s3_secret_key | The AWS secret key used for S3 authentication. This key is generally longer (often 40+ characters) and should be kept secure. | string | A valid AWS secret key |
writer.s3_path | The specific path (or prefix) within the S3 bucket where data files will be written. This is typically a folder path that starts with a / (e.g. "/data" ). | string | A valid path string |
- The generated
.parquet
files use SNAPPY compression (Read more). Note that SNAPPY is no longer supported by S3 Select when performing queries. - OLake creates a test folder named
olake_writer_test
containing a single text file (.txt
) with the content:This is used to verify that you have the necessary permissions to write to S3.S3 write test
For sample configuration and other details regarding S3, refer to S3 writer docs.
Local Parquet Writer
The local writer mode is used to write Parquet files directly to a local directory inside your Docker container. The local directory is mapped to your host file system via a Docker volume. To run OLake via docker, follow getting started guide.
Sample Configuration
{
"type": "PARQUET",
"writer": {
"normalization": false,
"local_path": "./mnt/config"
}
}
Configuration Key Details
Key | Data Type | Example Value | Description & Possible Values |
---|---|---|---|
type | string | "PARQUET" | Specifies the output file format. Currently, only the Parquet format is supported. |
writer.normalization | boolean | false | Determines whether OLake applies level-1 JSON flattening to nested objects. Set to true if you require normalized output; otherwise, use false . |
writer.local_path | string | "/mnt/config" | The local directory inside the Docker container where Parquet files will be stored. This path is mapped to your host file system via a Docker volume. |
Note: This configuration enables the Parquet local writer. For more details, check out the README section.
For sample configuration and other details regarding local writer, refer to S3 writer docs.