AWS S3 Writer
This component allows you to efficiently write your database data into Amazon S3 in Parquet format. For more background details, please refer to the README. Before proceeding, make sure you have completed the getting started instructions.
Quick Start Guide
OLake UI is live (beta)! You can now use the UI to configure your AWS S3 Destination, manage and configure various catalogs. Check it out at OLake UI regarding how to setup using Docker Compose and running it locally.
- Use OLake UI for AWS S3
- Use OLake CLI for AWS S3
Create an S3 Destination in OLake UI
Follow the steps below to get started with S3 Destination using the OLake UI (assuming the OLake UI is running locally on localhost:8000):
- Navigate to Destinations Tab.
- Click on
+ Create Destination
. - Select
AWS S3
as the Destination type from Connector drop down. - Fill in the required connection details in the form. For details regarding the connection details, refer to the S3 Destination Configuration docs section on the right side of UI.
- Click on
Create ->
- OLake will test the destination connection and display the results. If the connection is successful, you will see a success message. If there are any issues, OLake will provide error messages to help you troubleshoot.
This will create a S3 destination in OLake, now you can use this destination in your Jobs Pipeline to sync data from any Database to AWS S3.
Edit S3 Destination in OLake UI
To edit an existing S3 destination in OLake UI, follow these steps:
- Navigate to the Destinations Tab.
- Locate the S3 destination you want to edit from
Active destination
orInactive destination
tabs or using the search destination bar. - Click on the
Edit
button next to the Destinations from theActions
tab (3 dots). - Update the connection details as needed in the form and Click on
Save Changes
.
Editing a destination can break pipeline.
You will see a notification saying "Due to editing, the jobs are going to get affected".
Editing this destination will affect the following jobs that are associated with this destination and as a result will fail immediately. Do you still want to edit the destination?
- OLake will test the updated destination connection once you hit confirm on the destination Editing Caution Modal. If the connection is successful, you will see a success message. If there are any issues, we will provide error messages to help you troubleshoot.
Jobs Associated with S3 Destination
In the Destination Edit page, you can see the list of jobs that are associated with this destination. You can also see the status of each job, whether it is running, failed, or completed and can pause the job from the same screen as well.
Delete S3 Destination in OLake UI
To delete an existing S3 Destination in OLake UI, follow these steps:
- Navigate to the Destination Tab.
- Locate the destination you want to delete from
Active Destinations
orInactive Destinations
tabs or using the search destination bar. - Click on the
Delete
button next to the destinations from theActions
tab (3 dots).
- A confirmation dialog will appear asking you to confirm the deletion.
- Click on
Delete
to confirm the deletion.
This will remove the S3 Destination from OLake.
You can also delete a Destination from the Destination Edit page by clicking on the Delete
button at the bottom of the page.
Its a simple 3 step process:
- Create a source config file and lets name it
source.json
, - Create another config file named
destination.json
and - Run the discover and sync commands to fetch the schema and start syncing the data respectively.
source.json
- holds the source database information like host, port, username, password, database name, etc.destination.json
- holds the iceberg destination configurations.
Now, depending upon from where (source) to where (destination) you would like to sync the data, you can choose the below configurations.
- PostgreSQL to Iceberg | Postgres Source Config
- MongoDB to Iceberg | MongoDB Source Config
- MySQL to Iceberg | MySQL Source Config
Now that you have the source configuration set, lets move on to S3 destination configuration.
-
Run Sync Commands:
To replicate data from the source database to S3, you need to run the sync commands. The sync command will read the data from the source database and write it to the S3.- Discover Command:
<DISCOVER_COMMAND>
- Sync Command:
<SYNC_COMMAND>
- Sync with State Command:
<SYNC_WITH_STATE_COMMAND>
- Discover Command:
Refer to respective Database docs to use the command for discover schema and sync the data.
- MongoDB Discover and sync command
- Postgres Discover and sync command
- MySQL Discover and sync command
A sample disover & sync command would look like this:
docker run --pull=always \
-v /Users/USERNAME/Desktop/projects/OLAKE_DIRECTORY:/mnt/config \
olakego/source-mongodb:latest \
discover \
--config /mnt/config/source.json
The olakego/source-mongodb
is the OLake image for MongoDB source. You can replace it with the respective source image for PostgreSQL (source-postgres) or MySQL (source-mysql) or can build one locally.