MySQL Source
MySQL Source enables data synchronization from MySQL to your desired destination.
OLake UI is live (beta)! You can now use the UI to configure your MySQL source, discover streams, and sync data. Check it out at OLake UI regarding how to setup using Docker Compose and running it locally.
Now, you can use the UI to configure your MySQL source, discover streams, and sync data.
- Use OLake UI for MySQL
- Use OLake CLI for MySQL
Create a MySQL Source in OLake UI
Follow the steps below to get started with the MySQL Source using the OLake UI (assuming the OLake UI is running locally on localhost:8000):
- Navigate to Sources Tab.
- Click on
+ Create Source
. - Select
MySQL
as the source type from Connector type. - Fill in the required connection details in the form. For details regarding the connection details, refer to the MySQL Source Configuration section on the right side of UI.
- Click on
Create ->
- OLake will test the source connection and display the results. If the connection is successful, you will see a success message. If there are any issues, OLake will provide error messages to help you troubleshoot.
This will create a MySQL source in OLake, now you can use this source in your Jobs Pipeline to sync data from MySQL to Apache Iceberg or AWS S3.
Edit MySQL Source in OLake UI
To edit an existing MySQL source in OLake UI, follow these steps:
- Navigate to the Sources Tab.
- Locate the MySQL source you want to edit from
Active Sources
orInactive Sources
tabs or using the search bar. - Click on the
Edit
button next to the source from theActions
tab (3 dots). - Update the connection details as needed in the form and Click on
Save Changes
.
Editing a source can break pipeline.
You will see a notification saying "Due to the editing, the jobs are going to get affected".
Editing this source will affect the following jobs that are associated with this source and as a result will fail immediately. Do you still want to edit the source?
- OLake will test the updated source connection once you hit confirm on the Source Editing Caution Modal. If the connection is successful, you will see a success message. If there are any issues, we will provide error messages to help you troubleshoot.
Jobs Associated with MySQL Source
In the Source Edit page, you can see the list of jobs that are associated with this source. You can also see the status of each job, whether it is running, failed, or completed and can pause the job from the same screen as well.
Delete MySQL Source in OLake UI
To delete an existing MySQL source in OLake UI, follow these steps:
- Navigate to the Sources Tab.
- Locate the MySQL source you want to delete from
Active Sources
orInactive Sources
tabs or using the search bar. - Click on the
Delete
button next to the source from theActions
tab (3 dots).
- A confirmation dialog will appear asking you to confirm the deletion.
- Click on
Delete
to confirm the deletion.
This will remove the MySQL source from OLake.
You can also delete a source from the Source Edit page by clicking on the Delete
button at the bottom of the page.
To sync data TLDR:
- Create a
source.json
with your MySQL connection details. - Create a
destination.json
with your Writer (Apache Iceberg / AWS S3 / Azure ADLS / Google Cloud Storage) connection details. - Run
discover
to generate astreams.json
of available streams. - Run
sync
to replicate data to your specified destination.
Below is an overview of the supported modes and writers for MySQL data replication, along with tables summarizing the details.
Supported Modes
Our replication process supports various modes to fit different data ingestion needs.
The Full Refresh mode retrieves the entire dataset from MySQL and is ideal for initial data loads or when a complete dataset copy is required.
In contrast, CDC (Change Data Capture) continuously tracks and synchronizes incremental changes in real time, making sure that your destination remains updated with minimal latency.
The Incremental mode is currently under development (WIP).
Mode | Description |
---|---|
Full Refresh | Fetches the complete dataset from MySQL. |
CDC (Change Data Capture) | Tracks and syncs incremental changes from MySQL in real time. |
Strict CDC (Change Data Capture) | Tracks only new changes from the current position in the MySQL binlog, without performing an initial backfill. |
Incremental | No, WIP |
Supported Destinations
Destination | Supported | Docs | Comments |
---|---|---|---|
![]() | Yes | Link | |
![]() | Yes | Link | Supports both plain-Parquet and Iceberg format writes; requires aws_access_key / IAM role. |
![]() | Yes | Link | |
![]() | Yes | Any S3 protocol compliant object store can work with OLake | |
![]() | Yes | Link |
Setup and Configuration
To run the MySQL Driver, configure the following files with your specific credentials and settings:
source.json
: MySQL connection details.streams.json
: List of collections and fields to sync (generated using the Discover command).write.json
: Configuration for the destination where the data will be written.
Place these files in your project directory before running the commands.
Source File
Add MySQL credentials in following format in source.json
file as shown here.
Commands
Discover Command
The Discover command generates json content for streams.json
file, which defines the schema of the collections to be synced.
Usage
To run the Discover command, use the following syntax
- OLake Docker
- Locally run OLake
- macOS / Linux
- CMD
- Powershell
docker run --pull=always \
-v "$HOME/PATH_TO_OLAKE_DIRECTORY:/mnt/config" \
olakego/source-mysql:latest \
discover \
--config /mnt/config/source.json
docker run --pull=always ^
-v "%USERPROFILE%\PATH_TO_OLAKE_DIRECTORY:/mnt/config" ^
olakego/source-mysql:latest ^
discover ^
--config /mnt/config/source.json
docker run --pull=always `
-v "$env:USERPROFILE\PATH_TO_OLAKE_DIRECTORY:/mnt/config" `
olakego/source-mysql:latest `
discover `
--config /mnt/config/source.json
- macOS / Linux
- CMD
- Powershell
OLAKE_BASE_PATH="$HOME/PATH_TO_OLAKE_DIRECTORY/olake/drivers/mysql/config" && \
./build.sh driver-mysql discover \
--config "$OLAKE_BASE_PATH/source.json"
set "OLAKE_BASE_PATH=%USERPROFILE%\PATH_TO_OLAKE_DIRECTORY\olake\drivers\mysql\config" && ^
./build.sh driver-mysql discover ^
--config "%OLAKE_BASE_PATH%\source.json"
$OLAKE_BASE_PATH = "$env:USERPROFILE\PATH_TO_OLAKE_DIRECTORY\olake\drivers\mysql\config"; `
./build.sh driver-mysql discover `
--config "$OLAKE_BASE_PATH\source.json"
PATH_TO_OLAKE_DIRECTORY
is the absolute path where you have created the directory [as discussed above].
-v "$HOME/PATH_TO_OLAKE_DIRECTORY:/mnt/config" \
maps to -v /Users/JOHN_DOE_USERNAME/Desktop/projects/OLAKE_DIRECTORY:/mnt/config \
in macOS and Linux systems. Follow the same pattern in other systems.
Streams File
After executing the Discover command, a streams.json
file is created. Read more about Streams File here.
Writer File
Read about about
Sync Command
The Sync command fetches data from MySQL and ingests it into the destination.
- OLake Docker
- Locally run OLake
- macOS / Linux
- CMD
- Powershell
docker run --pull=always \
-v "$HOME/PATH_TO_OLAKE_DIRECTORY:/mnt/config" \
olakego/source-mysql:latest \
sync \
--config /mnt/config/source.json \
--catalog /mnt/config/streams.json \
--destination /mnt/config/destination.json
docker run --pull=always ^
-v "%USERPROFILE%\PATH_TO_OLAKE_DIRECTORY:/mnt/config" ^
olakego/source-mysql:latest ^
sync ^
--config /mnt/config/source.json ^
--catalog /mnt/config/streams.json ^
--destination /mnt/config/destination.json
docker run --pull=always `
-v "$env:USERPROFILE\PATH_TO_OLAKE_DIRECTORY:/mnt/config" `
olakego/source-mysql:latest `
sync `
--config /mnt/config/source.json `
--catalog /mnt/config/streams.json `
--destination /mnt/config/destination.json
- macOS / Linux
- CMD
- Powershell
OLAKE_BASE_PATH="$HOME/PATH_TO_OLAKE_DIRECTORY/olake/drivers/mysql/config" && \
./build.sh driver-mysql sync \
--config "$OLAKE_BASE_PATH/source.json" \
--catalog "$OLAKE_BASE_PATH/streams.json" \
--destination "$OLAKE_BASE_PATH/destination.json"
set "OLAKE_BASE_PATH=%USERPROFILE%\PATH_TO_OLAKE_DIRECTORY\olake\drivers\mysql\config" && ^
./build.sh driver-mysql sync ^
--config "%OLAKE_BASE_PATH%\source.json" ^
--catalog "%OLAKE_BASE_PATH%\streams.json" ^
--destination "%OLAKE_BASE_PATH%\destination.json"
$OLAKE_BASE_PATH = "$env:USERPROFILE\PATH_TO_OLAKE_DIRECTORY\olake\drivers\mysql\config"; `
./build.sh driver-mysql sync `
--config "$OLAKE_BASE_PATH\source.json" `
--catalog "$OLAKE_BASE_PATH\streams.json" `
--destination "$OLAKE_BASE_PATH\destination.json"
To run sync with state:
- OLake Docker
- Locally run OLake
- macOS / Linux
- CMD
- Powershell
docker run --pull=always \
-v "$HOME/PATH_TO_OLAKE_DIRECTORY:/mnt/config" \
olakego/source-mysql:latest \
sync \
--config /mnt/config/source.json \
--catalog /mnt/config/streams.json \
--destination /mnt/config/destination.json \
--state /mnt/config/state.json
docker run --pull=always ^
-v "%USERPROFILE%\PATH_TO_OLAKE_DIRECTORY:/mnt/config" ^
olakego/source-mysql:latest ^
sync ^
--config /mnt/config/source.json ^
--catalog /mnt/config/streams.json ^
--destination /mnt/config/destination.json ^
--state /mnt/config/state.json
docker run --pull=always `
-v "$env:USERPROFILE\PATH_TO_OLAKE_DIRECTORY:/mnt/config" `
olakego/source-mysql:latest `
sync `
--config /mnt/config/source.json `
--catalog /mnt/config/streams.json `
--destination /mnt/config/destination.json `
--state /mnt/config/state.json
- macOS / Linux
- CMD
- Powershell
OLAKE_BASE_PATH="$HOME/PATH_TO_OLAKE_DIRECTORY/olake/drivers/mysql/config" && \
./build.sh driver-mysql sync \
--config "$OLAKE_BASE_PATH/source.json" \
--catalog "$OLAKE_BASE_PATH/streams.json" \
--destination "$OLAKE_BASE_PATH/destination.json" \
--state "$OLAKE_BASE_PATH/state.json"
set "OLAKE_BASE_PATH=%USERPROFILE%\PATH_TO_OLAKE_DIRECTORY\olake\drivers\mysql\config" && ^
./build.sh driver-mysql sync ^
--config "%OLAKE_BASE_PATH%\source.json" ^
--catalog "%OLAKE_BASE_PATH%\streams.json" ^
--destination "%OLAKE_BASE_PATH%\destination.json" ^
--state "%OLAKE_BASE_PATH%\state.json"
$OLAKE_BASE_PATH = "$env:USERPROFILE\PATH_TO_OLAKE_DIRECTORY\olake\drivers\mysql\config"; `
./build.sh driver-mysql sync `
--config "$OLAKE_BASE_PATH\source.json" `
--catalog "$OLAKE_BASE_PATH\streams.json" `
--destination "$OLAKE_BASE_PATH\destination.json" `
--state "$OLAKE_BASE_PATH\state.json"
Find more about state file and its configuration here.
MySQL to Iceberg Data Type Mapping
When syncing data from MySQL to Iceberg, OLake handles data type conversions to ensure compatibility. Below is a table that outlines how MySQL data types are mapped to Iceberg data types:
MySQL Data Types | Iceberg Data Type |
---|---|
int , int unsigned , mediumint , mediumint unsigned , smallint , smallint unsigned , tinyint , tinyint unsigned | int |
bigint , bigint unsigned | bigint |
float , decimal(10,2) | float |
double , double precision , real | double |
datetime , timestamp | timestamp |
char , varchar , text , tinytext , mediumtext , longtext , enum , json , bit(1) , time | string |
Changelog
Expand to review
Version | Date | Pull Request | Subject |
---|---|---|---|
v0.0.2 | 14.04.2025 | https://github.com/datazip-inc/olake/pull/203 | |
v0.0.3 | 31.04.2025 | https://github.com/datazip-inc/olake/pull/250 |