MongoDB Driver
The MongoDB Driver enables data synchronization from MongoDB to your desired destination.
To sync data TLDR:
- Create a
config.json
with your MongoDB connection details. - Create a
writer.json
with your Writer (Apache Iceberg or AWS S3) connection details. - Run
discover
to generate acatalog.json
of available streams. - Run
sync
to replicate data to your specified destination.
Below is an overview of the supported modes and writers for MongoDB data replication, along with tables summarizing the details.
Supported Modes
Our replication process supports various modes to fit different data ingestion needs.
The Full Refresh mode retrieves the entire dataset from MongoDB and is ideal for initial data loads or when a complete dataset copy is required.
In contrast, CDC (Change Data Capture) continuously tracks and synchronizes incremental changes in real time, making sure that your destination remains updated with minimal latency.
The Incremental mode is currently under development (WIP).
Mode | Description |
---|---|
Full Refresh | Fetches the complete dataset from MongoDB. |
CDC (Change Data Capture) | Tracks and syncs incremental changes from MongoDB in real time. |
Incremental | No, WIP |
Supported Writers
OLake replicates data to multiple destinations to cater to a variety of deployment scenarios.
Whether you're storing data locally for quick access or using cloud storage services like S3 for scalability, our system is designed to integrate seamlessly.
We are also working on adding support for Iceberg to facilitate advanced analytics and data lake management.
Destination | Supported | Docs |
---|---|---|
![]() | Yes | Link |
![]() | Yes | Link |
![]() | Yes | Link |
Setup and Configuration
To run the MongoDB Driver, configure the following files with your specific credentials and settings:
config.json
: MongoDB connection details.catalog.json
: List of collections and fields to sync (generated using the Discover command).write.json
: Configuration for the destination where the data will be written.
Place these files in your project directory before running the commands.
Config File
Add MongoDB credentials in following format in config.json
file as shown here.
Commands
Discover Command
The Discover command generates json content for catalog.json
file, which defines the schema of the collections to be synced.
Usage
To run the Discover command, use the following syntax
- OLake Docker
- Locally run OLake
- macOS / Linux
- CMD
- Powershell
docker run --pull=always \
-v "$HOME/PATH_TO_OLAKE_DIRECTORY:/mnt/config" \
olakego/source-mongodb:latest \
discover \
--config /mnt/config/config.json
docker run --pull=always ^
-v "%USERPROFILE%\PATH_TO_OLAKE_DIRECTORY:/mnt/config" ^
olakego/source-mongodb:latest ^
discover ^
--config /mnt/config/config.json
docker run --pull=always `
-v "$env:USERPROFILE\PATH_TO_OLAKE_DIRECTORY:/mnt/config" `
olakego/source-mongodb:latest `
discover `
--config /mnt/config/config.json
- macOS / Linux
- CMD
- Powershell
OLAKE_BASE_PATH="$HOME/PATH_TO_OLAKE_DIRECTORY/olake/drivers/mongodb/config" && \
./build.sh driver-mongodb discover \
--config "$OLAKE_BASE_PATH/config.json"
set "OLAKE_BASE_PATH=%USERPROFILE%\PATH_TO_OLAKE_DIRECTORY\olake\drivers\mongodb\config" && ^
./build.sh driver-mongodb discover ^
--config "%OLAKE_BASE_PATH%\config.json"
$OLAKE_BASE_PATH = "$env:USERPROFILE\PATH_TO_OLAKE_DIRECTORY\olake\drivers\mongodb\config"; `
./build.sh driver-mongodb discover `
--config "$OLAKE_BASE_PATH\config.json"
PATH_TO_OLAKE_DIRECTORY
is the absolute path where you have created the directory [as discussed above].
-v "$HOME/PATH_TO_OLAKE_DIRECTORY:/mnt/config" \
maps to -v /Users/JOHN_DOE_USERNAME/Desktop/projects/OLAKE_DIRECTORY:/mnt/config \
in macOS and Linux systems. Follow the same pattern in other systems.
Catalog File
After executing the Discover command, a catalog.json
file is created. Read more about Catalog File here.
Writer File
Read about about
- Apache Iceberg Writer config
- S3 writer here
- local writer
Sync Command
The Sync command fetches data from MongoDB and ingests it into the destination.
- OLake Docker
- Locally run OLake
- macOS / Linux
- CMD
- Powershell
docker run --pull=always \
-v "$HOME/PATH_TO_OLAKE_DIRECTORY:/mnt/config" \
olakego/source-mongodb:latest \
sync \
--config /mnt/config/config.json \
--catalog /mnt/config/catalog.json \
--destination /mnt/config/writer.json
docker run --pull=always ^
-v "%USERPROFILE%\PATH_TO_OLAKE_DIRECTORY:/mnt/config" ^
olakego/source-mongodb:latest ^
sync ^
--config /mnt/config/config.json ^
--catalog /mnt/config/catalog.json ^
--destination /mnt/config/writer.json
docker run --pull=always `
-v "$env:USERPROFILE\PATH_TO_OLAKE_DIRECTORY:/mnt/config" `
olakego/source-mongodb:latest `
sync `
--config /mnt/config/config.json `
--catalog /mnt/config/catalog.json `
--destination /mnt/config/writer.json
- macOS / Linux
- CMD
- Powershell
OLAKE_BASE_PATH="$HOME/PATH_TO_OLAKE_DIRECTORY/olake/drivers/mongodb/config" && \
./build.sh driver-mongodb sync \
--config "$OLAKE_BASE_PATH/config.json" \
--catalog "$OLAKE_BASE_PATH/catalog.json" \
--destination "$OLAKE_BASE_PATH/writer.json"
set "OLAKE_BASE_PATH=%USERPROFILE%\PATH_TO_OLAKE_DIRECTORY\olake\drivers\mongodb\config" && ^
./build.sh driver-mongodb sync ^
--config "%OLAKE_BASE_PATH%\config.json" ^
--catalog "%OLAKE_BASE_PATH%\catalog.json" ^
--destination "%OLAKE_BASE_PATH%\writer.json"
$OLAKE_BASE_PATH = "$env:USERPROFILE\PATH_TO_OLAKE_DIRECTORY\olake\drivers\mongodb\config"; `
./build.sh driver-mongodb sync `
--config "$OLAKE_BASE_PATH\config.json" `
--catalog "$OLAKE_BASE_PATH\catalog.json" `
--destination "$OLAKE_BASE_PATH\writer.json"
To run sync with state:
- OLake Docker
- Locally run OLake
- macOS / Linux
- CMD
- Powershell
docker run --pull=always \
-v "$HOME/PATH_TO_OLAKE_DIRECTORY:/mnt/config" \
olakego/source-mongodb:latest \
sync \
--config /mnt/config/config.json \
--catalog /mnt/config/catalog.json \
--destination /mnt/config/writer.json \
--state /mnt/config/state.json
docker run --pull=always ^
-v "%USERPROFILE%\PATH_TO_OLAKE_DIRECTORY:/mnt/config" ^
olakego/source-mongodb:latest ^
sync ^
--config /mnt/config/config.json ^
--catalog /mnt/config/catalog.json ^
--destination /mnt/config/writer.json ^
--state /mnt/config/state.json
docker run --pull=always `
-v "$env:USERPROFILE\PATH_TO_OLAKE_DIRECTORY:/mnt/config" `
olakego/source-mongodb:latest `
sync `
--config /mnt/config/config.json `
--catalog /mnt/config/catalog.json `
--destination /mnt/config/writer.json `
--state /mnt/config/state.json
- macOS / Linux
- CMD
- Powershell
OLAKE_BASE_PATH="$HOME/PATH_TO_OLAKE_DIRECTORY/olake/drivers/mongodb/config" && \
./build.sh driver-mongodb sync \
--config "$OLAKE_BASE_PATH/config.json" \
--catalog "$OLAKE_BASE_PATH/catalog.json" \
--destination "$OLAKE_BASE_PATH/writer.json" \
--state "$OLAKE_BASE_PATH/state.json"
set "OLAKE_BASE_PATH=%USERPROFILE%\PATH_TO_OLAKE_DIRECTORY\olake\drivers\mongodb\config" && ^
./build.sh driver-mongodb sync ^
--config "%OLAKE_BASE_PATH%\config.json" ^
--catalog "%OLAKE_BASE_PATH%\catalog.json" ^
--destination "%OLAKE_BASE_PATH%\writer.json" ^
--state "%OLAKE_BASE_PATH%\state.json"
$OLAKE_BASE_PATH = "$env:USERPROFILE\PATH_TO_OLAKE_DIRECTORY\olake\drivers\mongodb\config"; `
./build.sh driver-mongodb sync `
--config "$OLAKE_BASE_PATH\config.json" `
--catalog "$OLAKE_BASE_PATH\catalog.json" `
--destination "$OLAKE_BASE_PATH\writer.json" `
--state "$OLAKE_BASE_PATH\state.json"
Find more about state file and its configuration here.
Changelog
Expand to review
Version | Date | Pull Request | Subject |
---|