MongoDB Source Documentation
Overviewβ
The OLake MongoDB Source connector supports multiple synchronization modes. It offers features like parallel chunking, checkpointing, and automatic resume for failed full loads. This connector can be used within the OLake UI or run locally via Docker for open-source workflows.
Sync Modes Supportedβ
- Full Refresh
- Full Refresh + Incremental
- Full Refresh + CDC
- CDC Only
Prerequisitesβ
Version Prerequisitesβ
MongoDB Version 4.0 or higher
CDC Prerequisitesβ
For CDC mode, MongoDB must meet the following requirements:
- MongoDB must be running in replica set mode (
--replSet rs0
) - Oplog must be enabled (automatic in replica sets)
-
CDC in OLake is not a continuous always-on process. It requires execution through the Orchestrator.
-
If you donβt have access to enable CDC (replica sets + oplog), OLake also supports Incremental sync.
To set up MongoDB for CDC, please refer to the MongoDB and Atlas CDC Setup guide.
Connection Prerequisitesβ
- Read access to the tables for the MongoDB user.
After initial Prerequisites are fulfilled, the configurations for MongoDB can be configured.
Configurationβ
- Use Olake UI for MongoDB
- Use Olake CLI for MongoDB
1. Navigate to the Source Configuration Pageβ
- Complete the OLake UI Setup Guide
- After logging in to the OlakeUI, select the
Sources
tab from the left sidebar. - Click
Create Source
on the top right corner. - Select MongoDB from the connector dropdown
- Provide a name for this source.
2. Provide Configuration Detailsβ
- Enter MongoDB credentials.
Field | Description | Example Value |
---|---|---|
Hostsrequired | List of MongoDB hosts. Use DNS SRV format if srv = true | x.xxx.xxx.120:27017 , x.xxx.xxx.133:27017 (multiple hosts supported) |
Usernamerequired | MongoDB authentication username | test |
Passwordrequired | MongoDB authentication password | test |
Auth DBrequired | Authentication database name | admin |
Replica Set | Name of the replica set (if applicable) | rs0 |
Read Preference | MongoDB read preference setting | secondaryPreferred |
Use SRV | Enable DNS SRV connection strings. When true , only one host allowed in hosts field | true or false |
Database Namerequired | Target MongoDB database name to replicate | database_name |
Max Threads | Maximum parallel threads for chunk-based snapshotting | 3 |
Retry Count | Number of retry attempts with exponential backoff. Defaults to 3 | 3 |
Chunking Strategy | Data chunking strategy for backfill: timestamp (time-based), bucket_auto (MongoDB's $bucketAuto), splitVector (built-in command). Defaults to splitVector if empty | splitVector |
3. Test Connectionβ
-
Once the connection is validated, the MongoDB source is created. Jobs can then be configured using this source.
-
In case of connection failure, refer to the Troubleshooting section.
1. Create Configuration Fileβ
- Once the Olake CLI is setup, create a folder to store configuration files such as
source.json
anddestination.json
.
2. Provide Configuration Detailsβ
An example source.json
file will look like this:
{
"hosts": ["host1:27017", "host2:27017", "host3:27017"],
"username": "your_username",
"password": "your_password",
"authdb": "admin",
"replica_set": "rs0",
"read_preference": "secondaryPreferred",
"srv": false,
"database": "your_db",
"max_threads": 5,
"backoff_retry_count": 4,
"chunking_strategy": ""
}
Field | Description | Example Value | Type |
---|---|---|---|
hosts | List of MongoDB hosts. Use DNS SRV format if srv = true | ["x.xxx.xxx.120:27017", "x.xxx.xxx.133:27017"] | STRING[] |
username | MongoDB authentication username | "test" | STRING |
password | MongoDB authentication password | "test" | STRING |
authdb | Authentication database name | "admin" | STRING |
replica_set | Name of the replica set (if applicable) | "rs0" | STRING |
read_preference | MongoDB read preference setting | "secondaryPreferred" | STRING |
srv | Enable DNS SRV connection strings. When true , only one host allowed | false | BOOLEAN |
database | Target MongoDB database name to replicate | "database_name" | STRING |
max_threads | Maximum parallel threads for chunk-based snapshotting | 3 | INTEGER |
backoff_retry_count | Number of retry attempts with exponential backoff. | 3 | INTEGER |
chunking_strategy | Data chunking strategy: timestamp , bucket_auto , splitVector . Defaults to splitVector | "splitVector" | STRING |
Similarly, destination.json
file can be created inside this folder. For more information, see destination documentation.
3. Check Source Connectionβ
To verify the database connection following command needs to be run:
docker run --pull=always \
-v "[PATH_OF_CONFIG_FOLDER]:/mnt/config" \
olakego/source-mongodb:latest \
check \
--config /mnt/config/source.json
-
If OLake is able to connect with MongoDB
{"connectionStatus":{"status":"SUCCEEDED"},"type":"CONNECTION_STATUS"}
response is returned. -
In case of connection failure, refer to the Troubleshooting section.
Data Type Mappingβ
MongoDB Data Types | Destination Data Type |
---|---|
int, timestamp | int |
long | bigint |
double | double |
boolean | boolean |
date | timestamptz |
string, object, objectId, binData (binary), code, regex (BSONRegExp), decimal128, maxKey, minKey, array, undefined | string |
OLake always ingests timestamp data in UTC format, independent of the source timezone.
Troubleshootingβ
1. Connection Failed (UI/CLI):β
Cause: Wrong host/port, MongoDB not running.
Solution: Check the port number entered is correct and MongoDB is up and accessible.
2. CDC Not Working:β
Cause: MongoDB not in replica set / oplog not accessible
Solution: Verify replica set is active by running rs.status()
3. File not found (CLI):β
Cause: Not in correct directory while running commands
Solution: Make sure both source.json is present in correct directory and the commands are executed while inside the directory
4. file name too long & FATAL error occurred while reading records: failed to finish backfill chunk 381: main writer closed:β
Cause: The generated file or directory name exceeded the Linux limit of 255 bytes (often happens when partitioning on very long string values).
2025-02-17T07:03:00Z ERROR main writer closed, with error: failed to create parititon file: failed to create directories[output/otter_db/stream_8/H.
Solution: The max filename length is 255 bytes and this error shows that you have excceded that limit for file creation (might happen if you partion based on STRING field that contain values that are too large). Usually in a linux system, these limits are defined at:
cat /usr/include/linux/limits.h
...
#define NAME_MAX 255 /* # chars in a file name */
#define PATH_MAX 4096 /* # chars in a path name including nul */
...
ChangeLogsβ
Date of Release | Version | Description |
---|---|---|
Aug 27, 2025 | v0.1.11 | override default timeout in Discover |