Skip to main content

MongoDB Source Documentation

Overview​

The OLake MongoDB Source connector supports multiple synchronization modes. It offers features like parallel chunking, checkpointing, and automatic resume for failed full loads. This connector can be used within the OLake UI or run locally via Docker for open-source workflows.

Sync Modes Supported​

  • Full Refresh
  • Full Refresh + Incremental
  • Full Refresh + CDC
  • CDC Only

Prerequisites​

Version Prerequisites​

MongoDB Version 4.0 or higher

CDC Prerequisites​

For CDC mode, MongoDB must meet the following requirements:

  • MongoDB must be running in replica set mode (--replSet rs0)
  • Oplog must be enabled (automatic in replica sets)
info
  1. CDC in OLake is not a continuous always-on process. It requires execution through the Orchestrator.

  2. If you don’t have access to enable CDC (replica sets + oplog), OLake also supports Incremental sync.

To set up MongoDB for CDC, please refer to the MongoDB and Atlas CDC Setup guide.

Connection Prerequisites​

  • Read access to the tables for the MongoDB user.

After initial Prerequisites are fulfilled, the configurations for MongoDB can be configured.


Configuration​

1. Navigate to the Source Configuration Page​

  1. Complete the OLake UI Setup Guide
  2. After logging in to the OlakeUI, select the Sources tab from the left sidebar.
  3. Click Create Source on the top right corner.
  4. Select MongoDB from the connector dropdown
  5. Provide a name for this source.

2. Provide Configuration Details​

  • Enter MongoDB credentials.

Olake UI MongoDB Source Setup

FieldDescriptionExample Value
Hosts
required
List of MongoDB hosts. Use DNS SRV format if srv = truex.xxx.xxx.120:27017, x.xxx.xxx.133:27017 (multiple hosts supported)
Username
required
MongoDB authentication usernametest
Password
required
MongoDB authentication passwordtest
Auth DB
required
Authentication database nameadmin
Replica SetName of the replica set (if applicable)rs0
Read PreferenceMongoDB read preference settingsecondaryPreferred
Use SRVEnable DNS SRV connection strings. When true, only one host allowed in hosts fieldtrue or false
Database Name
required
Target MongoDB database name to replicatedatabase_name
Max ThreadsMaximum parallel threads for chunk-based snapshotting3
Retry CountNumber of retry attempts with exponential backoff. Defaults to 33
Chunking StrategyData chunking strategy for backfill: timestamp (time-based), bucket_auto (MongoDB's $bucketAuto), splitVector (built-in command). Defaults to splitVector if emptysplitVector

3. Test Connection​

  • Once the connection is validated, the MongoDB source is created. Jobs can then be configured using this source.

  • In case of connection failure, refer to the Troubleshooting section.


Data Type Mapping​

MongoDB Data TypesDestination Data Type
int, timestampint
longbigint
doubledouble
booleanboolean
datetimestamptz
string, object, objectId, binData (binary), code, regex (BSONRegExp), decimal128, maxKey, minKey, array, undefinedstring
timestamptz timezone

OLake always ingests timestamp data in UTC format, independent of the source timezone.


Troubleshooting​

1. Connection Failed (UI/CLI):​

Cause: Wrong host/port, MongoDB not running.

Solution: Check the port number entered is correct and MongoDB is up and accessible.

2. CDC Not Working:​

Cause: MongoDB not in replica set / oplog not accessible

Solution: Verify replica set is active by running rs.status()

3. File not found (CLI):​

Cause: Not in correct directory while running commands

Solution: Make sure both source.json is present in correct directory and the commands are executed while inside the directory

4. file name too long & FATAL error occurred while reading records: failed to finish backfill chunk 381: main writer closed:​

Cause: The generated file or directory name exceeded the Linux limit of 255 bytes (often happens when partitioning on very long string values).

2025-02-17T07:03:00Z ERROR main writer closed, with error: failed to create parititon file: failed to create directories[output/otter_db/stream_8/H.

Solution: The max filename length is 255 bytes and this error shows that you have excceded that limit for file creation (might happen if you partion based on STRING field that contain values that are too large). Usually in a linux system, these limits are defined at:

cat /usr/include/linux/limits.h

...
#define NAME_MAX 255 /* # chars in a file name */
#define PATH_MAX 4096 /* # chars in a path name including nul */
...

ChangeLogs​

Date of ReleaseVersionDescription
Aug 27, 2025v0.1.11override default timeout in Discover



πŸ’‘ Join the OLake Community!

Got questions, ideas, or just want to connect with other data engineers?
πŸ‘‰ Join our Slack Community to get real-time support, share feedback, and shape the future of OLake together. πŸš€

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!