OLake Commands and Flags
OLake provides a set of CLI commands, each designed for specific use cases.
Commands can be executed in three ways:
- Using the
build.shscript - Running the generated binary
- Through the OLake Docker CLI.
./build.sh driver-[SOURCE-TYPE] [COMMAND] [FLAG]
The ./build.sh script is a Unix shell script. It runs natively on Linux and macOS terminals.
For Windows:
- Use Git Bash, WSL (Windows Subsystem for Linux), or another Unix-like shell.
- Alternatively, use the OLake Docker CLI, which works consistently across platforms.
Explanation of placeholders:
- SOURCE-TYPE → The source driver being used.
- COMMAND → The action to perform (listed in the sections below).
- FLAG → Additional configuration for fine-tuning the command.
Executing build.sh creates a binary file in the configuration directory.
Once available, the binary can be called directly to run commands, without needing build.sh.
Commands
1. Check
./build.sh driver-[SOURCE-TYPE] check [FLAG]
Description:
Verifies the connection to either a source or a destination.
Required flags:
Only one among these is required:
--config→ Checks the connection to a source.--destination→ Checks the connection to a destination.
2. Spec
./build.sh driver-[SOURCE-TYPE] spec [FLAG]
Description:
Generates a JSON Schema and UI Schema. These schemas are used by RJSF to render and validate configuration forms.
When no flag is provided, the spec of the defined SOURCE-TYPE is generated.
Optional flags:
--destination-type→ Generates the spec for a destination driver instead of the source.
example:./build.sh driver-mysql spec --destination-type iceberg
3. Discover
./build.sh driver-[SOURCE-TYPE] discover [FLAG]
Description:
Generates a streams.json file containing information about all streams in the source.
Required flags:
--config→ Specifies path to the source configuration file.
Optional flags:
--destination-database-prefix→ Adds a custom prefix to the destination database name.--streams→ Useful when streams.json has been manually modified. This flag ensures that the existing changes are preserved, while also merging in any new updates from the database (such as newly added tables).--timeout→ Overrides the default timeout value for the command.--max-discover-threads→ Max number of parallel threads for discovery of table in database.
4. Sync
./build.sh driver-[SOURCE-TYPE] sync [FLAG]
Description:
Used to sync the data from the source to the destination.
Required flags:
All of these flags need to be specified:
--config→ Specifies path to the source configuration file.--streams→ Specifies path to thestreams.jsonfile (produced by the discover command).--destination→ Specifies path to the destination configuration file.
Optional flags:
--state→ Specifies path to the state file.
➡️ You must learn about stats.json configuration. Refer to the Stats Config guide.
5. Clear Destination
./build.sh driver-[SOURCE-TYPE] [COMMAND] --clear-destination
Description:
- Clears data in the destination, only for the selected streams defined in
streams.json. - Resets the state file for those streams.
Flags
1. Help
./build.sh driver-[DRIVER_NAME] --help
Description:
- Lists all available commands and flags for the current OLake CLI version.
- Can be run without specifying a command.
- The shorthand
-hcan also be used.
2. Config
./build.sh driver-[SOURCE-TYPE] [COMMAND] --config [PATH_TO_CONFIG_FILE]
Description:
- Specifies the path to the source configuration file.
- For details about configuration files for different sources, see:
3. Streams
./build.sh driver-[SOURCE-TYPE] [COMMAND] --streams [PATH_TO_STREAMS_FILE]
Description: Specifies the path to the streams.json file. This file is generated after the discover command. When used during discovery, this flag updates the existing streams.json:
- Keeps prior manual changes.
- Adds new streams detected in the source database.
- Allows selecting which columns should be synced for each table.
- Allows updating the destination database name for each stream.
➡️ You must learn about streams.json configuration. Refer to the Streams Config guide.
4. Destination
./build.sh driver-[SOURCE-TYPE] [COMMAND] --destination [PATH_TO_DESTINATION_FILE]
Description:
- Specifies the path to the destination configuration file.
- For details about destination configuration files, see:
5. State
./build.sh driver-[SOURCE-TYPE] [COMMAND] --state [PATH_TO_STATE_FILE]
Description:
- Specifies the path to the state file.
- The state file contains metadata (such as offsets and positions) that enables:
- Resuming interrupted syncs.
- Continuing incremental or CDC syncs without restarting from scratch.
- Storing the version in the state file for maintaining backward compatibility.
The state.json file is organized into two main sections:
1. Global State
The global section contains global state information that applies to all streams anddriver-specific replication metadata that tracks the overall position in the source database's change log. The structure varies by database driver:
- MySQL
- PostgreSQL
- MongoDB
MySQL uses binlog position for global state tracking to maintain the replication position across all streams.
{
"type": "STREAM",
"version": 1,
"global": {
"state": {
"server_id": 261398335,
"state": {
"position": {
"Name": "mysql-bin.000070",
"Pos": 811746
}
}
},
"streams": [
"my_db.decimal_test",
"my_db.incr_test"
]
}
}
PostgreSQL uses LSN (Log Sequence Number) for global state tracking to maintain the replication position across all streams.
{
"type": "STREAM",
"version": 1,
"global": {
"state": {
"lsn": "BD7/650015C8"
},
"streams": [
"public.sample_data",
"public.employees"
]
}
}
MongoDB does not use a global state section. The state is maintained at the stream level only.
2. Streams State
The streams section is an array where each element tracks the synchronization state for a specific stream. The structure varies by database driver:
- MySQL & PostgreSQL
- MongoDB
Each stream state object contains:
{
"stream": "table1",
"namespace": "my_db",
"sync_mode": "",
"state": {
"chunks": []
}
}
MongoDB stream state includes a _data field that stores the resume token:
{
"stream": "users",
"namespace": "public",
"sync_mode": "",
"state": {
"_data": "82696F0837000000012B0429296E1404",
"chunks": []
}
}
State Configuration Elements
| Component | Type | Example Value | Description |
|---|---|---|---|
version | integer | 1 or 0 | Version 0 enables legacy, lenient handling for backward compatibility Version 1 and above enforce stricter validation and fail-fast behavior for newly created state |
global | object | For postgres : "global": { "state" : { "lsn": "BD7/650015C8" }, "streams": [ "public.sample_data", "public.employees" ] } | Contains global replication metadata. Structure varies by driver: MySQL uses binlog position, PostgreSQL uses LSN, MongoDB does not have a global section. |
global.state.server_id | integer | 261398335 | (MySQL only) The MySQL server ID used for replication tracking. |
global.state.state.position | object | {"Name": "mysql-bin.000070", "Pos": 811746} | (MySQL only) Tracks the current binlog file name and position for CDC replication. |
global.state.lsn | string | "BD7/650015C8" | (PostgreSQL only) Log Sequence Number (LSN) that tracks the position in the PostgreSQL write-ahead log (WAL) for CDC replication. |
global.streams | array | ["public.decimal_test", "public.incr_test"] | List of all streams that are being tracked in this state file. |
stream | string | "decimal_test", "incr_test" | The name of the stream being tracked. Must match the stream name defined in streams.json. |
namespace | string | "my_db", "public" | The namespace (database/schema) that the stream belongs to. |
sync_mode | string | "", "incremental", "cdc" | The synchronization mode being used for this stream. May be empty if not explicitly set. |
state.chunks | array | [] | Array tracking data chunks that have been processed. Used for resuming partial syncs and managing large data transfers. |
state._data | string | "82696F0837000000012B0429296E1404" | (MongoDB only) Resume token used to track the position in MongoDB's change stream for CDC replication. |
What's Next: The state file is automatically created and updated during sync operations. You can manually specify a state file using the --state flag to resume from a previous synchronization point.
6. Destination database prefix
./build.sh driver-[SOURCE-TYPE] [COMMAND] --destination-database-prefix [PREFIX_TO_ADD]
Description:
- Adds a custom prefix to the database name created in the destination.
Example:If the source database is./build.sh driver-mysql discover --config [PATH_TO_SOURCE_CONFIG_FILE] --destination-database-prefix olakesales-dband the driver ismysql:- Default (Normalized) →
mysql_sales_db - With prefix (Normalized) →
olake_sales_db
- Default (Normalized) →
7. Destination type
./build.sh driver-[SOURCE-TYPE] [COMMAND] --destination-type [TYPE_OF_DESTINATION]
Description:
- Used with the
speccommand to generate JSON Schema and UI Schema for the specified destination. TYPE_OF_DESTINATIONcan be any OLake supported destination, for example: iceberg or parquet.
8. Decryption of configuration files
./build.sh driver-[SOURCE-TYPE] [COMMAND] --encryption-key [DECRYPTION_KEY]
Description:
- Provides a key for OLake to decrypt encrypted configuration files during execution.
- Supported values include KMS keys, UUIDs, or custom strings.
- The flag must follow the encrypted file in the command.
Example:In this case, if the source config file is encrypted, OLake uses the provided key (./build.sh driver-mysql check config [PATH_TO_SOURCE_CONFIG_FILE] --encryption-key hello-worldhello-world) to decrypt and parse it.
9. No Save
./build.sh driver-[SOURCE-TYPE] [COMMAND] --no-save
Description:
- Prevents saving of any files generated by the command. This flag is valid for all available commands.
- Example: If used with discover, the
streams.jsonfile and related logs are not saved.
10. Timeout
./build.sh driver-[SOURCE-TYPE] [COMMAND] --timeout [TIMEOUT_IN_SECONDS]
Description:
- Applies only to the discover command.
- Overrides the default timeout of 300 seconds (5 minutes).
- This is helpful when working with large datasets or slower networks where the operation may need extra time to complete.
11. Max discover threads
./build.sh driver-[SOURCE-TYPE] [COMMAND] --max-discover-threads [NUMBER_OF_THREADS]
Description:
- Applies only to the discover command.
- Sets the maximum number of parallel threads used for discovering table schemas in the database.
- Value: Integer (mandatory when the flag is used). Minimum value is 1 (must be greater than 0).
- Default: If the flag is not provided, the value defaults to 50.