Skip to main content

OLake Commands and Flags

OLake provides a set of CLI commands, each designed for specific use cases.

Commands can be executed in three ways:

  • Using the build.sh script
  • Running the generated binary
  • Through the OLake Docker CLI.
build.sh command
./build.sh driver-[SOURCE-TYPE] [COMMAND] [FLAG]
Compatibility

The ./build.sh script is a Unix shell script. It runs natively on Linux and macOS terminals.

For Windows:

  • Use Git Bash, WSL (Windows Subsystem for Linux), or another Unix-like shell.
  • Alternatively, use the OLake Docker CLI, which works consistently across platforms.

Explanation of placeholders:

  • SOURCE-TYPE → The source driver being used.
  • COMMAND → The action to perform (listed in the sections below).
  • FLAG → Additional configuration for fine-tuning the command.
Using the binary

Executing build.sh creates a binary file in the configuration directory.
Once available, the binary can be called directly to run commands, without needing build.sh.

Commands

1. Check

./build.sh driver-[SOURCE-TYPE] check [FLAG]

Description:

Verifies the connection to either a source or a destination.

Required flags:

Only one among these is required:

  • --config → Checks the connection to a source.
  • --destination → Checks the connection to a destination.

2. Spec

./build.sh driver-[SOURCE-TYPE] spec [FLAG]

Description:

Generates a JSON Schema and UI Schema. These schemas are used by RJSF to render and validate configuration forms.
When no flag is provided, the spec of the defined SOURCE-TYPE is generated.

Optional flags:

  • --destination-type → Generates the spec for a destination driver instead of the source.
    example:
    ./build.sh driver-mysql spec --destination-type iceberg

3. Discover

./build.sh driver-[SOURCE-TYPE] discover [FLAG]

Description:

Generates a streams.json file containing information about all streams in the source.

Required flags:

  • --config → Specifies path to the source configuration file.

Optional flags:

  • --destination-database-prefix → Adds a custom prefix to the destination database name.
  • --streams → Useful when streams.json has been manually modified. This flag ensures that the existing changes are preserved, while also merging in any new updates from the database (such as newly added tables).
  • --timeout → Overrides the default timeout value for the command.
  • --max-discover-threads → Max number of parallel threads for discovery of table in database.

4. Sync

./build.sh driver-[SOURCE-TYPE] sync [FLAG]

Description:

Used to sync the data from the source to the destination.

Required flags:

All of these flags need to be specified:

  • --config → Specifies path to the source configuration file.
  • --streams → Specifies path to the streams.json file (produced by the discover command).
  • --destination → Specifies path to the destination configuration file.

Optional flags:

  • --state → Specifies path to the state file.
info

➡️ You must learn about stats.json configuration. Refer to the Stats Config guide.

5. Clear Destination

./build.sh driver-[SOURCE-TYPE] [COMMAND] --clear-destination

Description:

  • Clears data in the destination, only for the selected streams defined in streams.json.
  • Resets the state file for those streams.

Flags

1. Help

./build.sh driver-[DRIVER_NAME] --help

Description:

  • Lists all available commands and flags for the current OLake CLI version.
  • Can be run without specifying a command.
  • The shorthand -h can also be used.

2. Config

./build.sh driver-[SOURCE-TYPE] [COMMAND] --config [PATH_TO_CONFIG_FILE]

Description:

3. Streams

./build.sh driver-[SOURCE-TYPE] [COMMAND] --streams [PATH_TO_STREAMS_FILE]

Description: Specifies the path to the streams.json file. This file is generated after the discover command. When used during discovery, this flag updates the existing streams.json:

  • Keeps prior manual changes.
  • Adds new streams detected in the source database.
  • Allows selecting which columns should be synced for each table.
  • Allows updating the destination database name for each stream.
info

➡️ You must learn about streams.json configuration. Refer to the Streams Config guide.

4. Destination

./build.sh driver-[SOURCE-TYPE] [COMMAND] --destination [PATH_TO_DESTINATION_FILE]

Description:

5. State

./build.sh driver-[SOURCE-TYPE] [COMMAND] --state [PATH_TO_STATE_FILE]

Description:

  • Specifies the path to the state file.
  • The state file contains metadata (such as offsets and positions) that enables:
    • Resuming interrupted syncs.
    • Continuing incremental or CDC syncs without restarting from scratch.
    • Storing the version in the state file for maintaining backward compatibility.

The state.json file is organized into two main sections:

1. Global State

The global section contains global state information that applies to all streams anddriver-specific replication metadata that tracks the overall position in the source database's change log. The structure varies by database driver:

MySQL uses binlog position for global state tracking to maintain the replication position across all streams.

state.json (MySQL)
{
"type": "STREAM",
"version": 1,
"global": {
"state": {
"server_id": 261398335,
"state": {
"position": {
"Name": "mysql-bin.000070",
"Pos": 811746
}
}
},
"streams": [
"my_db.decimal_test",
"my_db.incr_test"
]
}
}

2. Streams State

The streams section is an array where each element tracks the synchronization state for a specific stream. The structure varies by database driver:

Each stream state object contains:

state.json
{
"stream": "table1",
"namespace": "my_db",
"sync_mode": "",
"state": {
"chunks": []
}
}

State Configuration Elements

ComponentTypeExample ValueDescription
versioninteger1 or 0Version 0 enables legacy, lenient handling for backward compatibility
Version 1 and above enforce stricter validation and fail-fast behavior for newly created state
globalobjectFor postgres :
"global": { "state" : { "lsn": "BD7/650015C8" }, "streams": [ "public.sample_data", "public.employees" ] }
Contains global replication metadata. Structure varies by driver: MySQL uses binlog position, PostgreSQL uses LSN, MongoDB does not have a global section.
global.state.server_idinteger261398335(MySQL only) The MySQL server ID used for replication tracking.
global.state.state.positionobject{"Name": "mysql-bin.000070", "Pos": 811746}(MySQL only) Tracks the current binlog file name and position for CDC replication.
global.state.lsnstring"BD7/650015C8"(PostgreSQL only) Log Sequence Number (LSN) that tracks the position in the PostgreSQL write-ahead log (WAL) for CDC replication.
global.streamsarray["public.decimal_test", "public.incr_test"]List of all streams that are being tracked in this state file.
streamstring"decimal_test", "incr_test"The name of the stream being tracked. Must match the stream name defined in streams.json.
namespacestring"my_db", "public"The namespace (database/schema) that the stream belongs to.
sync_modestring"", "incremental", "cdc"The synchronization mode being used for this stream. May be empty if not explicitly set.
state.chunksarray[]Array tracking data chunks that have been processed. Used for resuming partial syncs and managing large data transfers.
state._datastring"82696F0837000000012B0429296E1404"(MongoDB only) Resume token used to track the position in MongoDB's change stream for CDC replication.

What's Next: The state file is automatically created and updated during sync operations. You can manually specify a state file using the --state flag to resume from a previous synchronization point.

6. Destination database prefix

./build.sh driver-[SOURCE-TYPE] [COMMAND] --destination-database-prefix [PREFIX_TO_ADD]

Description:

  • Adds a custom prefix to the database name created in the destination.
    Example:
    ./build.sh driver-mysql discover --config [PATH_TO_SOURCE_CONFIG_FILE] --destination-database-prefix olake
    If the source database is sales-db and the driver is mysql:
    • Default (Normalized)mysql_sales_db
    • With prefix (Normalized)olake_sales_db

7. Destination type

./build.sh driver-[SOURCE-TYPE] [COMMAND] --destination-type [TYPE_OF_DESTINATION]

Description:

  • Used with the spec command to generate JSON Schema and UI Schema for the specified destination.
  • TYPE_OF_DESTINATION can be any OLake supported destination, for example: iceberg or parquet.

8. Decryption of configuration files

./build.sh driver-[SOURCE-TYPE] [COMMAND] --encryption-key [DECRYPTION_KEY]

Description:

  • Provides a key for OLake to decrypt encrypted configuration files during execution.
  • Supported values include KMS keys, UUIDs, or custom strings.
  • The flag must follow the encrypted file in the command.
    Example:
    ./build.sh driver-mysql check config [PATH_TO_SOURCE_CONFIG_FILE] --encryption-key hello-world
    In this case, if the source config file is encrypted, OLake uses the provided key (hello-world) to decrypt and parse it.

9. No Save

./build.sh driver-[SOURCE-TYPE] [COMMAND] --no-save

Description:

  • Prevents saving of any files generated by the command. This flag is valid for all available commands.
  • Example: If used with discover, the streams.json file and related logs are not saved.

10. Timeout

./build.sh driver-[SOURCE-TYPE] [COMMAND] --timeout [TIMEOUT_IN_SECONDS]

Description:

  • Applies only to the discover command.
  • Overrides the default timeout of 300 seconds (5 minutes).
  • This is helpful when working with large datasets or slower networks where the operation may need extra time to complete.

11. Max discover threads

./build.sh driver-[SOURCE-TYPE] [COMMAND] --max-discover-threads [NUMBER_OF_THREADS]

Description:

  • Applies only to the discover command.
  • Sets the maximum number of parallel threads used for discovering table schemas in the database.
  • Value: Integer (mandatory when the flag is used). Minimum value is 1 (must be greater than 0).
  • Default: If the flag is not provided, the value defaults to 50.


💡 Join the OLake Community!

Got questions, ideas, or just want to connect with other data engineers?
👉 Join our Slack Community to get real-time support, share feedback, and shape the future of OLake together. 🚀

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!