Skip to main content

streams.json Configuration​

Overall File Structure​

The streams.json file is organized into two main sections:

  • selected_streams: Lists the streams that have been chosen for processing. These are grouped by namespace.
  • streams: Contains an array of stream definitions. Each stream holds details about its data schema, supported synchronization modes, primary keys, and other metadata.

1. Selected Streams​

The selected_streams section groups streams by their namespace(database name). For example, the configuration might look like this:

streams.json
"selected_streams": {
"my_db": [
{
"partition_regex": "/{dropoff_datetime, year}",
"stream_name": "table1",
"normalization": true,
"append_only": false,
"filter": "UPDATED_AT >= \"08-JUN-25 07.19.23.690870000 AM\""
},
{
"partition_regex": "",
"stream_name": "table2",
"normalization": false,
"append_only": false,
"filter": "city = \"London\""
}
]
}

Details about all the fields mentioned in selected streams​

ComponentData TypeExample ValueDescription
namespacestringmy_dbGroups streams that belong to a specific database or logical category
stream_namestring"table1", "table2"The identifier for the stream. Should match the stream name defined in the stream configurations.
partition_regexstring"/{dropoff_datetime, year}"A pattern defining how to partition the data. To read more, refer the Partition Regex Documentation
normalizationbooleantrueDetermines whether OLake applies level-1 JSON flattening to Level 0 nested objects. Set to true if you require normalized output; otherwise, use false.
append_modebooleanfalseTo disable upserts in iceberg by setting this to true.
filterstring"UPDATED_AT >= \"08-JUN-25 07.19.23.690870000 AM\""Only the data that satisfies the specified condition will be synced.

2. Streams​

The streams section is an array where each element is an object that defines a specific data stream. Each stream object includes a stream key that holds the configuration details. For example, one stream definition looks like this:

streams.json
{
"stream": {
"name": "stream_8",
"namespace": "olake_db",
"type_schema": {
"properties": {
"_id": {
"type": ["string"]
},
"name": {
"type": ["string"]
},
"marks": {
"type": ["integer"]
},
"updated_at": {
"type": ["timestamp"]
},
...
}
},
"supported_sync_modes": ["full_refresh", "cdc", "incremental"],
"source_defined_primary_key": ["_id"],
"available_cursor_fields": ["_id", "name", "marks", "updated_at"],
"sync_mode": "incremental",
"cursor_field": "updated_at",
}
}

2.1 Stream Configuration Elements​

ComponentExample ValueDescription & Possible Values
name"stream_8"Unique identifier for the stream. Each stream must have a unique name.
namespace"olake_db"The grouping or database name that the stream belongs to. Helps organize streams by logical or physical data sources.
type_schema(JSON object with properties)Defines the structure of the records in the stream. Contains a properties object that maps each field (key) to its allowed data types (e.g., string, integer, array, object).
supported_sync_modes["full_refresh", "cdc", "incremental","strict_cdc"]Lists the synchronization modes the stream supports. Typically includes "full_refresh", "cdc", "strict_cdc" and "incremental".
source_defined_primary_key["_id"]Specifies the field(s) that is set as a primary key in the source.
available_cursor_fields["_id", "name", "marks", "updated_at"]Lists fields that can be used to track synchronization progress in incremental sync mode.
sync_mode"incremental"Indicates the active synchronization mode. Possible values are defined in supported_sync_modes.
cursor_field"updated_at"Defines the cursor field used to track incremental sync. A secondary cursor field can also be specified, separated by a colon. To read more about Incremental sync refer this .

For more information about partition_regex, refer to Iceberg Partition Documentation or S3 Partition Documentation.



πŸ’‘ Join the OLake Community!

Got questions, ideas, or just want to connect with other data engineers?
πŸ‘‰ Join our Slack Community to get real-time support, share feedback, and shape the future of OLake together. πŸš€

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!