Skip to main content

streams.json Configuration

Overall File Structure

The streams.json file is organized into two main sections:

  • selected_streams: Lists the streams that have been chosen for processing. These are grouped by namespace.
  • streams: Contains an array of stream definitions. Each stream holds details about its data schema, supported synchronization modes, primary keys, and other metadata.

1. Selected Streams

The selected_streams section groups streams by their namespace(database name). For example, the configuration might look like this:

streams.json
"selected_streams": {
"my_db": [
{
"partition_regex": "/{dropoff_datetime, year}",
"stream_name": "table1",
"normalization": true,
"append_only": false,
"filter": "UPDATED_AT >= \"08-JUN-25 07.19.23.690870000 AM\""
},
{
"partition_regex": "",
"stream_name": "table2",
"normalization": false,
"append_only": false,
"filter": "city = \"London\""
}
]
}

Details about all the fields mentioned in selected streams

ComponentData TypeExample ValueDescription
namespacestringmy_dbGroups streams that belong to a specific database or logical category
stream_namestring"table1", "table2"The identifier for the stream. Should match the stream name defined in the stream configurations.
partition_regexstring"/{dropoff_datetime, year}"A pattern defining how to partition the data. To read more, refer the Partition Regex Documentation
normalizationbooleantrueDetermines whether OLake applies level-1 JSON flattening to Level 0 nested objects. Set to true if you require normalized output; otherwise, use false.
append_modebooleanfalseTo disable upserts in iceberg by setting this to true.
filterstring"UPDATED_AT >= \"08-JUN-25 07.19.23.690870000 AM\""Only the data that satisfies the specified condition will be synced.

2. Streams

The streams section is an array where each element is an object that defines a specific data stream. Each stream object includes a stream key that holds the configuration details. For example, one stream definition looks like this:

streams.json
{
"stream": {
"name": "stream_8",
"namespace": "olake_db",
"type_schema": {
"properties": {
"_id": {
"type": ["string"]
},
"name": {
"type": ["string"]
},
"marks": {
"type": ["integer"]
},
"updated_at": {
"type": ["timestamp"]
},
...
}
},
"supported_sync_modes": ["full_refresh", "cdc", "incremental"],
"source_defined_primary_key": ["_id"],
"available_cursor_fields": ["_id", "name", "marks", "updated_at"],
"sync_mode": "incremental",
"cursor_field": "updated_at",
}
}

2.1 Stream Configuration Elements

ComponentExample ValueDescription & Possible Values
name"stream_8"Unique identifier for the stream. Each stream must have a unique name.
namespace"olake_db"The grouping or database name that the stream belongs to. Helps organize streams by logical or physical data sources.
type_schema(JSON object with properties)Defines the structure of the records in the stream. Contains a properties object that maps each field (key) to its allowed data types (e.g., string, integer, array, object).
supported_sync_modes["full_refresh", "cdc", "incremental","strict_cdc"]Lists the synchronization modes the stream supports. Typically includes "full_refresh", "cdc", "strict_cdc" and "incremental".
source_defined_primary_key["_id"]Specifies the field(s) that is set as a primary key in the source.
available_cursor_fields["_id", "name", "marks", "updated_at"]Lists fields that can be used to track synchronization progress in incremental sync mode.
sync_mode"incremental"Indicates the active synchronization mode. Possible values are defined in supported_sync_modes.
cursor_field"updated_at"Defines the cursor field used to track incremental sync. A secondary cursor field can also be specified, separated by a colon. To read more about Incremental sync refer this .

For more information about partition_regex, refer to Iceberg Partition Documentation or S3 Partition Documentation.



💡 Join the OLake Community!

Got questions, ideas, or just want to connect with other data engineers?
👉 Join our Slack Community to get real-time support, share feedback, and shape the future of OLake together. 🚀

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!