StreamsConfiguration

Last updated:9/3/2025|... min read

Edit this page

Open OLake issues

Open OLake doc issue

`streams.json` Configuration

Overall File Structure

The streams.json file is organized into two main sections:

selected_streams: Lists the streams that have been chosen for processing. These are grouped by namespace.
streams: Contains an array of stream definitions. Each stream holds details about its data schema, supported synchronization modes, primary keys, and other metadata.

1. Selected Streams

The selected_streams section groups streams by their namespace(database name). For example, the configuration might look like this:

streams.json
"selected_streams": {
  "my_db": [
    {
      "partition_regex": "/{dropoff_datetime, year}",
      "stream_name": "table1",
      "normalization": true,
      "append_only": false,
      "filter": "UPDATED_AT >= \"08-JUN-25 07.19.23.690870000 AM\""
    },
    {
      "partition_regex": "",
      "stream_name": "table2",
      "normalization": false,
      "append_only": false,
      "filter": "city = \"London\""
    }
  ]
}

Details about all the fields mentioned in selected streams

Component	Data Type	Example Value	Description
`namespace`	string	`my_db`	Groups streams that belong to a specific database or logical category
`stream_name`	string	`"table1"`, `"table2"`	The identifier for the stream. Should match the stream name defined in the stream configurations.
`partition_regex`	string	`"/{dropoff_datetime, year}"`	A pattern defining how to partition the data. To read more, refer the Partition Regex Documentation
`normalization`	boolean	`true`	Determines whether OLake applies level-1 JSON flattening to Level 0 nested objects. Set to `true` if you require normalized output; otherwise, use `false`.
`append_mode`	boolean	`false`	To disable upserts in iceberg by setting this to `true`.
`filter`	string	`"UPDATED_AT >= \"08-JUN-25 07.19.23.690870000 AM\""`	Only the data that satisfies the specified condition will be synced.

2. Streams

The streams section is an array where each element is an object that defines a specific data stream. Each stream object includes a stream key that holds the configuration details. For example, one stream definition looks like this:

streams.json
{
  "stream": {
    "name": "stream_8",
    "namespace": "olake_db",
    "type_schema": {
      "properties": {
        "_id": {
          "type": ["string"]
        },
        "name": {
          "type": ["string"]
        },
        "marks": {
          "type": ["integer"]
        },
        "updated_at": {
          "type": ["timestamp"]
        },
        ...
      }
    },
    "supported_sync_modes": ["full_refresh", "cdc", "incremental"],
    "source_defined_primary_key": ["_id"],
    "available_cursor_fields": ["_id", "name", "marks", "updated_at"],
    "sync_mode": "incremental",
    "cursor_field": "updated_at",
  }
}

2.1 Stream Configuration Elements

Component	Example Value	Description & Possible Values
`name`	`"stream_8"`	Unique identifier for the stream. Each stream must have a unique name.
`namespace`	`"olake_db"`	The grouping or database name that the stream belongs to. Helps organize streams by logical or physical data sources.
`type_schema`	(JSON object with properties)	Defines the structure of the records in the stream. Contains a `properties` object that maps each field (key) to its allowed data types (e.g., string, integer, array, object).
`supported_sync_modes`	`["full_refresh", "cdc", "incremental","strict_cdc"]`	Lists the synchronization modes the stream supports. Typically includes `"full_refresh"`, `"cdc"`, `"strict_cdc"` and `"incremental"`.
`source_defined_primary_key`	`["_id"]`	Specifies the field(s) that is set as a primary key in the source.
`available_cursor_fields`	`["_id", "name", "marks", "updated_at"]`	Lists fields that can be used to track synchronization progress in `incremental` sync mode.
`sync_mode`	`"incremental"`	Indicates the active synchronization mode. Possible values are defined in `supported_sync_modes`.
`cursor_field`	`"updated_at"`	Defines the cursor field used to track incremental sync. A secondary cursor field can also be specified, separated by a colon. To read more about Incremental sync refer this .

For more information about partition_regex, refer to Iceberg Partition Documentation or S3 Partition Documentation.

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!

`streams.json` Configuration

Overall File Structure

1. Selected Streams

Details about all the fields mentioned in selected streams

2. Streams

2.1 Stream Configuration Elements

💡 Join the OLake Community!

GitHub

Slack

Twitter

LinkedIn

YouTube

streams.json Configuration​

Overall File Structure​

1. Selected Streams​

Details about all the fields mentioned in selected streams​

2. Streams​

2.1 Stream Configuration Elements​

💡 Join the OLake Community!

GitHub

Slack

Twitter

LinkedIn

YouTube

`streams.json` Configuration

Overall File Structure

1. Selected Streams

Details about all the fields mentioned in selected streams

2. Streams

2.1 Stream Configuration Elements