Last updated:6/23/2025|... min read

`streams.json`

This document explains the structure and contents of your streams.json file, which is used to configure and manage data streams. It covers the following topics:

Overall File Structure
Selected Streams
Streams and Their Configuration
Type Schema: Properties and Data Types
Key-Value Pair Explanation
Synchronization Modes

Below is a consolidated summary table that captures the key components of the streams.json file, followed by a single sample configuration file.

Streams Configuration Summary

Component	Data Type	Example Value	Description
Overall Structure	-	Two sections: `selected_streams` and `streams`	The file is divided into a section for selected streams (grouped by namespace) and detailed stream definitions.
namespace (in selected_streams)	string	`otter_db`	Groups streams by a logical or physical data source.
stream_name (in selected_streams)	string	`"stream_0"`, `"stream_8"`	Identifies the stream and must match the name used in the streams section.
partition_regex (in selected_streams)	string	`{now(),2025,YYYY}-{now(),06,MM}-{now(),13,DD}/{string_change_language,,}` or `{,1999,YYYY}-{,09,MM}-{,31,DD}/{latest_revision,,}`	Defines how to partition data using tokens (like dates or revisions) to organize data into folders or segments.
normalization	boolean	`false`	Determines whether OLake applies level-1 JSON flattening to nested objects. Set to `true` if you require normalized output; otherwise, use `false`.
name (in streams)	string	`"stream_8"`	Unique identifier for a stream's configuration.
namespace (in streams)	string	`"otter_db"`	Groups the stream definition under a specific database or logical category.
type_schema	object	JSON schema (e.g., properties `_id`, `authors`, etc.)	Describes the structure and allowed data types for records in the stream.
supported_sync_modes	array	`["full_refresh", "cdc"]`	Lists synchronization modes supported by the stream—complete reload or incremental updates (CDC).
source_defined_primary_key	array	`["_id"]`	Specifies the field(s) that uniquely identify records in the stream.
available_cursor_fields	array	`[]`	Lists fields that can track sync progress; typically left empty if not used.
sync_mode	string	`"cdc"`	Indicates the active synchronization mode for the stream, either `full_refresh` or `cdc`.
`append_only`	boolean	`false`	The `append_only` flag determines whether records can be written to the iceberg delete file. If set to true, no records will be written to the delete file. Know more about delete file: Iceberg MOR and COW

Sample Configuration File

streams.json
{
  "selected_streams": {
    "otter_db": [
      {
        "partition_regex": "{now(),2025,YYYY}-{now(),06,MM}-{now(),13,DD}/{string_change_language,,}",
        "stream_name": "stream_0",
        "normalization": false,
        "append_only": false
      },
      {
        "partition_regex": "{,1999,YYYY}-{,09,MM}-{,31,DD}/{latest_revision,,}",
        "stream_name": "stream_8",
        "normalization": false,
        "append_only": false
      }
    ]
  },
  "streams": [
    {
      "stream": {
        "name": "stream_8",
        "namespace": "otter_db",
        "type_schema": {
          "properties": {
            "_id": {
              "type": ["string"]
            },
            "authors": {
              "type": ["array"]
            },
            "backreferences": {
              "type": ["array"]
            },
            "birth_date": {
              "type": ["string"]
            }
            // ... additional fields as defined in your schema
          }
        },
        "supported_sync_modes": [
          "full_refresh",
          "cdc"
        ],
        "source_defined_primary_key": [<primary_key>],
        "available_cursor_fields": [<cursor_field>],
        "sync_mode": "cdc"
      }
    }
    // ... additional streams if needed
  ]
}

note

Your streams.json file gets updated (merged) by the discover command (in case new column type or new streams, etc gets added), which generates the latest schema and stream definitions based on your database streams. Refer here for more information.

Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!

`streams.json`

Streams Configuration Summary

Need Assistance?

Join our growing community

GitHub

Slack

Twitter

LinkedIn

YouTube

Streams Configuration Summary​

Need Assistance?

Join our growing community

GitHub

Slack

Twitter

LinkedIn

YouTube

Streams Configuration Summary