Skip to main content

streams.json

This document explains the structure and contents of your streams.json file, which is used to configure and manage data streams. It covers the following topics:

  • Overall File Structure
  • Selected Streams
  • Streams and Their Configuration
  • Type Schema: Properties and Data Types
  • Key-Value Pair Explanation
  • Synchronization Modes

Below is a consolidated summary table that captures the key components of the streams.json file, followed by a single sample configuration file.

Streams Configuration Summary

ComponentData TypeExample ValueDescription
Overall Structure-Two sections: selected_streams and streamsThe file is divided into a section for selected streams (grouped by namespace) and detailed stream definitions.
namespace (in selected_streams)stringotter_dbGroups streams by a logical or physical data source.
stream_name (in selected_streams)string"stream_0", "stream_8"Identifies the stream and must match the name used in the streams section.
partition_regex (in selected_streams)string{now(),2025,YYYY}-{now(),06,MM}-{now(),13,DD}/{string_change_language,,} or {,1999,YYYY}-{,09,MM}-{,31,DD}/{latest_revision,,}Defines how to partition data using tokens (like dates or revisions) to organize data into folders or segments.
normalizationbooleanfalseDetermines whether OLake applies level-1 JSON flattening to nested objects. Set to true if you require normalized output; otherwise, use false.
name (in streams)string"stream_8"Unique identifier for a stream's configuration.
namespace (in streams)string"otter_db"Groups the stream definition under a specific database or logical category.
type_schemaobjectJSON schema (e.g., properties _id, authors, etc.)Describes the structure and allowed data types for records in the stream.
supported_sync_modesarray["full_refresh", "cdc"]Lists synchronization modes supported by the stream—complete reload or incremental updates (CDC).
source_defined_primary_keyarray["_id"]Specifies the field(s) that uniquely identify records in the stream.
available_cursor_fieldsarray[]Lists fields that can track sync progress; typically left empty if not used.
sync_modestring"cdc"Indicates the active synchronization mode for the stream, either full_refresh or cdc.
append_onlybooleanfalseThe append_only flag determines whether records can be written to the iceberg delete file. If set to true, no records will be written to the delete file. Know more about delete file: Iceberg MOR and COW

Sample Configuration File

streams.json
{
"selected_streams": {
"otter_db": [
{
"partition_regex": "{now(),2025,YYYY}-{now(),06,MM}-{now(),13,DD}/{string_change_language,,}",
"stream_name": "stream_0",
"normalization": false,
"append_only": false
},
{
"partition_regex": "{,1999,YYYY}-{,09,MM}-{,31,DD}/{latest_revision,,}",
"stream_name": "stream_8",
"normalization": false,
"append_only": false
}
]
},
"streams": [
{
"stream": {
"name": "stream_8",
"namespace": "otter_db",
"type_schema": {
"properties": {
"_id": {
"type": ["string"]
},
"authors": {
"type": ["array"]
},
"backreferences": {
"type": ["array"]
},
"birth_date": {
"type": ["string"]
}
// ... additional fields as defined in your schema
}
},
"supported_sync_modes": [
"full_refresh",
"cdc"
],
"source_defined_primary_key": [<primary_key>],
"available_cursor_fields": [<cursor_field>],
"sync_mode": "cdc"
}
}
// ... additional streams if needed
]
}
note

Your streams.json file gets updated (merged) by the discover command (in case new column type or new streams, etc gets added), which generates the latest schema and stream definitions based on your database streams. Refer here for more information.


Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

  • Email Support: Reach out to our team at hello@olake.io for prompt assistance.
  • Join our Slack Community: where we discuss future roadmaps, discuss bugs, help folks to debug issues they are facing and more.
  • Schedule a Call: If you prefer a one-on-one conversation, schedule a call with our CTO and team.

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!