Skip to main content

state.json

The state.json file is a critical component in OLake's synchronization process. It tracks the point (via a cursor token, resume token, or offset) until which data has been processed. This mechanism allows OLake to resume syncing from where it left off, preventing duplicate processing of records.

File Structure

A typical state.json file has the following structure:

config.json
{
"type": "STREAM",
"streams": [
{
"stream": "stream_8",
"namespace": "otter_db",
"sync_mode": "",
"state": {
"_data": "8267B34D61000000022B0429296E1404"
}
},
{
"stream": "stream_0",
"namespace": "otter_db",
"sync_mode": "",
"state": {
"_data": "8267B34D61000000022B0429296E1404"
}
}
]
}

Key Components

KeyData TypeDescriptionSample Value
typestringIdentifies the type of state stored. Typically, it is set to "STREAM"."STREAM"
streamsarrayAn array containing state objects for each stream.[ { ... }, { ... } ]
streamstringThe unique identifier for the stream whose state is recorded."stream_8" or "stream_0"
namespacestringThe namespace or logical grouping the stream belongs to."otter_db"
sync_modestringIndicates the active synchronization mode for the stream. This value may be empty or contain a specific mode."" (empty string) or sync modes like "cdc", "full_refresh", "incremental" (WIP)
stateobjectContains the resume token or offset. This token determines the point until which data has been synced.{ "_data": "8267B34D61000000022B0429296E1404" }

Refer here for more about sync modes.

How It Works

  • Resume Token / Offset:
    The value stored in the state object (in the _data field) represents the cursor token, resume token (in MongoDB), or offset (in other databases) indicating the last processed record.
  • Incremental Syncing:
    By keeping track of the token, OLake can start the next sync run from this point, ensuring that previously processed records are not re-fetched.
  • Multiple Streams:
    Each stream in the streams array maintains its own synchronization state. This allows OLake to handle multiple data sources or partitions independently.

Benefits

  • Efficiency:
    Incremental synchronization reduces data transfer and processing by only fetching new or changed records.
  • Data Consistency:
    Tracking the synchronization state prevents duplicate processing, ensuring that data remains consistent.
  • Flexibility:
    The state mechanism supports various data sources (e.g., MongoDB with resume tokens, other databases with offsets), making it adaptable to different backend systems.

If you have any further questions or need additional guidance on setting up your state configuration, please refer to the OLake documentation or contact support.


Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

  • Email Support: Reach out to our team at hello@olake.io for prompt assistance.
  • Join our Slack Community: where we discuss future roadmaps, discuss bugs, help folks to debug issues they are facing and more.
  • Schedule a Call: If you prefer a one-on-one conversation, schedule a call with our CTO and team.

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!