Last updated:|... min read
Key | Data Type | Description | Sample Value |
---|---|---|---|
type | string | Identifies the type of state stored. Typically, it is set to "STREAM" . | "STREAM" |
streams | array | An array containing state objects for each stream. | [ { ... }, { ... } ] |
stream | string | The unique identifier for the stream whose state is recorded. | "stream_8" or "stream_0" |
namespace | string | The namespace or logical grouping the stream belongs to. | "otter_db" |
sync_mode | string | Indicates the active synchronization mode for the stream. This value may be empty or contain a specific mode. | "" (empty string) or sync modes like "cdc" , "full_refresh" , "incremental" (WIP) |
state | object | Contains the resume token or offset. This token determines the point until which data has been synced. | { "_data": "8267B34D61000000022B0429296E1404" } |
Refer here for more about sync modes.
How It Works
- Resume Token / Offset:
The value stored in thestate
object (in the_data
field) represents the cursor token, resume token (in MongoDB), or offset (in other databases) indicating the last processed record. - Incremental Syncing:
By keeping track of the token, OLake can start the next sync run from this point, ensuring that previously processed records are not re-fetched. - Multiple Streams:
Each stream in thestreams
array maintains its own synchronization state. This allows OLake to handle multiple data sources or partitions independently.
Benefits
- Efficiency:
Incremental synchronization reduces data transfer and processing by only fetching new or changed records. - Data Consistency:
Tracking the synchronization state prevents duplicate processing, ensuring that data remains consistent. - Flexibility:
The state mechanism supports various data sources (e.g., MongoDB with resume tokens, other databases with offsets), making it adaptable to different backend systems.