Skip to main content

Apache Flink 1.18+

The reference implementation for CDC to Iceberg with comprehensive streaming support, exactly-once semantics, and advanced incremental reads

Key Features

100
Full Support

Comprehensive Catalog Support

Hive Metastore, Hadoop catalog, REST catalog (incl. Nessie), AWS Glue, JDBC, plus any custom implementation via catalog-impl

Explore details
100
Reference Engine

Streaming & CDC Excellence

Reference engine for CDC → Iceberg: consume Debezium/Kafka changelogs, upsert with exactly-once semantics, FLIP-27 incremental reads

Explore details
100
Full Support

Batch and Real-time Processing

Batch and streaming jobs read snapshots or incremental DataStreams; Iceberg Sink commits on each checkpoint with exactly-once semantics

Explore details
75
Partial Support

UPSERT and Row-level Operations

INSERT append always available; row-level changes via write.upsert.enabled=true on spec v2 tables; MERGE INTO not supported in Flink SQL

Explore details
100
Full Support

MoR/CoW Storage Strategies

Copy-on-Write for static batch rewrites; Merge-on-Read for streaming/UPSERT with delete files instead of partition rewrites

Explore details
95
GA

Format V3 Support

GA read + write with Flink 1.18+ and Iceberg 1.8+; Binary Deletion Vectors, Row Lineage, new data types, multi-argument transforms

Explore details
100
Full Support

Time Travel & Incremental Reads

Filter push-down + partition pruning automatic; point-in-time reads via source options: start-snapshot-id, start-snapshot-timestamp, branch, tag

Explore details
100
Delegated

Enterprise Security

Inherits ACLs from underlying catalog (Hive Ranger, AWS IAM, Nessie authorization); REST catalog secured with credential/token properties

Explore details

Apache Flink Iceberg Feature Matrix

Comprehensive breakdown of Iceberg capabilities in Apache Flink 1.18+

Dimension
Support Level
Implementation Details
Min Version
Catalog Types
FullComplete
Hive, Hadoop, REST (incl. Nessie), AWS Glue, JDBC, custom implementations
1.18+
Batch & Streaming Reads
FullComplete
Snapshot reads, incremental DataStreams, FLIP-27 source, exactly-once semantics
1.18+
Streaming Writes
FullExactly-Once
Checkpoint-based commits, INSERT INTO, automatic snapshot creation
1.18+
DML Operations
PartialUPSERT Only
INSERT always available; UPSERT via write.upsert.enabled; no MERGE INTO in SQL
1.18+
CDC Integration
FullReference Engine
Native Debezium/Kafka CDC, Flink CDC connectors, pipeline connectors
1.18+
Format V3 Support
FullGA
Deletion vectors, row lineage, new types (Flink 1.18+ + Iceberg 1.8+)
1.18+
Time Travel
FullSource Options
start-snapshot-id, start-snapshot-timestamp, branch, tag options
1.18+
Schema Evolution
LimitedDDL Restrictions
ALTER TABLE properties only; no ADD/RENAME columns via SQL
1.18+
Table Maintenance
FullActions API
Actions.rewriteDataFiles(), expire snapshots, remove orphans as batch jobs
1.18+
Security & Governance
FullCatalog Delegated
Inherits catalog ACLs (Ranger, IAM, Nessie); REST auth with tokens
1.18+
DDL Limitations
LimitedKnown Issues
No computed columns, watermarks, or column ADD/RENAME in Iceberg DDL
N/A
SQL MERGE Operations
NoneNot Supported
MERGE INTO not available in Flink SQL; use UPSERT mode instead
N/A

Showing 12 entries

Use Cases

Real-time CDC Pipelines

Industry-leading change data capture from databases to data lakes

  • Database-to-lakehouse replication with exactly-once semantics
  • Multi-source CDC aggregation and transformation
  • Real-time data synchronization across systems
  • Event-driven architecture with streaming updates

Stream Processing & Analytics

Complex event processing with stateful computations

  • Real-time fraud detection and alerting
  • IoT sensor data processing and aggregation
  • Financial trading and risk analytics
  • Social media and clickstream analytics

Data Lake Ingestion

High-throughput data ingestion with automatic optimization

  • Kafka-to-Iceberg streaming pipelines
  • Multi-format data ingestion and standardization
  • Schema evolution handling in streaming contexts
  • Automatic data quality validation and cleansing

Incremental ETL Processing

Efficient incremental processing with checkpoint recovery

  • Large-scale incremental transformations
  • Historical data reprocessing with time travel
  • Complex multi-stage pipeline orchestration
  • Fault-tolerant processing with exactly-once guarantees

Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

  • Email Support: Reach out to our team at hello@olake.io for prompt assistance.
  • Join our Slack Community: where we discuss future roadmaps, discuss bugs, help folks to debug issues they are facing and more.
  • Schedule a Call: If you prefer a one-on-one conversation, schedule a call with our CTO and team.

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!