Last updated:6/30/2025|... min read

Apache Flink 1.18+

The reference implementation for CDC to Iceberg with comprehensive streaming support, exactly-once semantics, and advanced incremental reads

Key Features

100

Full Support

Comprehensive Catalog Support

Hive Metastore, Hadoop catalog, REST catalog (incl. Nessie), AWS Glue, JDBC, plus any custom implementation via catalog-impl

Explore details

100

Reference Engine

Streaming & CDC Excellence

Reference engine for CDC → Iceberg: consume Debezium/Kafka changelogs, upsert with exactly-once semantics, FLIP-27 incremental reads

Explore details

100

Full Support

Batch and Real-time Processing

Batch and streaming jobs read snapshots or incremental DataStreams; Iceberg Sink commits on each checkpoint with exactly-once semantics

Explore details

Partial Support

UPSERT and Row-level Operations

INSERT append always available; row-level changes via write.upsert.enabled=true on spec v2 tables; MERGE INTO not supported in Flink SQL

Explore details

100

Full Support

MoR/CoW Storage Strategies

Copy-on-Write for static batch rewrites; Merge-on-Read for streaming/UPSERT with delete files instead of partition rewrites

Explore details

Format V3 Support

GA read + write with Flink 1.18+ and Iceberg 1.8+; Binary Deletion Vectors, Row Lineage, new data types, multi-argument transforms

Explore details

100

Full Support

Time Travel & Incremental Reads

Filter push-down + partition pruning automatic; point-in-time reads via source options: start-snapshot-id, start-snapshot-timestamp, branch, tag

Explore details

100

Delegated

Enterprise Security

Inherits ACLs from underlying catalog (Hive Ranger, AWS IAM, Nessie authorization); REST catalog secured with credential/token properties

Explore details

Apache Flink Iceberg Feature Matrix

Comprehensive breakdown of Iceberg capabilities in Apache Flink 1.18+

Dimension	Support Level	Implementation Details	Min Version
Catalog Types	FullComplete	Hive, Hadoop, REST (incl. Nessie), AWS Glue, JDBC, custom implementations	1.18+
Batch & Streaming Reads	FullComplete	Snapshot reads, incremental DataStreams, FLIP-27 source, exactly-once semantics	1.18+
Streaming Writes	FullExactly-Once	Checkpoint-based commits, INSERT INTO, automatic snapshot creation	1.18+
DML Operations	PartialUPSERT Only	INSERT always available; UPSERT via write.upsert.enabled; no MERGE INTO in SQL	1.18+
CDC Integration	FullReference Engine	Native Debezium/Kafka CDC, Flink CDC connectors, pipeline connectors	1.18+
Format V3 Support	FullGA	Deletion vectors, row lineage, new types (Flink 1.18+ + Iceberg 1.8+)	1.18+
Time Travel	FullSource Options	start-snapshot-id, start-snapshot-timestamp, branch, tag options	1.18+
Schema Evolution	LimitedDDL Restrictions	ALTER TABLE properties only; no ADD/RENAME columns via SQL	1.18+
Table Maintenance	FullActions API	Actions.rewriteDataFiles(), expire snapshots, remove orphans as batch jobs	1.18+
Security & Governance	FullCatalog Delegated	Inherits catalog ACLs (Ranger, IAM, Nessie); REST auth with tokens	1.18+
DDL Limitations	LimitedKnown Issues	No computed columns, watermarks, or column ADD/RENAME in Iceberg DDL	N/A
SQL MERGE Operations	NoneNot Supported	MERGE INTO not available in Flink SQL; use UPSERT mode instead	N/A

Showing 12 entries

Live data

For issues, click here (GitHub)

Use Cases

Real-time CDC Pipelines

Industry-leading change data capture from databases to data lakes

Database-to-lakehouse replication with exactly-once semantics
Multi-source CDC aggregation and transformation
Real-time data synchronization across systems
Event-driven architecture with streaming updates

Stream Processing & Analytics

Complex event processing with stateful computations

Real-time fraud detection and alerting
IoT sensor data processing and aggregation
Financial trading and risk analytics
Social media and clickstream analytics

Data Lake Ingestion

High-throughput data ingestion with automatic optimization

Kafka-to-Iceberg streaming pipelines
Multi-format data ingestion and standardization
Schema evolution handling in streaming contexts
Automatic data quality validation and cleansing

Incremental ETL Processing

Efficient incremental processing with checkpoint recovery

Large-scale incremental transformations
Historical data reprocessing with time travel
Complex multi-stage pipeline orchestration
Fault-tolerant processing with exactly-once guarantees

Resources & Documentation

Official Documentation

Complete API reference and guides

Getting Started Guide

Quick start tutorials and examples

Flink Configuration

Documentation

Flink CDC Documentation

Documentation

Flink Actions API

Documentation

Streaming Best Practices

Documentation

Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!

Apache Flink 1.18+

Key Features

Comprehensive Catalog Support

Streaming & CDC Excellence

Batch and Real-time Processing

UPSERT and Row-level Operations

MoR/CoW Storage Strategies

Format V3 Support

Time Travel & Incremental Reads

Enterprise Security

Apache Flink Iceberg Feature Matrix

Use Cases

Real-time CDC Pipelines

Stream Processing & Analytics

Data Lake Ingestion

Incremental ETL Processing

Resources & Documentation

Official Documentation

Getting Started Guide

Flink Configuration

Flink CDC Documentation

Flink Actions API

Streaming Best Practices

Need Assistance?

Join our growing community

GitHub

Slack

Twitter

LinkedIn

YouTube