Apache Flink 1.18+
The reference implementation for CDC to Iceberg with comprehensive streaming support, exactly-once semantics, and advanced incremental reads
Key Features
Comprehensive Catalog Support
Hive Metastore, Hadoop catalog, REST catalog (incl. Nessie), AWS Glue, JDBC, plus any custom implementation via catalog-impl
Streaming & CDC Excellence
Reference engine for CDC → Iceberg: consume Debezium/Kafka changelogs, upsert with exactly-once semantics, FLIP-27 incremental reads
Batch and Real-time Processing
Batch and streaming jobs read snapshots or incremental DataStreams; Iceberg Sink commits on each checkpoint with exactly-once semantics
UPSERT and Row-level Operations
INSERT append always available; row-level changes via write.upsert.enabled=true on spec v2 tables; MERGE INTO not supported in Flink SQL
MoR/CoW Storage Strategies
Copy-on-Write for static batch rewrites; Merge-on-Read for streaming/UPSERT with delete files instead of partition rewrites
Format V3 Support
GA read + write with Flink 1.18+ and Iceberg 1.8+; Binary Deletion Vectors, Row Lineage, new data types, multi-argument transforms
Time Travel & Incremental Reads
Filter push-down + partition pruning automatic; point-in-time reads via source options: start-snapshot-id, start-snapshot-timestamp, branch, tag
Enterprise Security
Inherits ACLs from underlying catalog (Hive Ranger, AWS IAM, Nessie authorization); REST catalog secured with credential/token properties
Apache Flink Iceberg Feature Matrix
Comprehensive breakdown of Iceberg capabilities in Apache Flink 1.18+
Dimension | Support Level | Implementation Details | Min Version |
---|---|---|---|
Catalog Types | FullComplete | Hive, Hadoop, REST (incl. Nessie), AWS Glue, JDBC, custom implementations | 1.18+ |
Batch & Streaming Reads | FullComplete | Snapshot reads, incremental DataStreams, FLIP-27 source, exactly-once semantics | 1.18+ |
Streaming Writes | FullExactly-Once | Checkpoint-based commits, INSERT INTO, automatic snapshot creation | 1.18+ |
DML Operations | PartialUPSERT Only | INSERT always available; UPSERT via write.upsert.enabled; no MERGE INTO in SQL | 1.18+ |
CDC Integration | FullReference Engine | Native Debezium/Kafka CDC, Flink CDC connectors, pipeline connectors | 1.18+ |
Format V3 Support | FullGA | Deletion vectors, row lineage, new types (Flink 1.18+ + Iceberg 1.8+) | 1.18+ |
Time Travel | FullSource Options | start-snapshot-id, start-snapshot-timestamp, branch, tag options | 1.18+ |
Schema Evolution | LimitedDDL Restrictions | ALTER TABLE properties only; no ADD/RENAME columns via SQL | 1.18+ |
Table Maintenance | FullActions API | Actions.rewriteDataFiles(), expire snapshots, remove orphans as batch jobs | 1.18+ |
Security & Governance | FullCatalog Delegated | Inherits catalog ACLs (Ranger, IAM, Nessie); REST auth with tokens | 1.18+ |
DDL Limitations | LimitedKnown Issues | No computed columns, watermarks, or column ADD/RENAME in Iceberg DDL | N/A |
SQL MERGE Operations | NoneNot Supported | MERGE INTO not available in Flink SQL; use UPSERT mode instead | N/A |
Showing 12 entries
Use Cases
Real-time CDC Pipelines
Industry-leading change data capture from databases to data lakes
- Database-to-lakehouse replication with exactly-once semantics
- Multi-source CDC aggregation and transformation
- Real-time data synchronization across systems
- Event-driven architecture with streaming updates
Stream Processing & Analytics
Complex event processing with stateful computations
- Real-time fraud detection and alerting
- IoT sensor data processing and aggregation
- Financial trading and risk analytics
- Social media and clickstream analytics
Data Lake Ingestion
High-throughput data ingestion with automatic optimization
- Kafka-to-Iceberg streaming pipelines
- Multi-format data ingestion and standardization
- Schema evolution handling in streaming contexts
- Automatic data quality validation and cleansing
Incremental ETL Processing
Efficient incremental processing with checkpoint recovery
- Large-scale incremental transformations
- Historical data reprocessing with time travel
- Complex multi-stage pipeline orchestration
- Fault-tolerant processing with exactly-once guarantees