Apache Doris v2.1+
MPP analytical database with comprehensive Iceberg read/write capabilities, vectorized execution, materialized view acceleration, and multi-catalog support for lake ingestion and analytics
Key Features
Multi-Catalog Excellence
CREATE CATALOG supports hive metastore, glue, rest, hadoop, dlf, s3tables with metastore URIs/REST endpoints plus credentials for comprehensive catalog integration
MPP Lake Ingestion Engine
Full SELECT and write-back: INSERT, INSERT OVERWRITE, CTAS. Doris writes Parquet/ORC files and commits Iceberg snapshots as lake-ingestion engine and analytics layer
Evolving DML Support
INSERT INTO (append), INSERT OVERWRITE, UPDATE & DELETE via Iceberg-v2 delete files (v2.1+). MERGE not yet single statement but emulated with patterns
Complete Storage Strategy
Reads: applies position and equality delete files (MoR) automatically. Writes: generates position/equality delete files for UPDATE/DELETE; INSERT OVERWRITE rewrites files (CoW)
No Native Streaming
No native streaming/CDC writer; use external tools (Flink-Iceberg) to land data, then query with sub-second latency. Routine Load only targets internal tables
Stable Format Support
Reads & writes Parquet (v1/v2) + ORC (v1/v2). Supports Iceberg spec v1 & v2; equality-delete support for ORC arrives in v2.1.3+. No v3 support yet
Comprehensive Time Travel
Query historical data with FOR TIMESTAMP AS OF / FOR VERSION AS OF or iceberg_meta() function. System tables ($snapshots, $manifests, $history) exposed
Layered Security Model
Doris RBAC plus underlying catalog/storage IAM. Ranger/Lake Formation policies apply at metastore/storage; Doris adds row-policies & column masking on query
Advanced Performance Features
Vectorized reader, manifest & data-file caching, partition-predicate push-down, materialized-view acceleration on Iceberg sources, CREATE MATERIALIZED VIEW … REFRESH
Known Limitations
No MERGE statement; no continuous streaming writes; Avro data files unsupported; concurrent multi-engine writes may need manual conflict retries
Apache Doris Iceberg Feature Matrix
Comprehensive breakdown of Iceberg capabilities in Apache Doris v2.1+
Dimension | Support Level | Implementation Details | Since Version |
---|---|---|---|
Catalog Types | Full6+ Types | HMS, Glue, REST, Hadoop, DLF, S3Tables with unified CREATE CATALOG syntax | 2.1+ |
SQL Analytics | FullMPP + Lake Ingestion | Full SELECT + write-back (INSERT, INSERT OVERWRITE, CTAS); Parquet/ORC writing | 2.1+ |
DML Operations | PartialUPDATE/DELETE ✓ MERGE ✗ | INSERT, INSERT OVERWRITE, UPDATE/DELETE via delete files; MERGE emulated with patterns | 2.1+ |
Storage Strategy | FullMoR + CoW | Reads position/equality deletes automatically; generates delete files; INSERT OVERWRITE (CoW) | 2.1.3+ |
Streaming Support | NoneExternal Tools | No native streaming/CDC; use Flink-Iceberg + query with sub-second latency | N/A |
Format Support | Partialv1/v2 + Parquet/ORC | Parquet + ORC (v1/v2); Iceberg spec v1/v2; no Avro or v3 support | 2.1.3+ |
Time Travel | FullSQL + System Tables | FOR TIMESTAMP/VERSION AS OF; iceberg_meta() function; system tables ($snapshots, etc.) | 2.1+ |
Schema Evolution | FullMetadata-only | ADD/DROP/RENAME columns, type evolution; automatic schema discovery | 2.1+ |
Security & Governance | PartialLayered Model | Doris RBAC + catalog IAM; row/column policies; Ranger/LF limited enforcement | 2.1+ |
Performance Features | FullVectorized + MVs | Vectorized reader, caching, predicate pushdown, materialized views with auto-refresh | 2.1+ |
Known Limitations | SeveralClear Constraints | No MERGE; no streaming; no Avro; concurrent write conflicts need manual handling | 2.1+ |
Iceberg Library | Currentv1.6.1 | Bundled Iceberg client 1.6.1; follows upstream roadmap for v3 support | 2.1+ |
Showing 12 entries
Use Cases
MPP Lake Analytics
High-performance analytics on Iceberg data lakes
- Complex analytical queries with vectorized execution
- Multi-catalog data lake analytics and federation
- High-performance OLAP workloads on lake data
- Real-time analytics on externally ingested streaming data
Lake Ingestion and ETL
Data ingestion and transformation into Iceberg tables
- ETL processes writing directly to Iceberg tables
- Data warehouse modernization with lake storage
- Batch data processing with INSERT OVERWRITE patterns
- Data transformation pipelines with CTAS operations
Unified Analytics Platform
Single engine for both ingestion and analytics
- Organizations wanting unified lake architecture
- Teams requiring both transformation and querying capabilities
- Materialized view acceleration for frequently accessed aggregates
- Cross-catalog analytics with comprehensive catalog support
Hybrid Streaming-Batch Architecture
Batch analytics layer with external streaming ingestion
- Lambda architectures with Flink streaming + Doris analytics
- Real-time ingestion via external tools, sub-second query latency
- Batch processing layer in streaming architectures
- Historical analysis on continuously updated datasets
Resources & Documentation
Official Documentation
Complete API reference and guides
Getting Started Guide
Quick start tutorials and examples
Iceberg Catalog Documentation
Documentation
Iceberg Data Building Guide
Documentation
Iceberg Catalog Configuration
Documentation
ICEBERG_META Function
Documentation
Doris and Iceberg Best Practices
Documentation
Metadata Cache Documentation
Documentation
Built-in Authorization
Documentation
Next-Generation Data Lakehouse
Documentation