Skip to main content

Apache Doris v2.1+

MPP analytical database with comprehensive Iceberg read/write capabilities, vectorized execution, materialized view acceleration, and multi-catalog support for lake ingestion and analytics

Key Features

100
6+ Catalog Types

Multi-Catalog Excellence

CREATE CATALOG supports hive metastore, glue, rest, hadoop, dlf, s3tables with metastore URIs/REST endpoints plus credentials for comprehensive catalog integration

Explore details
90
Full Write Capabilities

MPP Lake Ingestion Engine

Full SELECT and write-back: INSERT, INSERT OVERWRITE, CTAS. Doris writes Parquet/ORC files and commits Iceberg snapshots as lake-ingestion engine and analytics layer

Explore details
80
UPDATE/DELETE ✓ MERGE ✗

Evolving DML Support

INSERT INTO (append), INSERT OVERWRITE, UPDATE & DELETE via Iceberg-v2 delete files (v2.1+). MERGE not yet single statement but emulated with patterns

Explore details
95
MoR + CoW

Complete Storage Strategy

Reads: applies position and equality delete files (MoR) automatically. Writes: generates position/equality delete files for UPDATE/DELETE; INSERT OVERWRITE rewrites files (CoW)

Explore details
10
External Tools Required

No Native Streaming

No native streaming/CDC writer; use external tools (Flink-Iceberg) to land data, then query with sub-second latency. Routine Load only targets internal tables

Explore details
70
v1/v2 + Parquet/ORC

Stable Format Support

Reads & writes Parquet (v1/v2) + ORC (v1/v2). Supports Iceberg spec v1 & v2; equality-delete support for ORC arrives in v2.1.3+. No v3 support yet

Explore details
100
SQL + System Tables

Comprehensive Time Travel

Query historical data with FOR TIMESTAMP AS OF / FOR VERSION AS OF or iceberg_meta() function. System tables ($snapshots, $manifests, $history) exposed

Explore details
85
Doris RBAC + Catalog

Layered Security Model

Doris RBAC plus underlying catalog/storage IAM. Ranger/Lake Formation policies apply at metastore/storage; Doris adds row-policies & column masking on query

Explore details
95
Vectorized + MVs

Advanced Performance Features

Vectorized reader, manifest & data-file caching, partition-predicate push-down, materialized-view acceleration on Iceberg sources, CREATE MATERIALIZED VIEW … REFRESH

Explore details
75
Several Constraints

Known Limitations

No MERGE statement; no continuous streaming writes; Avro data files unsupported; concurrent multi-engine writes may need manual conflict retries

Explore details

Apache Doris Iceberg Feature Matrix

Comprehensive breakdown of Iceberg capabilities in Apache Doris v2.1+

Dimension
Support Level
Implementation Details
Since Version
Catalog Types
Full6+ Types
HMS, Glue, REST, Hadoop, DLF, S3Tables with unified CREATE CATALOG syntax
2.1+
SQL Analytics
FullMPP + Lake Ingestion
Full SELECT + write-back (INSERT, INSERT OVERWRITE, CTAS); Parquet/ORC writing
2.1+
DML Operations
PartialUPDATE/DELETE ✓ MERGE ✗
INSERT, INSERT OVERWRITE, UPDATE/DELETE via delete files; MERGE emulated with patterns
2.1+
Storage Strategy
FullMoR + CoW
Reads position/equality deletes automatically; generates delete files; INSERT OVERWRITE (CoW)
2.1.3+
Streaming Support
NoneExternal Tools
No native streaming/CDC; use Flink-Iceberg + query with sub-second latency
N/A
Format Support
Partialv1/v2 + Parquet/ORC
Parquet + ORC (v1/v2); Iceberg spec v1/v2; no Avro or v3 support
2.1.3+
Time Travel
FullSQL + System Tables
FOR TIMESTAMP/VERSION AS OF; iceberg_meta() function; system tables ($snapshots, etc.)
2.1+
Schema Evolution
FullMetadata-only
ADD/DROP/RENAME columns, type evolution; automatic schema discovery
2.1+
Security & Governance
PartialLayered Model
Doris RBAC + catalog IAM; row/column policies; Ranger/LF limited enforcement
2.1+
Performance Features
FullVectorized + MVs
Vectorized reader, caching, predicate pushdown, materialized views with auto-refresh
2.1+
Known Limitations
SeveralClear Constraints
No MERGE; no streaming; no Avro; concurrent write conflicts need manual handling
2.1+
Iceberg Library
Currentv1.6.1
Bundled Iceberg client 1.6.1; follows upstream roadmap for v3 support
2.1+

Showing 12 entries

Use Cases

MPP Lake Analytics

High-performance analytics on Iceberg data lakes

  • Complex analytical queries with vectorized execution
  • Multi-catalog data lake analytics and federation
  • High-performance OLAP workloads on lake data
  • Real-time analytics on externally ingested streaming data

Lake Ingestion and ETL

Data ingestion and transformation into Iceberg tables

  • ETL processes writing directly to Iceberg tables
  • Data warehouse modernization with lake storage
  • Batch data processing with INSERT OVERWRITE patterns
  • Data transformation pipelines with CTAS operations

Unified Analytics Platform

Single engine for both ingestion and analytics

  • Organizations wanting unified lake architecture
  • Teams requiring both transformation and querying capabilities
  • Materialized view acceleration for frequently accessed aggregates
  • Cross-catalog analytics with comprehensive catalog support

Hybrid Streaming-Batch Architecture

Batch analytics layer with external streaming ingestion

  • Lambda architectures with Flink streaming + Doris analytics
  • Real-time ingestion via external tools, sub-second query latency
  • Batch processing layer in streaming architectures
  • Historical analysis on continuously updated datasets

Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

  • Email Support: Reach out to our team at hello@olake.io for prompt assistance.
  • Join our Slack Community: where we discuss future roadmaps, discuss bugs, help folks to debug issues they are facing and more.
  • Schedule a Call: If you prefer a one-on-one conversation, schedule a call with our CTO and team.

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!