DuckDB v1.3+
A light-weight, read-only analytics engine for Iceberg with SQL time travel, external file caching, and REST catalog support
Key Features
Catalog Support
Hadoop (file-system) and Iceberg REST catalogs supported via rest option with bearer/OAuth tokens; no native Hive/Glue catalog yet
Read-only Analytics Excellence
Full SELECT support with predicate evaluation, manifest pruning and external file-cache to avoid re-downloading S3/GCS objects
Advanced Time Travel
Convenient SQL syntax: SELECT * FROM tbl AT (VERSION => 314159) or AT (TIMESTAMP => '2025-05-01 10:15:00'); older function-style still works
External File Caching
External file-cache via SET s3_cache_size='4GB'; halves cold-scan latency for Iceberg on S3/GCS with intelligent object reuse
Format Compatibility
Parquet remains the only supported data-file format; Avro/ORC data files are ignored, limiting compatibility with mixed-format tables
Metadata Operations
iceberg_snapshots() returns current snapshot first with summary JSON; iceberg_metadata() exposes file-size/row-count stats for planner optimization
Security Integration
Uses DuckDB's standard S3/Azure credentials via httpfs extension; REST-catalog tokens per-session; no built-in RBAC/row-masking
Current Limitations
Read-only engine with no write support; tables with deletes not supported; Format V3 capabilities absent; single-node execution constraints
DuckDB Iceberg Feature Matrix
Comprehensive breakdown of Iceberg capabilities in DuckDB v1.3+
Dimension | Support Level | Implementation Details | Min Version |
---|---|---|---|
Catalog Types | PartialHadoop + REST | Hadoop (file-system), REST catalog with OAuth tokens; no native Hive/Glue support | 1.3+ |
Read Operations | FullAnalytics Optimized | Complete SELECT support, predicate pushdown, manifest pruning, external file-cache | 1.3+ |
Write Operations | NoneRead-Only | No INSERT/UPDATE/DELETE/CREATE TABLE AS ICEBERG support | N/A |
Time Travel | FullSQL Syntax | New AT (VERSION/TIMESTAMP) syntax plus legacy function-style options | 1.3+ |
Delete File Support | NoneCoW Only | Reading tables with deletes not yet supported; Copy-on-Write tables only | N/A |
Format V3 Support | NoneV1/V2 Only | DuckDB 1.3 reads v1 & v2 tables only; V3 evaluation post-GA | N/A |
Data File Formats | LimitedParquet Only | Parquet files only; Avro/ORC data files are ignored | 1.3+ |
Streaming Support | NoneBatch Only | Batch-only analytics; no streaming ingestion or CDC capabilities | N/A |
Metadata Operations | FullHelper Functions | iceberg_snapshots(), iceberg_metadata() with summary JSON and planner stats | 1.3+ |
Cloud Storage Optimization | FullFile Cache | External file-cache via s3_cache_size reduces cold-scan latency by ~50% | 1.3+ |
Security Integration | BasicCredential Delegation | S3/Azure creds via httpfs, REST tokens; no built-in RBAC/row-masking | 1.3+ |
Scale Limitations | Single-NodeLocal Resources | Single-node execution; large lake queries constrained by local resources | 1.3+ |
Showing 12 entries
Use Cases
Interactive Data Exploration
Fast, ad-hoc analytics on Iceberg tables for data scientists and analysts
- Laptop-based data science with cloud data lakes
- Quick data quality assessment and profiling
- Prototyping data transformations and analysis
- Educational and learning environments
Development & Testing
Lightweight engine for developing and testing data pipelines
- Local development against production Iceberg tables
- Testing query logic before deploying to production
- Debugging data pipeline outputs and transformations
- Schema validation and compatibility testing
Analytical Reporting
Read-only reporting and dashboard data preparation
- Business intelligence report generation
- Data extraction for external systems and tools
- Historical trend analysis with time travel
- Cross-functional data sharing and exploration
Data Lake Auditing
Compliance and audit scenarios leveraging time travel capabilities
- Point-in-time data auditing and compliance
- Data lineage investigation and debugging
- Historical data comparison and validation
- Regulatory reporting with specific timestamps