Apache Impala v4.4+
High-performance analytics engine with Iceberg v2 support, row-level operations via position deletes, and deep HMS integration for enterprise environments
Key Features
HMS-Centric Catalog Integration
Deep integration with Hive Metastore, HadoopCatalog, and HadoopTables; other catalog implementations configurable via Hive site-config
Iceberg v2 Row-level Operations
Full support for INSERT, DELETE, UPDATE operations using Iceberg v2 position-delete files with MERGE operations in preview
Advanced Time Travel
Manual snapshot queries via FOR SYSTEM_TIME AS OF / FOR SYSTEM_VERSION AS OF with DESCRIBE HISTORY & EXPIRE SNAPSHOTS commands
High-Performance Optimizations
Hidden-partition pruning, LLVM-compiled query paths, in-memory data caching, parallel manifest reads, and Puffin NDV statistics support
Storage Strategy Support
Copy-on-Write for overwrites and Merge-on-Read for row-level operations using position-delete files; equality deletes not supported
Format Compatibility
Reads & writes Iceberg spec v1 and v2 tables with Parquet data; ORC/Avro read-only; default write format is Parquet with Snappy compression
Enterprise Security Integration
Relies on Hive Metastore + Apache Ranger ACLs with storage-layer permissions (HDFS/S3/Ozone) for comprehensive enterprise security
Current Limitations & Requirements
Position deletes only, no streaming/CDC, schema evolution limits on complex types, HMS dependency, and v4.4+ requirement for full DML support
Apache Impala Iceberg Feature Matrix
Comprehensive breakdown of Iceberg capabilities in Apache Impala v4.4+
Dimension | Support Level | Implementation Details | Min Version |
---|---|---|---|
Catalog Types | LimitedHMS-Centric | HiveCatalog (HMS), HadoopCatalog, HadoopTables; other catalog-impl via Hive site-config | 4.0+ |
Read & Write Operations | FullACID Isolation | Complete SELECT/INSERT/CTAS/ALTER/DROP with ACID snapshot-isolation | 4.0+ |
DML Operations | PartialMERGE Preview | INSERT, DELETE, UPDATE with position deletes; MERGE in CDW 1.5.5 preview | 4.4+ |
MoR/CoW Storage | PartialPosition Only | CoW for overwrites; MoR for row-level ops with position deletes only | 4.4+ |
Time Travel | FullSQL Syntax | FOR SYSTEM_TIME/VERSION AS OF; DESCRIBE HISTORY, EXPIRE SNAPSHOTS | 4.0+ |
Performance Optimization | FullLLVM Compiled | LLVM compilation, hidden-partition pruning, manifest caching, parallel reads | 4.0+ |
Format Support | v1/v2Parquet Focus | Iceberg v1 & v2; Parquet read/write; ORC/Avro read-only | 4.0+ |
Security & Governance | FullRanger Integration | HMS + Apache Ranger ACLs; storage-layer permissions (HDFS/S3/Ozone) | 4.0+ |
Metadata Tables | FullVirtual Tables | $snapshots, $history, $manifests, $files virtual tables available | 4.0+ |
Streaming Support | NoneSnapshot Only | No built-in streaming/CDC; reads latest snapshot at query start | N/A |
Iceberg v3 Support | Nonev1/v2 Only | Format-versions 1 & 2 only; v3 features not supported | N/A |
Cloud Catalog Integration | NoneHMS Required | No direct AWS Glue, REST, or Nessie support; HMS dependency | N/A |
Showing 12 entries
Use Cases
Enterprise Hadoop Analytics
High-performance analytics in existing Hadoop ecosystems with HMS integration
- Cloudera Data Platform deployments with existing HMS infrastructure
- Migration from Hive tables to Iceberg with minimal disruption
- Enterprise data warehousing with Apache Ranger security
- Traditional BI tools requiring SQL interface to data lakes
Interactive Business Intelligence
Sub-second analytics for dashboards and reporting applications
- Real-time dashboards with LLVM-optimized query performance
- Interactive analytics requiring hidden-partition pruning
- Business intelligence platforms with complex analytical queries
- Self-service analytics with time travel capabilities
Data Warehouse Modernization
Transitioning from traditional data warehouses to modern lakehouse architecture
- RDBMS to Iceberg migration with transactional consistency
- Legacy data warehouse replacement with ACID guarantees
- Enterprise reporting modernization with existing security models
- Gradual migration strategies with format compatibility
Compliance & Audit Workloads
Regulatory environments requiring detailed access control and audit trails
- Financial services regulatory reporting with time travel
- Healthcare data governance with Apache Ranger integration
- Audit trail requirements with comprehensive metadata tables
- Compliance frameworks requiring detailed access logging