Last updated:6/30/2025|... min read

Apache Impala v4.4+

High-performance analytics engine with Iceberg v2 support, row-level operations via position deletes, and deep HMS integration for enterprise environments

Key Features

Enterprise Ready

HMS-Centric Catalog Integration

Deep integration with Hive Metastore, HadoopCatalog, and HadoopTables; other catalog implementations configurable via Hive site-config

Explore details

Position Deletes

Iceberg v2 Row-level Operations

Full support for INSERT, DELETE, UPDATE operations using Iceberg v2 position-delete files with MERGE operations in preview

Explore details

SQL Syntax

Advanced Time Travel

Manual snapshot queries via FOR SYSTEM_TIME AS OF / FOR SYSTEM_VERSION AS OF with DESCRIBE HISTORY & EXPIRE SNAPSHOTS commands

Explore details

LLVM Compiled

High-Performance Optimizations

Hidden-partition pruning, LLVM-compiled query paths, in-memory data caching, parallel manifest reads, and Puffin NDV statistics support

Explore details

CoW & Position MoR

Storage Strategy Support

Copy-on-Write for overwrites and Merge-on-Read for row-level operations using position-delete files; equality deletes not supported

Explore details

v1/v2 Support

Format Compatibility

Reads & writes Iceberg spec v1 and v2 tables with Parquet data; ORC/Avro read-only; default write format is Parquet with Snappy compression

Explore details

Ranger & HMS

Enterprise Security Integration

Relies on Hive Metastore + Apache Ranger ACLs with storage-layer permissions (HDFS/S3/Ozone) for comprehensive enterprise security

Explore details

Important Constraints

Current Limitations & Requirements

Position deletes only, no streaming/CDC, schema evolution limits on complex types, HMS dependency, and v4.4+ requirement for full DML support

Explore details

Apache Impala Iceberg Feature Matrix

Comprehensive breakdown of Iceberg capabilities in Apache Impala v4.4+

Dimension	Support Level	Implementation Details	Min Version
Catalog Types	LimitedHMS-Centric	HiveCatalog (HMS), HadoopCatalog, HadoopTables; other catalog-impl via Hive site-config	4.0+
Read & Write Operations	FullACID Isolation	Complete SELECT/INSERT/CTAS/ALTER/DROP with ACID snapshot-isolation	4.0+
DML Operations	PartialMERGE Preview	INSERT, DELETE, UPDATE with position deletes; MERGE in CDW 1.5.5 preview	4.4+
MoR/CoW Storage	PartialPosition Only	CoW for overwrites; MoR for row-level ops with position deletes only	4.4+
Time Travel	FullSQL Syntax	FOR SYSTEM_TIME/VERSION AS OF; DESCRIBE HISTORY, EXPIRE SNAPSHOTS	4.0+
Performance Optimization	FullLLVM Compiled	LLVM compilation, hidden-partition pruning, manifest caching, parallel reads	4.0+
Format Support	v1/v2Parquet Focus	Iceberg v1 & v2; Parquet read/write; ORC/Avro read-only	4.0+
Security & Governance	FullRanger Integration	HMS + Apache Ranger ACLs; storage-layer permissions (HDFS/S3/Ozone)	4.0+
Metadata Tables	FullVirtual Tables	$snapshots, $history, $manifests, $files virtual tables available	4.0+
Streaming Support	NoneSnapshot Only	No built-in streaming/CDC; reads latest snapshot at query start	N/A
Iceberg v3 Support	Nonev1/v2 Only	Format-versions 1 & 2 only; v3 features not supported	N/A
Cloud Catalog Integration	NoneHMS Required	No direct AWS Glue, REST, or Nessie support; HMS dependency	N/A

Showing 12 entries

Live data

For issues, click here (GitHub)

Use Cases

Enterprise Hadoop Analytics

High-performance analytics in existing Hadoop ecosystems with HMS integration

Cloudera Data Platform deployments with existing HMS infrastructure
Migration from Hive tables to Iceberg with minimal disruption
Enterprise data warehousing with Apache Ranger security
Traditional BI tools requiring SQL interface to data lakes

Interactive Business Intelligence

Sub-second analytics for dashboards and reporting applications

Real-time dashboards with LLVM-optimized query performance
Interactive analytics requiring hidden-partition pruning
Business intelligence platforms with complex analytical queries
Self-service analytics with time travel capabilities

Data Warehouse Modernization

Transitioning from traditional data warehouses to modern lakehouse architecture

RDBMS to Iceberg migration with transactional consistency
Legacy data warehouse replacement with ACID guarantees
Enterprise reporting modernization with existing security models
Gradual migration strategies with format compatibility

Compliance & Audit Workloads

Regulatory environments requiring detailed access control and audit trails

Financial services regulatory reporting with time travel
Healthcare data governance with Apache Ranger integration
Audit trail requirements with comprehensive metadata tables
Compliance frameworks requiring detailed access logging

Resources & Documentation

Official Documentation

Complete API reference and guides

Getting Started Guide

Quick start tutorials and examples

Iceberg V2 Tables Documentation

Documentation

Row-level Operations Guide

Documentation

Ranger Iceberg Integration

Documentation

Impala 4.0 Change Log

Documentation

Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!

Apache Impala v4.4+

Key Features

HMS-Centric Catalog Integration

Iceberg v2 Row-level Operations

Advanced Time Travel

High-Performance Optimizations

Storage Strategy Support

Format Compatibility

Enterprise Security Integration

Current Limitations & Requirements

Apache Impala Iceberg Feature Matrix

Use Cases

Enterprise Hadoop Analytics

Interactive Business Intelligence

Data Warehouse Modernization

Compliance & Audit Workloads

Resources & Documentation

Official Documentation

Getting Started Guide

Iceberg V2 Tables Documentation

Row-level Operations Guide

Ranger Iceberg Integration

Impala 4.0 Change Log

Need Assistance?

Join our growing community

GitHub

Slack

Twitter

LinkedIn

YouTube