Skip to main content

Apache Impala v4.4+

High-performance analytics engine with Iceberg v2 support, row-level operations via position deletes, and deep HMS integration for enterprise environments

Key Features

75
Enterprise Ready

HMS-Centric Catalog Integration

Deep integration with Hive Metastore, HadoopCatalog, and HadoopTables; other catalog implementations configurable via Hive site-config

Explore details
80
Position Deletes

Iceberg v2 Row-level Operations

Full support for INSERT, DELETE, UPDATE operations using Iceberg v2 position-delete files with MERGE operations in preview

Explore details
90
SQL Syntax

Advanced Time Travel

Manual snapshot queries via FOR SYSTEM_TIME AS OF / FOR SYSTEM_VERSION AS OF with DESCRIBE HISTORY & EXPIRE SNAPSHOTS commands

Explore details
85
LLVM Compiled

High-Performance Optimizations

Hidden-partition pruning, LLVM-compiled query paths, in-memory data caching, parallel manifest reads, and Puffin NDV statistics support

Explore details
70
CoW & Position MoR

Storage Strategy Support

Copy-on-Write for overwrites and Merge-on-Read for row-level operations using position-delete files; equality deletes not supported

Explore details
75
v1/v2 Support

Format Compatibility

Reads & writes Iceberg spec v1 and v2 tables with Parquet data; ORC/Avro read-only; default write format is Parquet with Snappy compression

Explore details
85
Ranger & HMS

Enterprise Security Integration

Relies on Hive Metastore + Apache Ranger ACLs with storage-layer permissions (HDFS/S3/Ozone) for comprehensive enterprise security

Explore details
60
Important Constraints

Current Limitations & Requirements

Position deletes only, no streaming/CDC, schema evolution limits on complex types, HMS dependency, and v4.4+ requirement for full DML support

Explore details

Apache Impala Iceberg Feature Matrix

Comprehensive breakdown of Iceberg capabilities in Apache Impala v4.4+

Dimension
Support Level
Implementation Details
Min Version
Catalog Types
LimitedHMS-Centric
HiveCatalog (HMS), HadoopCatalog, HadoopTables; other catalog-impl via Hive site-config
4.0+
Read & Write Operations
FullACID Isolation
Complete SELECT/INSERT/CTAS/ALTER/DROP with ACID snapshot-isolation
4.0+
DML Operations
PartialMERGE Preview
INSERT, DELETE, UPDATE with position deletes; MERGE in CDW 1.5.5 preview
4.4+
MoR/CoW Storage
PartialPosition Only
CoW for overwrites; MoR for row-level ops with position deletes only
4.4+
Time Travel
FullSQL Syntax
FOR SYSTEM_TIME/VERSION AS OF; DESCRIBE HISTORY, EXPIRE SNAPSHOTS
4.0+
Performance Optimization
FullLLVM Compiled
LLVM compilation, hidden-partition pruning, manifest caching, parallel reads
4.0+
Format Support
v1/v2Parquet Focus
Iceberg v1 & v2; Parquet read/write; ORC/Avro read-only
4.0+
Security & Governance
FullRanger Integration
HMS + Apache Ranger ACLs; storage-layer permissions (HDFS/S3/Ozone)
4.0+
Metadata Tables
FullVirtual Tables
$snapshots, $history, $manifests, $files virtual tables available
4.0+
Streaming Support
NoneSnapshot Only
No built-in streaming/CDC; reads latest snapshot at query start
N/A
Iceberg v3 Support
Nonev1/v2 Only
Format-versions 1 & 2 only; v3 features not supported
N/A
Cloud Catalog Integration
NoneHMS Required
No direct AWS Glue, REST, or Nessie support; HMS dependency
N/A

Showing 12 entries

Use Cases

Enterprise Hadoop Analytics

High-performance analytics in existing Hadoop ecosystems with HMS integration

  • Cloudera Data Platform deployments with existing HMS infrastructure
  • Migration from Hive tables to Iceberg with minimal disruption
  • Enterprise data warehousing with Apache Ranger security
  • Traditional BI tools requiring SQL interface to data lakes

Interactive Business Intelligence

Sub-second analytics for dashboards and reporting applications

  • Real-time dashboards with LLVM-optimized query performance
  • Interactive analytics requiring hidden-partition pruning
  • Business intelligence platforms with complex analytical queries
  • Self-service analytics with time travel capabilities

Data Warehouse Modernization

Transitioning from traditional data warehouses to modern lakehouse architecture

  • RDBMS to Iceberg migration with transactional consistency
  • Legacy data warehouse replacement with ACID guarantees
  • Enterprise reporting modernization with existing security models
  • Gradual migration strategies with format compatibility

Compliance & Audit Workloads

Regulatory environments requiring detailed access control and audit trails

  • Financial services regulatory reporting with time travel
  • Healthcare data governance with Apache Ranger integration
  • Audit trail requirements with comprehensive metadata tables
  • Compliance frameworks requiring detailed access logging

Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

  • Email Support: Reach out to our team at hello@olake.io for prompt assistance.
  • Join our Slack Community: where we discuss future roadmaps, discuss bugs, help folks to debug issues they are facing and more.
  • Schedule a Call: If you prefer a one-on-one conversation, schedule a call with our CTO and team.

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!