Skip to main content

Apache Hive 4.0+

Traditional data warehouse with first-class Iceberg support, full SQL DML, hidden partitioning, and Ranger-based governance for batch analytics

Key Features

100
Native Support

First-Class Catalog Integration

Hive Metastore default via HiveIcebergStorageHandler; Hadoop, REST/Nessie, AWS Glue, JDBC, or custom catalogs configurable

Explore details
100
Full SQL Support

Traditional SQL Analytics

SELECT, INSERT INTO, atomic INSERT OVERWRITE, CTAS, CREATE TABLE LIKE; works through Tez or MapReduce jobs

Explore details
85
Tez Required

Complete DML Operations

SQL DELETE, UPDATE, and MERGE INTO supported when Hive runs on Tez; operations rewrite whole files (CoW)

Explore details
70
CoW Only

Copy-on-Write Operations

Copy-on-Write for all Hive writes; Merge-on-Read delete files are readable but not produced by Hive

Explore details
0
Batch Only

No Streaming Support

No native streaming; ingest via micro-batch jobs only; CDC pipelines typically rely on Spark/Flink then query with Hive

Explore details
95
Full Support

Schema Evolution & Metadata

ALTER TABLE ADD/RENAME COLUMN; metadata tables ($snapshots, $history) queryable; compaction via ALTER TABLE COMPACT

Explore details
100
Ranger Integration

Enterprise Security

Inherits Ranger/SQL-standard policies from Hive Metastore; Ranger policies can target Iceberg tables and storage-handler paths

Explore details
40
No V3

Legacy Format Support

Hive 4 bundles Iceberg 1.4.3, predating spec v3. Cannot write or reliably read v3 tables until upgrade to Iceberg β‰₯ 1.8.0

Explore details

Apache Hive Iceberg Feature Matrix

Comprehensive breakdown of Iceberg capabilities in Apache Hive 4.0+

Dimension
Support Level
Implementation Details
Min Version
Catalog Types
FullNative HMS
Hive Metastore (default), Hadoop, REST/Nessie, AWS Glue, JDBC, custom implementations
4.0+
SQL Analytics
FullComplete
SELECT, INSERT INTO, INSERT OVERWRITE, CTAS, CREATE TABLE LIKE via Tez/MapReduce
4.0+
DML Operations
FullTez Required
DELETE, UPDATE, MERGE INTO supported when running on Tez execution engine
4.0+
Storage Strategy
PartialCoW Only
Copy-on-Write for all writes; can read but not produce Merge-on-Read files
4.0+
Streaming Support
NoneBatch Only
No native streaming; micro-batch jobs only; pair with Spark/Flink for real-time
N/A
Format Support
Limitedv1/v2 Only
Reads/writes spec v1/v2; no v3 support (bundles Iceberg 1.4.3)
4.0+
Time Travel
PartialProperties Only
Hidden partitioning supported; time-travel via snapshot/branch properties, not SQL
4.0+
Schema Evolution
FullComplete DDL
ALTER TABLE ADD/RENAME COLUMN; metadata tables queryable; ALTER TABLE COMPACT
4.0+
Security & Governance
FullRanger Native
Apache Ranger integration; HMS policies; table and storage-path access control
4.0+
Table Migration
FullNative
ALTER TABLE SET STORED AS ICEBERG for migrating existing Hive tables
4.0+
Performance Limitations
IssuesCoW Overhead
Copy-on-Write rewrites hurt small updates; HMS locks limit concurrency
4.0+
Known Limitations
SeveralEngine Constraints
Early Hive 4 snapshot bugs; requires Tez for DML; no SQL time travel syntax
4.0+

Showing 12 entries

Use Cases

Traditional Data Warehouse Analytics

Large-scale batch analytics with familiar SQL interface

  • Complex analytical queries on historical data
  • Business intelligence and reporting workloads
  • Data warehouse modernization projects
  • Migration from traditional RDBMS systems

Batch ETL Processing

Scheduled data transformation and loading operations

  • Daily/weekly ETL job processing
  • Data quality validation and cleansing
  • Large-scale data aggregation and summarization
  • Slowly changing dimension processing

Lambda Architecture Batch Layer

Batch processing component in hybrid architectures

  • Historical data processing in Lambda architectures
  • Batch views for real-time streaming applications
  • Data reconciliation between batch and speed layers
  • Long-term data retention and archival

Legacy System Integration

Bridge between traditional Hadoop and modern data lakes

  • Gradual migration from traditional Hive tables
  • Integration with existing Hadoop ecosystem tools
  • Leveraging existing Hive skills and workflows
  • Maintaining compatibility with legacy applications


πŸ’‘ Join the OLake Community!

Got questions, ideas, or just want to connect with other data engineers?
πŸ‘‰ Join our Slack Community to get real-time support, share feedback, and shape the future of OLake together. πŸš€

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!