Skip to main content

Apache Hive 4.0+

Traditional data warehouse with first-class Iceberg support, full SQL DML, hidden partitioning, and Ranger-based governance for batch analytics

Key Features

100
Native Support

First-Class Catalog Integration

Hive Metastore default via HiveIcebergStorageHandler; Hadoop, REST/Nessie, AWS Glue, JDBC, or custom catalogs configurable

Explore details
100
Full SQL Support

Traditional SQL Analytics

SELECT, INSERT INTO, atomic INSERT OVERWRITE, CTAS, CREATE TABLE LIKE; works through Tez or MapReduce jobs

Explore details
85
Tez Required

Complete DML Operations

SQL DELETE, UPDATE, and MERGE INTO supported when Hive runs on Tez; operations rewrite whole files (CoW)

Explore details
70
CoW Only

Copy-on-Write Operations

Copy-on-Write for all Hive writes; Merge-on-Read delete files are readable but not produced by Hive

Explore details
0
Batch Only

No Streaming Support

No native streaming; ingest via micro-batch jobs only; CDC pipelines typically rely on Spark/Flink then query with Hive

Explore details
95
Full Support

Schema Evolution & Metadata

ALTER TABLE ADD/RENAME COLUMN; metadata tables ($snapshots, $history) queryable; compaction via ALTER TABLE COMPACT

Explore details
100
Ranger Integration

Enterprise Security

Inherits Ranger/SQL-standard policies from Hive Metastore; Ranger policies can target Iceberg tables and storage-handler paths

Explore details
40
No V3

Legacy Format Support

Hive 4 bundles Iceberg 1.4.3, predating spec v3. Cannot write or reliably read v3 tables until upgrade to Iceberg ≥ 1.8.0

Explore details

Apache Hive Iceberg Feature Matrix

Comprehensive breakdown of Iceberg capabilities in Apache Hive 4.0+

Dimension
Support Level
Implementation Details
Min Version
Catalog Types
FullNative HMS
Hive Metastore (default), Hadoop, REST/Nessie, AWS Glue, JDBC, custom implementations
4.0+
SQL Analytics
FullComplete
SELECT, INSERT INTO, INSERT OVERWRITE, CTAS, CREATE TABLE LIKE via Tez/MapReduce
4.0+
DML Operations
FullTez Required
DELETE, UPDATE, MERGE INTO supported when running on Tez execution engine
4.0+
Storage Strategy
PartialCoW Only
Copy-on-Write for all writes; can read but not produce Merge-on-Read files
4.0+
Streaming Support
NoneBatch Only
No native streaming; micro-batch jobs only; pair with Spark/Flink for real-time
N/A
Format Support
Limitedv1/v2 Only
Reads/writes spec v1/v2; no v3 support (bundles Iceberg 1.4.3)
4.0+
Time Travel
PartialProperties Only
Hidden partitioning supported; time-travel via snapshot/branch properties, not SQL
4.0+
Schema Evolution
FullComplete DDL
ALTER TABLE ADD/RENAME COLUMN; metadata tables queryable; ALTER TABLE COMPACT
4.0+
Security & Governance
FullRanger Native
Apache Ranger integration; HMS policies; table and storage-path access control
4.0+
Table Migration
FullNative
ALTER TABLE SET STORED AS ICEBERG for migrating existing Hive tables
4.0+
Performance Limitations
IssuesCoW Overhead
Copy-on-Write rewrites hurt small updates; HMS locks limit concurrency
4.0+
Known Limitations
SeveralEngine Constraints
Early Hive 4 snapshot bugs; requires Tez for DML; no SQL time travel syntax
4.0+

Showing 12 entries

Use Cases

Traditional Data Warehouse Analytics

Large-scale batch analytics with familiar SQL interface

  • Complex analytical queries on historical data
  • Business intelligence and reporting workloads
  • Data warehouse modernization projects
  • Migration from traditional RDBMS systems

Batch ETL Processing

Scheduled data transformation and loading operations

  • Daily/weekly ETL job processing
  • Data quality validation and cleansing
  • Large-scale data aggregation and summarization
  • Slowly changing dimension processing

Lambda Architecture Batch Layer

Batch processing component in hybrid architectures

  • Historical data processing in Lambda architectures
  • Batch views for real-time streaming applications
  • Data reconciliation between batch and speed layers
  • Long-term data retention and archival

Legacy System Integration

Bridge between traditional Hadoop and modern data lakes

  • Gradual migration from traditional Hive tables
  • Integration with existing Hadoop ecosystem tools
  • Leveraging existing Hive skills and workflows
  • Maintaining compatibility with legacy applications

Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

  • Email Support: Reach out to our team at hello@olake.io for prompt assistance.
  • Join our Slack Community: where we discuss future roadmaps, discuss bugs, help folks to debug issues they are facing and more.
  • Schedule a Call: If you prefer a one-on-one conversation, schedule a call with our CTO and team.

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!