Apache Hive 4.0+
Traditional data warehouse with first-class Iceberg support, full SQL DML, hidden partitioning, and Ranger-based governance for batch analytics
Key Features
First-Class Catalog Integration
Hive Metastore default via HiveIcebergStorageHandler; Hadoop, REST/Nessie, AWS Glue, JDBC, or custom catalogs configurable
Traditional SQL Analytics
SELECT, INSERT INTO, atomic INSERT OVERWRITE, CTAS, CREATE TABLE LIKE; works through Tez or MapReduce jobs
Complete DML Operations
SQL DELETE, UPDATE, and MERGE INTO supported when Hive runs on Tez; operations rewrite whole files (CoW)
Copy-on-Write Operations
Copy-on-Write for all Hive writes; Merge-on-Read delete files are readable but not produced by Hive
No Streaming Support
No native streaming; ingest via micro-batch jobs only; CDC pipelines typically rely on Spark/Flink then query with Hive
Schema Evolution & Metadata
ALTER TABLE ADD/RENAME COLUMN; metadata tables ($snapshots, $history) queryable; compaction via ALTER TABLE COMPACT
Enterprise Security
Inherits Ranger/SQL-standard policies from Hive Metastore; Ranger policies can target Iceberg tables and storage-handler paths
Legacy Format Support
Hive 4 bundles Iceberg 1.4.3, predating spec v3. Cannot write or reliably read v3 tables until upgrade to Iceberg ≥ 1.8.0
Apache Hive Iceberg Feature Matrix
Comprehensive breakdown of Iceberg capabilities in Apache Hive 4.0+
Dimension | Support Level | Implementation Details | Min Version |
---|---|---|---|
Catalog Types | FullNative HMS | Hive Metastore (default), Hadoop, REST/Nessie, AWS Glue, JDBC, custom implementations | 4.0+ |
SQL Analytics | FullComplete | SELECT, INSERT INTO, INSERT OVERWRITE, CTAS, CREATE TABLE LIKE via Tez/MapReduce | 4.0+ |
DML Operations | FullTez Required | DELETE, UPDATE, MERGE INTO supported when running on Tez execution engine | 4.0+ |
Storage Strategy | PartialCoW Only | Copy-on-Write for all writes; can read but not produce Merge-on-Read files | 4.0+ |
Streaming Support | NoneBatch Only | No native streaming; micro-batch jobs only; pair with Spark/Flink for real-time | N/A |
Format Support | Limitedv1/v2 Only | Reads/writes spec v1/v2; no v3 support (bundles Iceberg 1.4.3) | 4.0+ |
Time Travel | PartialProperties Only | Hidden partitioning supported; time-travel via snapshot/branch properties, not SQL | 4.0+ |
Schema Evolution | FullComplete DDL | ALTER TABLE ADD/RENAME COLUMN; metadata tables queryable; ALTER TABLE COMPACT | 4.0+ |
Security & Governance | FullRanger Native | Apache Ranger integration; HMS policies; table and storage-path access control | 4.0+ |
Table Migration | FullNative | ALTER TABLE SET STORED AS ICEBERG for migrating existing Hive tables | 4.0+ |
Performance Limitations | IssuesCoW Overhead | Copy-on-Write rewrites hurt small updates; HMS locks limit concurrency | 4.0+ |
Known Limitations | SeveralEngine Constraints | Early Hive 4 snapshot bugs; requires Tez for DML; no SQL time travel syntax | 4.0+ |
Showing 12 entries
Use Cases
Traditional Data Warehouse Analytics
Large-scale batch analytics with familiar SQL interface
- Complex analytical queries on historical data
- Business intelligence and reporting workloads
- Data warehouse modernization projects
- Migration from traditional RDBMS systems
Batch ETL Processing
Scheduled data transformation and loading operations
- Daily/weekly ETL job processing
- Data quality validation and cleansing
- Large-scale data aggregation and summarization
- Slowly changing dimension processing
Lambda Architecture Batch Layer
Batch processing component in hybrid architectures
- Historical data processing in Lambda architectures
- Batch views for real-time streaming applications
- Data reconciliation between batch and speed layers
- Long-term data retention and archival
Legacy System Integration
Bridge between traditional Hadoop and modern data lakes
- Gradual migration from traditional Hive tables
- Integration with existing Hadoop ecosystem tools
- Leveraging existing Hive skills and workflows
- Maintaining compatibility with legacy applications