Google BigQuery
Serverless Google Cloud data warehouse with managed Iceberg tables, automatic optimization, Storage Write API streaming, and deep GCP ecosystem integration
Key Features
Dual Table Model
BigQuery-managed Iceberg (internal catalog, full DML) and BigLake external Iceberg (Dataplex/HMS/Glue via GCS, query + limited writes)
Automatic Optimization
Fully automatic file-size tuning, clustering, metadata compaction & orphan-file GC. No user-issued OPTIMIZE or VACUUM commands required
Asymmetric DML Support
Managed tables: INSERT, UPDATE, DELETE, MERGE with GoogleSQL semantics. External tables: limited INSERT support via Dataflow/Spark
Intelligent Storage Strategy
Operations generate position/equality delete files (MoR). Automatic compaction, clustering, and garbage collection (CoW) in background
Storage Write API Streaming
High-throughput streaming via Storage Write API (Preview) - Dataflow, Beam, Spark. No built-in CDC apply; use Datastream + Dataflow patterns
Parquet-Only Format
Parquet only (preview). ORC/Avro not yet supported. v2 required for managed tables, v3 evaluation planned for 2025
Differential Time Travel
Managed tables: FOR SYSTEM_TIME AS OF syntax translating to snapshots. External BigLake tables: no SQL time travel currently
BigQuery-Native Security
IAM permissions like native BigQuery tables. Column-level security & masking on managed Iceberg. External via BigLake/Dataplex policy tags
Preview Status Limitations
Feature is Pre-GA (behavior may change). No table rename/clone, limited concurrency, external writes rely on Dataflow/Spark
GCP Ecosystem Integration
BigQuery ML on Iceberg data, Dataform transformations, BigQuery Omni cross-cloud queries, end-to-end lineage through Dataplex
Google BigQuery Iceberg Feature Matrix
Comprehensive breakdown of Iceberg capabilities in Google BigQuery
Dimension | Support Level | Implementation Details | Status |
---|---|---|---|
Catalog Types | PartialDual Model | BigQuery-managed (internal) + BigLake external (Dataplex/HMS/Glue); no REST/Nessie | Preview |
SQL Analytics | PartialTable-dependent | Managed: full CREATE/CTAS/INSERT/DML; External: SELECT + limited INSERT via Dataflow | Preview |
DML Operations | PartialManaged Full | Managed: INSERT/UPDATE/DELETE/MERGE; External: limited INSERT via external tools | Preview |
Storage Strategy | FullAuto Optimization | MoR operations + automatic CoW optimization; background compaction/clustering/GC | Preview |
Streaming Support | PartialStorage Write API | High-throughput via Storage Write API; Dataflow/Beam/Spark; no built-in CDC | Preview |
Format Support | LimitedParquet Only | Parquet only; no ORC/Avro; v2 required; v3 evaluation planned 2025 | Preview |
Time Travel | PartialManaged Only | Managed: FOR SYSTEM_TIME AS OF; External: no SQL time travel currently | Preview |
Schema Evolution | FullMetadata-only | ADD/DROP/RENAME columns; type widening; instant reflection in information_schema | Preview |
Security & Governance | FullIAM + Column Masking | BigQuery IAM + column-level security/masking; BigLake/Dataplex policy tags | Preview |
GCP Integration | FullNative Ecosystem | BigQuery ML, Dataform, Omni, Dataplex lineage; multi-engine access to managed tables | Preview |
Automatic Optimization | FullZero Maintenance | Automatic file compaction, clustering, metadata optimization, garbage collection | Preview |
Preview Limitations | SeveralPre-GA | No table rename/clone; concurrency limits; external DML limited; behavior may change | Preview |
Showing 12 entries
Use Cases
Serverless Data Warehouse
Fully managed Iceberg tables with automatic optimization
- Modern data warehouse with zero maintenance overhead
- Analytics workloads requiring automatic optimization
- Teams wanting BigQuery's serverless benefits on Iceberg
- High-frequency update scenarios with background optimization
GCP-Native Analytics Platform
Deep integration with Google Cloud ecosystem services
- BigQuery ML on Iceberg data for machine learning
- Dataform transformations on Iceberg tables
- Cross-cloud analytics with BigQuery Omni
- End-to-end data lineage through Dataplex
Streaming Analytics with Storage Write API
High-throughput streaming ingestion for real-time analytics
- Real-time streaming data analysis
- High-volume ingestion via Dataflow/Beam/Spark
- Near real-time dashboard and reporting
- CDC processing with Datastream integration
Multi-Engine Data Lake
Iceberg tables accessible from multiple GCP services
- Data shared between BigQuery and Dataproc Spark/Flink
- Multi-engine analytical workloads
- Hybrid batch and streaming architectures
- Open format data lake with BigQuery performance
Resources & Documentation
Official Documentation
Complete API reference and guides
Getting Started Guide
Quick start tutorials and examples
BigQuery Iceberg Tables Documentation
Documentation
Announcing BigQuery Iceberg Support
Documentation
Create External Iceberg Tables
Documentation
DML Operations Guide
Documentation
Column-level Security
Documentation
Time Travel Documentation
Documentation
BigQuery Iceberg Limitations
Documentation
Metadata Caching for External Tables
Documentation