Google BigQuery
Serverless Google Cloud data warehouse with managed Iceberg tables, automatic optimization, Storage Write API streaming, and deep GCP ecosystem integration
Key Features
Dual Table Model
BigQuery-managed Iceberg (internal catalog, full DML) and BigLake external Iceberg (Dataplex/HMS/Glue via GCS, query + limited writes)
Automatic Optimization
Fully automatic file-size tuning, clustering, metadata compaction & orphan-file GC. No user-issued OPTIMIZE or VACUUM commands required
Asymmetric DML Support
Managed tables: INSERT, UPDATE, DELETE, MERGE with GoogleSQL semantics. External tables: limited INSERT support via Dataflow/Spark
Intelligent Storage Strategy
Operations generate position/equality delete files (MoR). Automatic compaction, clustering, and garbage collection (CoW) in background
Storage Write API Streaming
High-throughput streaming via Storage Write API (Preview) - Dataflow, Beam, Spark. No built-in CDC apply; use Datastream + Dataflow patterns
Parquet-Only Format
Parquet only (preview). ORC/Avro not yet supported. v2 required for managed tables, v3 evaluation planned for 2025
Differential Time Travel
Managed tables: FOR SYSTEM_TIME AS OF syntax translating to snapshots. External BigLake tables: no SQL time travel currently
BigQuery-Native Security
IAM permissions like native BigQuery tables. Column-level security & masking on managed Iceberg. External via BigLake/Dataplex policy tags
Preview Status Limitations
Feature is Pre-GA (behavior may change). No table rename/clone, limited concurrency, external writes rely on Dataflow/Spark
GCP Ecosystem Integration
BigQuery ML on Iceberg data, Dataform transformations, BigQuery Omni cross-cloud queries, end-to-end lineage through Dataplex
Google BigQuery Iceberg Feature Matrix
Comprehensive breakdown of Iceberg capabilities in Google BigQuery
Dimension | Support Level | Implementation Details | Status |
|---|---|---|---|
Catalog Types | PartialDual Model | BigQuery-managed (internal) + BigLake external (Dataplex/HMS/Glue); no REST/Nessie | Preview |
SQL Analytics | PartialTable-dependent | Managed: full CREATE/CTAS/INSERT/DML; External: SELECT + limited INSERT via Dataflow | Preview |
DML Operations | PartialManaged Full | Managed: INSERT/UPDATE/DELETE/MERGE; External: limited INSERT via external tools | Preview |
Storage Strategy | FullAuto Optimization | MoR operations + automatic CoW optimization; background compaction/clustering/GC | Preview |
Streaming Support | PartialStorage Write API | High-throughput via Storage Write API; Dataflow/Beam/Spark; no built-in CDC | Preview |
Format Support | LimitedParquet Only | Parquet only; no ORC/Avro; v2 required; v3 evaluation planned 2025 | Preview |
Time Travel | PartialManaged Only | Managed: FOR SYSTEM_TIME AS OF; External: no SQL time travel currently | Preview |
Schema Evolution | FullMetadata-only | ADD/DROP/RENAME columns; type widening; instant reflection in information_schema | Preview |
Security & Governance | FullIAM + Column Masking | BigQuery IAM + column-level security/masking; BigLake/Dataplex policy tags | Preview |
GCP Integration | FullNative Ecosystem | BigQuery ML, Dataform, Omni, Dataplex lineage; multi-engine access to managed tables | Preview |
Automatic Optimization | FullZero Maintenance | Automatic file compaction, clustering, metadata optimization, garbage collection | Preview |
Preview Limitations | SeveralPre-GA | No table rename/clone; concurrency limits; external DML limited; behavior may change | Preview |
Showing 12 entries
Use Cases
Serverless Data Warehouse
Fully managed Iceberg tables with automatic optimization
- Real-world example: An e-commerce company manages 5TB of customer transaction data in BigQuery managed Iceberg tables. The automatic optimization service continuously compacts small files and optimizes table layout in the background, eliminating the need for manual OPTIMIZE commands. The data engineering team saves 15+ hours per week of maintenance work while queries remain fast
- Modern data warehouse with zero maintenance overhead for production workloads
- Analytics workloads requiring automatic optimization without manual intervention
- High-frequency update scenarios with background optimization and clustering
GCP-Native Analytics Platform
Deep integration with Google Cloud ecosystem services
- Real-world example: A fintech startup uses BigQuery ML to train fraud detection models directly on Iceberg tables containing payment transaction data. They use Dataform to transform raw data into analytics-ready tables, track end-to-end lineage through Dataplex, and use BigQuery Omni to query data across Google Cloud and AWS without moving it
- BigQuery ML on Iceberg data for machine learning model training and inference
- Dataform transformations on Iceberg tables for ELT pipelines
- Cross-cloud analytics with BigQuery Omni for multi-cloud data access
Streaming Analytics with Storage Write API
High-throughput streaming ingestion for real-time analytics
- Real-world example: A gaming company ingests player event data from 10 million active users using Dataflow with the Storage Write API into BigQuery managed Iceberg tables. Events become queryable within 2-3 seconds, powering real-time leaderboards and player analytics dashboards. They process 50,000 events per second with near real-time visibility
- Real-time streaming data analysis with sub-second to second latency
- High-volume ingestion via Dataflow/Apache Beam/Spark for operational analytics
- CDC processing with Datastream integration for database replication
Multi-Engine Data Lake
Iceberg tables accessible from multiple GCP services
- Real-world example: A media company stores video metadata in BigQuery managed Iceberg tables. Their data analysts use BigQuery SQL for reporting, while data scientists use Dataproc Spark for ML feature engineering, and streaming engineers use Flink for real-time processing. All three teams access the same Iceberg tables without data duplication or ETL
- Data shared between BigQuery and Dataproc Spark/Flink for unified analytics
- Multi-engine analytical workloads with consistent data access
- Hybrid batch and streaming architectures with open format interoperability
Resources & Documentation
Official Documentation
Complete API reference and guides
Getting Started Guide
Quick start tutorials and examples
BigQuery Iceberg Tables Documentation
Documentation
Announcing BigQuery Iceberg Support
Documentation
Create External Iceberg Tables
Documentation
DML Operations Guide
Documentation
Column-level Security
Documentation
Time Travel Documentation
Documentation
BigQuery Iceberg Limitations
Documentation
Metadata Caching for External Tables
Documentation