Skip to main content

Google BigQuery

Serverless Google Cloud data warehouse with managed Iceberg tables, automatic optimization, Storage Write API streaming, and deep GCP ecosystem integration

Key Features

80
Managed + External

Dual Table Model

BigQuery-managed Iceberg (internal catalog, full DML) and BigLake external Iceberg (Dataplex/HMS/Glue via GCS, query + limited writes)

Explore details
100
Zero Maintenance

Automatic Optimization

Fully automatic file-size tuning, clustering, metadata compaction & orphan-file GC. No user-issued OPTIMIZE or VACUUM commands required

Explore details
75
Managed Full, External Limited

Asymmetric DML Support

Managed tables: INSERT, UPDATE, DELETE, MERGE with GoogleSQL semantics. External tables: limited INSERT support via Dataflow/Spark

Explore details
95
Auto MoR + CoW

Intelligent Storage Strategy

Operations generate position/equality delete files (MoR). Automatic compaction, clustering, and garbage collection (CoW) in background

Explore details
70
High-Throughput Preview

Storage Write API Streaming

High-throughput streaming via Storage Write API (Preview) - Dataflow, Beam, Spark. No built-in CDC apply; use Datastream + Dataflow patterns

Explore details
40
Limited File Formats

Parquet-Only Format

Parquet only (preview). ORC/Avro not yet supported. v2 required for managed tables, v3 evaluation planned for 2025

Explore details
60
Managed vs External

Differential Time Travel

Managed tables: FOR SYSTEM_TIME AS OF syntax translating to snapshots. External BigLake tables: no SQL time travel currently

Explore details
95
IAM + Column Masking

BigQuery-Native Security

IAM permissions like native BigQuery tables. Column-level security & masking on managed Iceberg. External via BigLake/Dataplex policy tags

Explore details
50
Pre-GA Features

Preview Status Limitations

Feature is Pre-GA (behavior may change). No table rename/clone, limited concurrency, external writes rely on Dataflow/Spark

Explore details
100
Native Services

GCP Ecosystem Integration

BigQuery ML on Iceberg data, Dataform transformations, BigQuery Omni cross-cloud queries, end-to-end lineage through Dataplex

Explore details

Google BigQuery Iceberg Feature Matrix

Comprehensive breakdown of Iceberg capabilities in Google BigQuery

Dimension
Support Level
Implementation Details
Status
Catalog Types
PartialDual Model
BigQuery-managed (internal) + BigLake external (Dataplex/HMS/Glue); no REST/Nessie
Preview
SQL Analytics
PartialTable-dependent
Managed: full CREATE/CTAS/INSERT/DML; External: SELECT + limited INSERT via Dataflow
Preview
DML Operations
PartialManaged Full
Managed: INSERT/UPDATE/DELETE/MERGE; External: limited INSERT via external tools
Preview
Storage Strategy
FullAuto Optimization
MoR operations + automatic CoW optimization; background compaction/clustering/GC
Preview
Streaming Support
PartialStorage Write API
High-throughput via Storage Write API; Dataflow/Beam/Spark; no built-in CDC
Preview
Format Support
LimitedParquet Only
Parquet only; no ORC/Avro; v2 required; v3 evaluation planned 2025
Preview
Time Travel
PartialManaged Only
Managed: FOR SYSTEM_TIME AS OF; External: no SQL time travel currently
Preview
Schema Evolution
FullMetadata-only
ADD/DROP/RENAME columns; type widening; instant reflection in information_schema
Preview
Security & Governance
FullIAM + Column Masking
BigQuery IAM + column-level security/masking; BigLake/Dataplex policy tags
Preview
GCP Integration
FullNative Ecosystem
BigQuery ML, Dataform, Omni, Dataplex lineage; multi-engine access to managed tables
Preview
Automatic Optimization
FullZero Maintenance
Automatic file compaction, clustering, metadata optimization, garbage collection
Preview
Preview Limitations
SeveralPre-GA
No table rename/clone; concurrency limits; external DML limited; behavior may change
Preview

Showing 12 entries

Use Cases

Serverless Data Warehouse

Fully managed Iceberg tables with automatic optimization

  • Real-world example: An e-commerce company manages 5TB of customer transaction data in BigQuery managed Iceberg tables. The automatic optimization service continuously compacts small files and optimizes table layout in the background, eliminating the need for manual OPTIMIZE commands. The data engineering team saves 15+ hours per week of maintenance work while queries remain fast
  • Modern data warehouse with zero maintenance overhead for production workloads
  • Analytics workloads requiring automatic optimization without manual intervention
  • High-frequency update scenarios with background optimization and clustering

GCP-Native Analytics Platform

Deep integration with Google Cloud ecosystem services

  • Real-world example: A fintech startup uses BigQuery ML to train fraud detection models directly on Iceberg tables containing payment transaction data. They use Dataform to transform raw data into analytics-ready tables, track end-to-end lineage through Dataplex, and use BigQuery Omni to query data across Google Cloud and AWS without moving it
  • BigQuery ML on Iceberg data for machine learning model training and inference
  • Dataform transformations on Iceberg tables for ELT pipelines
  • Cross-cloud analytics with BigQuery Omni for multi-cloud data access

Streaming Analytics with Storage Write API

High-throughput streaming ingestion for real-time analytics

  • Real-world example: A gaming company ingests player event data from 10 million active users using Dataflow with the Storage Write API into BigQuery managed Iceberg tables. Events become queryable within 2-3 seconds, powering real-time leaderboards and player analytics dashboards. They process 50,000 events per second with near real-time visibility
  • Real-time streaming data analysis with sub-second to second latency
  • High-volume ingestion via Dataflow/Apache Beam/Spark for operational analytics
  • CDC processing with Datastream integration for database replication

Multi-Engine Data Lake

Iceberg tables accessible from multiple GCP services

  • Real-world example: A media company stores video metadata in BigQuery managed Iceberg tables. Their data analysts use BigQuery SQL for reporting, while data scientists use Dataproc Spark for ML feature engineering, and streaming engineers use Flink for real-time processing. All three teams access the same Iceberg tables without data duplication or ETL
  • Data shared between BigQuery and Dataproc Spark/Flink for unified analytics
  • Multi-engine analytical workloads with consistent data access
  • Hybrid batch and streaming architectures with open format interoperability


πŸ’‘ Join the OLake Community!

Got questions, ideas, or just want to connect with other data engineers?
πŸ‘‰ Join our Slack Community to get real-time support, share feedback, and shape the future of OLake together. πŸš€

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!