Skip to main content

Google BigQuery

Serverless Google Cloud data warehouse with managed Iceberg tables, automatic optimization, Storage Write API streaming, and deep GCP ecosystem integration

Key Features

80
Managed + External

Dual Table Model

BigQuery-managed Iceberg (internal catalog, full DML) and BigLake external Iceberg (Dataplex/HMS/Glue via GCS, query + limited writes)

Explore details
100
Zero Maintenance

Automatic Optimization

Fully automatic file-size tuning, clustering, metadata compaction & orphan-file GC. No user-issued OPTIMIZE or VACUUM commands required

Explore details
75
Managed Full, External Limited

Asymmetric DML Support

Managed tables: INSERT, UPDATE, DELETE, MERGE with GoogleSQL semantics. External tables: limited INSERT support via Dataflow/Spark

Explore details
95
Auto MoR + CoW

Intelligent Storage Strategy

Operations generate position/equality delete files (MoR). Automatic compaction, clustering, and garbage collection (CoW) in background

Explore details
70
High-Throughput Preview

Storage Write API Streaming

High-throughput streaming via Storage Write API (Preview) - Dataflow, Beam, Spark. No built-in CDC apply; use Datastream + Dataflow patterns

Explore details
40
Limited File Formats

Parquet-Only Format

Parquet only (preview). ORC/Avro not yet supported. v2 required for managed tables, v3 evaluation planned for 2025

Explore details
60
Managed vs External

Differential Time Travel

Managed tables: FOR SYSTEM_TIME AS OF syntax translating to snapshots. External BigLake tables: no SQL time travel currently

Explore details
95
IAM + Column Masking

BigQuery-Native Security

IAM permissions like native BigQuery tables. Column-level security & masking on managed Iceberg. External via BigLake/Dataplex policy tags

Explore details
50
Pre-GA Features

Preview Status Limitations

Feature is Pre-GA (behavior may change). No table rename/clone, limited concurrency, external writes rely on Dataflow/Spark

Explore details
100
Native Services

GCP Ecosystem Integration

BigQuery ML on Iceberg data, Dataform transformations, BigQuery Omni cross-cloud queries, end-to-end lineage through Dataplex

Explore details

Google BigQuery Iceberg Feature Matrix

Comprehensive breakdown of Iceberg capabilities in Google BigQuery

Dimension
Support Level
Implementation Details
Status
Catalog Types
PartialDual Model
BigQuery-managed (internal) + BigLake external (Dataplex/HMS/Glue); no REST/Nessie
Preview
SQL Analytics
PartialTable-dependent
Managed: full CREATE/CTAS/INSERT/DML; External: SELECT + limited INSERT via Dataflow
Preview
DML Operations
PartialManaged Full
Managed: INSERT/UPDATE/DELETE/MERGE; External: limited INSERT via external tools
Preview
Storage Strategy
FullAuto Optimization
MoR operations + automatic CoW optimization; background compaction/clustering/GC
Preview
Streaming Support
PartialStorage Write API
High-throughput via Storage Write API; Dataflow/Beam/Spark; no built-in CDC
Preview
Format Support
LimitedParquet Only
Parquet only; no ORC/Avro; v2 required; v3 evaluation planned 2025
Preview
Time Travel
PartialManaged Only
Managed: FOR SYSTEM_TIME AS OF; External: no SQL time travel currently
Preview
Schema Evolution
FullMetadata-only
ADD/DROP/RENAME columns; type widening; instant reflection in information_schema
Preview
Security & Governance
FullIAM + Column Masking
BigQuery IAM + column-level security/masking; BigLake/Dataplex policy tags
Preview
GCP Integration
FullNative Ecosystem
BigQuery ML, Dataform, Omni, Dataplex lineage; multi-engine access to managed tables
Preview
Automatic Optimization
FullZero Maintenance
Automatic file compaction, clustering, metadata optimization, garbage collection
Preview
Preview Limitations
SeveralPre-GA
No table rename/clone; concurrency limits; external DML limited; behavior may change
Preview

Showing 12 entries

Use Cases

Serverless Data Warehouse

Fully managed Iceberg tables with automatic optimization

  • Modern data warehouse with zero maintenance overhead
  • Analytics workloads requiring automatic optimization
  • Teams wanting BigQuery's serverless benefits on Iceberg
  • High-frequency update scenarios with background optimization

GCP-Native Analytics Platform

Deep integration with Google Cloud ecosystem services

  • BigQuery ML on Iceberg data for machine learning
  • Dataform transformations on Iceberg tables
  • Cross-cloud analytics with BigQuery Omni
  • End-to-end data lineage through Dataplex

Streaming Analytics with Storage Write API

High-throughput streaming ingestion for real-time analytics

  • Real-time streaming data analysis
  • High-volume ingestion via Dataflow/Beam/Spark
  • Near real-time dashboard and reporting
  • CDC processing with Datastream integration

Multi-Engine Data Lake

Iceberg tables accessible from multiple GCP services

  • Data shared between BigQuery and Dataproc Spark/Flink
  • Multi-engine analytical workloads
  • Hybrid batch and streaming architectures
  • Open format data lake with BigQuery performance

Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

  • Email Support: Reach out to our team at hello@olake.io for prompt assistance.
  • Join our Slack Community: where we discuss future roadmaps, discuss bugs, help folks to debug issues they are facing and more.
  • Schedule a Call: If you prefer a one-on-one conversation, schedule a call with our CTO and team.

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!