Skip to main content

REST Catalog

The REST catalog is a standardized API designed to simplify the management of Apache Iceberg tables across diverse engines and programming languages. By providing a unified client interface, it eliminates the need for separate catalog integrations for engines like Spark, Flink, Trino, or languages like Java.

Built on an OpenAPI specification, the REST catalog offers a modern, flexible alternative to the Hive Metastore's Thrift interface, tailored specifically for Iceberg's architecture.

The Generic REST Catalog is the default implementation of the Apache Iceberg REST Catalog API. It provides a standard, engine-agnostic way to manage Iceberg tables without depending on a specific catalog service like Nessie, Polaris, or Unity.

Prerequisites​

Required services:

  • Object store – e.g., S3, MinIO, or another S3-compatible storage for table data and metadata.
  • Metadata database – typically PostgreSQL, with a dedicated database and user for Iceberg.
  • REST service – the Iceberg REST Catalog service.

Permissions:

  • The REST Catalog service user needs read and write permissions on the object store buckets used by Iceberg tables.
  • The catalog user should have full DDL and DML privileges on the Iceberg metadata database and tables.

Configuration​

Before setting up the destination, make sure you have successfully set up the source.

After setting up the source, configure your destination with REST Catalog.

Create Destination

REST Configuration Parameters​

ParameterSample ValueDescription
REST Catalog URLhttp://<REST_ENDPOINT>:8181Specifies the endpoint URL for the REST catalog service that the writer will connect to.
Iceberg S3 Paths3://<BUCKET_NAME>Determines the S3 path or storage location for Iceberg data. "warehouse" represents the designated storage directory.
Iceberg Database<DATABASE_NAME>Specifies the name of the Iceberg database that will be used by the destination configuration.
S3 Endpointhttp://<S3_ENDPOINT>:9000Endpoint for the S3 service.
AWS Region<S3_REGION>Specifies the AWS region associated with the S3 bucket where the data is stored.
AWS Access Key<S3_ACCESS_KEY>AWS access key (Optional).
AWS Secret Key<S3_SECRET_KEY>AWS secret key (Optional).

Authentication Fields (optional)​

ParameterSample ValueDescription
Tokenabc...xyzSpecifies the Bearer token sent in the Authorization header for authenticating with the REST catalog service.
OAuth2 Auth URIhttps://auth.server.com/oauth/tokenOAuth2 server URI for OAuth2 authentication.
REST Auth Typeoauth2Authentication type (e.g., "oauth2").
Credential (OAuth2)your_id:your_secretSpecifies the client ID and secret for OAuth2, formatted as client_id:client_secret.
Scope (OAuth2)api.read api.writeOAuth2 scopes (space-separated).
REST Signing Names3tablesService name for AWS Signature V4 (e.g., "s3tables").
REST Signing Regionus-east-1Region for AWS Signature V4 signing.
REST Enable Signature V4trueEnable AWS Signature V4 signing (boolean).
Disable Identifier TablesfalseNeeded to set true for Databricks Unity Catalog as it doesn't support identifier fields

After you have successfully set up the destination: Configure your streams


Setup For Local Testing​

Save the following docker-compose.yml which will start the following services required for an Iceberg REST Catalog.

  1. REST Catalog Service (Tabular image) – Provides the REST API for managing Iceberg tables.
  2. PostgreSQL – Serves as the metadata database to track Iceberg table metadata and schema evolution.
  3. MinIO + MinIO Client – An S3-compatible object store used to store table data and snapshots.
docker-compose.yml
version: "3.9"

services:
rest:
image: tabulario/iceberg-rest
container_name: iceberg-rest
ports:
- 8181:8181
volumes:
- catalog-data:/catalog
environment:
AWS_ACCESS_KEY_ID: admin
AWS_SECRET_ACCESS_KEY: password
AWS_REGION: us-east-1
CATALOG_WAREHOUSE: s3://warehouse/
CATALOG_IO__IMPL: org.apache.iceberg.aws.s3.S3FileIO
CATALOG_S3_ENDPOINT: http://minio:9090
CATALOG_URI: jdbc:postgresql://postgres:5432/iceberg
CATALOG_JDBC_USER: iceberg
CATALOG_JDBC_PASSWORD: password
networks:
- iceberg_net
depends_on:
postgres:
condition: service_healthy
mc:
condition: service_completed_successfully

postgres:
image: postgres:15
container_name: postgres
networks:
- iceberg_net
environment:
POSTGRES_USER: iceberg
POSTGRES_PASSWORD: password
POSTGRES_DB: iceberg
healthcheck:
test: [ "CMD", "pg_isready", "-U", "iceberg", "-d", "password" ]
interval: 2s
timeout: 10s
retries: 3
start_period: 10s
ports:
- 5432:5432
volumes:
- ./data/postgres-data:/var/lib/postgresql/data

minio:
image: minio/minio
hostname: minio
container_name: minio
ports:
- 9090:9090
- 9091:9091
volumes:
- minio-data:/data
environment:
MINIO_ACCESS_KEY: admin
MINIO_SECRET_KEY: password
MINIO_DOMAIN: minio
command: server --address ":9090" --console-address ":9091" /data
networks:
iceberg_net:
aliases:
- warehouse.minio

mc:
image: minio/mc
container_name: mc
environment:
AWS_ACCESS_KEY_ID: admin
AWS_SECRET_ACCESS_KEY: password
AWS_REGION: us-east-1
entrypoint: >
/bin/sh -c "
until (/usr/bin/mc alias set minio http://minio:9090 admin password) do echo '...waiting...' && sleep 1; done;
echo 'Ensuring warehouse bucket exists and is public...';
if /usr/bin/mc stat minio/warehouse > /dev/null 2>&1; then
echo 'Warehouse bucket exists, removing for fresh start...';
/usr/bin/mc rm -r --force minio/warehouse || echo 'Failed to remove warehouse, proceeding...';
fi;
/usr/bin/mc mb minio/warehouse;
/usr/bin/mc anonymous set public minio/warehouse;
echo 'Minio warehouse bucket setup complete.';
"
networks:
- iceberg_net
depends_on:
- minio

volumes:
catalog-data:
minio-data:

networks:
iceberg_net:

Start the services:

docker-compose up -d
note

All services involved in the sync OLake, REST Catalog Service, MinIO, and Postgres must run in the same Docker network.


Troubleshooting​

  • Your authentication credentials are invalid ... unauthorized_client
    • Fix:
      • Ensure the correct OAuth/token or DB username/password is provided.
      • Re-issue new tokens or refresh secrets if expired.
    • Check using this command:
      curl -H "Authorization: Bearer <token>" https://<catalog-endpoint>/v1/config
  • User: <ARN> is not authorized to perform: sts:AssumeRole
    • Fix:
      • Make sure the correct IAM role is assigned.
      • Validate role trust relationships and necessary permissions to S3 and Catalog.
    • Check using this command:
      aws sts assume-role --role-arn <S3_role_arn> --role-session-name test-session


πŸ’‘ Join the OLake Community!

Got questions, ideas, or just want to connect with other data engineers?
πŸ‘‰ Join our Slack Community to get real-time support, share feedback, and shape the future of OLake together. πŸš€

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!