Skip to main content

Unity Catalog (Databricks)

OLake supports Databricks Unity Catalog as a REST catalog destination with token-based or OAuth2 authentication.

Important Limitations

⚠️ Current Unity Catalog Limitations:

  • Append Only: At the time of creation of this documentation, Unity Catalog only supports Append operations (NO updates) for Iceberg writes
  • Managed Tables: Unity Catalog supports Iceberg writes only in its own managed Iceberg tables
  • OAuth2 Status: Currently, Personal Access Token authentication has been tested and works well. OAuth2 is available as an alternative but is experiencing issues (receiving internal server errors from Databricks during testing)

For more details on Unity Catalog Iceberg support, refer to the official Databricks documentation.

This documentation is primarily focused on creating tokens for ingesting data into Unity/Databricks managed Iceberg tables using OLake.

Prerequisites

  • Admin access to Databricks workspace
  • Access to AWS S3 bucket (if using AWS for external storage)

Setup Instructions

Step 1: Create a New User & Token

Create the User

It is recommended to create a new user instead of using an existing one for security purposes.

  1. Go to top right corner user section → SettingsIdentity and AccessUsers (Manage)

  2. Add a new user with email address

  3. Go to SettingsAdvancedAccess Control

  4. Turn on Personal Access Tokens if it's not already enabled

    Personal Access Tokens Settings

  5. Go to Permissions and add the new user to allow token generation

Create the Token

  1. Login with the new user credentials
  2. Go to SettingsDeveloperAccess TokensManage
  3. Create a new token and set the appropriate validity period
  4. Copy and save the token immediately (it's only visible once)

Step 2: Create External Location for Data Storage

  1. Create a new external location as Unity Catalog Iceberg requires S3-based external storage (when using AWS).

    Navigate to the Catalog section and click on "Create an external location":

    Create External Location

  2. Follow the recommended Databricks guide to add S3 bucket storage quickly and securely.

Step 3: Create Schema with Proper Permissions

  1. Create a new schema using the storage location created in Step 2

    Create New Schema Dialog

  2. Go to SchemaPermissions

  3. Grant the following permissions to the newly created user:

    • ALL PRIVILEGES
    • EXTERNAL USE SCHEMA
    • MANAGE

    Grant Schema Permissions

Configuration

Unity Catalog supports two authentication methods: Token-based (recommended) and OAuth2 (alternative).

Common Configuration Fields

These fields are required for both authentication methods:

ParameterSample ValueDescription
REST Catalog URLhttps://adb-123456789.databricks.com/api/2.1/unity-catalog/iceberg-restDatabricks workspace URL with Unity Catalog REST API endpoint. Use your actual workspace URL.
Iceberg S3 Path (Warehouse)workspaceName of the catalog in Unity Catalog (e.g., "workspace", "main"). This appears as iceberg_s3_path in JSON config.
Iceberg DatabasedefaultNamespace name inside the catalog (e.g., "default", "production"). This appears as iceberg_db in JSON config.
NormalizationtrueEnable data normalization for proper formatting in Unity Catalog. Recommended to keep enabled.
No Identifier FieldstrueRequired for Unity Catalog managed Iceberg tables that don't support equality delete-based updates. Must be enabled.

✅ This method has been thoroughly tested and confirmed working

Additional field required for token-based authentication:

ParameterSample ValueDescription
Tokendapi1234567890abcdef...Databricks Personal Access Token for authentication. Created in Settings > Developer > Access Tokens.

OAuth2 Authentication (Alternative)

⚠️ Note: OAuth2 authentication is currently experiencing issues with internal server errors from Databricks during testing.

Additional fields required for OAuth2 authentication:

ParameterSample ValueDescription
REST Auth Typeoauth2Set authentication type to OAuth2
OAuth2 URIhttps://adb-123456789.databricks.com/oidc/v1/tokenOAuth2 server URI for your Databricks workspace
Credentialclient_id:client_secretClient ID and secret in format "id:secret"
Scopesql offline_accessOAuth2 scopes (space-separated)

Important Notes

  • Unity Catalog Compatibility: The no_identifier_fields: true setting is crucial for Unity Catalog managed Iceberg tables as they don't support equality delete-based updates
  • Normalization: Set normalization: true to ensure proper data formatting for Unity Catalog
  • REST API: Unity Catalog uses Iceberg's REST catalog API for table operations
  • Permissions: Ensure your user or service principal has appropriate permissions on the target catalog and schema

Troubleshooting

Common Issues

  1. Authentication Errors: Verify your Personal Access Token is valid and has the necessary permissions
  2. Catalog Not Found: Ensure the catalog name exists in your Unity Catalog
  3. Schema Permissions: Check that you have CREATE TABLE permissions on the target schema
  4. Network Access: Verify your OLake instance can reach the Databricks workspace URL

For more general guidance on Iceberg integration, refer to the Iceberg writer documentation.


Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

  • Email Support: Reach out to our team at hello@olake.io for prompt assistance.
  • Join our Slack Community: where we discuss future roadmaps, discuss bugs, help folks to debug issues they are facing and more.
  • Schedule a Call: If you prefer a one-on-one conversation, schedule a call with our CTO and team.

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!