Skip to main content

AWS Glue Catalog Write Guide

OLake integrates with AWS Glue Catalog to provide full support for Apache Iceberg tables. This setup ensures that:

  • Data is stored in Amazon S3 (Parquet + metadata files)
  • Metadata is managed in AWS Glue Catalog (schemas, partitions, table properties)
  • OLake seamlessly writes into Iceberg tables through Glue APIs

Prerequisites​

Before configuring OLake with AWS Glue Catalog, ensure the following are set up:

1. Amazon S3 Bucket​

  • Create an S3 bucket in the same AWS region as your Glue Catalog.
  • Example: s3://olake-iceberg/

2. AWS IAM Permissions​

  • Create an IAM role or user with Glue + S3 access.

Here is a sample IAM policy example:

IAM Policy JSON
IAM Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "GlueAccess",
"Effect": "Allow",
"Action": [
"glue:CreateTable",
"glue:CreateDatabase",
"glue:GetTable",
"glue:GetDatabase",
"glue:GetDatabases",
"glue:SearchTables",
"glue:UpdateDatabase",
"glue:UpdateTable"
],
"Resource": [
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:catalog",
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:database/{AWS_GLUE_DATABASE_NAME}",
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:table/{AWS_GLUE_DATABASE_NAME}/*"
]
},
{
"Sid": "S3BucketReadWrite",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucket*",
"s3:*Object"
],
"Resource": [
"arn:aws:s3:::{S3_BUCKET_NAME}",
"arn:aws:s3:::{S3_BUCKET_NAME}/*"
]
},
{
"Sid": "ListAllBuckets",
"Effect": "Allow",
"Action": "s3:ListAllMyBuckets",
"Resource": "*"
}
]
}
IAM Policy Notes
  • Replace <REGION>, <ACCOUNT_ID>, {AWS_GLUE_DATABASE_NAME}, and {S3_BUCKET_NAME} with your actual values
  • If you already have databases and tables in Glue Catalog, you can remove CreateDatabase and CreateTable permissions
  • The SearchTables permission is optional and used for table discovery operations
  • Note: Drop table or delete table permissions are not included. Add glue:DeleteTable if you need table deletion capabilities

Configuration​

  • Before setting up the destination, make sure you have successfully set up the source.

AWS Glue UI Configuration

ParameterSample ValueDescription
Iceberg S3 Path (Warehouse)s3://<BUCKET_NAME>/S3 bucket path where Iceberg table data and metadata files will be stored.
AWS Regionus-east-1AWS region containing the S3 bucket and Glue Data Catalog resources.
AWS Access KeyXXXAWS access key ID for authentication. Optional if using IAM roles or instance profiles.
AWS Secret KeyXXXAWS secret access key for authentication. Optional if using IAM roles or instance profiles.
Iceberg Databaseiceberg_dbDatabase name to create in AWS Glue Data Catalog for organizing Iceberg tables.

Click Next -> to test the connection and verify that OLake can validate both Glue Catalog and S3 access.

After you have successfully set up the destination: Configure your streams

Connection Testing

OLake will automatically test:

  • AWS credentials validity
  • S3 bucket access permissions
  • Glue Catalog connectivity
  • Database creation/access permissions

Querying Data​

Query Your Data with AWS Athena

Once OLake has written data to Iceberg tables in AWS Glue Catalog, you can query the data using AWS Athena:

SELECT * FROM "ICEBERG_DATABASE_NAME"."TABLE_NAME" LIMIT 10;

Troubleshooting​

The OLake Iceberg Writer with AWS Glue Catalog stops immediately upon encountering errors to ensure data integrity. Below are common issues and their fixes:

  • AccessDeniedException: User is not authorized to perform action
    • Cause: IAM role or user lacks required permissions for AWS Glue or S3 operations.
    • Fix:
      • Ensure your IAM policy includes necessary Glue permissions:
        {
        "Version": "2012-10-17",
        "Statement": [
        {
        "Effect": "Allow",
        "Action": [
        "glue:CreateDatabase",
        "glue:GetDatabase",
        "glue:CreateTable",
        "glue:GetTable",
        "glue:UpdateTable",
        "glue:GetPartitions"
        ],
        "Resource": "*"
        }
        ]
        }
      • Add S3 permissions for the warehouse path:
        {
        "Effect": "Allow",
        "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket"
        ],
        "Resource": [
        "arn:aws:s3:::your-bucket-name/*",
        "arn:aws:s3:::your-bucket-name"
        ]
        }
  • NoSuchBucket: The specified bucket does not exist
    • Cause: S3 bucket doesn't exist, wrong region, or incorrect bucket name in configuration.
    • Fix:
      • Verify bucket exists in the correct region:
        aws s3 ls s3://your-bucket-name/ --region us-east-1
      • Create bucket if it doesn't exist:
        aws s3 mb s3://your-bucket-name --region us-east-1
      • Ensure the aws_region in your configuration matches the bucket's region.
  • Database does not exist in Glue Catalog
    • Cause: Specified database name doesn't exist in AWS Glue Data Catalog.
    • Fix: OLake will automatically create the database if you have glue:CreateDatabase permissions. Verify permissions or create manually:
      aws glue create-database --database-input Name=iceberg_db --region us-east-1
  • InvalidInputException: Invalid S3 location
    • Cause: Malformed S3 path or unsupported S3 URI format.
    • Fix:
      • Ensure S3 path follows the correct format: s3://bucket-name/path/
      • Path must end with a trailing slash for warehouse locations
      • Avoid special characters except hyphens and underscores
      • Example valid paths:
        s3://my-iceberg-warehouse/
        s3://data-lake-bucket/iceberg/warehouse/
  • Connection timeout or network errors
    • Cause: Network connectivity issues, VPC configuration, or security group restrictions.
    • Fix:
      • Verify internet connectivity to AWS services
      • Check VPC endpoints for Glue and S3 if running in private subnets
      • Ensure security groups allow outbound HTTPS (port 443) traffic
      • Test connectivity:
        aws sts get-caller-identity --region us-east-1
        aws glue get-databases --region us-east-1
  • Table already exists with different schema
    • Cause: Attempting to create an Iceberg table that conflicts with existing table schema.
    • Fix:
      • Check existing table schema in Glue Console
      • Drop and recreate table if schema change is intended:
        aws glue delete-table --database-name iceberg_db --name table_name --region us-east-1
      • Or use OLake's schema evolution capabilities for compatible changes


πŸ’‘ Join the OLake Community!

Got questions, ideas, or just want to connect with other data engineers?
πŸ‘‰ Join our Slack Community to get real-time support, share feedback, and shape the future of OLake together. πŸš€

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!