AWS Glue Catalog Write Guide
OLake integrates with AWS Glue Catalog to provide full support for Apache Iceberg tables. This setup ensures that:
- Data is stored in Amazon S3 (Parquet + metadata files)
- Metadata is managed in AWS Glue Catalog (schemas, partitions, table properties)
- OLake seamlessly writes into Iceberg tables through Glue APIs
Prerequisitesβ
Before configuring OLake with AWS Glue Catalog, ensure the following are set up:
1. Amazon S3 Bucketβ
- Create an S3 bucket in the same AWS region as your Glue Catalog.
- Example:
s3://olake-iceberg/
2. AWS IAM Permissionsβ
- Create an IAM role or user with Glue + S3 access.
Here is a sample IAM policy example:
IAM Policy JSON
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "GlueAccess",
"Effect": "Allow",
"Action": [
"glue:CreateTable",
"glue:CreateDatabase",
"glue:GetTable",
"glue:GetDatabase",
"glue:GetDatabases",
"glue:SearchTables",
"glue:UpdateDatabase",
"glue:UpdateTable"
],
"Resource": [
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:catalog",
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:database/{AWS_GLUE_DATABASE_NAME}",
"arn:aws:glue:<REGION>:<ACCOUNT_ID>:table/{AWS_GLUE_DATABASE_NAME}/*"
]
},
{
"Sid": "S3BucketReadWrite",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucket*",
"s3:*Object"
],
"Resource": [
"arn:aws:s3:::{S3_BUCKET_NAME}",
"arn:aws:s3:::{S3_BUCKET_NAME}/*"
]
},
{
"Sid": "ListAllBuckets",
"Effect": "Allow",
"Action": "s3:ListAllMyBuckets",
"Resource": "*"
}
]
}
- Replace
<REGION>
,<ACCOUNT_ID>
,{AWS_GLUE_DATABASE_NAME}
, and{S3_BUCKET_NAME}
with your actual values - If you already have databases and tables in Glue Catalog, you can remove
CreateDatabase
andCreateTable
permissions - The
SearchTables
permission is optional and used for table discovery operations - Note: Drop table or delete table permissions are not included. Add
glue:DeleteTable
if you need table deletion capabilities
Configurationβ
- OLake UI
- OLake CLI
- Before setting up the destination, make sure you have successfully set up the source.
Parameter | Sample Value | Description |
---|---|---|
Iceberg S3 Path (Warehouse) | s3://<BUCKET_NAME>/ | S3 bucket path where Iceberg table data and metadata files will be stored. |
AWS Region | us-east-1 | AWS region containing the S3 bucket and Glue Data Catalog resources. |
AWS Access Key | XXX | AWS access key ID for authentication. Optional if using IAM roles or instance profiles. |
AWS Secret Key | XXX | AWS secret access key for authentication. Optional if using IAM roles or instance profiles. |
Iceberg Database | iceberg_db | Database name to create in AWS Glue Data Catalog for organizing Iceberg tables. |
Click Next ->
to test the connection and verify that OLake can validate both Glue Catalog and S3 access.
After you have successfully set up the destination: Configure your streams
Create a destination.json
file with the following configuration:
{
"type": "ICEBERG",
"writer": {
"catalog_type": "glue",
"iceberg_s3_path": "s3://<BUCKET_NAME>/",
"aws_region": "us-east-1",
"aws_access_key": "XXX",
"aws_secret_key": "XXX",
"iceberg_db": "iceberg_db"
}
}
Parameter | Sample Value | Description |
---|---|---|
iceberg_s3_path | s3://<BUCKET_NAME>/ | S3 bucket path where Iceberg table data and metadata files will be stored. |
aws_region | us-east-1 | AWS region containing the S3 bucket and Glue Data Catalog resources. |
aws_access_key | XXX | AWS access key with sufficient permissions for S3 and Glue. Optional if using IAM role attached to running instance/pod. |
aws_secret_key | XXX | AWS secret key with sufficient permissions for S3 and Glue. Optional if using IAM role attached to running instance/pod. |
iceberg_db | iceberg_db | Database name to create in AWS Glue Data Catalog for organizing Iceberg tables. |
Using this destination.json
, you can run the sync command.
After you have successfully set up the destination: Run the Discover command
OLake will automatically test:
- AWS credentials validity
- S3 bucket access permissions
- Glue Catalog connectivity
- Database creation/access permissions
Querying Dataβ
Once OLake has written data to Iceberg tables in AWS Glue Catalog, you can query the data using AWS Athena:
SELECT * FROM "ICEBERG_DATABASE_NAME"."TABLE_NAME" LIMIT 10;
Troubleshootingβ
The OLake Iceberg Writer with AWS Glue Catalog stops immediately upon encountering errors to ensure data integrity. Below are common issues and their fixes:
-
AccessDeniedException: User is not authorized to perform action
- Cause: IAM role or user lacks required permissions for AWS Glue or S3 operations.
- Fix:
- Ensure your IAM policy includes necessary Glue permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"glue:CreateDatabase",
"glue:GetDatabase",
"glue:CreateTable",
"glue:GetTable",
"glue:UpdateTable",
"glue:GetPartitions"
],
"Resource": "*"
}
]
} - Add S3 permissions for the warehouse path:
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::your-bucket-name/*",
"arn:aws:s3:::your-bucket-name"
]
}
- Ensure your IAM policy includes necessary Glue permissions:
-
NoSuchBucket: The specified bucket does not exist
- Cause: S3 bucket doesn't exist, wrong region, or incorrect bucket name in configuration.
- Fix:
- Verify bucket exists in the correct region:
aws s3 ls s3://your-bucket-name/ --region us-east-1
- Create bucket if it doesn't exist:
aws s3 mb s3://your-bucket-name --region us-east-1
- Ensure the
aws_region
in your configuration matches the bucket's region.
- Verify bucket exists in the correct region:
-
Database does not exist in Glue Catalog
- Cause: Specified database name doesn't exist in AWS Glue Data Catalog.
- Fix: OLake will automatically create the database if you have
glue:CreateDatabase
permissions. Verify permissions or create manually:aws glue create-database --database-input Name=iceberg_db --region us-east-1
-
InvalidInputException: Invalid S3 location
- Cause: Malformed S3 path or unsupported S3 URI format.
- Fix:
- Ensure S3 path follows the correct format:
s3://bucket-name/path/
- Path must end with a trailing slash for warehouse locations
- Avoid special characters except hyphens and underscores
- Example valid paths:
s3://my-iceberg-warehouse/
s3://data-lake-bucket/iceberg/warehouse/
- Ensure S3 path follows the correct format:
-
Connection timeout or network errors
- Cause: Network connectivity issues, VPC configuration, or security group restrictions.
- Fix:
- Verify internet connectivity to AWS services
- Check VPC endpoints for Glue and S3 if running in private subnets
- Ensure security groups allow outbound HTTPS (port 443) traffic
- Test connectivity:
aws sts get-caller-identity --region us-east-1
aws glue get-databases --region us-east-1
-
Table already exists with different schema
- Cause: Attempting to create an Iceberg table that conflicts with existing table schema.
- Fix:
- Check existing table schema in Glue Console
- Drop and recreate table if schema change is intended:
aws glue delete-table --database-name iceberg_db --name table_name --region us-east-1
- Or use OLake's schema evolution capabilities for compatible changes