Hive Catalog
Use iceberg_s3_path
with s3a
prefix if your Hive is configured so. This will work for most use cases. Otherwise, use iceberg_path
with s3
prefix.
{
"type": "ICEBERG",
"writer": {
"catalog_type": "hive",
"normalization": false,
"iceberg_s3_path": "s3a://warehouse/",
"aws_region": "us-east-1",
"aws_access_key": "admin",
"aws_secret_key": "password",
"s3_endpoint": "http://localhost:9000",
"hive_uri": "thrift://localhost:9083",
"s3_use_ssl": false,
"s3_path_style": true,
"hive_clients": 5,
"hive_sasl_enabled": false,
"iceberg_db": "ICEBERG_DATABASE_NAME"
}
}
Hive Configuration Parameters
Parameter | Sample Value | Description |
---|---|---|
catalog_type | hive | Indicates the catalog type used by the writer. "hive" means that the writer uses the Hive Metastore for catalog operations. |
normalization | false | Specifies whether data normalization is applied. "false" means that normalization is disabled. |
iceberg_s3_path | s3a://warehouse/ | Determines the S3 path or storage location for Iceberg data. The value "s3a://warehouse/" represents the designated S3 bucket or directory. |
aws_region | us-east-1 | Specifies the AWS region associated with the S3 bucket where the data is stored. |
aws_access_key | admin | Provides the AWS access key used for authentication when connecting to S3. |
aws_secret_key | password | Provides the AWS secret key used for authentication when connecting to S3. |
s3_endpoint | http://localhost:9000 | Specifies the endpoint URL for the S3 service. This may be used when connecting to an S3-compatible storage service like MinIO running on localhost. |
hive_uri | thrift://localhost:9083 | Defines the URI of the Hive Metastore service that the writer will connect to for catalog interactions. |
s3_use_ssl | false | Indicates whether SSL is enabled for S3 connections. "false" means that SSL is disabled for these communications. |
s3_path_style | true | Determines if path-style access is used for S3. "true" means that the writer will use path-style addressing instead of the default virtual-hosted style. |
hive_clients | 5 | Specifies the number of Hive clients allocated for managing interactions with the Hive Metastore. |
hive_sasl_enabled | false | Indicates whether SASL authentication is enabled for the Hive connection. "false" means that SASL is disabled. |
iceberg_db | olake_iceberg | Specifies the name of the Iceberg database to be used by the destination configuration. |
You can query the data via:
SELECT * FROM CATALOG_NAME.ICEBERG_DATABASE_NAME.TABLE_NAME;
CATALOG_NAME
can be:jdbc_catalog
,hive_catalog
,rest_catalog
, etc.ICEBERG_DATABASE_NAME
is the name of the Iceberg database you created / added as a value indestination.json
file.
For S3 related permissions which is needed to write data to S3, refer to the AWS S3 Permissions documentation.
If you wish to test out the REST Catalog locally, you can use the docker-compose setup. The local test setup uses Minio as an S3-compatible storage and other all supported catalog types.
You can then setup local spark to run queries on the iceberg tables created in the local test setup.