OLake Maintenance Kubernetes Installation with Helm
This guide details the process for deploying OLake Fusion on Kubernetes using the official Helm chart. The Fusion components run alongside the standard OLake stack and provide Iceberg table maintenance services.
Components & Architecture
- OLake UI: Main web interface for job management and configuration
- OLake Worker: Background worker for processing data replication jobs
- PostgreSQL: Primary database for storing job data, configurations, sync state, and Temporal visibility data
- Temporal: Workflow orchestration engine for managing job execution
- Signup Init: One-time initialization service that creates the default admin user
- Fusion: Maintenance control plane and API server
- Fusion Spark Optimizer: Spark-on-Kubernetes execution layer for Maintenance jobs

Prerequisites
Ensure the following requirements are met before proceeding:
-
Kubernetes 1.19+: Administrative access to a Kubernetes cluster
-
Helm 3.2.0+: Helm client installed and configured. Installation Guide
-
kubectl: Configured
kubectlcommand-line tool. Installation Guide -
StorageClass: A StorageClass is required by the chart to provision persistent volumes for PostgreSQL and the shared storage volume
# List available StorageClass
kubectl get storageclass -
System Requirements: Minimum resources per workload:
Component RAM vCPU OLake UI 4 GB 2 Temporal 4 GB 2 OLake Worker 4 GB 2 PostgreSQL 4 GB 2 Sync pod (OLake Go) 8 GB 4 Fusion AMS 8 GB 4 Optimizer 1 GB — Spark executor 4 GB —
Sync pod sizing depends on the data volume to be synced. Use JobID-Based Scheduling to map jobs to nodes with sufficient capacity.
tipNode-class suggestions
OLake will run on mixed capacity, but scheduling the components below on stable, non-preemptible (on-demand) nodes lowers the chance of spot or preemptible churn interrupting core services and compaction work:
- nfs-server (if enabled)
- PostgreSQL (if enabled)
- Temporal
- OLake workers
- OLake UI
- Fusion and Spark optimizer workloads
Quick Start
1. Add OLake Helm Repository
helm repo add olake https://datazip-inc.github.io/olake-helm
helm repo update
2. Install the Chart
- AWS EKS
- Others
Ingestion + Maintenance:
helm install olake olake/olake --set global.storageClass="gp2" --set fusion.enabled=true
Ingestion + Maintenance:
helm install olake olake/olake --set fusion.enabled=true
3. Access OLake UI
Forward the UI service port to local machine:
kubectl port-forward svc/olake-ui 8000:8000
Open browser and navigate to: http://localhost:8000
Default Credentials:
- Username:
admin - Password:
password
If OLake is installed with Ingress enabled, port-forwarding is not necessary. Access the application using the configured Ingress hostname.
With Fusion enabled, follow Configure your first compaction for step-by-step guidance on adding catalogs and scheduling maintenance on the Iceberg tables in the OLake UI.
Configuration Options
The ingestion-focused configurations are documented in Ingestion Installation with Helm.
Customizing Fusion Spark Optimizer Configuration
The Fusion Spark optimizer can be customized by providing additional Spark configurations via fusion.optimizer.spark.extraConfig and other container-specific properties via fusion.optimizer.spark.properties.
For example, to set spark.sql.shuffle.partitions and export.JAVA_HOME:
fusion:
optimizer:
spark:
extraConfig:
spark.sql.shuffle.partitions: "200"
properties:
export.JAVA_HOME: "/usr/lib/jvm/java-17-openjdk"
Refer to the Spark documentation for a comprehensive list of configurable properties for Spark containers.
Compaction Scheduling
Configure fusion.nodeSelector, fusion.tolerations, and fusion.affinity in values.yaml to define scheduling behavior for compaction workloads.
These settings are applied to the Fusion pod and to Spark driver/executor pods created by Fusion. The following example shows the expected values structure:
fusion:
nodeSelector:
workload: "fusion"
tolerations: []
affinity: {}
Updating OLake UI Version
Pull the latest images and restart the deployments without downtime:
# Restart OLake components
kubectl rollout restart deployment/olake-ui
kubectl rollout restart deployment/olake-workers
Initial User Setup
Create a Kubernetes secret to replace default credentials:
kubectl create secret generic olake-admin-credentials \
--from-literal=username='superadmin' \
--from-literal=password='a-very-secure-password' \
--from-literal=email='admin@mycompany.com'
Then configure in values.yaml:
olakeUI:
initUser:
existingSecret: "olake-admin-credentials"
secretKeys:
username: "username"
password: "password"
email: "email"
Apply the configuration:
helm upgrade olake olake/olake -f values.yaml --set fusion.enabled=true
Ingress Configuration
To expose OLake through an ingress controller, create a custom values file:
# values.yaml
olakeUI:
ingress:
enabled: true
className: "nginx"
hosts:
- host: olake.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: olake-tls
hosts:
- olake.example.com
Persistent Storage Configuration
The OLake application components (UI, Worker, and Activity Pods) require a shared ReadWriteMany (RWX) volume for coordinating pipeline state and metadata.
For production, a robust, highly-available RWX-capable storage solution such as AWS EFS, GKE Filestore, or Azure Files must be used. This is achieved by disabling the built-in NFS server and providing an existing Kubernetes StorageClass that is backed by a managed storage service. An example for using StorageClass is given below:
nfsServer:
# 1. The development NFS server is disabled
enabled: false
# 2. An existing ReadWriteMany PersistentVolumeClaim is specified
external:
storageClass: "efs-csi"
For development and quick starts, a simple NFS server is included and enabled by default. This provides an out-of-the-box shared storage solution without any external dependencies. However, because this server runs as a single pod, it represents a single point of failure and is not recommended for production use.
Bottlerocket OS on AWS EKS: The built-in NFS server is incompatible with Bottlerocket OS worker nodes. For such AWS EKS configurations, AWS EFS must be used as an alternative, which requires setting nfsServer.enabled: false and configuring the EFS CSI driver as shown above.
External PostgreSQL Configuration
External PostgreSQL databases can be used instead of the built-in postgresql deployment. It is the primary database for storing job data, configurations, and sync state.
Requirements:
- PostgreSQL 12+ with
btree_ginextension enabled - This can be enabled with
CREATE EXTENSION IF NOT EXISTS btree_gin;, then run\dxto verify if its enabled. - Both OLake and Temporal databases created on the PostgreSQL instance
- Network connectivity from Kubernetes cluster to PostgreSQL instance
There are two ways to configure an external PostgreSQL database:
Option 1: Using existingSecret
Reference a pre-existing Kubernetes Secret containing the database credentials. The secret must be created manually before installing the chart.
1. Create the database secret:
kubectl create secret generic external-postgres-secret \
--from-literal=host="postgres-host" \
--from-literal=port="5432" \
--from-literal=olake_database="olakeDB" \
--from-literal=temporal_database="temporalDB" \
--from-literal=username="username" \
--from-literal=password="password" \
--from-literal=ssl_mode="require"
2. Configure values.yaml:
postgresql:
enabled: false
external:
existingSecret: "external-postgres-secret"
Option 2: Using properties (Recommended for ArgoCD/GitOps)
Specify the database connection details directly in values.yaml. The chart automatically creates a Kubernetes Secret from these values at template time. This approach is fully compatible with ArgoCD and other GitOps tools that use helm template for rendering.
postgresql:
enabled: false
external:
properties:
host: "postgres-host"
port: 5432
username: "username"
password: "password"
olake_database: "olakeDB"
temporal_database: "temporalDB"
ssl_mode: "require"
Global Environment Variables
Environment variables defined in global.env are automatically propagated to OLake UI, OLake Workers, and Activity Pods:
global:
env:
OLAKE_SECRET_KEY: "your-secret-encryption-key"
RUN_MODE: "production"
# Add any custom environment variables here
Private Container Registry
For deployments in air-gapped environments or clusters without access to public registries (Docker Hub, registry.k8s.io), all container images can be pulled from a private registry by setting CONTAINER_REGISTRY_BASE in global.env:
global:
env:
CONTAINER_REGISTRY_BASE: "1234567890123.dkr.ecr.us-east-1.amazonaws.com/dockerhub_mirror"
When set, all container images are automatically prefixed with this registry base — no additional image.repository overrides are needed. If left unset, images are pulled from Docker Hub (registry-1.docker.io) by default.
Ensure the following images are mirrored to the private registry (CONTAINER_REGISTRY_BASE/...) before deploying:
library/busybox:latestcurlimages/curl:8.1.2olakego/ui,olakego/ui-workerolakego/fusion,olakego/fusion-sparkolakego/source-*(connector images, example1234567890123.dkr.ecr.us-east-1.amazonaws.com/dockerhub_mirror/olakego/source-mysql:v0.4.0)temporalio/auto-setup:1.22.3,temporalio/ui:2.16.2library/postgres:14-alpinesig-storage/nfs-provisioner:v4.0.8(built-in NFS server; sourced fromregistry.k8s.io, not Docker Hub)- When using a private container registry (e.g., Amazon ECR, Google Artifact Registry, Azure ACR), the
olake-uipod requires permissions to list repositories and image tags in order to discover available source connectors. To grant access, either:- Pass registry credentials as environment variables under
olakeUI.env(e.g.,AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEYfor ECR), or - Attach the required read-only registry permissions to the Cloud Provider's IAM role referenced by
global.jobServiceAccount.
- Pass registry credentials as environment variables under
Security Context Configuration
OLake supports configuring securityContext for all its components to comply with restricted Kubernetes environments.
General Component Configuration
The securityContext can be set for olakeUI, olakeWorker, postgresql and temporal in values.yaml. All the pods created by olakeWorker, for example sync pods inherit the same securityContext.
For eg,
olakeWorker:
securityContext:
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
runAsNonRoot: true
- PostgreSQL (UID 70 Requirement):
The default PostgreSQL image (Alpine-based) used by OLake requires a specific UID/GID to function correctly. If enforcing
runAsNonRoot: true, the UID, GID, and FSGroup must be set to 70.
postgresql:
securityContext:
runAsUser: 70
runAsGroup: 70
fsGroup: 70
runAsNonRoot: true
- OLake doesnot support setting securityContext for
nfsServer.
Upgrading Chart Version
Currently ES Stack does not have an option to upgrade to maintenance Stack, To use maintenance feature upgrade legacy to latest stack first.
Upgrade to Latest Version
helm repo update
helm upgrade olake olake/olake --set fusion.enabled=true
Upgrade with New Configuration
helm upgrade olake olake/olake -f new-values.yaml --set fusion.enabled=true
For conflict free compaction of OLake-Ingested tables, upgrade OLake Go (Ingestion) driver version to v0.7.0 or higher.
Post-Upgrade: Resume Stopped Syncs
After upgrading, perform a rollout restart of the olake-workers:
kubectl rollout restart deployment/olake-workers
Troubleshooting
Check Pod Logs
# OLake UI logs
kubectl logs -l app.kubernetes.io/name=olake-ui -f
# Fusion logs
kubectl logs -l app.kubernetes.io/component=fusion -f
# Temporal server logs
kubectl logs -l app.kubernetes.io/name=temporal-server -f
Common Issues
Pods Stuck in Pending State:
- Check if your cluster has sufficient resources
- Check node selectors or affinity rules
- Verify StorageClass is available and configured correctly
Pods Stuck in CrashLoopBackOff State:
- If pod logs show error like
failed to ping database, check theDatabase Connection Issuessection below - If pod restarts and goes into CrashLoopBackOff state, check for the resource requests and limits defined in Helm Values file
- If pod events show error like
failed to mount volume, check if the nfs-server pod is up and running
Database Connection Issues:
- Verify PostgreSQL pod is running:
kubectl get pods -l app.kubernetes.io/name=postgresql - In case, setup is done with External PostgreSQL Configuration, check the following:
- Check if the host is pointing to Writer instance and not a Reader of the database
- Check if password contains special characters like
@or#etc. If yes, use a different password - Check if
ssl_modeis correctly set in the kubernetes secret for external database
- Check database connectivity from other pods
- Review database credentials in secrets
Migration Guides
Migrating to v0.0.12
When upgrading from a previous version, the olake-signup-init Job must be deleted before running helm upgrade. Kubernetes does not allow modifications to a Job's pod template, and the updated image references in this version will cause the upgrade to fail.
kubectl delete job olake-signup-init -n olake
helm upgrade olake olake/olake --set fusion.enabled=true
Migrating to v0.0.7 (Standard Resources)
Version 0.0.7 introduced a significant change to how ServiceAccount, RBAC, and Secret resources are managed. By default, useStandardResources is now set to true, which converts these from Helm Hooks to standard resources. This improved compatibility with ArgoCD and prevents race conditions during updates.
For New Installations:
No action needed. The new default (true) is the recommended configuration.
For Existing Installations:
Upgrading directly may cause resource already exists errors because Helm tries to adopt resources that were previously created by hooks.
Option 1: Maintain Legacy Behavior (Easiest)
Set the flag to false in custom values.yaml to keep the old hook-based behavior:
useStandardResources: false
Option 2: Migrate to Standard Resources (Recommended) To adopt the new behavior, manually remove the hook annotations and label the resources for Helm adoption before upgrading:
# 1. ServiceAccount
kubectl annotate serviceaccount olake-workers meta.helm.sh/release-name=olake meta.helm.sh/release-namespace=olake helm.sh/hook- helm.sh/hook-weight- helm.sh/hook-delete-policy- -n olake --overwrite
kubectl label serviceaccount olake-workers app.kubernetes.io/managed-by=Helm -n olake --overwrite
# 2. Role
kubectl annotate role olake-workers meta.helm.sh/release-name=olake meta.helm.sh/release-namespace=olake helm.sh/hook- helm.sh/hook-weight- helm.sh/hook-delete-policy- -n olake --overwrite
kubectl label role olake-workers app.kubernetes.io/managed-by=Helm -n olake --overwrite
# 3. RoleBinding
kubectl annotate rolebinding olake-workers meta.helm.sh/release-name=olake meta.helm.sh/release-namespace=olake helm.sh/hook- helm.sh/hook-weight- helm.sh/hook-delete-policy- -n olake --overwrite
kubectl label rolebinding olake-workers app.kubernetes.io/managed-by=Helm -n olake --overwrite
# 4. Secret
kubectl annotate secret olake-workers-secret meta.helm.sh/release-name=olake meta.helm.sh/release-namespace=olake helm.sh/hook- helm.sh/hook-weight- helm.sh/hook-delete-policy- -n olake --overwrite
kubectl label secret olake-workers-secret app.kubernetes.io/managed-by=Helm -n olake --overwrite
# 5. Perform the upgrade
helm upgrade olake olake/olake --set fusion.enabled=true
Uninstallation
Remove OLake Installation
helm uninstall olake
Some resources are intentionally preserved after helm uninstall to prevent accidental data loss:
- PersistentVolumeClaims (PVCs): olake-shared-storage and database PVCs are retained to preserve job data, configurations, and historical information
- NFS Server Resources: If installed using the built-in NFS server, the following resources persist:
- Service/olake-nfs-server
- StatefulSet/olake-nfs-server
- ClusterRole/olake-nfs-server
- ClusterRoleBinding/olake-nfs-server
- StorageClass/nfs-server
- ServiceAccount/olake-nfs-server