Setup OLake Sync as a Kubernetes CronJob
This guide details the process for deploying and managing an OLake data synchronization task as a scheduled Kubernetes CronJob. This configuration uses standard Kubernetes resources (ConfigMaps, PVC, CronJob) to automate recurring OLake sync operations using a pre-generated streams file.
Prerequisitesβ
Ensure the following requirements are met before proceeding:
- Kubernetes Cluster Access: Administrative access to a Kubernetes cluster.
kubectl: Configuredkubectlcommand-line tool. Installation Guide.- Pre-generated OLake Stream file (
streams.json): This setup requires astreams.jsongenerated beforehand using the OLakediscovercommand against your source database.- Stream Generation Guides:
- The content of this file will be placed within the
cm_olake-streams-config.yamlConfigMap.
- Kubernetes Namespace: The manifests default to the
olakenamespace. Create it (kubectl create namespace olake) or update thenamespacefields in all YAML files if using a different target. - Node Labels (Optional): If using
nodeAffinityincronjob_olake.yaml, ensure target nodes possess the specified labels.
Configuration Stepsβ
This setup requires several Kubernetes manifest files. Download these files first, then customize them for your specific environment.
1. Download Manifest Files
Use the links below (or curl) to download the necessary YAML files:
-
Source ConfigMap:
- Download
cm_olake-source-config.yamlcurl -Lo cm_olake-source-config.yaml https://raw.githubusercontent.com/datazip-inc/olake-docs/refs/heads/master/kubernetes/cm_olake-source-config.yaml
- Download
-
Destination ConfigMap:
- Download
cm_olake-destination-config.yamlcurl -Lo cm_olake-destination-config.yaml https://raw.githubusercontent.com/datazip-inc/olake-docs/refs/heads/master/kubernetes/cm_olake-destination-config.yaml
- Download
-
Streams ConfigMap:
- Download
cm_olake-streams-config.yamlcurl -Lo cm_olake-streams-config.yaml https://raw.githubusercontent.com/datazip-inc/olake-docs/refs/heads/master/kubernetes/cm_olake-streams-config.yaml
- Download
-
CronJob & PVC Manifest:
- Download
cronjob_olake.yamlcurl -Lo cronjob_olake.yaml https://raw.githubusercontent.com/datazip-inc/olake-docs/refs/heads/master/kubernetes/cronjob_olake.yaml
- Download
2. Customize Downloaded Files
Edit the downloaded files with your specific configurations:
cm_olake-source-config.yaml: Updatesource.jsonwith your source connection parameters.cm_olake-destination-config.yaml: Updatedestination.jsonwith your destination configuration.cm_olake-streams-config.yaml: Replace the content understreams.jsonwith your complete, pre-generated OLake streams JSON.cronjob_olake.yaml:- Set
metadata.namespace/spec.jobTemplate.metadata.namespace/ PVCmetadata.namespace(defaultolake). - Configure
spec.schedule. Refer for schedule syntax: CronTab Guru - Set
spec.suspendtofalseto enable the schedule. - Update
spec.jobTemplate.spec.template.spec.containers[0].imagewith the correct OLake source image (e.g.,olakego/source-mongodb:latest). See OLake Docker Hub Images. - Configure
nodeAffinity(optional) underspec.jobTemplate.spec.template.spec.affinity. - Adjust
resources.requestsandresources.limitsfor the main OLake container. - Crucially, set the correct
storageClassNamein thePersistentVolumeClaimdefinition at the end of the file. This StorageClass must support theReadWriteManyaccess mode specified in the PVC.
- Set
Deployment Procedureβ
Apply the configured manifests to your Kubernetes cluster:
-
Apply ConfigMaps:
kubectl apply -f cm_olake-source-config.yaml -n olake
kubectl apply -f cm_olake-destination-config.yaml -n olake
kubectl apply -f cm_olake-streams-config.yaml -n olake -
Apply CronJob and PVC:
kubectl apply -f cronjob_olake.yaml -n olake -
Verify PVC Status:
kubectl get pvc olake-config-pvc -n olake(Ensure
STATUSisBound. Troubleshoot StorageClass or provisioner ifPending). -
Verify CronJob Status:
kubectl get cronjob olake-mongodb-sync -n olake
Manual Job Execution and Controlβ
-
Trigger Manually: Create a Job instance immediately from the CronJob template. This is useful for testing or running outside the defined schedule.
kubectl create job --from=cronjob/olake-mongodb-sync manual-olake-sync-$(date +%s) -n olakeMonitor the manually created job using the commands in the Monitoring section below.
-
Suspend CronJob: Prevent the CronJob from creating new jobs based on the schedule. Does not affect running jobs.
kubectl patch cronjob olake-mongodb-sync -n olake -p '{"spec":{"suspend":true}}' -
Unsuspend CronJob: Re-enable the CronJob's schedule.
kubectl patch cronjob olake-mongodb-sync -n olake -p '{"spec":{"suspend":false}}'
Cleanupβ
To remove the deployed resources:
- Delete CronJob:
kubectl delete cronjob olake-mongodb-sync -n olake - Delete PVC:
kubectl delete pvc olake-config-pvc -n olake(Caution: Deletes persisted data). - Delete ConfigMaps:
kubectl delete configmap olake-source-config -n olake
kubectl delete configmap olake-destination-config -n olake
kubectl delete configmap olake-streams-config -n olake