Setup OLake Sync as a Kubernetes CronJob
This guide details the process for deploying and managing an OLake data synchronization task as a scheduled Kubernetes CronJob. This configuration uses standard Kubernetes resources (ConfigMaps, PVC, CronJob) to automate recurring OLake sync
operations using a pre-generated streams file.
Prerequisites​
Ensure the following requirements are met before proceeding:
- Kubernetes Cluster Access: Administrative access to a Kubernetes cluster.
kubectl
: Configuredkubectl
command-line tool. Installation Guide.- Pre-generated OLake Stream file (
streams.json
): This setup requires astreams.json
generated beforehand using the OLakediscover
command against your source database. - Kubernetes Namespace: The manifests default to the
olake
namespace. Create it (kubectl create namespace olake
) or update thenamespace
fields in all YAML files if using a different target. - Node Labels (Optional): If using
nodeAffinity
incronjob_olake.yaml
, ensure target nodes possess the specified labels.
Configuration Steps​
This setup requires several Kubernetes manifest files. Download these files first, then customize them for your specific environment.
1. Download Manifest Files
Use the links below (or curl
) to download the necessary YAML files:
-
Source ConfigMap:
- Download
cm_olake-source-config.yaml
curl -Lo cm_olake-source-config.yaml https://raw.githubusercontent.com/datazip-inc/olake-docs/refs/heads/master/kubernetes/cm_olake-source-config.yaml
- Download
-
Destination ConfigMap:
- Download
cm_olake-destination-config.yaml
curl -Lo cm_olake-destination-config.yaml https://raw.githubusercontent.com/datazip-inc/olake-docs/refs/heads/master/kubernetes/cm_olake-destination-config.yaml
- Download
-
Streams ConfigMap:
- Download
cm_olake-streams-config.yaml
curl -Lo cm_olake-streams-config.yaml https://raw.githubusercontent.com/datazip-inc/olake-docs/refs/heads/master/kubernetes/cm_olake-streams-config.yaml
- Download
-
CronJob & PVC Manifest:
- Download
cronjob_olake.yaml
curl -Lo cronjob_olake.yaml https://raw.githubusercontent.com/datazip-inc/olake-docs/refs/heads/master/kubernetes/cronjob_olake.yaml
- Download
2. Customize Downloaded Files
Edit the downloaded files with your specific configurations:
cm_olake-source-config.yaml
: Updatesource.json
with your source connection parameters.cm_olake-destination-config.yaml
: Updatedestination.json
with your destination configuration.cm_olake-streams-config.yaml
: Replace the content understreams.json
with your complete, pre-generated OLake streams JSON.cronjob_olake.yaml
:- Set
metadata.namespace
/spec.jobTemplate.metadata.namespace
/ PVCmetadata.namespace
(defaultolake
). - Configure
spec.schedule
. Refer for schedule syntax: CronTab Guru - Set
spec.suspend
tofalse
to enable the schedule. - Update
spec.jobTemplate.spec.template.spec.containers[0].image
with the correct OLake source image (e.g.,olakego/source-mongodb:latest
). See OLake Docker Hub Images. - Configure
nodeAffinity
(optional) underspec.jobTemplate.spec.template.spec.affinity
. - Adjust
resources.requests
andresources.limits
for the main OLake container. - Crucially, set the correct
storageClassName
in thePersistentVolumeClaim
definition at the end of the file. This StorageClass must support theReadWriteMany
access mode specified in the PVC.
- Set
Deployment Procedure​
Apply the configured manifests to your Kubernetes cluster:
-
Apply ConfigMaps:
kubectl apply -f cm_olake-source-config.yaml -n olake
kubectl apply -f cm_olake-destination-config.yaml -n olake
kubectl apply -f cm_olake-streams-config.yaml -n olake -
Apply CronJob and PVC:
kubectl apply -f cronjob_olake.yaml -n olake
-
Verify PVC Status:
kubectl get pvc olake-config-pvc -n olake
(Ensure
STATUS
isBound
. Troubleshoot StorageClass or provisioner ifPending
). -
Verify CronJob Status:
kubectl get cronjob olake-mongodb-sync -n olake
Manual Job Execution and Control​
-
Trigger Manually: Create a Job instance immediately from the CronJob template. This is useful for testing or running outside the defined schedule.
kubectl create job --from=cronjob/olake-mongodb-sync manual-olake-sync-$(date +%s) -n olake
Monitor the manually created job using the commands in the Monitoring section below.
-
Suspend CronJob: Prevent the CronJob from creating new jobs based on the schedule. Does not affect running jobs.
kubectl patch cronjob olake-mongodb-sync -n olake -p '{"spec":{"suspend":true}}'
-
Unsuspend CronJob: Re-enable the CronJob's schedule.
kubectl patch cronjob olake-mongodb-sync -n olake -p '{"spec":{"suspend":false}}'
Cleanup​
To remove the deployed resources:
- Delete CronJob:
kubectl delete cronjob olake-mongodb-sync -n olake
- Delete PVC:
kubectl delete pvc olake-config-pvc -n olake
(Caution: Deletes persisted data). - Delete ConfigMaps:
kubectl delete configmap olake-source-config -n olake
kubectl delete configmap olake-destination-config -n olake
kubectl delete configmap olake-streams-config -n olake
Support​
- Email:
hello@olake.io
- Join Slack Community
- Schedule a Call