Setup OLake Sync as a Kubernetes CronJob
This guide details the process for deploying and managing an OLake data synchronization task as a scheduled Kubernetes CronJob. This configuration uses standard Kubernetes resources (ConfigMaps, PVC, CronJob) to automate recurring OLake sync
operations using a pre-generated streams file.
Prerequisites
Ensure the following requirements are met before proceeding:
- Kubernetes Cluster Access: Administrative access to a Kubernetes cluster.
kubectl
: Configuredkubectl
command-line tool. Installation Guide.- Pre-generated OLake Stream file (
streams.json
): This setup requires astreams.json
generated beforehand using the OLakediscover
command against your source database. - Kubernetes Namespace: The manifests default to the
olake
namespace. Create it (kubectl create namespace olake
) or update thenamespace
fields in all YAML files if using a different target. - Node Labels (Optional): If using
nodeAffinity
incronjob_olake.yaml
, ensure target nodes possess the specified labels.
Configuration Steps
This setup requires several Kubernetes manifest files. Download these files first, then customize them for your specific environment.
1. Download Manifest Files
Use the links below (or curl
) to download the necessary YAML files:
-
Source ConfigMap:
- Download
cm_olake-source-config.yaml
curl -Lo cm_olake-source-config.yaml https://raw.githubusercontent.com/datazip-inc/olake-docs/refs/heads/master/kubernetes/cm_olake-source-config.yaml
- Download
-
Destination ConfigMap:
- Download
cm_olake-destination-config.yaml
curl -Lo cm_olake-destination-config.yaml https://raw.githubusercontent.com/datazip-inc/olake-docs/refs/heads/master/kubernetes/cm_olake-destination-config.yaml
- Download
-
Streams ConfigMap:
- Download
cm_olake-streams-config.yaml
curl -Lo cm_olake-streams-config.yaml https://raw.githubusercontent.com/datazip-inc/olake-docs/refs/heads/master/kubernetes/cm_olake-streams-config.yaml
- Download
-
CronJob & PVC Manifest:
- Download
cronjob_olake.yaml
curl -Lo cronjob_olake.yaml https://raw.githubusercontent.com/datazip-inc/olake-docs/refs/heads/master/kubernetes/cronjob_olake.yaml
- Download
2. Customize Downloaded Files
Edit the downloaded files with your specific configurations:
cm_olake-source-config.yaml
: Updatesource.json
with your source connection parameters.cm_olake-destination-config.yaml
: Updatedestination.json
with your destination configuration.cm_olake-streams-config.yaml
: Replace the content understreams.json
with your complete, pre-generated OLake streams JSON.cronjob_olake.yaml
:- Set
metadata.namespace
/spec.jobTemplate.metadata.namespace
/ PVCmetadata.namespace
(defaultolake
). - Configure
spec.schedule
. Refer for schedule syntax: CronTab Guru - Set
spec.suspend
tofalse
to enable the schedule. - Update
spec.jobTemplate.spec.template.spec.containers[0].image
with the correct OLake source image (e.g.,olakego/source-mongodb:latest
). See OLake Docker Hub Images. - Configure
nodeAffinity
(optional) underspec.jobTemplate.spec.template.spec.affinity
. - Adjust
resources.requests
andresources.limits
for the main OLake container. - Crucially, set the correct
storageClassName
in thePersistentVolumeClaim
definition at the end of the file. This StorageClass must support theReadWriteMany
access mode specified in the PVC.
- Set
Deployment Procedure
Apply the configured manifests to your Kubernetes cluster:
-
Apply ConfigMaps:
kubectl apply -f cm_olake-source-config.yaml -n olake
kubectl apply -f cm_olake-destination-config.yaml -n olake
kubectl apply -f cm_olake-streams-config.yaml -n olake -
Apply CronJob and PVC:
kubectl apply -f cronjob_olake.yaml -n olake
-
Verify PVC Status:
kubectl get pvc olake-config-pvc -n olake
(Ensure
STATUS
isBound
. Troubleshoot StorageClass or provisioner ifPending
). -
Verify CronJob Status:
kubectl get cronjob olake-mongodb-sync -n olake
Manual Job Execution and Control
-
Trigger Manually: Create a Job instance immediately from the CronJob template. This is useful for testing or running outside the defined schedule.
kubectl create job --from=cronjob/olake-mongodb-sync manual-olake-sync-$(date +%s) -n olake
Monitor the manually created job using the commands in the Monitoring section below.
-
Suspend CronJob: Prevent the CronJob from creating new jobs based on the schedule. Does not affect running jobs.
kubectl patch cronjob olake-mongodb-sync -n olake -p '{"spec":{"suspend":true}}'
-
Unsuspend CronJob: Re-enable the CronJob's schedule.
kubectl patch cronjob olake-mongodb-sync -n olake -p '{"spec":{"suspend":false}}'
Cleanup
To remove the deployed resources:
- Delete CronJob:
kubectl delete cronjob olake-mongodb-sync -n olake
- Delete PVC:
kubectl delete pvc olake-config-pvc -n olake
(Caution: Deletes persisted data). - Delete ConfigMaps:
kubectl delete configmap olake-source-config -n olake
kubectl delete configmap olake-destination-config -n olake
kubectl delete configmap olake-streams-config -n olake
Support
- Email:
hello@olake.io
- Join Slack Community
- Schedule a Call