Skip to main content

Setup OLake Sync as a Kubernetes CronJob

This guide details the process for deploying and managing an OLake data synchronization task as a scheduled Kubernetes CronJob. This configuration uses standard Kubernetes resources (ConfigMaps, PVC, CronJob) to automate recurring OLake sync operations using a pre-generated streams file.

Prerequisites

Ensure the following requirements are met before proceeding:

  1. Kubernetes Cluster Access: Administrative access to a Kubernetes cluster.
  2. kubectl: Configured kubectl command-line tool. Installation Guide.
  3. Pre-generated OLake Stream file (streams.json): This setup requires a streams.json generated beforehand using the OLake discover command against your source database.
    • Stream Generation Guides:
    • The content of this file will be placed within the cm_olake-streams-config.yaml ConfigMap.
  4. Kubernetes Namespace: The manifests default to the olake namespace. Create it (kubectl create namespace olake) or update the namespace fields in all YAML files if using a different target.
  5. Node Labels (Optional): If using nodeAffinity in cronjob_olake.yaml, ensure target nodes possess the specified labels.

Configuration Steps

This setup requires several Kubernetes manifest files. Download these files first, then customize them for your specific environment.

1. Download Manifest Files

Use the links below (or curl) to download the necessary YAML files:

  • Source ConfigMap:

    • Download cm_olake-source-config.yaml
      curl -Lo cm_olake-source-config.yaml https://raw.githubusercontent.com/datazip-inc/olake-docs/refs/heads/master/kubernetes/cm_olake-source-config.yaml
  • Destination ConfigMap:

  • Streams ConfigMap:

    • Download cm_olake-streams-config.yaml
      curl -Lo cm_olake-streams-config.yaml https://raw.githubusercontent.com/datazip-inc/olake-docs/refs/heads/master/kubernetes/cm_olake-streams-config.yaml
  • CronJob & PVC Manifest:

    • Download cronjob_olake.yaml
      curl -Lo cronjob_olake.yaml https://raw.githubusercontent.com/datazip-inc/olake-docs/refs/heads/master/kubernetes/cronjob_olake.yaml

2. Customize Downloaded Files

Edit the downloaded files with your specific configurations:

  • cm_olake-source-config.yaml: Update source.json with your source connection parameters.
  • cm_olake-destination-config.yaml: Update destination.json with your destination configuration.
  • cm_olake-streams-config.yaml: Replace the content under streams.json with your complete, pre-generated OLake streams JSON.
  • cronjob_olake.yaml:
    • Set metadata.namespace / spec.jobTemplate.metadata.namespace / PVC metadata.namespace (default olake).
    • Configure spec.schedule. Refer for schedule syntax: CronTab Guru
    • Set spec.suspend to false to enable the schedule.
    • Update spec.jobTemplate.spec.template.spec.containers[0].image with the correct OLake source image (e.g., olakego/source-mongodb:latest). See OLake Docker Hub Images.
    • Configure nodeAffinity (optional) under spec.jobTemplate.spec.template.spec.affinity.
    • Adjust resources.requests and resources.limits for the main OLake container.
    • Crucially, set the correct storageClassName in the PersistentVolumeClaim definition at the end of the file. This StorageClass must support the ReadWriteMany access mode specified in the PVC.

Deployment Procedure

Apply the configured manifests to your Kubernetes cluster:

  1. Apply ConfigMaps:

    kubectl apply -f cm_olake-source-config.yaml -n olake
    kubectl apply -f cm_olake-destination-config.yaml -n olake
    kubectl apply -f cm_olake-streams-config.yaml -n olake
  2. Apply CronJob and PVC:

    kubectl apply -f cronjob_olake.yaml -n olake
  3. Verify PVC Status:

    kubectl get pvc olake-config-pvc -n olake

    (Ensure STATUS is Bound. Troubleshoot StorageClass or provisioner if Pending).

  4. Verify CronJob Status:

    kubectl get cronjob olake-mongodb-sync -n olake

Manual Job Execution and Control

  • Trigger Manually: Create a Job instance immediately from the CronJob template. This is useful for testing or running outside the defined schedule.

    kubectl create job --from=cronjob/olake-mongodb-sync manual-olake-sync-$(date +%s) -n olake

    Monitor the manually created job using the commands in the Monitoring section below.

  • Suspend CronJob: Prevent the CronJob from creating new jobs based on the schedule. Does not affect running jobs.

    kubectl patch cronjob olake-mongodb-sync -n olake -p '{"spec":{"suspend":true}}'
  • Unsuspend CronJob: Re-enable the CronJob's schedule.

    kubectl patch cronjob olake-mongodb-sync -n olake -p '{"spec":{"suspend":false}}'

Cleanup

To remove the deployed resources:

  1. Delete CronJob: kubectl delete cronjob olake-mongodb-sync -n olake
  2. Delete PVC: kubectl delete pvc olake-config-pvc -n olake (Caution: Deletes persisted data).
  3. Delete ConfigMaps:
    kubectl delete configmap olake-source-config -n olake
    kubectl delete configmap olake-destination-config -n olake
    kubectl delete configmap olake-streams-config -n olake

Support


Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

  • Email Support: Reach out to our team at hello@olake.io for prompt assistance.
  • Join our Slack Community: where we discuss future roadmaps, discuss bugs, help folks to debug issues they are facing and more.
  • Schedule a Call: If you prefer a one-on-one conversation, schedule a call with our CTO and team.

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!