Skip to main content

1. Create Jobs

You can create a job by:

  1. Clicking on the "Jobs" tab in the OLake UI.
  2. and then clicking on the "Create Job" button.

olake-job-1

2. Configure the Source

Now, you need to setup your source configuration:

Enter the source configuration details in the provided fields.

Here’s a clearer and more polished version:

You can refer to the source documentation for detailed setup instructions. Alternatively, documentation specific to your source type will appear on the right side of the screen.

olake-job-2

Click on the "Next" button to proceed to the destination configuration.

OLake will test the source connection and display the results. If the connection is successful, you will see a success message. If there are any issues, OLake will provide error messages to help you troubleshoot.

olake-job-3

info

You can either create a new source configuration or select an existing source by toggling the radio button at top. If you choose to create a new source, you will need to provide the necessary details for the source configuration.

3. Configure the Destination

Now you need to setup your destination configuration.

Enter the destination configuration details in the provided fields. You can pick from:

Destination:

Catalog (for Apache Iceberg as destination):

olake-job-4

You can refer to the Destination documentation for detailed setup instructions. Alternatively, documentation specific to your source type will appear on the right side of the screen.

OLake will test the destination connection and display the results. If the connection is successful, you will see a success message. If there are any issues, OLake will provide error messages to help you troubleshoot.

info

You can either create a new destination configuration or select an existing destination by toggling the radio button at top. If you choose to create a new source, you will need to provide the necessary details for the source configuration.

4. Stream Selection

Select the stream you want to sync for the job. You can choose from the available streams you want to sync data from.

By default all streams are pre-selected, but you can uncheck the ones you don't want to include in the job.

olake-job-5

For each stream, you have the follow configurations options:

Under Config, you have:

  • Sync Mode: Choose between Full Refresh or Change Data Capture (CDC).
  • Enable backfill: If you want to backfill the data, you can enable this option. This will sync all historical data from the source to the destination. This option will be OFF by default, and you can enable it only if the sync mode is set to CDC.
  • Normalize: If you want to normalize the data, you can enable this option. This will convert the data into a normalized format (Level 1 flattening) before syncing it to the destination.

olake-job-7

Other stream filtering options include:

  • All tables: View all tables from the source.
  • Change Data Capture (CDC): View tables that have CDC enabled.
  • Full Refresh: View tables that are not part of CDC.
  • Selected Tables: Allows you to view specific tables that the user selected from the source.
  • Not Selected Tables: Shows you the tables that are not part of the selected tables.

olake-job-6

Under Schema, you can see all the column and their data types. You can also see the primary key and unique key columns for the stream that will later help you to define partition_regex for the destination table.

olake-job-8

Under Partitioning, you can define how the data should be partitioned. There are 2 types of data partitioning depending upon your destination type.

  1. For Data Partitioning in the Apache Iceberg table, follow the Iceberg Partitioning guide for detailed information.
  2. For Data Partitioning in AWS S3, follow the AWS S3 Partitioning guide for detailed information.

olake-job-9

5. Job Configuration

Under the Job Configuration section, you can set the following options:

  1. Job Name: Enter a name for your job.
  2. Replication Frequency: Choose how often you want the job to run. You can select from the available options or set a custom frequency.
  3. Replication Frequency Value: Select the value for the replication frequency. This will determine how often the job will run:
    • Minutes: Runs every minute.
    • Hourly: Runs every hour.
    • Days: Runs every day.
    • Weeks: Runs every week.
    • Months: Runs every week.
    • Years: Runs every week.

olake-job-10

Once the job configuration is complete, click on the "Create Job" button to create the job.

info

You can also set the job to run immediately after creation by clicking on the Sync Now button. This will start the job immediately and sync the data from the source to the destination.

olake-job-11


Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

  • Email Support: Reach out to our team at hello@olake.io for prompt assistance.
  • Join our Slack Community: where we discuss future roadmaps, discuss bugs, help folks to debug issues they are facing and more.
  • Schedule a Call: If you prefer a one-on-one conversation, schedule a call with our CTO and team.

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!