Skip to main content

Learning Modules

Before jumping in and setting up a development environment and contributing to OLake, if you have limited knowledge of the required technologies and tools, this is the section where you should learn everything important that's required. This page hosts learning modules covering essential concepts and technologies that will help you get started with OLake development.


Required Learning Modules

These modules are mandatory and should be completed before you start contributing to OLake. They are listed in the recommended order of priority.

1. Golang (Go)

OLake is primarily written in Golang, so having a solid grasp of Go makes it much easier to understand the codebase, contribute features, and debug issues.

  • If you prefer videos: Watch this YouTube playlist for a step‑by‑step Go learning path (from basics to advanced topics):

  • If you prefer reading and already know basic programming concepts: Follow this structured Go tutorial series:

Focus on: Need inputs from interns


2. Docker

OLake heavily uses Docker for local development and the Docker CLI workflow. Understanding Docker helps you:

  • Run OLake containers correctly
  • Understand volumes, networks, and images used in the examples
  • Debug environment‑related issues

Recommended video:


3. ETL and ELT

OLake is fundamentally a data migration / ETL tool, so understanding ETL/ELT concepts is critical:

  • What it means to extract, transform, and load data
  • How data flows from operational systems into warehouses or lakehouses
  • Why transformation might happen before (ETL) or after loading (ELT)

Recommended reading:


4. Apache Iceberg

OLake writes data into Apache Iceberg tables, which is the core table format powering many OLake destinations. Understanding Iceberg helps you reason about:

  • How OLake writes snapshots, data files, and metadata
  • How schema evolution and partitioning work
  • Why Iceberg is chosen over traditional table formats

Recommended video:

Key concepts to grasp:

  • Snapshots and time‑travel
  • Partitioning
  • Manifest files and metadata

5. OLake Docs

Familiarizing yourself with OLake's official documentation is essential for understanding how to set up, configure, and work with OLake effectively. The documentation provides comprehensive guides on:

  • Setting up a development environment
  • Understanding OLake's architecture and data flow
  • Configuring sources, destinations, and sync modes
  • Using OLake CLI commands and flags
  • Debugging and troubleshooting

Recommended starting point:


6. Change Data Capture (CDC)

OLake supports CDC (Change Data Capture) and Incremental syncs. Knowing CDC concepts helps you understand:

  • How OLake tracks inserts, updates, and deletes over time
  • Why resume tokens, log positions, and state files (state.json) are important
  • How incremental syncs differ from full refreshes

Recommended reading:

Focus on:

  • The high‑level CDC flow (source logs → captured changes → target)
  • Common CDC implementation patterns (log‑based CDC, triggers, etc.)

Additional Resources (Optional)

The following tutorials and resources are not mandatory but are recommended for a deeper understanding of related technologies and concepts that can enhance your OLake development experience.

1. Data Lakehouse Architecture

Understanding the data lakehouse concept helps you appreciate how OLake fits into modern data architectures. A lakehouse combines the best of data lakes and data warehouses, enabling both structured and unstructured data processing.

Recommended video:


2. Apache Parquet File Format

OLake supports Parquet as a destination format. Understanding Parquet helps you understand:

  • How columnar storage works and why it's efficient for analytics
  • How OLake writes data in Parquet format
  • The relationship between Parquet and Iceberg (Iceberg can use Parquet files)

Recommended video:


3. PostgreSQL Fundamentals

While not required if you're only working with other sources, understanding PostgreSQL is valuable since it's one of the most commonly used sources with OLake. This knowledge helps you:

  • Understand source database concepts and structures
  • Better configure PostgreSQL connections and CDC settings
  • Debug source-related issues

Recommended reading:



💡 Join the OLake Community!

Got questions, ideas, or just want to connect with other data engineers?
👉 Join our Slack Community to get real-time support, share feedback, and shape the future of OLake together. 🚀

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!