Learning Modules

Last updated:2/16/2026|... min read

Learning Modules

Before jumping in and setting up a development environment and contributing to OLake, if you have limited knowledge of the required technologies and tools, this is the section where you should learn everything important that's required. This page hosts learning modules covering essential concepts and technologies that will help you get started with OLake development.

Required Learning Modules

These modules are mandatory and should be completed before you start contributing to OLake. They are listed in the recommended order of priority.

1. Golang (Go)

OLake is primarily written in Golang, so having a solid grasp of Go makes it much easier to understand the codebase, contribute features, and debug issues.

If you prefer videos: Watch this YouTube playlist for a step‑by‑step Go learning path (from basics to advanced topics):
- Golang Video Playlist
If you prefer reading and already know basic programming concepts: Follow this structured Go tutorial series:
- Go Language Tutorial – GeeksforGeeks

Focus on: Need inputs from interns

2. Docker

OLake heavily uses Docker for local development and the Docker CLI workflow. Understanding Docker helps you:

Run OLake containers correctly
Understand volumes, networks, and images used in the examples
Debug environment‑related issues

Recommended video:

Docker Tutorial for Beginners

3. ETL and ELT

OLake is fundamentally a data migration / ETL tool, so understanding ETL/ELT concepts is critical:

What it means to extract, transform, and load data
How data flows from operational systems into warehouses or lakehouses
Why transformation might happen before (ETL) or after loading (ELT)

4. Apache Iceberg

OLake writes data into Apache Iceberg tables, which is the core table format powering many OLake destinations. Understanding Iceberg helps you reason about:

How OLake writes snapshots, data files, and metadata
How schema evolution and partitioning work
Why Iceberg is chosen over traditional table formats

Recommended video:

Introduction to Apache Iceberg

Key concepts to grasp:

Snapshots and time‑travel
Partitioning
Manifest files and metadata

5. OLake Docs

Familiarizing yourself with OLake's official documentation is essential for understanding how to set up, configure, and work with OLake effectively. The documentation provides comprehensive guides on:

Setting up a development environment
Understanding OLake's architecture and data flow
Configuring sources, destinations, and sync modes
Using OLake CLI commands and flags
Debugging and troubleshooting

Recommended starting point:

Setting up a Development Environment – OLake Docs

6. Change Data Capture (CDC)

OLake supports CDC (Change Data Capture) and Incremental syncs. Knowing CDC concepts helps you understand:

How OLake tracks inserts, updates, and deletes over time
Why resume tokens, log positions, and state files (state.json) are important
How incremental syncs differ from full refreshes

Additional Resources (Optional)

The following tutorials and resources are not mandatory but are recommended for a deeper understanding of related technologies and concepts that can enhance your OLake development experience.

1. Data Lakehouse Architecture

Understanding the data lakehouse concept helps you appreciate how OLake fits into modern data architectures. A lakehouse combines the best of data lakes and data warehouses, enabling both structured and unstructured data processing.

Recommended video:

Introduction to Data Lakehouse

2. Apache Parquet File Format

OLake supports Parquet as a destination format. Understanding Parquet helps you understand:

How columnar storage works and why it's efficient for analytics
How OLake writes data in Parquet format
The relationship between Parquet and Iceberg (Iceberg can use Parquet files)

Recommended video:

Understanding Apache Parquet

3. PostgreSQL Fundamentals

While not required if you're only working with other sources, understanding PostgreSQL is valuable since it's one of the most commonly used sources with OLake. This knowledge helps you:

Understand source database concepts and structures
Better configure PostgreSQL connections and CDC settings
Debug source-related issues

Learning Modules

Required Learning Modules

1. Golang (Go)

2. Docker

3. ETL and ELT

4. Apache Iceberg

5. OLake Docs

6. Change Data Capture (CDC)

Additional Resources (Optional)

1. Data Lakehouse Architecture

2. Apache Parquet File Format

3. PostgreSQL Fundamentals

💡 Join the OLake Community!

GitHub

Slack

Twitter

LinkedIn

YouTube

Learning Modules​

Required Learning Modules​

1. Golang (Go)​

2. Docker​

3. ETL and ELT​

4. Apache Iceberg​

5. OLake Docs​

6. Change Data Capture (CDC)​

Additional Resources (Optional)​

1. Data Lakehouse Architecture​

2. Apache Parquet File Format​

3. PostgreSQL Fundamentals​

💡 Join the OLake Community!

GitHub

Slack

Twitter

LinkedIn

YouTube

Learning Modules

Required Learning Modules

1. Golang (Go)

2. Docker

3. ETL and ELT

4. Apache Iceberg

5. OLake Docs

6. Change Data Capture (CDC)

Additional Resources (Optional)

1. Data Lakehouse Architecture

2. Apache Parquet File Format

3. PostgreSQL Fundamentals