Types of Optimizations Supported in OLake
OLake provides three types of optimizations that can be performed on a table, depending on the level of optimization required.
-
Lite – Performs a lightweight optimization by converting equality delete files into positional delete files and merging small files. Files smaller than 1/8 of the target file size are merged into new files of up to about 1/8 of that target—for example, with a 128 MB target, files under 16 MB are merged into files of roughly 16 MB.
-
Medium – Applies all deletes and merges data files. Output files are typically between 1/8 of the target file size and the target file size itself, depending on how much data is available to merge—for example, with a 128 MB target file size, merged files might be anywhere between 16 MB and 128 MB.
-
Full – Performs the deepest level of optimization by rewriting data files so that they align with the configured target file size. This results in a complete copy-on-write (COW) rewrite of the table’s data files, producing the most optimal file layout. Full optimization is typically used when tables have accumulated significant fragmentation or when maximum query performance is required.
When more than one type of optimization is scheduled for a table to run at the same time, only the highest runs: Full overrides Medium and Lite; Medium overrides Lite. For example, if Full, Medium, and Lite are all due together, Full runs alone; if Medium and Lite are due together, Medium runs alone.
Choosing the Right Optimization Type
| Optimization Type | Output | What it Does | Cost Incurred | When to Use |
|---|---|---|---|---|
| Lite | Equality delete files are converted to positional delete files and small files are merged | Improves query engine compatibility without rewriting data files | Low | Use when the table has too many small files and you want lightweight maintenance with low compute. |
| Medium | Deletes are applied and data files are merged; output sizes fall between 1/8 of target file size and the target file size itself | Reduces fragmentation by merging data files into larger files up to the target size | Medium | Use when you need more than Lite: deletes fully applied and files merged toward the target size without a full table rewrite. |
| Full | Data files are completely rewritten into files aligned with the target file size | Performs a full copy-on-write rewrite of the table to produce the most optimal file layout | High | Use when tables are heavily fragmented or when maximum query performance and optimal file layout are required. |