Last updated:2/13/2025|... min read

Sample Datasets

You can use the following GitHub repository awesome-json-datasets to get some sample JSON data for testing out OLake. OR just use this data.

Our MongoDB benchmarks are based on Twitter dataset - Archive.org (This JSON dataset has 4 levels of complex nesting, 230 million rows (664.81GB) uncompressed).

For SQL datasets, you can generate one using TPC, click here and download the TPC-H tool and this guide to generate the sample dataset however much you wish to generate and load data into PostgreSQL or MySQL.

Need Assistance?

If you have any questions or uncertainties about setting up OLake, contributing to the project, or troubleshooting any issues, we’re here to help. You can:

Email Support: Reach out to our team at hello@olake.io for prompt assistance.
Join our Slack Community: where we discuss future roadmaps, discuss bugs, help folks to debug issues they are facing and more.
Schedule a Call: If you prefer a one-on-one conversation, schedule a call with our CTO and team.

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!

Sample Datasets

Need Assistance?

Join our growing community

GitHub

Slack

Twitter

LinkedIn

YouTube