Skip to main content
File FormatSource Data TypeDestination Data TypeNotes
CSVInferred from datastringAll CSV fields initially treated as strings
CSVNumeric patternsint, bigint, doubleInteger and floating-point numbers auto-detected
CSVISO 8601 datestimestamptzDate/datetime strings converted to timestamp
CSVBoolean valuesbooleantrue/false strings converted to boolean
JSONstringstringJSON string fields
JSONnumber (integer)bigintJSON integer values
JSONnumber (float)doubleJSON floating-point values
JSONbooleanbooleanJSON boolean values
JSONobject, arraystringNested objects/arrays serialized to JSON strings
JSONnullstringNull values converted to empty strings
ParquetSTRING, BINARYstringParquet string types
ParquetINT32, INT64int, bigintParquet integer types
ParquetFLOAT, DOUBLEfloat, doubleParquet floating-point types
ParquetBOOLEANbooleanParquet boolean type
ParquetTIMESTAMP_MILLIStimestamptzParquet timestamp types
ParquetDATEdateParquet date type
ParquetDECIMALfloatParquet decimal types converted to float64
All Formats_last_modified_timetimestamptzS3 LastModified metadata (added by connector)
Schema Inference
  • CSV: Uses AND logic - examines all sampled rows to determine most restrictive type
  • JSON: Auto-detects types from JSON primitives
  • Parquet: Schema read directly from file metadata (no inference needed)
timestamptz timezone

OLake always ingests timestamp data in UTC format, independent of the source timezone.



πŸ’‘ Join the OLake Community!

Got questions, ideas, or just want to connect with other data engineers?
πŸ‘‰ Join our Slack Community to get real-time support, share feedback, and shape the future of OLake together. πŸš€

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!