Last updated:|... min read
| File Format | Source Data Type | Destination Data Type | Notes |
|---|---|---|---|
| CSV | Inferred from data | string | All CSV fields initially treated as strings |
| CSV | Numeric patterns | int, bigint, double | Integer and floating-point numbers auto-detected |
| CSV | ISO 8601 dates | timestamptz | Date/datetime strings converted to timestamp |
| CSV | Boolean values | boolean | true/false strings converted to boolean |
| JSON | string | string | JSON string fields |
| JSON | number (integer) | bigint | JSON integer values |
| JSON | number (float) | double | JSON floating-point values |
| JSON | boolean | boolean | JSON boolean values |
| JSON | object, array | string | Nested objects/arrays serialized to JSON strings |
| JSON | null | string | Null values converted to empty strings |
| Parquet | STRING, BINARY | string | Parquet string types |
| Parquet | INT32, INT64 | int, bigint | Parquet integer types |
| Parquet | FLOAT, DOUBLE | float, double | Parquet floating-point types |
| Parquet | BOOLEAN | boolean | Parquet boolean type |
| Parquet | TIMESTAMP_MILLIS | timestamptz | Parquet timestamp types |
| Parquet | DATE | date | Parquet date type |
| Parquet | DECIMAL | float | Parquet decimal types converted to float64 |
| All Formats | _last_modified_time | timestamptz | S3 LastModified metadata (added by connector) |
Schema Inference
- CSV: Uses AND logic - examines all sampled rows to determine most restrictive type
- JSON: Auto-detects types from JSON primitives
- Parquet: Schema read directly from file metadata (no inference needed)
timestamptz timezone
OLake always ingests timestamp data in UTC format, independent of the source timezone.