You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Originally posted by gero90 February 14, 2025
If there is anyway to estimate parquet file size in df.write_iceberg() , it would be really nice to try to get parquet files of size close to the iceberg table property write.target-file-size-bytes (default is 512 MiB)
Having parquet files close to that size makes iceberg reads more efficient, and there is less table maintenance (compaction) to perform.
As example, I'm doing df.into_partitions(1) right before df.write_iceberg() where I know the total data is small, to get a single file per write.
Thanks in advance for taking a look and for making daft awesome!
The text was updated successfully, but these errors were encountered:
Discussed in #3815
Originally posted by gero90 February 14, 2025
If there is anyway to estimate parquet file size in
df.write_iceberg()
, it would be really nice to try to get parquet files of size close to the iceberg table propertywrite.target-file-size-bytes
(default is 512 MiB)Having parquet files close to that size makes iceberg reads more efficient, and there is less table maintenance (compaction) to perform.
As example, I'm doing
df.into_partitions(1)
right beforedf.write_iceberg()
where I know the total data is small, to get a single file per write.Thanks in advance for taking a look and for making daft awesome!
The text was updated successfully, but these errors were encountered: