Iceberg writes to match file size #3823

kevinzwang · 2025-02-19T00:34:48Z

Discussed in #3815

^{Originally posted by gero90 February 14, 2025}
If there is anyway to estimate parquet file size in df.write_iceberg() , it would be really nice to try to get parquet files of size close to the iceberg table property write.target-file-size-bytes (default is 512 MiB)

Having parquet files close to that size makes iceberg reads more efficient, and there is less table maintenance (compaction) to perform.

As example, I'm doing df.into_partitions(1) right before df.write_iceberg() where I know the total data is small, to get a single file per write.

Thanks in advance for taking a look and for making daft awesome!

The text was updated successfully, but these errors were encountered:

kevinzwang added enhancement New feature or request p2 (backlog) Nice to have features labels Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iceberg writes to match file size #3823

Iceberg writes to match file size #3823

kevinzwang commented Feb 19, 2025

Iceberg writes to match file size #3823

Iceberg writes to match file size #3823

Comments

kevinzwang commented Feb 19, 2025

Discussed in #3815