You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a schema evolution write is triggered, we generate a whole new metadata line to represent the state of the new schema, which makes sense. But we're also regenerating the table's ID, which has negative consequences on Spark streaming jobs that use the metadata ID to tell which table is being read. When this changes, the Spark streaming read thinks that this table is a whole new one.
This net new metadata struct and ID is generated here:
let metadata = Metadata::try_new(schema, part_cols,HashMap::new())?;
What you expected to happen:
A new metadata struct is generated, but the metadata ID from the table's existing state is utilized instead.
How to reproduce it:
Perform any schema evolution write on a table and compare the original metadata.id to the new one.
More details:
I'm tinkering locally and think I have a fix made, but am also not a Rust expert, so if anyone has strong opinions on how to implement feel free to override me.
The text was updated successfully, but these errors were encountered:
liamphmurphy
changed the title
Schema evolution causing table ID to be rewritten
Schema evolution causing table ID to be regenerated, breaks Spark streaming jobs
Feb 27, 2025
Environment
Delta-rs version: 0.25.2
Binding: Python
Environment:
AWS, Local
Bug
What happened:
When a schema evolution write is triggered, we generate a whole new metadata line to represent the state of the new schema, which makes sense. But we're also regenerating the table's ID, which has negative consequences on Spark streaming jobs that use the metadata ID to tell which table is being read. When this changes, the Spark streaming read thinks that this table is a whole new one.
This net new metadata struct and ID is generated here:
delta-rs/crates/core/src/writer/record_batch.rs
Line 286 in 94a2009
What you expected to happen:
A new metadata struct is generated, but the metadata ID from the table's existing state is utilized instead.
How to reproduce it:
Perform any schema evolution write on a table and compare the original
metadata.id
to the new one.More details:
I'm tinkering locally and think I have a fix made, but am also not a Rust expert, so if anyone has strong opinions on how to implement feel free to override me.
The text was updated successfully, but these errors were encountered: