-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle unchanged status on schema dry-run / diff #341
Comments
Scenario to be validated before implementation: Schema v1 {"namespace":"com.michelin.kafka.producer.showcase.avro","type":"record","name":"PersonAvro","fields":[{"name":"firstName","type":["null","string"],"default":null,"doc":"First name of the person"},{"name":"lastName","type":["null","string"],"default":null,"doc":"Last name of the person"},{"name":"dateOfBirth","type":["null",{"type":"long","logicalType":"timestamp-millis"}],"default":null,"doc":"Date of birth of the person"}]} and we just swap the order of the Schema v2 {"namespace":"com.michelin.kafka.producer.showcase.avro","type":"record","name":"PersonAvro","fields":[{"name":"lastName","type":["null","string"],"default":null,"doc":"Last name of the person"},{"name":"firstName","type":["null","string"],"default":null,"doc":"First name of the person"},{"name":"dateOfBirth","type":["null",{"type":"long","logicalType":"timestamp-millis"}],"default":null,"doc":"Date of birth of the person"}]} Based on canonical strings, schemas are different. But when applying the v2 to the Schema Registry, does the Schema Registry is smart enough to detect it is the exact same schema (with fields in different order) or does it creates a new schema ? ➡️ If a new schema is created, the scenario is OK. (and it seems to be according to my quick tests) A full integration test to validate this could be: 💡 The check for 💡 Think about adding the Confluent repository when using repositories {
mavenCentral()
maven {
url "https://packages.confluent.io/maven"
}
} |
Problem
Suppose that we have a subject with a v1 schema created with ns4kafka
When doing a dry-run / diff with the same schema, kafkactl will return a CHANGED status.
It's because ns4kafka never compares the value of the schema (only if there is a subject present to return CHANGED / CREATED) with the latest schema definition.
Doing a apply without dry-run won't create a new version the schema are equals so it will return an UNCHANGED status. Behaviour between dry-run and apply without dry-run is different.
Suggestion
A simple string comparison is not sufficient because we won't take into account formatting changes that could have been done (spaces, tabs, reordering, etc.)
The
kafka-schema-registry-client
dep provides anAvroSchema
class that we can instantiate with the input schema string and the latest schema definition schema. The constructor normalizes the schema and allows us to compare the two schemas, even if users did some formatting changes.My suggestion would be to improve the dry-run comparison (https://github.com/michelin/ns4kafka/blob/master/src/main/java/com/michelin/ns4kafka/controllers/SchemaController.java#L121) by instantiating 2 AvroSchema instances for the 2 definitions and doing a comparison on the
AvroSchema.canonicalString()
stringsInput schema
Normalized schema
Alternatives Considered
N/A
Additional Context
N/A
The text was updated successfully, but these errors were encountered: