-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schema Registry considers avro.java.string as part of the schema comparison #868
Comments
I have one question: Why did you manually register the schema instead of letting the Serializer do it? Also, can you post the rest of the stack trace and mention version numbers of the Registry, please? |
I edited my previous input: added complete stacktrace and version. To answer your question: The setup ist, that there are multiple clients Java, .NET, ruby and even Node. So the process of defining schema and maintaining schema-registry is done centralised (yet manually, in process of writing some tool for it) one the one hand and language agnostic on the other. In case each producer is allowed to register hist own schemas, all consumers (example: Java is producing and all others are consuming) need to be able to live with that language specific properties. |
Thanks! And yes, that makes sense. I come from an almost entirely Java producer environment, so this error is a new one to me, but it makes sense given that the MD5 computation is on the raw Schema string body, I believe, not the Parsing Canonical Form, as you mention |
Yes, the key point is to find the right canonical form, cause the one implemented by avro has at least one big issue, it does not consider So there is sth. needed more like Btw. do you know why |
Did you look at the PR mentioned on #28 it mentions defaults are preserved. The |
The PR did a custom implementation to preserve them yes, but seems not work with 5.0.0 anymore. Actually Code Generator does not need the property according to what i experienced so far, but it actually generates them and that causes the issue. It only makes sense if these are used by Serializer ... but until now i did not debug that deep, but will hopefully find time soon. |
By code generator, I mean the Avro Maven Plugin's That determines if Java classes are made with |
sure i understand this part, but what i mean is: why does the generator add |
In order to generate Java classes from the schema at a later point. |
I believe no. My process is following:
Only now this property occurs. So there is no reason for any later Generator or Compiler steps. Either the Serializer needs this property now when this generated class gets serialized or i do not see the reason. |
I think i now found out the purpose. It is used by the deserializer, but it seems that deserializer takes two things: reader- (<- the Anyhow: the original problem still exists, how to let @confluentinc Schema Registry be language agnostic and still use it in such situation, that there is a Java client with generated code. |
Yes, I am having the same issue. We have non-java consumers. The java producer, which uses the generated avro class with the SCHEMA$ field, writes this to the registry - which includes the avro.java.string type. |
+1 |
+1 |
3 similar comments
+1 |
+1 |
+1 |
Is there any timeline for this issue fix? Thanks for your contributions. |
What is the workaround for this? |
I am having this problem. As a workaround, I manually register the schema with avro.java.string type restead of the original schema in avsc. Or else, I set auto.register.schemas = true in my producer kafka avro serde config. But I don't recommend this way, because in developing stage, this may register wrong schema to production schema registry. I think I will go this way:
|
I have filed an issue to Avro project https://issues.apache.org/jira/browse/AVRO-2838 |
The same issue raises when producing avro data from Kafka Connect and consuming it with Kafka Streams What workaround are you using while the solution is still in development ? |
One way to resolve this is to replace all string types in the avsc file same type that the plugin will generate. For example: Keep the avro-maven-plugin config:
This way, the generated schema inserted into the jar file will be identical with the one used by the plugin to upload and verify schemas. I have only tested that the inserted schema is identical to the one in the avsc file but this should work. |
This is addressed in 5.5.1 and later by specifying |
I'll just add here that Confluent recently introduced a flag in their version of avro serializer, which solves this problem. You need to put: |
At the end, we ended-up using our serializer that does not require code generation at all https://github.com/productboardlabs/jackson-kafka-avro-serializer |
The new property:
seems like an ugly hack. I guess it it is easier for down-stream projects to bend over backwards than try to fix the root of the problem at the source? I submitted a patch to AVRO project months ago with no movement so I guess the answer is "yes": apache/avro#1235 |
Given
is published to Schema-Registry
when
Java Producer using generated POJOs with
avro-maven-plugin
tries to serialize this events withKafkaAvroSerialzer
the lookup for the schema fails because of the fact, thatSCHEMA$
property in generated classes looks a little differentthen
SCHEMA$
it has these avro.java.string properties.
(I still do not get why these are needed at all and because of this the schema is treated being not equal. Btw. consumers can handle this situation.)
Remark
I guess that #28 is related, but due to having the
Map<MD5, SchemaIdAndSubjects> schemaHashToGuid
AFAIK it is not enough to just changeio.confluent.kafka.schemaregistry.avro.AvroUtils#parseSchema
to a more or less schema equal form, especially not Parsing Canonical Form for Schemas since it lacks the at a minimum thedefault
properties (What aboutorder
?). Already tried that ... but i assume it is more of a concept change, just storing canonical schemas into the registry from startup and not only change the lookup.Anyone from Confluent can guide towards a solution? Concerning implementation i would be happy to help.
Version
Confluent Open Source: 5.0.0
The text was updated successfully, but these errors were encountered: