Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema Registry considers avro.java.string as part of the schema comparison #868

Closed
markush81 opened this issue Aug 15, 2018 · 27 comments
Closed

Comments

@markush81
Copy link

markush81 commented Aug 15, 2018

Given

{
  "type": "record",
  "name": "Ping",
  "namespace": "de.markush.kafka",
  "fields": [
    {
      "name": "id",
      "type": "string",
      "logicalType": "uuid"
    },
    {
      "name": "created_at",
      "type": "string",
      "logicalType": "iso8601Timestamp"
    }
  ]
}

is published to Schema-Registry

when

Java Producer using generated POJOs with avro-maven-plugin tries to serialize this events with KafkaAvroSerialzerthe lookup for the schema fails because of the fact, that SCHEMA$ property in generated classes looks a little different

then

org.apache.kafka.common.errors.SerializationException: Error retrieving Avro schema: {"type":"record","name":"Ping","namespace":"de.markush.kafka","fields":[{"name":"id","type":{"type":"string","avro.java.string":"String"},"logicalType":"uuid"},{"name":"created_at","type":{"type":"string","avro.java.string":"String"},"logicalType":"iso8601Timestamp"}]}
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema not found; error code: 40403
	at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:203) ~[kafka-schema-registry-client-5.0.0.jar:na]
	at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:229) ~[kafka-schema-registry-client-5.0.0.jar:na]
	at io.confluent.kafka.schemaregistry.client.rest.RestService.lookUpSubjectVersion(RestService.java:296) ~[kafka-schema-registry-client-5.0.0.jar:na]
	at io.confluent.kafka.schemaregistry.client.rest.RestService.lookUpSubjectVersion(RestService.java:284) ~[kafka-schema-registry-client-5.0.0.jar:na]
	at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getIdFromRegistry(CachedSchemaRegistryClient.java:132) ~[kafka-schema-registry-client-5.0.0.jar:na]
	at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getId(CachedSchemaRegistryClient.java:264) ~[kafka-schema-registry-client-5.0.0.jar:na]
	at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:82) ~[kafka-avro-serializer-5.0.0.jar:na]
	at io.confluent.kafka.serializers.KafkaAvroSerializer.serialize(KafkaAvroSerializer.java:53) ~[kafka-avro-serializer-5.0.0.jar:na]
	at org.apache.kafka.common.serialization.ExtendedSerializer$Wrapper.serialize(ExtendedSerializer.java:65) ~[kafka-clients-1.0.2.jar:na]
	at org.apache.kafka.common.serialization.ExtendedSerializer$Wrapper.serialize(ExtendedSerializer.java:55) ~[kafka-clients-1.0.2.jar:na]
	at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:791) ~[kafka-clients-1.0.2.jar:na]
	at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:768) ~[kafka-clients-1.0.2.jar:na]
	at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer.send(DefaultKafkaProducerFactory.java:285) ~[spring-kafka-2.1.8.RELEASE.jar:2.1.8.RELEASE]
	at org.springframework.kafka.core.KafkaTemplate.doSend(KafkaTemplate.java:357) ~[spring-kafka-2.1.8.RELEASE.jar:2.1.8.RELEASE]
	at org.springframework.kafka.core.KafkaTemplate.send(KafkaTemplate.java:188) ~[spring-kafka-2.1.8.RELEASE.jar:2.1.8.RELEASE]
	at de.markush.kafka.schemaregistryexamples.producer.PingProducer.ping(PingProducer.java:29) ~[classes/:na]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_172]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_172]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_172]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_172]
	at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:84) ~[spring-context-5.0.8.RELEASE.jar:5.0.8.RELEASE]
	at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) ~[spring-context-5.0.8.RELEASE.jar:5.0.8.RELEASE]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_172]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_172]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_172]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_172]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_172]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_172]
	at java.lang.Thread.run(Thread.java:748) [na:1.8.0_172]

SCHEMA$

{  
   "type":"record",
   "name":"Ping",
   "namespace":"de.markush.kafka",
   "fields":[  
      {  
         "name":"id",
         "type":{  
            "type":"string",
            "avro.java.string":"String"
         },
         "logicalType":"uuid"
      },
      {  
         "name":"created_at",
         "type":{  
            "type":"string",
            "avro.java.string":"String"
         },
         "logicalType":"iso8601Timestamp"
      }
   ]
}

it has these avro.java.string properties.
(I still do not get why these are needed at all and because of this the schema is treated being not equal. Btw. consumers can handle this situation.)

Remark

I guess that #28 is related, but due to having the Map<MD5, SchemaIdAndSubjects> schemaHashToGuid AFAIK it is not enough to just change io.confluent.kafka.schemaregistry.avro.AvroUtils#parseSchema to a more or less schema equal form, especially not Parsing Canonical Form for Schemas since it lacks the at a minimum the default properties (What about order?). Already tried that ... but i assume it is more of a concept change, just storing canonical schemas into the registry from startup and not only change the lookup.

Anyone from Confluent can guide towards a solution? Concerning implementation i would be happy to help.

Version

Confluent Open Source: 5.0.0

@markush81 markush81 changed the title Schema Registry consideres avro.java.string as part of the schema comparison Schema Registry considers avro.java.string as part of the schema comparison Aug 17, 2018
@OneCricketeer
Copy link
Contributor

OneCricketeer commented Aug 23, 2018

I have one question: Why did you manually register the schema instead of letting the Serializer do it?

Also, can you post the rest of the stack trace and mention version numbers of the Registry, please?

@markush81
Copy link
Author

I edited my previous input: added complete stacktrace and version.

To answer your question:

The setup ist, that there are multiple clients Java, .NET, ruby and even Node. So the process of defining schema and maintaining schema-registry is done centralised (yet manually, in process of writing some tool for it) one the one hand and language agnostic on the other. In case each producer is allowed to register hist own schemas, all consumers (example: Java is producing and all others are consuming) need to be able to live with that language specific properties.

@OneCricketeer
Copy link
Contributor

Thanks! And yes, that makes sense. I come from an almost entirely Java producer environment, so this error is a new one to me, but it makes sense given that the MD5 computation is on the raw Schema string body, I believe, not the Parsing Canonical Form, as you mention

@markush81
Copy link
Author

markush81 commented Aug 23, 2018

Yes, the key point is to find the right canonical form, cause the one implemented by avro has at least one big issue, it does not consider default property, which is essential.

So there is sth. needed more like language-agnostic form, basically remove all custom properties.

Btw. do you know why avro.java.string is needed at all? AFAIK the consumer does not need it and i have no clue why a producer should need this as well.

@OneCricketeer
Copy link
Contributor

Did you look at the PR mentioned on #28 it mentions defaults are preserved.

The java.string property can be set to String, or its parent class CharSequence. That way, the Avro schema can refer to any other sibling of a String type, and it's generally only used by code generators, as far as I know.

@markush81
Copy link
Author

The PR did a custom implementation to preserve them yes, but seems not work with 5.0.0 anymore.

Actually Code Generator does not need the property according to what i experienced so far, but it actually generates them and that causes the issue. It only makes sense if these are used by Serializer ... but until now i did not debug that deep, but will hopefully find time soon.

@OneCricketeer
Copy link
Contributor

By code generator, I mean the Avro Maven Plugin's <stringType> tag.

https://github.com/apache/avro/blob/master/lang/java/maven-plugin/src/main/java/org/apache/avro/mojo/AbstractAvroMojo.java#L101-L106

That determines if Java classes are made with String x or CharSequence x

@markush81
Copy link
Author

sure i understand this part, but what i mean is: why does the generator add avro.java.string to the SCHEMA$ of each generated class. But i will find out.

@OneCricketeer
Copy link
Contributor

@markush81
Copy link
Author

markush81 commented Aug 25, 2018

I believe no.

My process is following:

  1. Take a schema, without this property
  2. Generate code from this schema
  3. Look into the generated class and see the property SCHEMA$.

Only now this property occurs. So there is no reason for any later Generator or Compiler steps. Either the Serializer needs this property now when this generated class gets serialized or i do not see the reason.

@markush81
Copy link
Author

markush81 commented Aug 28, 2018

I think i now found out the purpose. It is used by the deserializer, but it seems that deserializer takes two things: reader- (<- the SCHEMA$ property with avro.java.string) and writer-schema (<- from schema registry, without avro.java.string) and somehow this does work. So i question (at least for my setup) the need of it. But i am pretty sure there are reasons for it.

Anyhow: the original problem still exists, how to let @confluentinc Schema Registry be language agnostic and still use it in such situation, that there is a Java client with generated code.

@chrisdoberman
Copy link

Yes, I am having the same issue. We have non-java consumers. The java producer, which uses the generated avro class with the SCHEMA$ field, writes this to the registry - which includes the avro.java.string type.

@mbieser
Copy link

mbieser commented Feb 22, 2019

+1
Having the java libraries accept language-independent schema would be helpful.

@plinioj
Copy link

plinioj commented Jun 18, 2019

+1

3 similar comments
@cbsmith
Copy link

cbsmith commented Jun 26, 2019

+1

@clande
Copy link

clande commented Jul 12, 2019

+1

@alinazemian
Copy link

+1

@ghost
Copy link

ghost commented Sep 10, 2019

Is there any timeline for this issue fix? Thanks for your contributions.

@anilkulkarni87
Copy link

What is the workaround for this?

@zhuguangbin
Copy link

I am having this problem. As a workaround, I manually register the schema with avro.java.string type restead of the original schema in avsc.

Or else, I set auto.register.schemas = true in my producer kafka avro serde config. But I don't recommend this way, because in developing stage, this may register wrong schema to production schema registry.

I think I will go this way:

  1. write a schema in avsc
  2. maven build to java jar. in test stage, use kafka-schema-registry-maven-plugin to test compatibility between local avsc schema and remote kafka schema registery.
  3. if compatibly and confirm the schema is final, mannually register it to remote schema registy. (use a java tool, given jar artifact groupId/artifactId/version and full avro classname for topic, the tool extract this SpecificRecord's SCHEMA$ field, this is the right schema, POST this schema to registry. We must restrict the privilege who can use this tool to register schema )
  4. consumer/producer upgrade artifact version to the right version for consuming/producing kafka topic

@lukas-krecan
Copy link

I have filed an issue to Avro project https://issues.apache.org/jira/browse/AVRO-2838

@enima2684
Copy link

The same issue raises when producing avro data from Kafka Connect and consuming it with Kafka Streams
It would be really helpful to have a solution for that.

What workaround are you using while the solution is still in development ?

@peknu
Copy link

peknu commented Sep 10, 2020

One way to resolve this is to replace all string types in the avsc file same type that the plugin will generate. For example:
Replace: { "name": "comment", "type": "string" }
With: { "name": "comment", "type": { "type": "string", "avro.java.string": "String" } }

Keep the avro-maven-plugin config:

<configuration>
    <stringType>String</stringType>
</configuration>

This way, the generated schema inserted into the jar file will be identical with the one used by the plugin to upload and verify schemas. I have only tested that the inserted schema is identical to the one in the avsc file but this should work.

@rayokota
Copy link
Member

This is addressed in 5.5.1 and later by specifying auto.register.schemas=false and use.latest.version=true in the Avro serializer configs.

@mnowaczyk
Copy link

I'll just add here that Confluent recently introduced a flag in their version of avro serializer, which solves this problem. You need to put:
KafkaAvroSerializerConfig.AVRO_REMOVE_JAVA_PROPS_CONFIG = true
in the serde config, and it will send correct schemas to the registry, without "avro.java.string".

@lukas-krecan
Copy link

At the end, we ended-up using our serializer that does not require code generation at all https://github.com/productboardlabs/jackson-kafka-avro-serializer

@slominskir
Copy link
Contributor

The new property:

KafkaAvroSerializerConfig.AVRO_REMOVE_JAVA_PROPS_CONFIG = true

seems like an ugly hack. I guess it it is easier for down-stream projects to bend over backwards than try to fix the root of the problem at the source? I submitted a patch to AVRO project months ago with no movement so I guess the answer is "yes": apache/avro#1235

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests