An in-depth look into the behavior of Spring-Kafka producers making multiple calls to the Confluent Schema Registry instead of caching schema information, and solutions to optimize this process.
---
This video is based on the question https://stackoverflow.com/q/62904356/ asked by the user 'Raj' ( https://stackoverflow.com/u/7257971/ ) and on the answer https://stackoverflow.com/a/62921939/ provided by the user 'Gary Russell' ( https://stackoverflow.com/u/1240763/ ) at 'Stack Overflow' website. Thanks to these great users and Stackexchange community for their contributions.
Visit these links for original content and any more details, such as alternate solutions, latest updates/developments on topic, comments, revision history etc. For example, the original title of the Question was: Why producer is making more calls to Schema registry instead of using producer cache when using spring-kafka?
Also, Content (except music) licensed under CC BY-SA https://meta.stackexchange.com/help/l...
The original Question post is licensed under the 'CC BY-SA 4.0' ( https://creativecommons.org/licenses/... ) license, and the original Answer post is licensed under the 'CC BY-SA 4.0' ( https://creativecommons.org/licenses/... ) license.
If anything seems off to you, please feel free to write me at vlogize [AT] gmail [DOT] com.
---
Why Spring-Kafka Producers Are Making Multiple Calls to Schema Registry
When utilizing Spring-Kafka with Avro messages, many developers experience an unexpected issue: the producer makes multiple calls to the (Confluent) Schema Registry for every message sent, instead of utilizing a cache. This behavior can cause performance concerns and increase latency in message delivery. Let's dive into this problem, its causes, and how to potentially resolve it.
The Problem at a Glance
In your configuration of spring-kafka, like in the example provided, it is common to see two calls made to the Schema Registry for each message sent:
One call is made to obtain the schema (getInputStream).
A second call is made to verify the schema (getOutputStream).
This results in a total of 20 calls for 10 messages, clearly indicating that the Schema Registry is being overutilized.
Key Questions to Address
Why is it making two calls to the Schema Registry for each message?
Why is the schema information not being cached on the producer side?
Understanding the Calls to Schema Registry
Why Two Calls per Message?
The dual calls to the Schema Registry are a function of how KafkaAvroSerializer interacts with the registry. Typically, the following sequence occurs:
Call 1: The producer fetches the schema to serialize the Avro message before sending it.
Call 2: After serialization, the producer confirms the schema for the message is valid before actually sending it through the Kafka topic.
These calls ensure that the data being sent adheres to the expected schema, but when sending multiple messages of the same type, this can become redundant.
Caching Mechanism of the Producer
The caching behavior in spring-kafka is governed by the CachedSchemaRegistry. If properly configured, it should minimize the number of calls to the Schema Registry:
The CachedSchemaRegistry is designed to store already fetched schemas, thereby avoiding repeated retrieval for the same schema.
If your configuration is not correctly set, the producer might not cache the schema, leading to more calls than necessary.
Troubleshooting and Solutions
1. Verify Configuration
First and foremost, ensure your producer configuration is set correctly. The simplest step could involve rechecking the SCHEMA_REGISTRY_URL_CONFIG. Confirm that it's being set correctly in your application context.
2. Utilize a Breakpoint or Debugging
To assist in understanding why the calls are not caching, you can set a breakpoint in the CachedSchemaRegistry.register() method. This allows you to observe:
Whether subsequent identical calls are properly resolving against the cache or if they're attempting fresh retrieval from the Schema Registry.
3. Adjust Version or Dependencies
You mentioned using spring-kafka 2.2.8. If possible, consider upgrading to a newer version that may include optimizations related to schema caching.
4. Monitor Messages and Calls
By keeping track of the number of calls to the Schema Registry against the throughput of your messages, you can identify if caching works effectively or if issues persist across message sends.
Conclusion
The challenge of excessive calls to the Schema Registry when using Spring-Kafka with Avro messages may seem daunting, but by carefully verifying your configuration and observing how your application interacts with the CachedSchemaRegistry, you can optimize this process. Remember, analyzing your versions and configurations while monitoring actual runtime behavior is key to resolving these issues efficiently.
By following these steps, you can help ensure that your producers use the Schema Registry more
Информация по комментариям в разработке