Kafka Message Retention V/S Committed Offset Retention

Anshita Singh
3 min readApr 29, 2021

--

Many of us often get confused between the two important paradigms of Kafka viz. : Kafka Message Retention and Committed Offset Retention. These two might sound similar but in reality, THEY’RE NOT. The first one represents the retention of a Kafka topic message whereas the second one represents the retention of committed offset by a consumer. FYI, post-Kafka server version 0.8 with new consumer clients, Consumer offsets are stored in a special topic on Kafka brokers called __consumer_offsets.

Here is a brief summary of both the paradigms::

Kafka Message Retention: A Kafka Topic’s message retention can be decided by either one of the server properties (a) log.retention.hours or (b) log.retention.minutes or (c) log.retention.ms or by a topic level property retention.ms. There are other properties also like log.retention.bytes, log.retention.check.interval.ms, etc. which also play a role in retention, but those are out of scope for this discussion. The time-based retention properties signify till what time a message will be retained in Kafka topic. (Though there is still a great chance that the message would be present in the topic even after the time-based configs are expired, check here). And once this duration gets expired, messages are eligible for deletion.

Committed Offset By Consumer Retention: Once a Kafka consumer starts consuming data from a topic, it commits its last consumed message’s offset in the Kafka broker’s internal topic called __consumer_offsets. This topic helps a consumer in identifying the offset from which it should start reading the topic on its next poll. Retention of Committed offset by a consumer is decided by broker property offset.retention.minutes. Once this retention expires, the consumer committed offset will be reset and the consumer won’t be able to find its last committed offset in the __consumer_offsets topic. In this case, the consumer can either decide to read all data from the topic or the latest data from the topic based on the consumer config auto.offset.reset which can be set to “earliest” or “latest” or “none”.

Now the question is how they are different from each other:

Let’s see this example:

Assumption:
log.retention.minutes=10080(7days)
offsets.retention.minutes=7200(5days)

Current data in __consumer_offsets topic will look like below:

On the 6th day, __consumer_offsets topic will look like below:

Here, we can see that on the 6th day, the value for Key K1 didn’t get deleted from the __consumer_offsets topic, the reason being (a) __consumer_offsets topic has a cleanup policy “compact” and compaction has not been triggered (b) retention policy of __consumer_offsets topic is determined by cluster level topic retention config i.e. log.retention.minutes=10080 (7days). Now, the offset retention got expired on the 6th day, so another offset got committed for key K1 with value NULL, which indicates that the committed offset got reset for K1.

Now on the 8th day, __consumer_offsets topic will look like below:

It shows K1, K2, and K3 values are reset. And since log.retention.minutes have passed and when the log cleaner thread runs, old values of the topic will be compacted due to the fact that for __consumer_offsets topic has cleanup.policy as “compact.

So in conclusion, we can state that Kafka Message Retention describes till what time messages will be retained in Kafka topic whereas Committed Offset Retention denotes that after its expiry, committed offset by a consumer will be reset and would eventually be deleted/compacted by cleaner threads due to the compact policy of __consumer_offsets topic.

--

--