A practical Spring Boot Kafka debugging story about stuck consumers, misleading restarts, and why environment-specific consumer group IDs matter.
Apache Kafka was sending messages. The Spring Boot service was running. The consumer did not crash.
But messages were not being processed reliably.
At first, it looked like a Kafka stability issue. The logs showed consumer group rebalancing. Restarting the service helped for a while. Clearing messages seemed to help too.
The real problem was much smaller.
Dev and staging were using the same Kafka consumer group ID.
Kafka was not broken. Spring Boot was not broken. Kafka was doing exactly what we configured it to do.
The Symptom
The first symptom was simple.
Kafka messages were being sent, but the consumer was not processing them consistently.
Testers reported that queues were not being picked up. The service did not crash. There was no clear application error. There was no obvious stack trace pointing to the root cause.
The Spring Boot Kafka logs showed repeated group rejoin behavior.
For example, the logs looked similar to this:
INFO o.a.k.c.c.internals.AbstractCoordinator :
[Consumer clientId=order-service-consumer-1, groupId=order-service]
Attempt to heartbeat failed since group is rebalancing
INFO o.a.k.c.c.internals.ConsumerCoordinator :
[Consumer clientId=order-service-consumer-1, groupId=order-service]
Revoking previously assigned partitions order-created-0
INFO o.a.k.c.c.internals.AbstractCoordinator :
[Consumer clientId=order-service-consumer-1, groupId=order-service]
Successfully joined group with generation Generation{generationId=42}This kind of log is easy to misread.
It tells you the consumer is joining or rejoining the group, but it does not directly say:
“Another environment is using the same group ID.”
So the first reaction was to treat it like a Kafka consumer stability issue.
The Misleading Workarounds
Restarting the Spring Boot service helped sometimes.
After a restart, the consumer started processing again. Testers could continue testing. The team got some breathing room.
But the issue came back.
Then clearing messages from the Kafka queue seemed to help sometimes too.
That made the issue even more confusing.
If restarting worked, maybe the service was stuck.
If clearing the queue worked, maybe one bad message blocked processing.
If Kafka logs showed group rejoining, maybe Kafka itself had a broker or coordinator problem.
Each theory sounded possible.
But none of them explained why the same issue kept happening.
What We Checked First
Before finding the real cause, we checked the normal things first.
A developer should usually start with these areas:
- Application logs
- Kafka consumer logs
- Message payloads
- Topic and partition state
- Consumer lag
- Service restart behavior
- Recent deployments
- Failed message handling
- Network or broker connectivity
That is a reasonable debugging path.
For example, if the consumer code throws an exception for every message, the consumer may appear stuck. If consumer lag keeps growing, the application may be too slow or not consuming at all.
But in this case, the code was not the real issue.
A Simple Spring Boot Kafka Consumer
Here is a simple Spring Boot Kafka consumer:
package com.example.orders;
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.stereotype.Component;
@Component
public class OrderCreatedConsumer {
@KafkaListener(topics = "order-created")
public void consume(String message) {
System.out.println("Received message: " + message);
}
}This consumer listens to the order-created topic.
When a message arrives, Spring Kafka calls the consume method. In a real project, the developer would usually parse the message, validate it, call business logic, and update a database.
The important part is that this code does not show the consumer group ID.
That value usually comes from configuration.
The Bad Configuration
Here was the dangerous configuration:
spring.kafka.consumer.group-id=order-service
At first, this looks normal.
The service is called order-service, so the group ID is also order-service.
The problem appears when multiple environments share the same Kafka broker and the same topic.
For example:
Dev service -> topic: order-created -> group.id: order-service Staging service -> topic: order-created -> group.id: order-service
In this scenario, Kafka does not know that dev and staging are separate environments.
Kafka only sees two consumers with the same group ID.
So Kafka treats them as members of the same logical consumer group.
The Real Root Cause
The root cause was painfully simple.
Dev and staging used the same Kafka server.
They also used the same Kafka topic.
And they used the same Kafka consumer group ID.
That means Kafka saw both consumers as part of the same group.
So the environments started competing for messages.
Sometimes dev received messages that staging was expected to process. Sometimes staging was affected after dev was deployed. Sometimes the consumers kept rejoining the group because deployments and restarts triggered rebalances.
Kafka was doing exactly what it was configured to do.
The bug was not in Kafka.
The bug was unclear environment isolation.
Why Consumer Group IDs Matter In Kafka
Kafka uses group.id to identify a group of consumers that work together.
Consumers in the same group share topic partitions.
For example, if a topic has three partitions and one consumer group has three consumers, Kafka may assign one partition to each consumer.
That is useful when the consumers are part of the same application environment.
But it becomes dangerous when unrelated environments share the same group ID.
For example:
Topic: order-created Consumer group: order-service Members: - dev-order-service - staging-order-service
From Kafka’s point of view, this is one group.
From the team’s point of view, these are two different environments.
That mismatch causes the problem.
The Fix
The fix is to use environment-specific Kafka consumer group IDs.
For example:
# application-dev.properties spring.kafka.consumer.group-id=order-service-dev
# application-staging.properties spring.kafka.consumer.group-id=order-service-staging
# application-prod.properties spring.kafka.consumer.group-id=order-service-prod
Now each environment has its own Kafka consumer identity.
Dev consumers belong to order-service-dev.
Staging consumers belong to order-service-staging.
Production consumers belong to order-service-prod.
Even if they share the same Kafka broker, they no longer accidentally join the same consumer group.
Spring Profile Setup
In Spring Boot, you can manage this with profiles.
Example project structure:
src/main/resources/ application.properties application-dev.properties application-staging.properties application-prod.properties
Base configuration:
# application.properties
spring.kafka.bootstrap-servers=${KAFKA_BOOTSTRAP_SERVERS}
spring.kafka.consumer.auto-offset-reset=earliest
spring.kafka.consumer.key-deserializer=org.apache.kafka.common.serialization.StringDeserializer
spring.kafka.consumer.value-deserializer=org.apache.kafka.common.serialization.StringDeserializerDev configuration:
# application-dev.properties spring.kafka.consumer.group-id=order-service-dev
Staging configuration:
# application-staging.properties spring.kafka.consumer.group-id=order-service-staging
Production configuration:
# application-prod.properties spring.kafka.consumer.group-id=order-service-prod
The base file contains shared Kafka settings.
Each environment file contains the environment-specific group ID. This keeps the common configuration reusable while still keeping consumer identities separate.
Running With A Spring Profile
You can run the dev profile locally like this:
java -jar order-service.jar --spring.profiles.active=dev
For staging:
java -jar order-service.jar --spring.profiles.active=staging
For production:
java -jar order-service.jar --spring.profiles.active=prod
The active profile decides which application-{profile}.properties file Spring Boot loads.
This makes it easier to avoid accidentally using the same group ID everywhere.
Alternative Configuration With Environment Variables
Another common option is to inject the group ID from an environment variable.
spring.kafka.consumer.group-id=${KAFKA_CONSUMER_GROUP_ID}Then configure each environment differently.
Dev:
KAFKA_CONSUMER_GROUP_ID=order-service-dev
Staging:
KAFKA_CONSUMER_GROUP_ID=order-service-staging
Production:
KAFKA_CONSUMER_GROUP_ID=order-service-prod
This approach works well with Docker, Kubernetes, Helm, and CI/CD systems.
The important part is that the deployment pipeline must clearly set the value for each environment.
Example Docker Compose Configuration
For local or dev testing, the configuration may look like this:
services:
order-service:
image: order-service:latest
environment:
SPRING_PROFILES_ACTIVE: dev
KAFKA_BOOTSTRAP_SERVERS: kafka:9092
KAFKA_CONSUMER_GROUP_ID: order-service-devThis makes the environment identity explicit.
The service does not need to know where it is running from code. The deployment configuration provides that information.
Example Helm Values
In Kubernetes, you may use Helm values like this.
Dev values:
springProfile: dev kafka: bootstrapServers: kafka.shared.svc.cluster.local:9092 consumerGroupId: order-service-dev
Staging values:
springProfile: staging kafka: bootstrapServers: kafka.shared.svc.cluster.local:9092 consumerGroupId: order-service-staging
Then the deployment template can pass the values as environment variables:
env:
- name: SPRING_PROFILES_ACTIVE
value: "{{ .Values.springProfile }}"
- name: KAFKA_BOOTSTRAP_SERVERS
value: "{{ .Values.kafka.bootstrapServers }}"
- name: KAFKA_CONSUMER_GROUP_ID
value: "{{ .Values.kafka.consumerGroupId }}"This is useful when multiple environments share the same Kafka infrastructure.
The rule is simple:
Shared broker can be acceptable. Shared consumer identity is not.
How We Confirmed The Issue
The team confirmed the issue by comparing Kafka consumer groups and deployment configuration.
The key checks were:
kafka-consumer-groups.sh \ --bootstrap-server kafka:9092 \ --list
Then we described the suspicious group:
kafka-consumer-groups.sh \ --bootstrap-server kafka:9092 \ --describe \ --group order-service
The result showed that consumers from more than one environment were using the same group.
That explained the repeated rebalancing and inconsistent processing behavior.
In this scenario, the issue was not that messages disappeared. The issue was that the wrong environment could become part of the same group and affect partition assignment.
Expected Result After The Fix
After separating the consumer group IDs, the behavior became predictable again.
Dev deployments no longer affected staging.
Staging consumers processed their own messages.
Kafka consumers stopped competing across environments.
Restarting the service was no longer needed as a workaround.
Queue processing became easier to reason about.
The system did not need a complex code change. It needed a clearer configuration boundary.
Debugging Checklist
When a Spring Boot Kafka consumer is not picking up messages, check these questions:
- Are multiple environments sharing the same Kafka broker?
- Are they using the same topic?
- Are they using the same consumer group ID?
- Are consumers constantly rebalancing?
- Did the issue start after deploying another environment?
- Does restart only temporarily fix the issue?
- Are offsets being committed under the expected group ID?
- Is the active Spring profile correct?
- Are environment variables different between dev, staging, and production?
This checklist is useful because Kafka issues are often configuration issues before they are code issues.
Practical Notes
Sharing a Kafka broker across environments is possible, but you need clear boundaries.
If dev and staging share the same topic and the same consumer group ID, Kafka will treat them as one group.
Restarting can hide the real issue because it forces a rebalance.
Clearing queues can create false confidence because it changes the state of the problem without fixing the configuration.
Do not reuse production group IDs in dev or staging.
Treat group.id as an application identity, not a random string.
For production systems, also consider separating topics by environment.
For example:
order-created-dev order-created-staging order-created-prod
This gives stronger isolation.
A common production setup is to separate both topic names and group IDs per environment.
Recommended Naming Pattern
Use environment-specific naming for Kafka resources.
For example:
Topic: order-created-staging Consumer group: order-service-staging
Also add configuration checks during deployment.
For example, a CI/CD step can fail the deployment if staging tries to use a production group ID.
You can also log the active Kafka group ID during application startup:
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.ApplicationRunner;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class KafkaStartupLogger {
@Bean
ApplicationRunner logKafkaGroupId(
@Value("${spring.kafka.consumer.group-id}") String groupId) {
return args -> System.out.println("Kafka consumer group ID: " + groupId);
}
}This small startup log can save debugging time.
It makes the active group ID visible when the service starts, especially in container logs.
Conclusion
Kafka was not refusing to process messages.
Kafka was following the consumer group ID it was given.
The real problem was that dev and staging had the same Kafka identity. Once both environments used separate consumer group IDs, the confusing behavior stopped.
This bug looked bigger than it was because there was no dramatic crash, no clear error, and no single stack trace pointing to the cause.
The fix was not a code rewrite. It was better environment isolation.
A small configuration value like group.id can quietly merge two environments that were supposed to stay separate.



