Why Spring Boot Kafka Consumers Generate Duplicate Reports

A practical Spring Kafka guide for duplicate report jobs, acknowledgments, retries, and idempotency

Kafka can deliver the same message more than once.

That is normal.

The real problem starts when a Spring Boot report worker treats every delivery as a brand-new job.

This article walks through how I would debug a Spring Boot application using spring-kafka when it sometimes generates the same report twice.

Step 1: Start With Kafka’s Delivery Guarantee

Kafka consumers usually give you at-least-once delivery.

That means a message can be processed more than once if the consumer crashes, times out, loses its partition, or fails to commit the offset after doing the work.

For report generation, duplicate delivery can become duplicate output:

report_123.pdf generated
report_123.pdf generated again
email sent twice
billing export created twice

Kafka is not broken when this happens. The Spring Kafka consumer needs to be designed for it.

Step 2: Understand Spring Kafka Offset Commits

Offset commits are one of the easiest places to create accidental duplicate work.

Kafka does not commit “the message.” A consumer group commits the next offset it should read for each partition.

For example, if the listener successfully processes this record:

topic=report-jobs partition=2 offset=10491

The committed offset for that partition should become:

topic=report-jobs partition=2 committed_offset=10492

That means “start from 10492 next time.”

In Spring Kafka, you normally do not call consumer.commitSync() directly inside an @KafkaListener. You configure the listener container and acknowledge the record after the business operation is safe.

A safe manual-ack flow usually looks like this:

Poll record at offset 10491
Spring Kafka invokes @KafkaListener
Write durable business state
Generate or confirm report output
Call acknowledgment.acknowledge()
Container commits offset 10492 according to the ack mode

A risky flow looks like this:

Listener receives offset 10491
Generate report
Application crashes
Acknowledgment is never called
Offset 10492 is never committed
Application restarts from offset 10491
Report is attempted again

For slow report jobs, I usually prefer manual acknowledgment:

spring:
  kafka:
    consumer:
      group-id: report-generator
      enable-auto-commit: false
      auto-offset-reset: earliest
      max-poll-records: 5
      properties:
        max.poll.interval.ms: 900000
        session.timeout.ms: 30000
        heartbeat.interval.ms: 10000
    listener:
      ack-mode: manual_immediate

Then the listener acknowledges only after the durable work is safe:

@KafkaListener(
    topics = "report-jobs",
    groupId = "report-generator"
)
public void handleReportJob(
    ReportRequest request,
    Acknowledgment acknowledgment,
    @Header(KafkaHeaders.RECEIVED_TOPIC) String topic,
    @Header(KafkaHeaders.RECEIVED_PARTITION) int partition,
    @Header(KafkaHeaders.OFFSET) long offset
) {
    reportService.processReport(request, topic, partition, offset);

    acknowledgment.acknowledge();
}

The important part is not the method call itself. The significant part is where it happens.

Call acknowledge() after the report is completed or after the report job is safely stored in a durable system that can finish or retry it.

Do not acknowledge before the business state is safe.

Step 3: Check Long-Running Processing Time

Long-running jobs are risky because Kafka consumers are expected to keep polling.

A report duration metric may look like this:

report_generation_duration_seconds{report_type="monthly_sales"} 420

This report took 420 seconds, or 7 minutes.

Compare that number with:

  • max.poll.interval.ms
  • max.poll.records
  • worst-case processing time per record
  • Spring Kafka listener concurrency
  • rebalance frequency
  • offset commit failures
  • application shutdown behavior

A common mistake is thinking session.timeout.ms is the only timeout that matters.

In modern Kafka clients, heartbeats can continue in the background. But the application still needs to call poll() often enough to stay within max.poll.interval.ms. If processing takes longer than that interval, the consumer can be removed from the group and its partitions can be assigned elsewhere.

That can produce this sequence:

Pod A receives offset 10491
Pod A spends too long generating the report
Pod A exceeds max.poll.interval.ms
Kafka triggers a rebalance
Pod B receives partition 2
Pod B reads offset 10491 again

For long jobs in Spring Boot, consider one of these designs:

  • Keep max.poll.records small.
  • Increase max.poll.interval.ms with care.
  • Use listener concurrency intentionally, not accidentally.
  • Pause partitions while work is in progress if your design supports it.
  • Let the Kafka listener create a durable database job quickly, then acknowledge after the job is safely stored.
  • Move heavy report generation to a separate worker pool that reads from the database job table.

The last design is frequently better for production. Kafka remains the ingestion layer, while the database and worker pool handle long-running execution.

Step 4: Use a Spring Kafka Container Factory Deliberately

If you want explicit control, define the listener container factory instead of relying on scattered defaults.

Example:

@Bean
ConcurrentKafkaListenerContainerFactory<String, ReportRequest> reportKafkaListenerContainerFactory(
    ConsumerFactory<String, ReportRequest> consumerFactory,
    DefaultErrorHandler errorHandler
) {
    ConcurrentKafkaListenerContainerFactory<String, ReportRequest> factory =
        new ConcurrentKafkaListenerContainerFactory<>();

    factory.setConsumerFactory(consumerFactory);
    factory.setCommonErrorHandler(errorHandler);
    factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL_IMMEDIATE);

    return factory;
}

Then attach the listener to that factory:

@KafkaListener(
    topics = "report-jobs",
    groupId = "report-generator",
    containerFactory = "reportKafkaListenerContainerFactory"
)
public void handleReportJob(ReportRequest request, Acknowledgment ack) {
    reportService.processReport(request);
    ack.acknowledge();
}

Spring Kafka supports multiple acknowledgment modes. For report jobs, the main ones to understand are:

  • RECORD: commit after the listener returns for each record.
  • BATCH: commit after all records from the previous poll are processed.
  • MANUAL: listener acknowledges, but commits are queued by the container.
  • MANUAL_IMMEDIATE: listener acknowledges, and the container can commit immediately when called on the consumer thread.

For long-running report jobs, avoid hiding commit behavior. Make the acknowledgment strategy obvious in code and configuration.

Step 5: Look for Rebalances

Rebalances are normal, but frequent rebalances are a warning sign.

A rebalance can happen when:

  • a consumer crashes
  • a consumer stops polling for too long
  • a new pod joins the group
  • a pod leaves the group during deployment
  • the broker thinks a consumer is unhealthy

If rebalances happen while reports are still running, another Spring Boot instance may receive the same partition and read from the last committed offset.

Look for logs like:

Revoking previously assigned partitions
Lost partition report-jobs-2
Assigned partitions report-jobs-2
Member removed from group

If these logs line up with duplicate report generation, your consumer lifecycle is part of the bug.

Step 6: Check Spring Boot Shutdown Behavior

A bad shutdown path can create duplicates.

For example:

Spring Kafka listener receives report job
Listener starts generating report
Deployment begins
Pod receives SIGTERM
Report finishes
Application exits before acknowledgment/commit completes
New pod starts
Same offset is consumed again

A safer shutdown path should stop polling new messages, finish or safely checkpoint current work, acknowledge only after safe completion, and then let the listener container stop cleanly.

For Kubernetes workers, check:

  • termination grace period
  • preStop hooks
  • Spring Boot graceful shutdown settings
  • maximum report duration
  • whether the pod is killed before the offset commit completes

Example Spring Boot shutdown settings:

server:
  shutdown: graceful

spring:
  lifecycle:
    timeout-per-shutdown-phase: 60s

These settings do not magically make long reports safe. They only give the application more time to stop cleanly. Your idempotency design still matters.

Step 7: Test Duplicate Delivery Directly

Do not only test the happy path.

Force duplicate delivery and check the business result.

One useful test is:

1. Produce report_id=report_123
2. Let the Spring Kafka listener generate the report
3. Kill the application before acknowledgment
4. Restart the application
5. Confirm the same Kafka message is read again
6. Confirm the report is not generated twice

Expected logs should look more like this:

INFO report job completed report_id=report_123 offset=10491
WARN duplicate report job skipped report_id=report_123 existing_status=completed offset=10491
INFO acknowledged report job topic=report-jobs partition=2 offset=10491 committed_offset=10492

The important assertion is not just “the offset appeared twice.” The important assertion is “the duplicate message did not duplicate the report.”

Step 8: Make Report Generation Idempotent

This is the most important fix.

Even with perfect Spring Kafka settings, a consumer should assume the same message can be delivered more than once.

The goal is not to make duplicate delivery impossible. The goal is to make duplicate delivery harmless.

For report generation, use a durable idempotency guard in the database.

Example table:

CREATE TABLE report_jobs (
  report_id VARCHAR(100) PRIMARY KEY,
  user_id VARCHAR(100) NOT NULL,
  report_type VARCHAR(100) NOT NULL,
  status VARCHAR(30) NOT NULL,
  attempt_count INTEGER NOT NULL DEFAULT 0,
  worker_id VARCHAR(100),
  lease_until TIMESTAMP,
  started_at TIMESTAMP,
  completed_at TIMESTAMP,
  output_url TEXT,
  last_error TEXT,
  created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

For PostgreSQL, the first guard can be an insert like this:

INSERT INTO report_jobs (
  report_id,
  user_id,
  report_type,
  status,
  attempt_count,
  started_at
)
VALUES (
  :report_id,
  :user_id,
  :report_type,
  'processing',
  1,
  CURRENT_TIMESTAMP
)
ON CONFLICT (report_id) DO NOTHING;

Then fetch the row and decide what to do.

Example Spring service flow:

@Transactional
public ReportJobDecision reserveReportJob(ReportRequest request, String workerId) {
    reportJobRepository.insertIfAbsent(request.reportId(), request.userId(), request.reportType());

    ReportJob job = reportJobRepository.findByIdForUpdate(request.reportId())
        .orElseThrow();

    if (job.isCompleted()) {
        return ReportJobDecision.alreadyCompleted(job);
    }

    if (job.isProcessing() && job.hasLiveLease(clock.instant())) {
        return ReportJobDecision.activeWorkerExists(job);
    }

    if (job.isRetryable(clock.instant())) {
        reportJobRepository.acquireLease(
            job.reportId(),
            workerId,
            clock.instant().plus(Duration.ofMinutes(15))
        );
        return ReportJobDecision.acquired(job);
    }

    return ReportJobDecision.notRetryable(job);
}

The listener can then decide whether to acknowledge:

@KafkaListener(
    topics = "report-jobs",
    groupId = "report-generator",
    containerFactory = "reportKafkaListenerContainerFactory"
)
public void handleReportJob(
    ReportRequest request,
    Acknowledgment ack,
    @Header(KafkaHeaders.OFFSET) long offset
) {
    ReportJobDecision decision = reportService.reserveReportJob(request, workerId);

    if (decision.isAlreadyCompleted()) {
        log.warn("duplicate report job skipped report_id={} offset={}",
            request.reportId(), offset);
        ack.acknowledge();
        return;
    }

    if (decision.hasActiveWorker()) {
        log.info("report job already leased report_id={} offset={}",
            request.reportId(), offset);
        return;
    }

    reportService.generateReport(request);
    reportService.markCompleted(request.reportId());

    ack.acknowledge();
}

Be careful with processing jobs.

If another worker is actively processing the report, starting a second copy creates duplicate work. But blindly acknowledging the Kafka record can also be risky if no durable retry path exists.

A safer pattern is to use a database lease:

worker_id=worker-7
lease_until=2026-05-05T10:30:00Z

If the worker crashes, the lease eventually expires and another worker can retry the job.

The database should be the source of truth for whether a report is new, processing, completed, failed, or retriable.

This protects the business operation when Kafka delivers the same record again.

Step 9: Configure Spring Kafka Retries and DLQ Clearly

Retries should be intentional.

A vague retry strategy creates duplicate work, blocked partitions, and confusing logs.

Common options:

  • Retry briefly in the listener container for transient errors.
  • Store failed state in the database.
  • Publish retriable failures to a retry topic.
  • Publish poison messages to a dead-letter topic.

Be careful with listener retries. Kafka preserves ordering within a partition. If one message keeps failing and the listener keeps retrying it for several minutes, later messages in that same partition may be blocked.

Spring Kafka commonly uses DefaultErrorHandler with a BackOff. A DLQ can be handled with DeadLetterPublishingRecoverer.

Example:

@Bean
DefaultErrorHandler reportErrorHandler(KafkaTemplate<Object, Object> kafkaTemplate) {
    DeadLetterPublishingRecoverer recoverer = new DeadLetterPublishingRecoverer(
        kafkaTemplate,
        (record, exception) -> new TopicPartition(
            record.topic() + ".DLT",
            record.partition()
        )
    );

    FixedBackOff backOff = new FixedBackOff(1_000L, 3L);

    return new DefaultErrorHandler(recoverer, backOff);
}

A retry and DLQ flow may look like this:

report-jobs -> Spring Kafka listener -> retry attempts -> report-jobs.DLT

A dead-letter event or DLQ record should include enough context to debug the failure. You can include this as payload fields or as headers, depending on your topic contract:

{
  "original_topic": "report-jobs",
  "original_partition": 2,
  "original_offset": 10491,
  "consumer_group": "report-generator",
  "report_id": "report_123",
  "attempt": 4,
  "error_type": "InvalidReportType",
  "error_message": "Unknown report type: monthly_sales_v9",
  "failed_at": "2026-05-05T10:15:00Z"
}

Do not put sensitive report contents, access tokens, signed URLs, or private user data in retry or DLQ messages unless the topic is protected and retention is intentional.

A poison message is a message that will not succeed without human or code intervention. Examples include:

  • deleted user
  • invalid report type
  • missing required file
  • unsupported schema version
  • unauthorized tenant access

Retries should answer three questions:

Should this error be retried?
How many times?
Where can an operator see the final failure?

Step 10: Add Alerts That Catch the Real Problem

Offset lag alone is not enough.

For duplicate report generation, I would alert on:

rebalance_count > normal_baseline * 3 over 10 minutes
max_poll_interval_exceeded_count > 0
offset_commit_failure_count > 0
duplicate_report_attempt_count > 0
report_job_lease_expired_count > 0
dlt_record_count > 0

Also log the Kafka coordinates with every report attempt:

report_id=report_123 topic=report-jobs partition=2 offset=10491 attempt=2 status=duplicate_skipped

Spring Kafka makes these coordinates available through ConsumerRecord or listener headers such as KafkaHeaders.RECEIVED_TOPIC, KafkaHeaders.RECEIVED_PARTITION, and KafkaHeaders.OFFSET.

Without topic, partition, offset, and report ID in the same log line, duplicate debugging becomes guesswork.

Security Note

Report jobs often contain sensitive identifiers.

Be careful with:

  • user IDs
  • account IDs
  • tenant IDs
  • date ranges
  • report filters
  • export URLs
  • signed URLs
  • PII in report parameters
  • access tokens or authorization context

Keep sensitive payloads out of retry topics, DLT topics, screenshots, and logs unless those systems have the right access controls and retention policy.

Official References

For exact behavior, check the documentation for Kafka and Spring Kafka:

Spring Kafka versions differ. Acknowledgment modes, error handlers, retry configuration, and DLT behavior can change across major versions, so check the reference docs for the version used by your Spring Boot release.

Conclusion

The fix is not “make Kafka never send duplicates.”

Kafka consumers should assume duplicate delivery can happen.

For this Spring Boot report worker, the safer design is:

Receive Kafka message in @KafkaListener
Create or find durable report job
Use database state and leases to prevent duplicate work
Generate the report once
Mark the job completed
Call acknowledgment.acknowledge()
Let Spring Kafka commit the next offset

The main lesson is that Spring Kafka acknowledgments protect Kafka progress, but idempotency protects the business operation.

For report generation, the database should decide whether the report is new, already completed, actively leased, failed, or retriable. Once that state is durable, duplicate Kafka delivery becomes something your Spring Boot application can handle instead of something that surprises you in production.