Kafka Lag vs DB Writes — Real Performance Test

Executive Summary

A controlled benchmark was conducted to determine whether Kafka consumer lag was caused by Kafka or downstream database performance.

Using:

  • Apache Kafka
  • Spring Boot
  • PostgreSQL

We produced 50,000 messages per test and measured lag drain time under increasing payload size and database indexing conditions.

Results demonstrate that database write latency directly impacts Kafka lag and system throughput.


Architecture

Producer → Kafka (1 partition) → Consumer → PostgreSQL

Consumer is synchronous:

Poll → Insert → Commit offset

Lag measurement via:

kafka-consumer-groups --describe

Test Variables

Controlled:

  • Same Kafka cluster
  • Same hardware
  • Same consumer group
  • Same number of messages (50,000)

Changed:

  • Payload size
  • Database indexing

Results

1️⃣ 1 KB Payload

Drain time: 125 seconds
Throughput: ~400 msg/sec

System stable.


2️⃣ 20 KB Payload

Drain time: 152 seconds
Throughput: ~329 msg/sec

Throughput reduced by 18%.


3️⃣ 100 KB Payload

Drain time: 288 seconds
Throughput: ~173 msg/sec

Throughput reduced by 57% compared to baseline.


4️⃣ 100 KB + Additional Indexes

Drain time: 336 seconds
Throughput: ~149 msg/sec

Additional indexing further reduced throughput by ~14%.


Throughput Comparison

PayloadIndexingDrain TimeThroughput
1 KBMinimal125s400 msg/sec
20 KBMinimal152s329 msg/sec
100 KBMinimal288s173 msg/sec
100 KBExtra Index336s149 msg/sec

Technical Analysis

Effective throughput:

Throughput = Total Messages / Drain Time

Database write amplification increases due to:

  • Larger payload size
  • WAL growth
  • Index maintenance
  • Disk I/O
  • Memory allocation

Kafka broker performance remained stable throughout testing.

The bottleneck was the downstream write cost.


Key Engineering Insight

Kafka lag is a symptom, not a root cause.

Lag reflects:

Producer Rate > Consumer Processing Capacity

Consumer processing capacity in synchronous designs is tightly coupled to database latency.


Production Recommendations

For high-ingest systems:

  • Minimize secondary indexes on write-heavy tables
  • Consider batching inserts
  • Monitor P95/P99 DB write latency
  • Measure drain time, not just lag
  • Consider asynchronous offset commit strategies

Conclusion

This benchmark confirms:

Kafka consumer lag is frequently caused by downstream database bottlenecks.

As payload size and write amplification increase, consumer throughput decreases proportionally.

Kafka is often blamed first.

But in this case — and many real systems —

The database was the limiting factor.


🔗 Source Code & Benchmark Project

This benchmark is fully reproducible.

The complete project setup — including:

  • Producer & Consumer (Spring Boot)
  • Docker Compose for Kafka
  • Database scripts for Scenario A & B
  • Real benchmark logs
  • Payload test variations (1KB → 100KB)

is available on GitHub:

👉 https://github.com/nithidol/kafka-pe

Clone it. Run it. Reproduce the results.