Executive Summary
A controlled benchmark was conducted to determine whether Kafka consumer lag was caused by Kafka or downstream database performance.
Using:
- Apache Kafka
- Spring Boot
- PostgreSQL
We produced 50,000 messages per test and measured lag drain time under increasing payload size and database indexing conditions.
Results demonstrate that database write latency directly impacts Kafka lag and system throughput.
Architecture
Producer → Kafka (1 partition) → Consumer → PostgreSQL
Consumer is synchronous:
Poll → Insert → Commit offset
Lag measurement via:
kafka-consumer-groups --describe
Test Variables
Controlled:
- Same Kafka cluster
- Same hardware
- Same consumer group
- Same number of messages (50,000)
Changed:
- Payload size
- Database indexing
Results
1️⃣ 1 KB Payload
Drain time: 125 seconds
Throughput: ~400 msg/sec
System stable.
2️⃣ 20 KB Payload
Drain time: 152 seconds
Throughput: ~329 msg/sec
Throughput reduced by 18%.
3️⃣ 100 KB Payload
Drain time: 288 seconds
Throughput: ~173 msg/sec
Throughput reduced by 57% compared to baseline.
4️⃣ 100 KB + Additional Indexes
Drain time: 336 seconds
Throughput: ~149 msg/sec
Additional indexing further reduced throughput by ~14%.
Throughput Comparison
| Payload | Indexing | Drain Time | Throughput |
|---|---|---|---|
| 1 KB | Minimal | 125s | 400 msg/sec |
| 20 KB | Minimal | 152s | 329 msg/sec |
| 100 KB | Minimal | 288s | 173 msg/sec |
| 100 KB | Extra Index | 336s | 149 msg/sec |
Technical Analysis
Effective throughput:
Throughput = Total Messages / Drain Time
Database write amplification increases due to:
- Larger payload size
- WAL growth
- Index maintenance
- Disk I/O
- Memory allocation
Kafka broker performance remained stable throughout testing.
The bottleneck was the downstream write cost.
Key Engineering Insight
Kafka lag is a symptom, not a root cause.
Lag reflects:
Producer Rate > Consumer Processing Capacity
Consumer processing capacity in synchronous designs is tightly coupled to database latency.
Production Recommendations
For high-ingest systems:
- Minimize secondary indexes on write-heavy tables
- Consider batching inserts
- Monitor P95/P99 DB write latency
- Measure drain time, not just lag
- Consider asynchronous offset commit strategies
Conclusion
This benchmark confirms:
Kafka consumer lag is frequently caused by downstream database bottlenecks.
As payload size and write amplification increase, consumer throughput decreases proportionally.
Kafka is often blamed first.
But in this case — and many real systems —
The database was the limiting factor.
🔗 Source Code & Benchmark Project
This benchmark is fully reproducible.
The complete project setup — including:
- Producer & Consumer (Spring Boot)
- Docker Compose for Kafka
- Database scripts for Scenario A & B
- Real benchmark logs
- Payload test variations (1KB → 100KB)
is available on GitHub:
👉 https://github.com/nithidol/kafka-pe
Clone it. Run it. Reproduce the results.



