A practical backend lesson on data readiness, reconciliation, and why a report generator is not enough when the numbers keep changing.
“The report is wrong again. We need to regenerate it.”
That sentence sounds small, but it usually means something bigger is happening.
An automated reporting system collects data, calculates totals, generates an output, and usually sends or stores the result without manual work.
For example, a Spring Boot service may generate a monthly finance report at 2 AM on the first day of every month.
That sounds simple.
But in real projects, the hard part is not always generating the file.
The hard part is knowing whether the data is complete and correct before the file is generated.
In this scenario, the monthly report was automated, but the team still had to fix data and regenerate the report again and again.
The real issue was not only a report bug.
It was a data trust problem.
The Real Problem
The system was supposed to generate a monthly transaction report automatically.
At the end of the month, the backend job should read transactions from the database, calculate totals, save a monthly summary, and generate the report.
But the data kept changing after the report was generated.
Some transactions arrived late.
Some batches were incomplete.
Some manual fixes happened after the monthly summary was already created.
So the report was technically automated, but nobody fully trusted the result.
The team still had to manually compare database numbers, fix data, regenerate the report, and explain why the first version was wrong.
That is not trusted automation.
That is a scheduled generate button with manual cleanup around it.
Why The Report Generator Was Not Enough
When a report is wrong, developers usually check the final query first.
Maybe the SQL is wrong.
Potentially the summary calculation is wrong.
Maybe the export format is broken.
Maybe the scheduled job ran at the wrong time.
Those are valid checks.
But if the same mismatch keeps coming back, the developer should look upstream.
The source data lifecycle may be unstable.
A report can only be as reliable as the data it reads.
If transactions are incomplete, batches are still running, or late data keeps arriving, the report generator may produce the correct result for the wrong moment in time.
Common Causes Of Wrong Reports
Wrong reports often come from timing and lifecycle issues.
For example:
- Late-arriving transactions
- Incomplete import batches
- Summary generated before all transactions are inserted
- Manual data fixes after report generation
- Missing cutoff rules
- No status showing whether data is ready
- No reconciliation between raw transactions and summary tables
- No audit trail showing which data version was used
In this scenario, the report was generated automatically, but the system did not know whether the data was ready.
That is the key problem.
Automation should not start with:
“Can we schedule it?”
It should start with:
“How do we know the result is correct?”
Example Backend Setup
Assume the backend uses:
- Java 21
- Spring Boot
- PostgreSQL
- Spring Scheduler
- A monthly transaction report
- Raw transactions table
- monthly_summary table
- report_run audit table
- report_verification_result table
The same idea also applies to MySQL, Oracle, SQL Server, Quartz, cron jobs, and batch workers.
The database does not matter as much as the design rule:
Verify data readiness before generating the final report.
Example Tables
Here is a simple transaction table:
CREATE TABLE transactions (
id BIGSERIAL PRIMARY KEY,
transaction_date DATE NOT NULL,
amount NUMERIC(18, 2) NOT NULL,
status VARCHAR(30) NOT NULL,
batch_id VARCHAR(100),
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);This table stores raw transaction data.
In a real finance or sales system, there may be more fields such as customer ID, merchant ID, currency, payment method, settlement status, or source system.
Now add a summary table:
CREATE TABLE monthly_summary (
id BIGSERIAL PRIMARY KEY,
report_month VARCHAR(7) NOT NULL,
transaction_count BIGINT NOT NULL,
total_amount NUMERIC(18, 2) NOT NULL,
summary_version INT NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);This table stores calculated monthly totals.
For example, report_month can contain 2026–04. The summary should match the raw transaction records for that month.
Step 1: Define What “Data Ready” Means
Before generating the report, the team needs a clear rule for data readiness.
For example, April 2026 data is ready only when:
- All April transaction batches are completed.
- No pending imports remain.
- No failed transaction records exist.
- The cutoff time has passed.
- The transaction total matches the monthly summary total.
- No late-arriving data is waiting for review.
The business team should agree with these rules.
This matters because “ready” is not only a technical word.
In reporting systems, “ready” is a business decision expressed as backend rules.
Step 2: Add Report Statuses
A report requires a lifecycle.
A simple enum can help:
public enum ReportStatus {
DRAFT,
VERIFYING,
VERIFIED,
FINAL,
FAILED
}DRAFT means the report exists but should not be trusted yet.
VERIFYING means the system is checking readiness and reconciliation.
VERIFIED means the numbers passed validation.
FINAL means the report is released and should not be changed without a controlled process.
FAILED means generation or verification failed.
The workflow can stay simple at first.
The important part is that a report should not jump directly from “generated” to “final” without verification.
Step 3: Add An Audit Table For Report Runs
Every generated report should leave a trace.
CREATE TABLE report_run (
id BIGSERIAL PRIMARY KEY,
report_month VARCHAR(7) NOT NULL,
status VARCHAR(30) NOT NULL,
source_start_date DATE NOT NULL,
source_end_date DATE NOT NULL,
summary_version INT,
generated_by VARCHAR(100),
generated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
finalized_at TIMESTAMP
);This table answers important questions.
Which month was generated?
Which date range was used?
Which summary version was used?
Who or what generated it?
When was it finalized?
Without this audit trail, debugging becomes guesswork.
Step 4: Store Verification Results
The developer should also store reconciliation results.
CREATE TABLE report_verification_result (
id BIGSERIAL PRIMARY KEY,
report_run_id BIGINT NOT NULL REFERENCES report_run(id),
check_name VARCHAR(100) NOT NULL,
expected_value NUMERIC(18, 2),
actual_value NUMERIC(18, 2),
passed BOOLEAN NOT NULL,
message TEXT,
checked_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);This table gives the team a history of why a report passed or failed.
For example, it can store a failed check saying:
Transaction total does not match monthly summary total.
That is much better than asking someone to remember what happened last month.
Step 5: Add Reconciliation Before Report Generation
Reconciliation means comparing source data with summary data.
For example, the developer can calculate raw transaction totals:
SELECT COUNT(*) AS transaction_count,
COALESCE(SUM(amount), 0) AS transaction_total
FROM transactions
WHERE transaction_date >= '2026-04-01'
AND transaction_date < '2026-05-01'
AND status = 'COMPLETED';This query reads the raw transaction data for April 2026.
It counts completed transactions and sums the transaction amount.
Then compare it with the monthly summary:
SELECT transaction_count,
total_amount
FROM monthly_summary
WHERE report_month = '2026-04'
ORDER BY summary_version DESC
LIMIT 1;This query reads the latest summary for the same month.
If the count or total does not match, the report should not be finalized.
Step 6: Add The Scheduled Job
A basic Spring Boot scheduled report job may look like this:
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;
@Component
public class MonthlyReportScheduler {
private final ReportService reportService;
public MonthlyReportScheduler(ReportService reportService) {
this.reportService = reportService;
}
@Scheduled(cron = "0 0 2 1 * *")
public void generateMonthlyReport() {
reportService.generateMonthlyReport();
}
}This job runs at 2 AM on the first day of every month.
But the scheduler should not blindly generate the final report.
It should start the reporting workflow.
The important logic belongs inside ReportService.
Step 7: Build A Safer Service Flow
Here is a simplified service flow:
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
@Service
public class ReportService {
private final DataReadinessService dataReadinessService;
private final ReconciliationService reconciliationService;
private final ReportGenerator reportGenerator;
private final ReportAuditService reportAuditService;
public ReportService(
DataReadinessService dataReadinessService,
ReconciliationService reconciliationService,
ReportGenerator reportGenerator,
ReportAuditService reportAuditService) {
this.dataReadinessService = dataReadinessService;
this.reconciliationService = reconciliationService;
this.reportGenerator = reportGenerator;
this.reportAuditService = reportAuditService;
}
@Transactional
public void generateMonthlyReport() {
String reportMonth = "2026-04";
Long reportRunId = reportAuditService.createRun(reportMonth, ReportStatus.VERIFYING);
if (!dataReadinessService.isReady(reportMonth)) {
reportAuditService.markFailed(reportRunId, "Data is not ready");
return;
}
ReconciliationResult result = reconciliationService.reconcile(reportMonth);
reportAuditService.saveVerificationResult(reportRunId, result);
if (!result.passed()) {
reportAuditService.markFailed(reportRunId, "Reconciliation failed");
return;
}
reportGenerator.generate(reportMonth);
reportAuditService.markVerified(reportRunId);
}
}This code shows the main idea.
The service checks readiness first, runs reconciliation, generates the report, and updates the audit status.
The developer can later add approval logic before moving from VERIFIED to FINAL.
Step 8: Check For Late-Arriving Data
Late-arriving data must be part of the design.
For example, if April data should be closed after May 1 at 2 AM, the developer can search for records inserted after the cutoff:
SELECT COUNT(*) AS late_transaction_count FROM transactions WHERE transaction_date >= '2026-04-01' AND transaction_date < '2026-05-01' AND created_at >= '2026-05-01 02:00:00';
This query finds April transactions that arrived after the expected cutoff.
If late data appears after the report is finalized, the system should do something visible.
For example:
- Mark the report as stale.
- Create a new report version.
- Block final release until reconciliation passes again.
- Log late data for review.
- Send an alert to the team.
Ignoring late data is risky because the report may look final while the database has already changed.
Step 9: Add Monitoring And Alerts
A reporting system should not depend only on people noticing mismatches.
The developer can add alerts for important conditions.
Example alert condition:
SELECT r.report_month,
v.check_name,
v.expected_value,
v.actual_value,
v.message
FROM report_verification_result v
JOIN report_run r ON r.id = v.report_run_id
WHERE v.passed = false
AND v.checked_at >= CURRENT_DATE;This query finds failed verification checks from today.
The system can send the result to Slack, email, PagerDuty, or another monitoring tool.
Useful alerts include:
- Transaction totals and summary totals differ.
- A required batch is incomplete.
- A report is regenerated too many times.
- Late data appears after finalization.
- A monthly report is still not verified after the expected time.
Good alerts should be actionable.
An alert should tell the developer what failed and which month or report run is affected.
Testing The Flow
The developer can test the reporting flow with a small dataset.
Insert sample transactions:
INSERT INTO transactions (transaction_date, amount, status, batch_id)
VALUES
('2026-04-01', 100.00, 'COMPLETED', 'APRIL-BATCH-1'),
('2026-04-02', 200.00, 'COMPLETED', 'APRIL-BATCH-1'),
('2026-04-03', 50.00, 'COMPLETED', 'APRIL-BATCH-1');Insert a matching summary:
INSERT INTO monthly_summary (
report_month,
transaction_count,
total_amount,
summary_version
)
VALUES ('2026-04', 3, 350.00, 1);Now run the report job.
The reconciliation should pass because the raw transactions and summary both show three transactions and a total amount of 350.00.
Then test a mismatch:
UPDATE monthly_summary SET total_amount = 300.00 WHERE report_month = '2026-04' AND summary_version = 1;
Run the report job again.
This time, reconciliation should fail. The system should mark the report run as FAILED and store the verification result.
That is the behavior we want.
A wrong report should be blocked before it becomes final.
Expected Result
After adding readiness checks and reconciliation:
- Reports are generated only when data is ready.
- The team can see whether a report is draft, verified, or final.
- Summary numbers are compared with transaction data automatically.
- Late-arriving data becomes visible.
- Regeneration becomes controlled instead of random.
- Audit history shows what happened for each report run.
- The team trusts the report output more.
This does not mean reports will never fail.
It means failures become visible, explainable, and controlled.
That is a big improvement.
Temporary Manual Verification Is Not Failure
Manual checking can be the right step when the system is unstable.
If the team does not yet understand why data changes after generation, manual verification can help reveal the pattern.
But each manual check should teach the team something.
Every mismatch should become a future rule, validation, query, status, or alert.
For example:
- If failed imports cause wrong totals, add an import completion check.
- If late transactions change monthly numbers, add late data detection.
- If summary tables drift from raw data, add reconciliation.
- If report regeneration is common, add report versioning.
Manual verification should be a temporary checkpoint, not the permanent system design.
Practical Notes
Do not automate a report just because the job can run on a schedule.
Do not treat manual regeneration as a long-term solution.
Do not rely only on the final report query if the source data is unstable.
Late-arriving data must be part of the design.
Always keep an audit trail for generated reports.
Reconciliation rules should be agreed with the business team.
A wrong report can be worse than a delayed report, especially in finance, sales, billing, settlement, and compliance systems.
The developer should also be careful with performance.
Large monthly transaction tables may need indexes:
CREATE INDEX idx_transactions_month_status ON transactions (transaction_date, status);
This helps monthly reconciliation queries filter data faster.
For very large systems, the developer may need partitioned tables, incremental summaries, or background reconciliation jobs instead of scanning the full transaction table every time.
Security And Production Notes
Reports often contain sensitive business or customer data.
The developer should control who can generate, verify, download, and finalize reports.
Production systems should store:
- Who generated the report
- Who verified it
- Who finalized it
- When it was generated
- Which data range was used
- Which version was released
Avoid editing finalized reports directly.
If late data appears, create a new version or correction flow.
This protects the team from silent changes and unclear history.
Recommended Build Order
Build reporting automation in this order:
- Define data readiness.
- Add report statuses.
- Add reconciliation.
- Store audit history.
- Detect late-arriving data.
- Add alerts.
- Automate final generation only when the checks are reliable.
The goal is not only to generate reports automatically.
The goal is to generate reports the team can trust.
Conclusion
Good automation is not just a scheduler.
In this reporting flow, the useful work was not only generating the monthly report. It was adding the backend checks around it: data readiness, report statuses, reconciliation, audit history, late-data detection, and alerts.
That changed the system from “a job that runs” into “a workflow that can explain whether the numbers are safe to use.”
Sometimes manual verification is not a step backward.
It can be a temporary checkpoint that helps the team understand the data lifecycle and build stronger automation later.
The real goal is not a system that runs by itself.
The real goal is a system that runs by itself without making everyone wonder whether the numbers are correct.



