Comma-separated values (CSV) files are widely used for data exchange due to their simplicity and human-readable format. They are a standard method for representing structured data in a text format, making them popular for applications ranging from data analysis to configuration settings. However, while reading CSV files in Java may seem straightforward, the real challenge often lies in validating the data they contain.
Validation is crucial because CSV files can be prone to inconsistencies, formatting errors, and unexpected data types, especially when the source data comes from various origins. Ensuring data integrity is essential for maintaining the quality of your applications, as improper data can lead to incorrect processing and erroneous outcomes.
This guide will explore how to read and validate CSV files using Java. We will cover essential libraries, such as OpenCSV, that facilitate reading CSV data efficiently and discuss best practices for validating the data once it has been imported into Java objects. By mastering these techniques, you will be better equipped to handle CSV data robustly, ensuring that your applications can manage and process data confidently and reliably.
Scenario
The client has a business requirement to upload a CSV file and validate it before saving data into the database server.
A CSV (Comma-Separated Values) file is a simple, widely used file format for storing tabular data. It is a plain text file where each line represents a row of data, and the values within each row are separated by commas (or other delimiters like tabs, semicolons, etc.).
Navigating the Challenges of CSV Data Validation: Using Apache Tika and ICU4J for Encoding Detection
The hard part is not reading CSV, but the hard part is validating data in a CSV file because the source data is plain text. The developer can’t control data inside the file, e.g., format or encoding. The Apache Tika or ICU4J libraries can auto-detect the encoding to ensure the developer uses the proper encoding.

Example CSV file
The CSV example file contains five columns separated by a comma(“,”).
column 1: index,
column 2: first name,
column 3: last name,
column 4: weight,
column 5: date of birth,
"1","Terika","Burt","70","05/10/1988" "2","Matilde","Cristina","65","05/10/1988"
Read CSV Files using Java without libraries
1. The essential reading of the CSV file by line.
String csvFilePath = "C:/workspace/demo_csv.csv"; Charset charset = StandardCharsets.UTF_8; public void readCsv(){ try (BufferedReader br = new BufferedReader(new FileReader(csvFilePath, charset))) { String line; while ((line = br.readLine()) != null) { String[] fields = line.split(","); System.out.println(Arrays.asList(fields)); } } catch (IOException e) { e.printStackTrace(); } }
["1", "Terika", "Burt", "70", "05/10/1988"] ["2", "Matilde", "Cristina", "65", "05/10/1988"]
2. If the CSV has changed to an invalid number of columns.
"1","Terika","Burt","70","05/10/1988" "2","Matilde","Cristina","65","05/10/1988"
["1", "Terika", "Burt", "70", "05/10/1988"] ["2", "Matilde", "Cristina", "65""05/10/1988"]
3. Columns must have five columns, but line number 2 has four columns.
How to validate will be demonstrated in the validation section.
Read a CSV file with the OpenCSV library.
1. Create a Java class to read CSV files.
<dependency> <groupId>com.opencsv</groupId> <artifactId>opencsv</artifactId> <version>5.10</version> </dependency>
String csvFilePath = "C:/workspace/demo_csv.csv"; Charset charset = StandardCharsets.UTF_8; public void readOpenCsv(){ try (CSVReader reader = new CSVReader(new FileReader(csvFilePath, charset))) { String[] nextLine; while ((nextLine = reader.readNext()) != null) { System.out.println(Arrays.asList(nextLine)); } } catch (IOException | CsvValidationException e) { e.printStackTrace(); } }
[1, Terika, Burt, 70, 05/10/1988] [2, Matilde, Cristina, 65, 05/10/1988]
2. The result shows data in each line without a double quote(“). But if the CSV has changed, put a double quote(“) in the middle of the content in the column with the last name.
"1","Terika","Burt","70","05/10/1988" "2","Matilde","Cris"tina","65","05/10/1988"
com.opencsv.exceptions.CsvMalformedLineException: Unterminated quoted field at end of CSV line. Beginning of lost text: [Cris"tina,65,05/10/1988
3. It causes an error. Put another double quote(“) in the column to fix it.
"1","Terika","Burt","70","05/10/1988" "2","Matilde","Cris""tina","65","05/10/1988"
[1, Terika, Burt, 70, 05/10/1988] [2, Matilde, Cris"tina, 65, 05/10/1988]
4. It executes without any error.
Reading CSV Files with OpenCSV and Converting to Java Objects: A Practical Example
The reading of a CSV file using OpenCSV and converting it to a Java object.
import com.opencsv.bean.CsvBindByPosition; import lombok.Data; @Data public class PersonalDataBean { @CsvBindByPosition(position = 0) private String index; @CsvBindByPosition(position = 1) private String firstName; @CsvBindByPosition(position = 2) private String lastName; @CsvBindByPosition(position = 3) private String weight; @CsvBindByPosition(position = 4) private String dateOfBirth; }
String csvFilePath = "C:/workspace/demo_csv.csv"; Charset charset = StandardCharsets.UTF_8; public void convertCSVtoObject(){ try (CSVReader reader = new CSVReader(new FileReader(csvFilePath, charset))) { CsvToBean<PersonalDataBean> csvToBean = new CsvToBeanBuilder<PersonalDataBean>(reader) .withType(PersonalDataBean.class) .build(); List<PersonalDataBean> beans = csvToBean.parse(); beans.stream().forEach(System.out::println); } catch (IOException e) { e.printStackTrace(); } }
PersonalDataBean(index=1, firstName=Terika, lastName=Burt, weight=70, dateOfBirth=05/10/1988) PersonalDataBean(index=2, firstName=Matilde, lastName=Cristina, weight=65, dateOfBirth=05/10/1988)
Switching from Column Index to Column Name Mapping in Java
The developer can change the mapping column by index to a mapping column by column name.
firstName,lastName,weight,dateOfBirth,index "Terika","Burt","70","05/10/1988","1" "Matilde","Cristina","65","05/10/1988","2"
import com.opencsv.bean.CsvBindByName; import lombok.Data; @Data public class PersonalDataBean { @CsvBindByName(column = "index") private String index; @CsvBindByName(column = "firstName") private String firstName; @CsvBindByName(column = "lastName") private String lastName; @CsvBindByName(column = "weight") private String weight; @CsvBindByName(column = "dateOfBirth") private String dateOfBirth; }
PersonalDataBean(index=1, firstName=Terika, lastName=Burt, weight=70, dateOfBirth=05/10/1988) PersonalDataBean(index=2, firstName=Matilde, lastName=Cristina, weight=65, dateOfBirth=05/10/1988)
The benefit of mapping columns by name is that the CSV file column can swap positions without causing issues.
Handling CSV with Multi-Line Fields (Using @CsvBindAndSplitBy
for Lists)
Define a Class with a List Field
import com.opencsv.bean.CsvBindAndSplitByPosition; import com.opencsv.bean.CsvBindByPosition; import lombok.*; import java.util.List; @AllArgsConstructor @NoArgsConstructor @Data public class EmployeeMultiSkill { @CsvBindByPosition(position = 0) private int id; @CsvBindByPosition(position = 1) private String name; @CsvBindAndSplitByPosition(position = 2, elementType = String.class, splitOn = "\\|") private List<String> skills; // Multiple skills separated by "|" }
CSV Sample (demo.csv
)
1,Alice,Java|Spring Boot|PostgreSQL 2,Bob,Python|Django|AWS 3,Charlie,JavaScript|React|Node.js
Reading CSV with Multi-Value Fields
import com.opencsv.bean.CsvToBean; import com.opencsv.bean.CsvToBeanBuilder; import java.io.FileReader; import java.io.Reader; import java.util.List; public class CsvMultiValueReader { public static void main(String[] args) { String csvFile = "C:/workspace/demo.csv"; try (Reader reader = new FileReader(csvFile)) { List<EmployeeMultiSkill> employees = new CsvToBeanBuilder<EmployeeMultiSkill>(reader) .withType(EmployeeMultiSkill.class) .build() .parse(); employees.forEach(employee -> log.info("Employee data: {}", employee)); } catch (Exception e) { e.printStackTrace(); } } }
Expected Output
Employee data: EmployeeMultiSkill(id=1, name=Alice, skills=[Java, Spring Boot, PostgreSQL]) Employee data: EmployeeMultiSkill(id=2, name=Bob, skills=[Python, Django, AWS]) Employee data: EmployeeMultiSkill(id=3, name=Charlie, skills=[JavaScript, React, Node.js])
Useful when a CSV field contains a list of values.
Conclusion
The above section describes several ways to read a CSV file in Java, but this section recommends using OpenCSV to map columns to Java beans.
Validating CSV Data with Java: A Step-by-Step Example of Java Bean Validation
Common practice of validating data inside a CSV file using Java validation after converting CSV data to Java beans.
"1","Terika","Burt","70","05/10/1988" "2","Matilde","Cristina",,"05/10/1988"
package com.example.demo.bean; import com.opencsv.bean.CsvBindByName; import com.opencsv.bean.CsvBindByPosition; import jakarta.validation.constraints.NotEmpty; import lombok.Data; @Data public class PersonalDataBean { @CsvBindByPosition(position = 0) private String index; @CsvBindByPosition(position = 1) private String firstName; @CsvBindByPosition(position = 2) private String lastName; @CsvBindByPosition(position = 3) @NotEmpty private String weight; @CsvBindByPosition(position = 4) private String dateOfBirth; }
String csvFilePath = "C:/workspace/demo_csv.csv"; Charset charset = StandardCharsets.UTF_8; public void validate(){ try (CSVReader reader = new CSVReader(new FileReader(csvFilePath, charset))) { CsvToBean<PersonalDataBean> csvToBean = new CsvToBeanBuilder<PersonalDataBean>(reader) .withType(PersonalDataBean.class) .build(); List<PersonalDataBean> beans = csvToBean.parse(); beans.stream().forEach(person->{ Validator validator = Validation.buildDefaultValidatorFactory().getValidator(); System.out.println(person); System.out.println(Arrays.asList(validator.validate(person).toArray())); }); } catch (IOException e) { e.printStackTrace(); } }
PersonalDataBean(index=1, firstName=Terika, lastName=Burt, weight=70, dateOfBirth=05/10/1988) [] PersonalDataBean(index=2, firstName=Matilde, lastName=Cristina, weight=, dateOfBirth=05/10/1988) [ConstraintViolationImpl{interpolatedMessage='must not be empty', propertyPath=weight, rootBeanClass=class com.example.demo.bean.PersonalDataBean, messageTemplate='{jakarta.validation.constraints.NotEmpty.message}'}]
After executing the method “validate,” the results show that line 1 passes validation, but line 2 does not, because “PersonalDataBean” declares an annotation @NotEmpty for weight. The developer can use various validations from “jakarta.validation.constraints.”
The developer can customize the error message from the validator, send all of the data from the CSV file response to the front-end, and display all to the client, which line is not passed, and the client can fix a specific line that does not pass.
Customizing Error Messages in Java: A Simple Example
The basic customization of the error message.
package com.example.demo.bean; import com.opencsv.bean.CsvBindByName; import com.opencsv.bean.CsvBindByPosition; import jakarta.validation.constraints.NotEmpty; import lombok.Data; @Data public class PersonalDataBean { @CsvBindByPosition(position = 0) private String index; @CsvBindByPosition(position = 1) private String firstName; @CsvBindByPosition(position = 2) private String lastName; @CsvBindByPosition(position = 3) @NotEmpty(message = "height can not be empty") private String weight; @CsvBindByPosition(position = 4) private String dateOfBirth; }
PersonalDataBean(index=1, firstName=Terika, lastName=Burt, weight=70, dateOfBirth=05/10/1988) [] PersonalDataBean(index=2, firstName=Matilde, lastName=Cristina, weight=, dateOfBirth=05/10/1988) [ConstraintViolationImpl{interpolatedMessage='height can not be empty', propertyPath=weight, rootBeanClass=class com.example.demo.bean.PersonalDataBean, messageTemplate='height can not be empty'}]
The output shows that the “message template” in “ConstraintViolationImpl” has changed from ‘{jakarta.validation.constraints.NotEmpty.message}’ to “height can not be empty,” making an error message natural for a client.
An error message is vital because when clients read it, they must immediately understand what it means to make the process fluent and satisfy a client with an application. It reduces the help desk load to answer clients’ questions about what this error message means.
Alternative Libraries
- Apache Commons CSV — Another flexible CSV library for Java.
- Super CSV — Feature-rich but with a steeper learning curve.
- Jackson CSV — Best if you’re already using Jackson for JSON processing.
Finally
CSV files are plain text, so validation is significant in preventing false data from being saved in the database, which causes many problems. The developer must pay attention and take a lot of time considering all possible ways to validate data from the CSV source file.