What is UniVocity-parsers?
uniVocity-parsers is a high-performance library. It helps Java developers parse and write data. It supports CSV, TSV, fixed-width, and more. The library is open-source and developer-friendly.
Why Choose uniVocity-parsers?
Many libraries can parse data. But UniVocity-parsers stands out for a few reasons:
- It’s fast.
- It’s flexible.
- It supports annotations.
- It handles edge cases well.
- It has strong documentation.
Version 4.11.0 adds performance improvements and bug fixes. It’s more stable than previous versions.
Setting Up uniVocity-parsers
You can include it using Maven:
<!-- https://mvnrepository.com/artifact/org.apache.camel/camel-univocity-parsers --> <dependency> <groupId>org.apache.camel</groupId> <artifactId>camel-univocity-parsers</artifactId> <version>4.11.0</version> </dependency>
Parsing CSV Files
Parsing CSV is easy. First, create a CsvParserSettings
object.
CsvParserSettings settings = new CsvParserSettings(); settings.setHeaderExtractionEnabled(true);
Then, create the parser:
CsvParser parser = new CsvParser(settings); List<String[]> rows = parser.parseAll(new FileReader("data.csv"));
This code reads all lines from a file named data.csv
. It returns a list of string arrays.
Prepare the data
// Create sample data List<Employee> employees = new ArrayList<>(); employees.add(new Employee(1, "John Smith", "john.smith@example.com", 32, "Engineering", 85000.00)); employees.add(new Employee(2, "Jane Doe", "jane.doe@example.com", 28, "Marketing", 72000.00)); employees.add(new Employee(3, "Bob Johnson", "bob.johnson@example.com", 45, "Finance", 95000.00)); employees.add(new Employee(4, "Alice Brown", "alice.brown@example.com", 37, "HR", 68000.00)); employees.add(new Employee(5, "Charlie Black", "charlie.black@example.com", 41, "Sales", 78000.00));
Advanced settings
CsvParserSettings settings = new CsvParserSettings(); settings.setDelimiterDetectionEnabled(true, ','); // Set delimiter character settings.setQuoteDetectionEnabled(true); // Set quote character settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_CLOSING_QUOTE); settings.setKeepQuotes(true); settings.setNullValue("N/A"); // Set text to represent null values settings.setEmptyValue(""); // Set text to represent empty values settings.setSkipEmptyLines(true); // Skip empty lines when writing settings.setHeaders( "Name", "Age", "Country"); // Define headers settings.getFormat().setComment('#'); // Set comment character
Writing with a BeanWriterProcessor
Writing CSV is just as simple. Use CsvWriterSettings
:
package com.example.jobrunr.univocity; import com.univocity.parsers.csv.CsvWriter; import com.univocity.parsers.csv.CsvWriterSettings; import java.io.FileWriter; import java.io.IOException; public class WriteCSV { public static void main(String[] args) throws IOException { CsvWriterSettings writerSettings = new CsvWriterSettings(); CsvWriter writer = new CsvWriter(new FileWriter("D:\\logs\\csv\\univocity\\output.csv"), writerSettings); writer.writeHeaders("Name", "Age", "Country"); writer.writeRow("Alice", "30", "USA"); writer.writeRow("Bob", "25", "Canada"); writer.close(); } }
You now have a clean output.csv
file.
Name,Age,Country Alice,30,USA Bob,25,Canada
Writing CSV Using BeanWriterProcessor with a Java bean
private static void write() throws IOException { List<Person> people = Arrays.asList( new Person("Alice", 30, "USA"), new Person("Bob", 25, "Canada"), new Person("Carlos", 28, "Brazil") ); Writer fileWriter = new FileWriter("people.csv"); CsvWriterSettings settings = new CsvWriterSettings(); settings.setHeaderWritingEnabled(true); BeanWriterProcessor<Person> processor = new BeanWriterProcessor<>(Person.class); settings.setRowWriterProcessor(processor); CsvWriter writer = new CsvWriter(fileWriter, settings); writer.writeHeaders(); // Writes: name, age, country for (Person person : people) { writer.processRecord(person); // Process each record manually } writer.close(); }
Using Annotations for Bean Parsing
uniVocity-parsers supports Java beans. Annotate fields with @Parsed
.
@Getter @Setter @ToString public class Person { @Parsed private String name; @Parsed private int age; @Parsed private String country; }
Now use a BeanListProcessor
:
// Configure CSV writer settings CsvParserSettings settings = new CsvParserSettings(); settings.setDelimiterDetectionEnabled(true, ','); // Set delimiter character settings.setQuoteDetectionEnabled(true); // Set quote character settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_CLOSING_QUOTE); settings.setKeepQuotes(true); settings.setNullValue("N/A"); // Set text to represent null values settings.setEmptyValue(""); // Set text to represent empty values settings.setSkipEmptyLines(true); // Skip empty lines when writing settings.setHeaders( "Name", "Age", "Country"); // Define headers settings.getFormat().setComment('#'); // Set comment character BeanListProcessor<Person> processor = new BeanListProcessor<>(Person.class); settings.setProcessor(processor); CsvParser parser = new CsvParser(settings); parser.parse(new FileReader("output.csv")); List<Person> people = processor.getBeans();
This approach keeps your code clean and object-oriented.
Handling Fixed-Width Files
Parsing fixed-width files is easy too. Define the column lengths:
FixedWidthFields lengths = new FixedWidthFields(10, 5, 20); FixedWidthParserSettings settings = new FixedWidthParserSettings(lengths); FixedWidthParser parser = new FixedWidthParser(settings); List<String[]> rows = parser.parseAll(new FileReader("fixedwidth.txt"));
You can also define names:
lengths.addField("Name", 10); lengths.addField("Age", 5); lengths.addField("Email", 20);
This helps make your data more readable.
Skipping Headers and Comments
The parser can skip headers and comments automatically.
settings.setHeaderExtractionEnabled(true); settings.setLineSeparatorDetectionEnabled(true); settings.getFormat().setComment('#');
With this, lines starting with #
will be ignored.
Validating Input
You can validate data on the fly:
settings.setProcessorErrorHandler((error, inputRow, rowIndex) -> { System.err.println("Error at row " + rowIndex + ": " + error.getMessage()); });
This feature improves data reliability.
Trimming and Null Handling
You can remove whitespaces easily:
settings.setIgnoreLeadingWhitespaces(true); settings.setIgnoreTrailingWhitespaces(true);
And handle empty strings:
settings.setNullValue("");
This avoids common issues with blank data.
Parsing Large Files
uniVocity-parsers supports chunk parsing. Use row processors to avoid memory issues:
settings.setRowProcessor(new AbstractRowProcessor() { @Override public void rowProcessed(String[] row, ParsingContext context) { System.out.println(Arrays.toString(row)); } }); parser.parse(new FileReader("bigfile.csv"));
This handles each row without loading everything in memory.
What’s New in Version 4.11.0?
Version 4.11.0 introduces minor updates:
- Faster column processing.
- Bug fixes for edge case delimiters.
- Enhanced annotation support.
- Better memory performance for large datasets.
You can view the changelog on the official GitHub page.
Best Practices
Follow these tips for better results:
- Always validate your input files.
- Use beans for structured data.
- Trim and clean data during parsing.
- Use row processors for large files.
- Benchmark different settings.
Real-World Use Cases
You can use uniVocity-parsers in:
- Financial data pipelines
- Log processing systems
- ETL operations
- Machine learning preprocessing
- Inventory systems
It’s robust enough for enterprise applications.
Conclusion
uniVocity-parsers makes Java data parsing effortless. It’s flexible, fast, and easy to use. Whether you deal with CSV, TSV, or fixed-width files, this library has you covered.
Version 4.11.0 builds on a strong foundation. It ensures cleaner, safer, and faster parsing for your applications.
Start using uniVocity-parsers today. You’ll never struggle with data formats again.