Java Tutorial: Parsing Made Simple with uniVocity-parsers

What is UniVocity-parsers?

uniVocity-parsers is a high-performance library. It helps Java developers parse and write data. It supports CSV, TSV, fixed-width, and more. The library is open-source and developer-friendly.

Why Choose uniVocity-parsers?

Many libraries can parse data. But UniVocity-parsers stands out for a few reasons:

  • It’s fast.
  • It’s flexible.
  • It supports annotations.
  • It handles edge cases well.
  • It has strong documentation.

Version 4.11.0 adds performance improvements and bug fixes. It’s more stable than previous versions.

Setting Up uniVocity-parsers

You can include it using Maven:

<!-- https://mvnrepository.com/artifact/org.apache.camel/camel-univocity-parsers -->
<dependency>
    <groupId>org.apache.camel</groupId>
    <artifactId>camel-univocity-parsers</artifactId>
    <version>4.11.0</version>
</dependency>

Parsing CSV Files

Parsing CSV is easy. First, create a CsvParserSettings object.

CsvParserSettings settings = new CsvParserSettings();
settings.setHeaderExtractionEnabled(true);

Then, create the parser:

CsvParser parser = new CsvParser(settings);
List<String[]> rows = parser.parseAll(new FileReader("data.csv"));

This code reads all lines from a file named data.csv. It returns a list of string arrays.

Prepare the data

// Create sample data
List<Employee> employees = new ArrayList<>();
employees.add(new Employee(1, "John Smith", "john.smith@example.com", 32, "Engineering", 85000.00));
employees.add(new Employee(2, "Jane Doe", "jane.doe@example.com", 28, "Marketing", 72000.00));
employees.add(new Employee(3, "Bob Johnson", "bob.johnson@example.com", 45, "Finance", 95000.00));
employees.add(new Employee(4, "Alice Brown", "alice.brown@example.com", 37, "HR", 68000.00));
employees.add(new Employee(5, "Charlie Black", "charlie.black@example.com", 41, "Sales", 78000.00));

Advanced settings

CsvParserSettings settings = new CsvParserSettings();
settings.setDelimiterDetectionEnabled(true, ',');                   // Set delimiter character
settings.setQuoteDetectionEnabled(true);                   // Set quote character
settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_CLOSING_QUOTE);
settings.setKeepQuotes(true);
settings.setNullValue("N/A");                 // Set text to represent null values
settings.setEmptyValue("");                   // Set text to represent empty values
settings.setSkipEmptyLines(true);             // Skip empty lines when writing
settings.setHeaders( "Name", "Age", "Country"); // Define headers
settings.getFormat().setComment('#');         // Set comment character

Writing with a BeanWriterProcessor

Writing CSV is just as simple. Use CsvWriterSettings:

package com.example.jobrunr.univocity;
import com.univocity.parsers.csv.CsvWriter;
import com.univocity.parsers.csv.CsvWriterSettings;
import java.io.FileWriter;
import java.io.IOException;

public class WriteCSV {
    public static void main(String[] args) throws IOException {
        CsvWriterSettings writerSettings = new CsvWriterSettings();
        CsvWriter writer = new CsvWriter(new FileWriter("D:\\logs\\csv\\univocity\\output.csv"), writerSettings);
        writer.writeHeaders("Name", "Age", "Country");
        writer.writeRow("Alice", "30", "USA");
        writer.writeRow("Bob", "25", "Canada");
        writer.close();
    }
}

You now have a clean output.csv file.

Name,Age,Country
Alice,30,USA
Bob,25,Canada

Writing CSV Using BeanWriterProcessor with a Java bean

private static void write() throws IOException {

    List<Person> people = Arrays.asList(
            new Person("Alice", 30, "USA"),
            new Person("Bob", 25, "Canada"),
            new Person("Carlos", 28, "Brazil")
    );

    Writer fileWriter = new FileWriter("people.csv");
    CsvWriterSettings settings = new CsvWriterSettings();
    settings.setHeaderWritingEnabled(true);

    BeanWriterProcessor<Person> processor = new BeanWriterProcessor<>(Person.class);
    settings.setRowWriterProcessor(processor);

    CsvWriter writer = new CsvWriter(fileWriter, settings);
    writer.writeHeaders();  // Writes: name, age, country

    for (Person person : people) {
        writer.processRecord(person); // Process each record manually
    }

    writer.close();

}

Using Annotations for Bean Parsing

uniVocity-parsers supports Java beans. Annotate fields with @Parsed.

@Getter
@Setter
@ToString
public class Person {
    @Parsed
    private String name;

    @Parsed
    private int age;

    @Parsed
    private String country;
}

Now use a BeanListProcessor:

// Configure CSV writer settings
CsvParserSettings settings = new CsvParserSettings();
settings.setDelimiterDetectionEnabled(true, ',');                   // Set delimiter character
settings.setQuoteDetectionEnabled(true);                   // Set quote character
settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_CLOSING_QUOTE);
settings.setKeepQuotes(true);
settings.setNullValue("N/A");                 // Set text to represent null values
settings.setEmptyValue("");                   // Set text to represent empty values
settings.setSkipEmptyLines(true);             // Skip empty lines when writing
settings.setHeaders( "Name", "Age", "Country"); // Define headers
settings.getFormat().setComment('#');         // Set comment character

BeanListProcessor<Person> processor = new BeanListProcessor<>(Person.class);
settings.setProcessor(processor);
CsvParser parser = new CsvParser(settings);
parser.parse(new FileReader("output.csv"));
List<Person> people = processor.getBeans();

This approach keeps your code clean and object-oriented.

Handling Fixed-Width Files

Parsing fixed-width files is easy too. Define the column lengths:

FixedWidthFields lengths = new FixedWidthFields(10, 5, 20);
FixedWidthParserSettings settings = new FixedWidthParserSettings(lengths);
FixedWidthParser parser = new FixedWidthParser(settings);
List<String[]> rows = parser.parseAll(new FileReader("fixedwidth.txt"));

You can also define names:

lengths.addField("Name", 10);
lengths.addField("Age", 5);
lengths.addField("Email", 20);

This helps make your data more readable.

Skipping Headers and Comments

The parser can skip headers and comments automatically.

settings.setHeaderExtractionEnabled(true);
settings.setLineSeparatorDetectionEnabled(true);
settings.getFormat().setComment('#'); 

With this, lines starting with # will be ignored.

Validating Input

You can validate data on the fly:

settings.setProcessorErrorHandler((error, inputRow, rowIndex) -> {
    System.err.println("Error at row " + rowIndex + ": " + error.getMessage());
});

This feature improves data reliability.

Trimming and Null Handling

You can remove whitespaces easily:

settings.setIgnoreLeadingWhitespaces(true);
settings.setIgnoreTrailingWhitespaces(true);

And handle empty strings:

settings.setNullValue("");

This avoids common issues with blank data.

Parsing Large Files

uniVocity-parsers supports chunk parsing. Use row processors to avoid memory issues:

settings.setRowProcessor(new AbstractRowProcessor() {
    @Override
    public void rowProcessed(String[] row, ParsingContext context) {
        System.out.println(Arrays.toString(row));
    }
});
parser.parse(new FileReader("bigfile.csv"));

This handles each row without loading everything in memory.

What’s New in Version 4.11.0?

Version 4.11.0 introduces minor updates:

  • Faster column processing.
  • Bug fixes for edge case delimiters.
  • Enhanced annotation support.
  • Better memory performance for large datasets.

You can view the changelog on the official GitHub page.

Best Practices

Follow these tips for better results:

  • Always validate your input files.
  • Use beans for structured data.
  • Trim and clean data during parsing.
  • Use row processors for large files.
  • Benchmark different settings.

Real-World Use Cases

You can use uniVocity-parsers in:

  • Financial data pipelines
  • Log processing systems
  • ETL operations
  • Machine learning preprocessing
  • Inventory systems

It’s robust enough for enterprise applications.

Conclusion

uniVocity-parsers makes Java data parsing effortless. It’s flexible, fast, and easy to use. Whether you deal with CSV, TSV, or fixed-width files, this library has you covered.

Version 4.11.0 builds on a strong foundation. It ensures cleaner, safer, and faster parsing for your applications.

Start using uniVocity-parsers today. You’ll never struggle with data formats again.

Leave a Comment

Your email address will not be published. Required fields are marked *