Parser¶
A CsvParser
is at the core of the library. It is used to parse the given CSV data into strongly-typed objects.
Contructing a Parser¶
A CsvParser
needs the CsvParserOptions
and a CsvMapping
to be constructed.
Mapping¶
The parser has to know how to map between the textual CSV data and the strongly typed .NET object. This mapping is defined with
a CsvMapping
, which defines the mapping between the CSV column index and the property of the .NET object. It is an
abstract base class, that needs to be implemented by the user of the library.
The CsvMapping
exposes the method MapProperty
to define the actual property mapping.
You have seen an example for a CsvMapping
in the Quickstart document.
private class CsvPersonMapping : CsvMapping<Person>
{
public CsvPersonMapping()
: base()
{
MapProperty(0, x => x.FirstName);
MapProperty(1, x => x.LastName);
MapProperty(2, x => x.BirthDate);
}
}
Options¶
The CsvParser
doesn’t know by default, if the header row of the CSV data should be skipped or how to tokenize (see Tokenizer) a line. The options
are set in the CsvParserOptions
and passed into to the CsvParser
. Since input data can processed in parallel, so there are also options for the degree
of parallelism.
In the simplest case it is sufficient to pass the flag for the header skip and the column delimiter.
You have seen an example for CsvParserOptions
in the Quickstart document.
CsvParserOptions csvParserOptions = new CsvParserOptions(false, ';');
Parsing CSV Data¶
The CsvParser
exposes the methods ReadFromFile
and ReadFromString
to read the CSV data from a given file or string
.
You have seen an example for CsvParserOptions
in the Quickstart document.
var result = csvParser
.ReadFromFile("person.csv", Encoding.UTF8)
.ToList();
Working with the Results¶
The return value of the CsvParser.ReadFromFile
and CsvParser.ReadFromString
methods is a ParallelQuery<CsvMappingResult<TEntity>>
.
A ParallelQuery? A ParallelQuery
is a special IEnumerable
from the Parallel LINQ namespace, that behaves almost like a normal
IEnumerable
(with a few exceptions). In order to evaluate the results, you can iterate through the ParallelQuery
, which is the preferred
way of working with the results. If you are uncomfortable with enumerables, you can also turn the data into a simple list by calling the method
ToList()
on it.
Note
The library uses Parallel LINQ (PLINQ) to support a high degree of parallelism. Building a parallel processing pipeline with PLINQ may not be intuitive, so reading the most important PLINQ concepts is suggested. There is a great documentation on working with Parallel LINQ at MSDN: Parallel LINQ (PLINQ).
The CsvMappingResult
holds the parse results. You can access the result through the property CsvMappingResult<TEntity>.Result
, but the property
is only populated, when the parsing was successful. You can check if the CSV data was parsed successfully by evaluating the property CsvMappingResult<TEntity>.IsValid
.
Attention
The CsvParser
doesn’t throw any exceptions during parsing, because the input data is processed in parallel and the
CsvParser
can’t stop parsing, just because a single line has an error. So the CsvMappingResult
can also
contain an error, if parsing a line was not successful.
If a CSV line could not be parsed, the property CsvMappingResult<TEntity>.Error
is populated and contains the problematic column and error message.