Skip to content

A small package that will help you to sort large CSV files.

License

Notifications You must be signed in to change notification settings

b-maslennikov/CsvSorter

Repository files navigation

CsvSorter

Build Tests Issues License Nuget

It is a small package that will help you to sort your large CSV files.

Instalation

PM> Install-Package CsvSorter
> dotnet add package CsvSorter
<PackageReference Include="CsvSorter" Version="2.0.0" />

Dependencies

Package name Version
CsvHelper >=30.0.1

Avaliable methods

Method name Parameter type Description
Using CsvConfiguration Sets CsvConfiguration. See CsvHelper documentation
TypeConverterOptions Sets TypeConverterOptions. See CsvHelper documentation
IIndexProvider<T> or IAsyncIndexProvider<T> Sets index provider. Default: MemoryIndexProvider
SortDirection Sets sorting direction.
Default: Ascending
ToFileAsync string, CancellationToken Saves sorted data to a file
ToFile string Saves sorted data to a file
ToWriterAsync TextWriter, CancellationToken Saves sorted data using provided writer
ToWriter TextWriter Saves sorted data using provided writer

Basic usage

using CsvSorter;

await new StreamReader(@"C:\my_large_file.csv")
    .GetCsvSorter<int>("id")
    .ToFileAsync(@"C:\my_large_file_sorted_by_id.csv");
	
// or

await new CsvSorter<int>(streamReader, "id")
    .ToFileAsync(@"C:\my_large_file_sorted_by_id.csv");

Index providers

Default index provider is MemoryIndexProvider<T>. It stores index data in the memory.
You can create your own provider (AzureIndexProvider for example) by implementing IIndexProvider<T> or IAsyncIndexProvider<T> interfaces:

public interface IIndexProvider<T> where T: IComparable<T>
{
    void Add(CsvSorterIndex<T> record);
    IEnumerable<CsvSorterIndex<T>> GetSorted(SortDirection sortDirection);
    void Clear();
}

public interface IAsyncIndexProvider<T> where T: IComparable<T>
{
    Task AddAsync(CsvSorterIndex<T> record, CancellationToken cancellationToken);
    IAsyncEnumerable<CsvSorterIndex<T>> GetSorted(SortDirection sortDirection, CancellationToken cancellationToken);
    Task ClearAsync(CancellationToken cancellationToken);
}

Please note that only one index provider can be used at a time.

await new CsvSorter<int>(streamReader, "id")
    .Using(new FirebaseIndexProvider<int>()) // will be ignored
    .Using(new AzureIndexProvider<int>())    // will be used
    .ToWriterAsync(writer);

Events

You can specify 4 events: OnIndexCreationStarted, OnIndexCreationFinished, OnSortingStarted and OnSortingFinished

await new StreamReader(@"C:\my_large_file.csv")
    .GetCsvSorter<int>(0)
    .OnIndexCreationStarted(() => { logger.Info("Index creation has started"); })
    .OnIndexCreationFinished(() => { logger.Info("Index creation completed"); })
    .OnSortingStarted(() => { logger.Info("Sorting has started"); })
    .OnSortingFinished(() => { logger.Info("Sorting completed"); })
    .ToWriterAsync(writer);

A few more examples

var csvConfig = new CsvConfiguration(CultureInfo.InvariantCulture)
{
    HasHeaderRecord = false
};

var dateTimeConverterOptions = new TypeConverterOptions
{ 
    Formats = new[] { "dd_MM_yyyy" }
};

await new StreamReader(@"C:\my_large_file.csv")
    .GetCsvSorter<DateTime>(3)
    .Using(SortDirection.Descending)
    .Using(csvConfig)
    .Using(dateTimeConverterOptions)	
    .ToFileAsync(@"C:\my_large_file_sorted_by_date.csv", cancellationToken);
var csvConfig = new CsvConfiguration(CultureInfo.InvariantCulture)
{
    Delimiter = "|"
};

await new StreamReader(@"C:\my_large_file.csv")
    .GetCsvSorter<string>("email")
    .Using(csvConfig)
    .Using(new AzureIndexProvider<string>())
    .ToWriterAsync(writer);