Skip to content

v2.4.0

Compare
Choose a tag to compare
@mjakubowski84 mjakubowski84 released this 02 Apr 13:27
· 136 commits to master since this release

This is a major release that brings important improvements and fixes.

TypedSchemaDef fix

Long time ago it seemed to be a great idea to make TypedSchemaDef a tagged type alias for SchemaDef. And as it was used only by ParquetSchemaResolver it was defined in its scope. All implicit implementations were defined inside companion object of ParquetSchemaResolver. This design didn't change much for a long time but it had a big flaw - it was not a proper type class. As implicit implementations were not defined in a companion object of TypedSchemaDef users could encounter problems with ambiguous implicits when defining own schema definitions.

In this release TypedSchemaDef is turned into a proper type class with its own trait and companion object. ParquetSchemaResolver.TypedSchemaDef is left as an alias to a new trait but is marked as deprecated. All implementations are moved to companion object of TypedSchemaDef - so if you are referencing provided schema definitions explicitly then it is a breaking change for you. However, this change is necessary to bring the improvement.

Chunks in FS2

FS2 suggests processing stream elements in chunks for the best performance. Parquet4S was not following this advice - so far. It had an advantage of keeping code simple and allowed library users to take an action per each processed element. However, this approach had a big influence on performance which was visible especially when reading or writing local files.

Now you are able to choose if you want to process your data in chunks or not. By default fromParquet or viaParquet are using chunks of size equal to 16. To keep previous behaviour change it to 1 using chunkSize property. Or increase it if you are not afraid if higher memory consumption.

Simple write uses chunking provided by the upstream and remains unchanged.

Local benchmarks prove that with chunks of 16 elements reading speed increased even 10x while viaParquet is ~2x faster than in previous releases of Parquet4s.

Upgraded dependencies

  1. Shapeless upgraded to 2.3.9
  2. Akka upgraded to 2.6.19
  3. FS2 upgraded to 3.2.7
  4. Cats effect upgraded to 3.3.9