How is it intended to do multi-threading with this library?

**Which part is this question about**
Library API / UX

**Describe your question**
With a lot of functions in the `pyarrow` package, there is already some multithreading implemented for you.
As far as I understand, reading from a file for example is multithreaded by letting each column be processed by a separate thread.

As far as I know, there is nothing comparable to that directly available for you in this Rust crate. What are users expected to do here?

For example, for simply reading a parquet file in a parallized manner, would one do something like this?
1. first look at the schema to find out what the columns are
2. spawn an async worker for each column that reads from the same file, but with a filter for just one column 
3. collect all RecordBatches from each worker and merge it into one RecordBatch containing all the data

If there is nothing already offered for you in this crate that does this, should this maybe be part of this crate?

How could parallelized writes work? It's not easily possible to just write parquet files containing one column each and then merge it afterwards, right?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How is it intended to do multi-threading with this library? #7284

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How is it intended to do multi-threading with this library? #7284

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions