Indexing a nested list with 0 or an index larger than list size is not handled correctly #5310

ahmedriza · 2023-02-16T19:27:25Z

Describe the bug
Given a nested list, indexing works correctly as long as the index is not 0 or larger than the size of the list. However, if 0 or an index larger than the list is given, it will throw an error similar to the following:

Error: Arrow error: Invalid argument error: column types must match schema types, expected Float64 but found List(Field { name: "item", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) at column index 1

To Reproduce
Use the attached parquet file, list.parquet.gz

This Parquet file contains a single row of data as follows:

+------------------+---+
|a                 |id |
+------------------+---+
|[1.71, 2.71, 3.71]|1  |
+------------------+---+

Example code that demonstrates the bug (after uncompressing the file):

use datafusion::prelude::*;

let ctx = SessionContext::new();
ctx.register_parquet("t", "list.parquet", ParquetReadOptions::default()).await?;
let df = ctx.sql("select id, a[0] from t").await?;
df.show().await?;

Expected behavior
We expect to get a null value when the index is out of range. For example, the above code should produce the following output:

+----+--------+
| id | t.a[0] |
+----+--------+
| 1  |        |
+----+--------+

Additional context

We should be able to index this correctly, and if an invalid index is given, that should return nulls. Example:

use datafusion::prelude::*;

let ctx = SessionContext::new();
ctx.register_parquet("t", "list.parquet", ParquetReadOptions::default()).await?;
let df = ctx.sql("select id, a[0] from t").await?;
df.show().await?;

This should produce the following output:

+----+--------+--------+--------+--------+----------+
| id | t.a[0] | t.a[1] | t.a[2] | t.a[3] | t.a[100] |
+----+--------+--------+--------+--------+----------+
| 1  |        | 1.71   | 2.71   | 3.71   |          |
+----+--------+--------+--------+--------+----------+

The text was updated successfully, but these errors were encountered:

ahmedriza added the bug Something isn't working label Feb 16, 2023

ahmedriza mentioned this issue Feb 16, 2023

Fix nested list indexing when the index is 0 or larger than the list size #5311

Merged

alamb closed this as completed in #5311 Feb 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing a nested list with 0 or an index larger than list size is not handled correctly #5310

Indexing a nested list with 0 or an index larger than list size is not handled correctly #5310

ahmedriza commented Feb 16, 2023

Indexing a nested list with 0 or an index larger than list size is not handled correctly #5310

Indexing a nested list with 0 or an index larger than list size is not handled correctly #5310

Comments

ahmedriza commented Feb 16, 2023