Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

send_columns has false-positive when promoting batch to column for list-types #7137

Closed
jleibs opened this issue Aug 9, 2024 · 2 comments
Closed
Assignees
Labels
🪳 bug Something isn't working 🐍 Python API Python logging API

Comments

@jleibs
Copy link
Member

jleibs commented Aug 9, 2024

Out check for whether a batch is a column is currently just whether it is a list array:

rerun/rerun_py/src/arrow.rs

Lines 167 to 169 in 5a2d5dd

let batch = if let Some(batch) = value.as_any().downcast_ref::<ListArray<i32>>() {
batch.clone()
} else {

However, for types that are already list arrays, such as ImageBuffer, this leads us to pass through the batch without wrapping it suitably, leading to downstream errors.

This came up in the context of a proof-of-concept for logging image batches:

import numpy as np
import pyarrow as pa
import rerun as rr

rr.init("rerun_example_send_columns", spawn=True)

COUNT = 64
WIDTH = 100
HEIGHT = 50
CHANNELS = 3

# Create our time
times = np.arange(0, COUNT)

# Create a batch of images
rng = np.random.default_rng(12345)
image_batch = rng.uniform(0, 255, size=[COUNT, HEIGHT, WIDTH, CHANNELS]).astype(dtype=np.uint8)

# Log the ImageFormat once, as static
format_static = rr.components.ImageFormat(width=WIDTH, height=HEIGHT, color_model="RGB", channel_datatype="U8")
rr.log("image", [format_static], static=True)

# Manually create an ImageBuffersBatch
image_buffers = (row.tobytes() for row in image_batch.reshape(COUNT, -1))
raw_arrow = pa.array(image_buffers, type=rr.components.ImageBufferType())
buffers_batch = rr.components.ImageBufferBatch(raw_arrow)

# Uncomment this to work around the problem
# buffers_column = buffers_column.partition([1] * COUNT)


rr.send_columns(
    "image",
    times=[rr.TimeSequenceColumn("step", times)],
    components=[rr.Image.indicator(), buffers_column],
)

Which ends up with the fairly cryptic error:

/home/jleibs/rerun/docs/snippets/all/tutorials/send_columns.py:37: RerunWarning: send_columns: RuntimeError(Detected malformed Chunk: The outer array in chunked component batch must be a sparse list, got List(Field { name: "item", data_type: UInt8, is_nullable: false, metadata: {} }))
  rr.send_columns(
@jleibs jleibs added 🪳 bug Something isn't working 👀 needs triage This issue needs to be triaged by the Rerun team and removed 👀 needs triage This issue needs to be triaged by the Rerun team labels Aug 9, 2024
@jleibs
Copy link
Member Author

jleibs commented Aug 9, 2024

Even though it's a bit less performant, this could be a good argument for moving column-promotion from rust back to python, where we still have the full object context.

@jleibs jleibs added the 🐍 Python API Python logging API label Aug 9, 2024
@Wumpf
Copy link
Member

Wumpf commented Aug 13, 2024

noticed this issue only now. Moving column promotion to Python is essentially what I did in

@Wumpf Wumpf closed this as completed Aug 13, 2024
@Wumpf Wumpf self-assigned this Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🪳 bug Something isn't working 🐍 Python API Python logging API
Projects
None yet
Development

No branches or pull requests

2 participants