Skip to content

Commit 781968d

Browse files
authored
Fix python's send_columns failing to convert correctly from ComponentBatch to ComponentColumn in some cases (#7155)
### What * Fixes #6592 * technically not for tensors which is covered by it's union-avoidance-ticket, but this PR adds a note about this Makes it a lot easier to log batches of images in python: ```python from __future__ import annotations import numpy as np import rerun as rr rr.init("rerun_example_send_columns", spawn=True) COUNT = 64 WIDTH = 100 HEIGHT = 50 CHANNELS = 3 # Create our time times = np.arange(0, COUNT) # Create a batch of images rng = np.random.default_rng(12345) image_batch = rng.uniform(0, 255, size=[COUNT, HEIGHT, WIDTH, CHANNELS]).astype( dtype=np.uint8 ) # Log the ImageFormat and indicator once, as static. format_static = rr.components.ImageFormat( width=WIDTH, height=HEIGHT, color_model="RGB", channel_datatype="U8" ) rr.log("images", [format_static, rr.Image.indicator()], static=True) # Reshape the images so `ImageBufferBatch` can tell that this is several blobs. rr.send_columns( "images", times=[rr.TimeSequenceColumn("step", times)], components=[rr.components.ImageBufferBatch(image_batch.reshape(COUNT, -1))], ) ``` cc: @Famok Related things that went into this PR: * add a note to tensors not supporting batching right now * support arrays of bytes for blob arrays Manually tested the snippets for `send_columns` to make sure nothing else broke. Considered adding the above as a snippet, but I'm still not sure if we want to advertise this prominently. There's some situations where this is useful, but generally we don't have a great usecase for it. Not sure about this 🤔 ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested the web demo (if applicable): * Using examples from latest `main` build: [rerun.io/viewer](https://rerun.io/viewer/pr/7155?manifest_url=https://app.rerun.io/version/main/examples_manifest.json) * Using full set of examples from `nightly` build: [rerun.io/viewer](https://rerun.io/viewer/pr/7155?manifest_url=https://app.rerun.io/version/nightly/examples_manifest.json) * [x] The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG * [x] If applicable, add a new check to the [release checklist](https://github.com/rerun-io/rerun/blob/main/tests/python/release_checklist)! * [x] If have noted any breaking changes to the log API in `CHANGELOG.md` and the migration guide - [PR Build Summary](https://build.rerun.io/pr/7155) - [Recent benchmark results](https://build.rerun.io/graphs/crates.html) - [Wasm size tracking](https://build.rerun.io/graphs/sizes.html) To run all checks from `main`, comment on the PR with `@rerun-bot full-check`.
1 parent 354124e commit 781968d

File tree

8 files changed

+59
-20
lines changed

8 files changed

+59
-20
lines changed

crates/store/re_types/definitions/rerun/archetypes/tensor.fbs

+4
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,10 @@ namespace rerun.archetypes;
77
/// \cpp data can be passed in without a copy from raw pointers or by reference from `std::vector`/`std::array`/c-arrays.
88
/// \cpp If needed, this "borrow-behavior" can be extended by defining your own `rerun::CollectionAdapter`.
99
///
10+
/// \py It's not currently possible to use `send_columns` with tensors since construction
11+
/// \py of `rerun.components.TensorDataBatch` does not support more than a single element.
12+
/// \py This will be addressed as part of https://github.com/rerun-io/rerun/issues/6832.
13+
///
1014
/// \example archetypes/tensor_simple title="Simple tensor" image="https://static.rerun.io/tensor_simple/baacb07712f7b706e3c80e696f70616c6c20b367/1200w.png"
1115
table Tensor (
1216
"attr.rust.derive": "PartialEq",

crates/store/re_types/definitions/rerun/datatypes/tensor_data.fbs

+4
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,10 @@ namespace rerun.datatypes;
1010
///
1111
/// These dimensions are combined with an index to look up values from the `buffer` field,
1212
/// which stores a contiguous array of typed values.
13+
///
14+
/// \py It's not currently possible to use `send_columns` with tensors since construction
15+
/// \py of `rerun.components.TensorDataBatch` does not support more than a single element.
16+
/// \py This will be addressed as part of https://github.com/rerun-io/rerun/issues/6832.
1317
table TensorData (
1418
"attr.python.aliases": "npt.ArrayLike",
1519
"attr.python.array_aliases": "npt.ArrayLike",

rerun_py/rerun_sdk/rerun/_baseclasses.py

-9
Original file line numberDiff line numberDiff line change
@@ -310,15 +310,6 @@ def as_arrow_array(self) -> pa.Array:
310310
return pa.ListArray.from_arrays(offsets, array)
311311

312312

313-
ComponentColumnLike = ComponentBatchLike | ComponentColumn
314-
"""
315-
Type alias for component column-like objects.
316-
317-
Every component batch can be interpreted as a component column.
318-
`ComponentColumn` implements the `ComponentBatchLike` interface but is still explicitly included here.
319-
"""
320-
321-
322313
class ComponentBatchMixin(ComponentBatchLike):
323314
def component_name(self) -> str:
324315
"""

rerun_py/rerun_sdk/rerun/archetypes/tensor.py

+4
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

rerun_py/rerun_sdk/rerun/datatypes/blob_ext.py

+2
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,8 @@ def native_to_pa_array_override(data: BlobArrayLike, data_type: pa.DataType) ->
5454
inners = []
5555
elif isinstance(data[0], Blob):
5656
inners = [pa.array(np.array(datum.data, dtype=np.uint8).flatten()) for datum in data] # type: ignore[union-attr]
57+
elif isinstance(data[0], bytes):
58+
inners = [pa.array(np.frombuffer(datum, dtype=np.uint8)) for datum in data] # type: ignore[arg-type]
5759
else:
5860
inners = [pa.array(np.array(datum, dtype=np.uint8).flatten()) for datum in data]
5961

rerun_py/rerun_sdk/rerun/datatypes/tensor_data.py

+4
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

rerun_py/rerun_sdk/rerun/send_columns.py

+23-9
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
from __future__ import annotations
22

3-
from typing import Iterable, Protocol, TypeVar
3+
from typing import Iterable, Protocol, TypeVar, Union
44

55
import pyarrow as pa
66
import rerun_bindings as bindings
77

8-
from ._baseclasses import Archetype, ComponentColumnLike
8+
from ._baseclasses import Archetype, ComponentBatchMixin, ComponentColumn
99
from ._log import IndicatorComponentBatch
1010
from .error_utils import catch_and_log_exceptions
1111
from .recording_stream import RecordingStream
@@ -84,7 +84,7 @@ def as_arrow_array(self) -> pa.Array:
8484
def send_columns(
8585
entity_path: str,
8686
times: Iterable[TimeColumnLike],
87-
components: Iterable[ComponentColumnLike],
87+
components: Iterable[Union[ComponentBatchMixin, ComponentColumn]],
8888
recording: RecordingStream | None = None,
8989
strict: bool | None = None,
9090
) -> None:
@@ -148,7 +148,11 @@ def send_columns(
148148
of timestamps. Generally you should use one of the provided classes: [`TimeSequenceColumn`][],
149149
[`TimeSecondsColumn`][], or [`TimeNanosColumn`][].
150150
components:
151-
The batches of components to log. Each `ComponentColumnLike` object represents a single column of data.
151+
The columns of components to log. Each object represents a single column of data.
152+
153+
If a batch of components is passed, it will be partitioned with one element per timepoint.
154+
In order to send multiple components per time value, explicitly create a [`ComponentColumn`][rerun.ComponentColumn]
155+
either by constructing it directly, or by calling the `.partition()` method on a `ComponentBatch` type.
152156
recording:
153157
Specifies the [`rerun.RecordingStream`][] to use.
154158
If left unspecified, defaults to the current active data recording, if there is one.
@@ -182,15 +186,25 @@ def send_columns(
182186
indicators.append(c)
183187
continue
184188
component_name = c.component_name()
185-
component_column = c.as_arrow_array() # type: ignore[union-attr]
189+
190+
if isinstance(c, ComponentColumn):
191+
component_column = c
192+
elif isinstance(c, ComponentBatchMixin):
193+
component_column = c.partition([1] * len(c)) # type: ignore[arg-type]
194+
else:
195+
raise TypeError(
196+
f"Expected either a type that implements the `ComponentMixin` or a `ComponentColumn`, got: {type(c)}"
197+
)
198+
arrow_list_array = component_column.as_arrow_array()
199+
186200
if expected_length is None:
187-
expected_length = len(component_column)
188-
elif len(component_column) != expected_length:
201+
expected_length = len(arrow_list_array)
202+
elif len(arrow_list_array) != expected_length:
189203
raise ValueError(
190-
f"All times and components in a batch must have the same length. Expected length: {expected_length} but got: {len(component_column)} for component: {component_name}"
204+
f"All times and components in a batch must have the same length. Expected length: {expected_length} but got: {len(arrow_list_array)} for component: {component_name}"
191205
)
192206

193-
components_args[component_name] = component_column
207+
components_args[component_name] = arrow_list_array
194208

195209
for i in indicators:
196210
if expected_length is None:

rerun_py/tests/unit/test_blob.py

+18-2
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,23 @@
77
def test_blob() -> None:
88
"""Blob should accept bytes input."""
99

10-
bites = b"Hello world"
10+
bytes = b"Hello world"
1111
array = np.array([72, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100], dtype=np.uint8)
1212

13-
assert rr.components.BlobBatch(bites).as_arrow_array() == rr.components.BlobBatch(array).as_arrow_array()
13+
assert rr.datatypes.BlobBatch(bytes).as_arrow_array() == rr.datatypes.BlobBatch(array).as_arrow_array()
14+
15+
16+
def test_blob_arrays() -> None:
17+
COUNT = 10
18+
19+
# bytes & array
20+
bytes = [b"Hello world"] * COUNT
21+
array = [np.array([72, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100], dtype=np.uint8)] * COUNT
22+
assert rr.datatypes.BlobBatch(bytes).as_arrow_array() == rr.datatypes.BlobBatch(array).as_arrow_array()
23+
assert len(rr.datatypes.BlobBatch(bytes)) == COUNT
24+
assert len(rr.datatypes.BlobBatch(array)) == COUNT
25+
26+
# 2D numpy array
27+
array_2d = np.array([[72, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100]] * COUNT, dtype=np.uint8)
28+
assert rr.datatypes.BlobBatch(bytes).as_arrow_array() == rr.datatypes.BlobBatch(array_2d).as_arrow_array()
29+
assert len(rr.datatypes.BlobBatch(array_2d)) == COUNT

0 commit comments

Comments
 (0)