Skip to content

Commit 07fb9e7

Browse files
abey79teh-cmc
andauthored
Add support for NumPy arrays to the arrow serializer for string datatypes (#7689)
### What This adds support for Numpy array for batches of `Utf8` datatypes. For example, this facilitates logging a `TextBatch` when using Pandas dataframe: ```python rr.send_columns( "/entity/path", times=[rr.TimeSequenceColumn("frame_nr", df["frame_nr"])], components=[ rr.components.TextBatch(np.where(df["mouth_open"], "OPEN", "CLOSE")), ], ) ``` ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested the web demo (if applicable): * Using examples from latest `main` build: [rerun.io/viewer](https://rerun.io/viewer/pr/7689?manifest_url=https://app.rerun.io/version/main/examples_manifest.json) * Using full set of examples from `nightly` build: [rerun.io/viewer](https://rerun.io/viewer/pr/7689?manifest_url=https://app.rerun.io/version/nightly/examples_manifest.json) * [x] The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG * [x] If applicable, add a new check to the [release checklist](https://github.com/rerun-io/rerun/blob/main/tests/python/release_checklist)! * [x] If have noted any breaking changes to the log API in `CHANGELOG.md` and the migration guide - [PR Build Summary](https://build.rerun.io/pr/7689) - [Recent benchmark results](https://build.rerun.io/graphs/crates.html) - [Wasm size tracking](https://build.rerun.io/graphs/sizes.html) To run all checks from `main`, comment on the PR with `@rerun-bot full-check`. --------- Co-authored-by: Clement Rey <cr.rey.clement@gmail.com>
1 parent ca6ad40 commit 07fb9e7

File tree

8 files changed

+58
-8
lines changed

8 files changed

+58
-8
lines changed

crates/build/re_types_builder/src/codegen/python/mod.rs

+3-1
Original file line numberDiff line numberDiff line change
@@ -1995,9 +1995,11 @@ fn quote_arrow_serialization(
19951995
return Ok(unindent(
19961996
r##"
19971997
if isinstance(data, str):
1998-
array = [data]
1998+
array: Union[list[str], npt.ArrayLike] = [data]
19991999
elif isinstance(data, Sequence):
20002000
array = [str(datum) for datum in data]
2001+
elif isinstance(data, np.ndarray):
2002+
array = data
20012003
else:
20022004
array = [str(data)]
20032005

crates/store/re_types/definitions/rerun/datatypes/utf8.fbs

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ namespace rerun.datatypes;
88
table Utf8 (
99
"attr.arrow.transparent",
1010
"attr.python.aliases": "str",
11-
"attr.python.array_aliases": "str, Sequence[str]",
11+
"attr.python.array_aliases": "str, Sequence[str], npt.ArrayLike",
1212
"attr.rust.derive": "Default, PartialEq, Eq, PartialOrd, Ord, Hash",
1313
"attr.rust.override_crate": "re_types_core",
1414
"attr.rust.repr": "transparent",

rerun_py/rerun_sdk/rerun/datatypes/entity_path.py

+5-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

rerun_py/rerun_sdk/rerun/datatypes/utf8.py

+6-2
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

rerun_py/tests/test_types/components/affix_fuzzer10.py

+5-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

rerun_py/tests/test_types/components/affix_fuzzer9.py

+5-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

rerun_py/tests/test_types/datatypes/string_component.py

+5-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

rerun_py/tests/unit/test_utf8.py

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
from __future__ import annotations
2+
3+
import numpy as np
4+
from rerun import datatypes
5+
6+
7+
def test_utf8_batch_single() -> None:
8+
single_string = "hello"
9+
list_of_one_string = ["hello"]
10+
array_of_one_string = np.array(["hello"])
11+
12+
assert (
13+
datatypes.Utf8Batch(single_string).as_arrow_array() == datatypes.Utf8Batch(list_of_one_string).as_arrow_array()
14+
)
15+
16+
assert (
17+
datatypes.Utf8Batch(single_string).as_arrow_array() == datatypes.Utf8Batch(array_of_one_string).as_arrow_array()
18+
)
19+
20+
21+
def test_utf8_batch_many() -> None:
22+
# different string length to be sure
23+
list_of_strings = ["hell", "worlds"]
24+
array_of_strings = np.array(["hell", "worlds"])
25+
26+
assert (
27+
datatypes.Utf8Batch(list_of_strings).as_arrow_array() == datatypes.Utf8Batch(array_of_strings).as_arrow_array()
28+
)

0 commit comments

Comments
 (0)