-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New temporal batch APIs #6587
New temporal batch APIs #6587
Conversation
e1b1d5a
to
475c9d4
Compare
27dc2b9
to
014d5cb
Compare
27c8cab
to
422b30c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great.
One question though: is there any particular reason you went with the "FFI-heavy" approach of passing py-dicts around vs. crafting the (Transport)Chunk directly in Python?
I imagine it's because we want to avoid duplicating all the metadata handling and sanity checking code etc. On the one hand I like it, on the other I can't help but wonder how that's gonna go in C++ where the FFI is much more barebones... Eh 🤷.
(Also yes -- I feel like we've already had this discussion at one point... it's all a big blur)
rerun_py/src/arrow.rs
Outdated
} else if Some(value.len()) == expected_length { | ||
let offsets = Offsets::try_from_lengths(std::iter::repeat(1).take(value.len())) | ||
.map_err(|err| ChunkError::Malformed { | ||
reason: format!("Failed to create offsets: {err}"), | ||
})?; | ||
let data_type = ListArray::<i32>::default_datatype(value.data_type().clone()); | ||
ListArray::<i32>::try_new(data_type, offsets.into(), value, None).map_err( | ||
|err| ChunkError::Malformed { | ||
reason: format!("Failed to wrap in List array: {err}"), | ||
}, | ||
)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha. Interesting that this is where this happens.
014d5cb
to
ad29d5c
Compare
8828289
to
43d91cc
Compare
7e789be
to
1e24a6b
Compare
43d91cc
to
fbe8f2a
Compare
6b42567
to
d304e97
Compare
fbe8f2a
to
2c083fb
Compare
@rerun-bot full-check |
Started a full build: https://github.com/rerun-io/rerun/actions/runs/9796809649 |
Hi @jleibs , I've been testing the pre-release build (releases). Then I tried to add multiple images at once using different versions using rr.components.TensorDataBatch (which i thought would be a fitting equivalent to ScalarBatch) but was unable to get it working. If it is possible to do in this release, could you please provide an example using log_temporal_batch to log a video / series of images for different times at once? ( video beeing a numpy array of shape(image_count, height, width) or a list[img0,img1 ...] ) What I've tried:
--> results in a warning: RerunWarning: log_temporal_batch: ValueError(All times and components in a batch must have the same length. Expected length: 2 but got: 1 for component: rerun.components.TensorData)
--> results in a warning: RerunWarning: TensorDataBatch: ValueError(Tensors do not support batches) |
@Famok thanks for testing the pre-release! It's very nice to get early feedback like this. What you are trying to do is conceptually correct, but is currently blocked by: I'm not sure whether we will manage to get that one done in time for this release. So you may still need to iterate over separate log calls when logging images for 0.18. I believe the inherent overhead with logging images is such that the performance differences here are more negligible, compared to batch-logging of scalars. That said, I absolutely appreciate the desire to use this pattern to clean up code like this when you are starting with an array of images to begin with. |
@jleibs Thanks for the clarification!
This probably depends on the images. In my case they are small (100x100 px) and many of them (>>10k). Therefore I hope to profit ;) |
Interesting. That makes sense. Even without ImageBatch support I think we will have a way to do this directly with an arrow constructor. I'll try to see if I can come up with some example code. |
@Famok here's an example of manually creating an ImageBuffersBatch using pyarrow. This may still change a bit syntactically before the release, but should give you the basic idea.
|
@jleibs Thank you so much, I'll give this a try on monday! Can this also be found in the docs somewhere? |
@jleibs I tried running your code, but it seems like the pre-release build is missing some stuff:
also this:
as well as:
|
@Famok the api has shifted around quite a bit again and your dev build doesn't have the latest changes yet it seems. Try again with the latest pre-release. That's |
What
This primarily introduces a new logging API for temporal batches. This API is a slightly lower-level than the existing
log
API but has a fairly familiar feel to it if you've worked with our data-types.The biggest difference is that it does not currently support Archetypes. Data must be specifically logged using raw components arranged in (possibly partitioned) batches. The main reason for this is that Archetypes aren't required to have matched-length components and as such you would need to provide a per-component parititioning, which starts to look very similar to the manual component-level API.
Note that we automatically wrap regular batches in the correct way to turn them into "mono-batches". Partitions only need to be generated manually to support batch-batches.
This also requires a few helper classes for the 3
TimeBatch
types, but otherwise makes use of the existingComponentBatch
constructors.New python API docstring:
Directly log a temporal batch of data to Rerun.
Unlike the regular
log
API, which is row-oriented, this API lets you submit the datain a columnar form. Each
TimeBatchLike
andComponentBatchLike
object represents a columnof data that will be sent to Rerun. The lengths of all of these columns must match, and all
data that shares the same index across the different columns will act as a single logical row,
equivalent to a single call to
rr.log()
.Note that this API ignores any stateful time set on the log stream via the
rerun.set_time_*
APIs.When using a regular
ComponentBatch
input, the batch data will map to single-valued componentinstance at each timepoint.
For example, scalars would be logged as:
In the viewer this will show up as 64 individual scalar values, one for each timepoint.
However, it is still possible to log temporal batches of batch data. To do this the source data first must
be created as a single contiguous batch, and can then be partitioned using the
.partition()
helper on theComponentBatch
objects.For example, to log 5 batches of 20 point clouds, first create a batch of 100 (20 * 5) point clouds and then
partition it into 5 batches of 20 point clouds:
In the viewer this will show up as 5 individual point clouds, one for each timepoint.
TODO
Checklist
main
build: rerun.io/viewernightly
build: rerun.io/viewerTo run all checks from
main
, comment on the PR with@rerun-bot full-check
.