Log indicators in their own dedicated chunks #8768

teh-cmc · 2025-01-21T17:18:19Z

Indicators are meant to go, but this won't happen overnight. In fact, they will be here for the foreseeable future.
In the meantime, it would be nice to reduce their negative impact, if possible (i.e. with very minimal efforts).

There are two problems with them that I want to focus on in this issue.

First problem: on the standard row-oriented log path, indicators actually consume 4 bytes of space per row in each chunk.
The reason for that is that chunks are, at the moment, only capable of carrying ListArrays around (we want to be able to use native-typed arrays as columns in the future, but this is not a small change and this won't happen for a long time).
Corollary: each time you log an indicator (i.e. each time you log an archetype), the indicator's NullArray gets wrapped in a ListArray, and that list must have N offset values where N in the number of rows in the chunk. The null array itself doesn't take space (well it takes a fixed 4 bytes), but the outer list's offsets do.

Second problem: all of that is just as true on the column-oriented path, and becomes even more relevant today with new send_columns APIs that actually enforce that the appropriate indicators are logged.

We should investigate the possibility of always logging indicators into their own little chunk, on both the row- and column- oriented paths.

On the row-oriented path, that means splitting them in a separate chunk if e.g. the main chunk has more than a predefined threshold of rows.
On the column-oriented path, that means always logging them in a separate chunk.

The text was updated successfully, but these errors were encountered:

See #8769 (review) for rationale: ![image](https://github.com/user-attachments/assets/1b017fb2-328c-46cd-8ead-4054c0ab5d0b) * Related: #8768 * Follow-up to #8753

teh-cmc added ⛃ re_datastore affects the datastore itself 🔩 data model Sorbet 🚀 performance Optimization, memory use, etc 🪵 Log & send APIs Affects the user-facing API for all languages labels Jan 21, 2025

This was referenced Jan 21, 2025

Tagged columnar updates: Rust #8764

Merged

Make tagged columnar updates work with mono-components too #8769

Merged

Columnar APIs: do not autogenerate indicator for Scalar #8771

Merged

teh-cmc mentioned this issue Jan 27, 2025

Always split indicators into their own dedicated chunks #8833

Merged

1 task

Wumpf closed this as completed in #8833 Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log indicators in their own dedicated chunks #8768

Log indicators in their own dedicated chunks #8768

teh-cmc commented Jan 21, 2025 •

edited

Loading

Log indicators in their own dedicated chunks #8768

Log indicators in their own dedicated chunks #8768

Comments

teh-cmc commented Jan 21, 2025 • edited Loading

teh-cmc commented Jan 21, 2025 •

edited

Loading