Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log indicators in their own dedicated chunks #8768

Closed
teh-cmc opened this issue Jan 21, 2025 · 0 comments · Fixed by #8833
Closed

Log indicators in their own dedicated chunks #8768

teh-cmc opened this issue Jan 21, 2025 · 0 comments · Fixed by #8833
Labels
🔩 data model Sorbet 🪵 Log & send APIs Affects the user-facing API for all languages 🚀 performance Optimization, memory use, etc ⛃ re_datastore affects the datastore itself

Comments

@teh-cmc
Copy link
Member

teh-cmc commented Jan 21, 2025

Indicators are meant to go, but this won't happen overnight. In fact, they will be here for the foreseeable future.
In the meantime, it would be nice to reduce their negative impact, if possible (i.e. with very minimal efforts).

There are two problems with them that I want to focus on in this issue.

First problem: on the standard row-oriented log path, indicators actually consume 4 bytes of space per row in each chunk.
The reason for that is that chunks are, at the moment, only capable of carrying ListArrays around (we want to be able to use native-typed arrays as columns in the future, but this is not a small change and this won't happen for a long time).
Corollary: each time you log an indicator (i.e. each time you log an archetype), the indicator's NullArray gets wrapped in a ListArray, and that list must have N offset values where N in the number of rows in the chunk. The null array itself doesn't take space (well it takes a fixed 4 bytes), but the outer list's offsets do.

Second problem: all of that is just as true on the column-oriented path, and becomes even more relevant today with new send_columns APIs that actually enforce that the appropriate indicators are logged.

We should investigate the possibility of always logging indicators into their own little chunk, on both the row- and column- oriented paths.

  • On the row-oriented path, that means splitting them in a separate chunk if e.g. the main chunk has more than a predefined threshold of rows.
  • On the column-oriented path, that means always logging them in a separate chunk.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔩 data model Sorbet 🪵 Log & send APIs Affects the user-facing API for all languages 🚀 performance Optimization, memory use, etc ⛃ re_datastore affects the datastore itself
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant