Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Chunk component-level helpers and UnitChunk #6990

Merged
merged 3 commits into from
Jul 29, 2024

Conversation

teh-cmc
Copy link
Member

@teh-cmc teh-cmc commented Jul 25, 2024

This introduces all the usual crazy helpers for when you want to retrieve some very particular piece of data out of a chunk, in one (hopefully) neat, consistent package.

In particular this adds UnitChunk, a wrapper type for Chunk with is guaranteed to only ever hold one row of data, which is going to be very useful when introducing the new Chunk-based latest-at API later on.

Checklist

  • I have read and agree to Contributor Guide and the Code of Conduct
  • I've included a screenshot or gif (if applicable)
  • I have tested the web demo (if applicable):
  • The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG
  • If applicable, add a new check to the release checklist!
  • If have noted any breaking changes to the log API in CHANGELOG.md and the migration guide

To run all checks from main, comment on the PR with @rerun-bot full-check.

@teh-cmc teh-cmc changed the title TODO 1: todo Introduce Chunk component-level helpers and UnitChunk Jul 25, 2024
@teh-cmc teh-cmc added 🧑‍💻 dev experience developer experience (excluding CI) 🔍 re_query affects re_query itself do-not-merge Do not merge this PR include in changelog labels Jul 25, 2024
@teh-cmc teh-cmc marked this pull request as ready for review July 25, 2024 16:15
@jleibs
Copy link
Member

jleibs commented Jul 26, 2024

which is going to be very useful when introducing the new Chunk-based latest-at API later on.

Interesting -- you're thinking of materializing a LatestAt chunk on-demand for the query results?

&self,
component_name: &ComponentName,
row_index: usize,
) -> Option<ChunkResult<Box<dyn ArrowArray>>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, and in all the signatures below, I think we should try to keep result types on the outside as in: ChunkResult<Option<Box<dyn ArrowArray>>>

This seems like it lets us handle serialization errors consistently and then end up with Option<Box<dyn ArrowArray>> inline with:

Copy link
Member Author

@teh-cmc teh-cmc Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had ChunkResult<Option<Box<dyn ArrowArray>>> at the beginning, and it was just as awkward but for different reasons.
In the end I ended up with that variant, but ultimately I don't really have a strong opinion for one or the other.

Two notes, though:

  • I purposefully ignored / stayed away from Proposal: avoid empty arrays #6602, as I quickly realized that this PR stream would be complicated enough without attempting to modify our existing semantics. I don't mind revisiting that in the future (and switching the Option<Result>), but for now I don't think this should have any bearing on the current PR.
  • I wanted these Chunk APIs to be low-level enough so that more high-level semantics could be built on top of them for the common cases. These high-level APIs are introduced later in this PR stream, at the re_query layer.

component_name: &ComponentName,
row_index: usize,
instance_index: usize,
) -> Option<ChunkResult<Box<dyn ArrowArray>>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an missing component array is semantically consider "empty", then a row_index within an empty array should be an error. I'm not sure we need an Option here at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I commented above: I wanted these APIs to be a lower-level building block. It can be valuable to the caller to know whether the component as a whole is missing with a single API.

Higher-level APIs with the kind of semantics you're describing are introduced later in this PR stream at the re_query layer, and we can decide to treat the missing component as an error there.

Base automatically changed from cmc/chunkified_queries_0_chunk_iter to main July 29, 2024 10:38
@teh-cmc teh-cmc force-pushed the cmc/chunkified_queries_1_chunk_helpers branch from 80bcf4d to 2510837 Compare July 29, 2024 10:41
@teh-cmc
Copy link
Member Author

teh-cmc commented Jul 29, 2024

which is going to be very useful when introducing the new Chunk-based latest-at API later on.

Interesting -- you're thinking of materializing a LatestAt chunk on-demand for the query results?

(I assume you got your answer while reviewing the next PR!)

@teh-cmc teh-cmc removed the do-not-merge Do not merge this PR label Jul 29, 2024
@teh-cmc teh-cmc merged commit e88aaa2 into main Jul 29, 2024
34 of 35 checks passed
@teh-cmc teh-cmc deleted the cmc/chunkified_queries_1_chunk_helpers branch July 29, 2024 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🧑‍💻 dev experience developer experience (excluding CI) include in changelog 🔍 re_query affects re_query itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants