-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
beam.map()
#5449
base: develop
Are you sure you want to change the base?
beam.map()
#5449
Conversation
WalkthroughThis pull request introduces a new utility in the Apache Beam integration. The changes add a Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant beam_map
participant ApacheBeam
participant MapBatch
participant ReduceFcn
User->>+beam_map: Call beam_map(sample_collection, map_fcn, ...)
beam_map->>+ApacheBeam: Set up pipeline with options
ApacheBeam->>+MapBatch: Process sample batches using map_fcn
MapBatch-->>-ApacheBeam: Return mapped results
alt reduce/aggregate provided
ApacheBeam->>+ReduceFcn: Combine results
ReduceFcn-->>-ApacheBeam: Return aggregated output
end
ApacheBeam-->>-beam_map: Complete execution with processed samples
beam_map-->>-User: Return final results
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
21601b2
to
249a280
Compare
I have done a little bit of refactor and bug fixes so that we can run evaluate detections with |
a8aeeee
to
11b1309
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
fiftyone/utils/beam.py (1)
8-16
: Remove unused import
Theinspect
module (line 8) is never referenced in this file. Removing it will help avoid clutter and remain consistent with best practices.- import inspect + # import inspect # remove this import statement🧰 Tools
🪛 Ruff (0.8.2)
8-8:
inspect
imported but unusedRemove unused import:
inspect
(F401)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
fiftyone/utils/beam.py
(3 hunks)setup.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
fiftyone/utils/beam.py
8-8: inspect
imported but unused
Remove unused import: inspect
(F401)
557-560: Use ternary operator reduce_cls = reduce_fcn if reduce_fcn is not None else ReduceFcn
instead of if
-else
-block
Replace if
-else
-block with reduce_cls = reduce_fcn if reduce_fcn is not None else ReduceFcn
(SIM108)
581-582: Use a single with
statement with multiple contexts instead of nested with
statements
(SIM117)
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
- GitHub Check: e2e / test-e2e
- GitHub Check: build
🔇 Additional comments (7)
fiftyone/utils/beam.py (6)
20-22
: Environment variable usage
SettingGRPC_VERBOSITY
to"NONE"
effectively silences gRPC logs. This usage is fine for reducing log noise. No issues here.
349-361
: Function signature clarity
Thebeam_map
function signature is well-structured, with clearly named arguments matching the docstring. The choice to provide flexible parameters (reduce_fcn
,aggregate_fcn
,save
) supports a wide range of data processing needs. Looks good!
362-461
: Rich and instructive docstring
The docstring thoroughly explains various usage patterns (map-only, map-reduce, and map-aggregate) with clear examples. This level of detail is beneficial for maintainability and user adoption. Great work!
768-832
: Parallel mapping implementation
TheMapBatch
class provides a parallel-friendly way to applymap_fcn
to each sample. The conditional logic for slice-based vs. ID-based shards is clearly separated. Good job ensuringautosave=self.save
is leveraged within the iteration.
833-893
: Reducer design
TheReduceFcn
class cleanly leverages the Apache BeamCombineFn
interface to aggregate outputs. Storing the final results via_store_output()
is straightforward and obvious. This design is flexible for custom reduce or aggregate operations.
894-904
: Convenient intermediate results storage
_set_key
and_get_key
provide a neat way to store and retrieve intermediate results. Consider clarifying concurrency implications if multiple workers simultaneously set the same key. Otherwise, this method is well-conceived.setup.py (1)
73-73
: Good addition of tqdm
Including"tqdm"
in theINSTALL_REQUIRES
ensures that progress bars function correctly in thebeam_map
feature without requiring separate manual installation.
Change log
Adds a
beam_map()
method that leverages Apache Beam to apply amap_fcn
to each sample in asample_collection
in parallel, optionally saving any sample edits and/or applying areduce_fcn
oraggregate_fcn
to the outputs.Example 1: map-save
Same operation with
iter_samples(autosave=True)
:Example 2: map-reduce
Example 3: map-aggregate
Summary by CodeRabbit
New Features
Chores