Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of query_stats function #157

Merged
merged 6 commits into from
Jun 20, 2024
Merged

Conversation

Yejashi
Copy link
Collaborator

@Yejashi Yejashi commented May 2, 2024

This PR implements the query_stats function, introducing query language functionality to the statsframe. This enhancement allows end users to perform queries based on data within the statsframe.

Summary of Changes

To achieve this, extensive modifications were necessary. Specifically, all existing stats functions now return a list of output column names. This adjustment was essential to support the caching functionality required for the query_stats function.

Further Explanation

When the query_stats function is called with the squash flag enabled, nodes are sometimes merged into one. This creates two primary issues:

  1. A mismatch between the expected number of nodes in the performance data and the statsframe.
  2. Invalid data within the statsframe, as some nodes have been merged and their values aggregated.

To address these issues, the statsframe needs to be recalculated after a squash operation. To facilitate this recalculation, I added a caching mechanism. This mechanism records the most recent stats operations and stores them in statsframe_ops_cache. These cached operations are then reapplied on a new statsframe.

By implementing this caching functionality, we ensure that the statsframe remains consistent and accurate after squash operations, enabling reliable query results.

Important Notice

For the methods in this PR to work in the future, we must ensure a standard across all stats function such that they must return a list of output column names. This is an integral aspect of the caching mechanism.

@Yejashi Yejashi self-assigned this May 2, 2024
@Yejashi Yejashi marked this pull request as draft May 2, 2024 06:35
@Yejashi Yejashi added area-thicket Issues and PRs involving Thicket's core Thicket datastructure and associated classes priority-normal Normal priority issues and PRs type-feature Requests for new features or PRs which implement new features labels May 2, 2024
@pearce8 pearce8 added the status-work-in-progress PR is currently being worked on label May 8, 2024
@Yejashi Yejashi requested review from ilumsden and pearce8 May 15, 2024 14:28
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 95.69892% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 80.56%. Comparing base (9c53fd9) to head (d51de55).

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #157      +/-   ##
===========================================
+ Coverage    80.15%   80.56%   +0.40%     
===========================================
  Files           52       53       +1     
  Lines         3442     3535      +93     
===========================================
+ Hits          2759     2848      +89     
- Misses         683      687       +4     
Files Coverage Δ
thicket/tests/test_query_stats.py 98.48% <98.48%> (ø)
thicket/thicket.py 86.66% <88.88%> (+0.12%) ⬆️

@Yejashi Yejashi added status-ready-for-review This PR is ready to be reviewed by assigned reviewers and removed status-work-in-progress PR is currently being worked on labels May 15, 2024
@Yejashi Yejashi marked this pull request as ready for review May 15, 2024 17:03
Copy link
Collaborator

@ilumsden ilumsden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one major issue right now with query_stats and the corresponding tests:

They don't update or check the statsframe.

Since you are querying the statsframe, I would expect the output Thicket to contain the following:

  1. A statsframe containing only nodes captured by the query
  2. A graph containing only nodes captured by the query
  3. A performance dataframe containing only nodes captured by the query
  4. An unmodified metadata table

Your code right now produces Thicket objects with elements 2, 3, and 4, but it doesn't add the filtered statsframe (i.e., element 1) back to the new Thicket.

@Yejashi Yejashi force-pushed the query_stats branch 2 times, most recently from f806693 to a97b34d Compare May 20, 2024 15:13
Copy link
Collaborator

@ilumsden ilumsden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good, but there's still a few things that need to be addressed

@Yejashi Yejashi removed the status-ready-for-review This PR is ready to be reviewed by assigned reviewers label Jun 4, 2024
@michaelmckinsey1 michaelmckinsey1 self-requested a review June 5, 2024 17:11
@Yejashi Yejashi added the status-ready-for-review This PR is ready to be reviewed by assigned reviewers label Jun 10, 2024
@Yejashi Yejashi requested a review from ilumsden June 10, 2024 17:17
Copy link
Collaborator

@ilumsden ilumsden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very close to being good to go. I've got a few more changes, but I suspect this PR will probably be good after that.

@Yejashi Yejashi requested a review from ilumsden June 12, 2024 23:52
Copy link
Collaborator

@ilumsden ilumsden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that the next review will be an approval for this PR.

There are just a handful of minor things to change. Most of them amount to removing stringification of multi-level column names for the cache. The remaining changes are just minor one line things.

Yejashi added 2 commits June 14, 2024 13:30
flake, black

added tests for query_stats

fixed PR issues

prog

temporary resolve PR concerns
added more stats functions

added more stats functions

added additional stats func support
Yejashi added 3 commits June 14, 2024 13:30
flake8, black

fixed all tests

added tests for reapply function

added prelim test for cache decorator

fixed issue caused by graph copy

pr fixes
Copy link
Collaborator

@ilumsden ilumsden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! @pearce8 this PR is ready for your review

@pearce8
Copy link
Collaborator

pearce8 commented Jun 17, 2024

@Yejashi Do we have docs/examples (tutorial?) to go with this PR?
@Yejashi @ilumsden Any further thoughts on how we could enforce that the future functions use the decorator? Example of using this with a different decorator (e.g., profiling the functions)?

@Yejashi
Copy link
Collaborator Author

Yejashi commented Jun 17, 2024

@pearce8 , in terms of documentation, there really isn't any meaningful difference on how the stats functions are called. All that's really changed from the user end of things is that the stats functions now return a list of output columns. This is already indicated in the docstrings and but not in the thicket documentation website. A concise sentence could accomplish this task.

In terms of queries, i think we should add a small section in the query portion of the tutorials indicating that the query language can now be used on the statsframe, albeit with a slight difference. An example or two would be enough for this.

@ilumsden
Copy link
Collaborator

@pearce8 I mentioned this on Slack in more detail, but this branch shows an example of using Befikir's cache decorator with other decorators: https://github.com/TauferLab/thicket/tree/memray_decorator_example

@pearce8 pearce8 merged commit 9c83ea0 into LLNL:develop Jun 20, 2024
4 checks passed
@slabasan slabasan added this to the 2024.2.0 milestone Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-thicket Issues and PRs involving Thicket's core Thicket datastructure and associated classes priority-normal Normal priority issues and PRs status-ready-for-review This PR is ready to be reviewed by assigned reviewers type-feature Requests for new features or PRs which implement new features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants