-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of query_stats function #157
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #157 +/- ##
===========================================
+ Coverage 80.15% 80.56% +0.40%
===========================================
Files 52 53 +1
Lines 3442 3535 +93
===========================================
+ Hits 2759 2848 +89
- Misses 683 687 +4
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one major issue right now with query_stats
and the corresponding tests:
They don't update or check the statsframe.
Since you are querying the statsframe, I would expect the output Thicket to contain the following:
- A statsframe containing only nodes captured by the query
- A graph containing only nodes captured by the query
- A performance dataframe containing only nodes captured by the query
- An unmodified metadata table
Your code right now produces Thicket
objects with elements 2, 3, and 4, but it doesn't add the filtered statsframe (i.e., element 1) back to the new Thicket
.
f806693
to
a97b34d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good, but there's still a few things that need to be addressed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very close to being good to go. I've got a few more changes, but I suspect this PR will probably be good after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that the next review will be an approval for this PR.
There are just a handful of minor things to change. Most of them amount to removing stringification of multi-level column names for the cache. The remaining changes are just minor one line things.
flake, black added tests for query_stats fixed PR issues prog temporary resolve PR concerns
added more stats functions added more stats functions added additional stats func support
flake8, black fixed all tests added tests for reapply function added prelim test for cache decorator fixed issue caused by graph copy pr fixes
pr resolve pr fixes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! @pearce8 this PR is ready for your review
@pearce8 , in terms of documentation, there really isn't any meaningful difference on how the stats functions are called. All that's really changed from the user end of things is that the stats functions now return a list of output columns. This is already indicated in the docstrings and but not in the thicket documentation website. A concise sentence could accomplish this task. In terms of queries, i think we should add a small section in the query portion of the tutorials indicating that the query language can now be used on the statsframe, albeit with a slight difference. An example or two would be enough for this. |
@pearce8 I mentioned this on Slack in more detail, but this branch shows an example of using Befikir's cache decorator with other decorators: https://github.com/TauferLab/thicket/tree/memray_decorator_example |
This PR implements the query_stats function, introducing query language functionality to the statsframe. This enhancement allows end users to perform queries based on data within the statsframe.
Summary of Changes
To achieve this, extensive modifications were necessary. Specifically, all existing stats functions now return a list of output column names. This adjustment was essential to support the caching functionality required for the query_stats function.
Further Explanation
When the query_stats function is called with the squash flag enabled, nodes are sometimes merged into one. This creates two primary issues:
To address these issues, the statsframe needs to be recalculated after a squash operation. To facilitate this recalculation, I added a caching mechanism. This mechanism records the most recent stats operations and stores them in
statsframe_ops_cache
. These cached operations are then reapplied on a new statsframe.By implementing this caching functionality, we ensure that the statsframe remains consistent and accurate after squash operations, enabling reliable query results.
Important Notice
For the methods in this PR to work in the future, we must ensure a standard across all stats function such that they must return a list of output column names. This is an integral aspect of the caching mechanism.