-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Statistics #36
Statistics #36
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I really understand the apply/transform methods; don't we just need one of them?
Pretreatments and statistics do different things so maybe it makes sense for the methods to be named different things -- pretreatments transform dataframes to dataframes, and statistics transform dataframes to StatisticsResultCollections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the design overall. I guess we still need to figure out more details about the pre treatments like what they are and are they always the same for a given statistic like you mentioned.
It looks like we're hitting python/mypy#4717. We should just |
I made changes to represent statistics in the config for metrics like discussed: [metrics]
weekly = ["active_ticks"]
[metrics.active_ticks.statistics.bootstrap_quantiles]
quantiles = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
[metrics.active_ticks.statistics.statistic_without_arguments]
[metrics.active_ticks.statistics.bootstrap_means]
pre_treatments = ["trim", "log"]
[metrics.my_cool_metric]
select_expr = "1"
data_source = "whatever"
# This "inline table" syntax produces the same output as the explicit table syntax
statistics = {statistic_without_arguments={}, bootstrap_quantiles={}} I'll add more tests if the current approach looks okay. |
Summary( | ||
metric=mozanalysis.metrics.desktop.search_count, | ||
treatment=BootstrapMean(num_samples=1000), | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also move those into a .toml
config file that lives in the pensieve repository instead of hardcode them. It might be easier to find and change if there is a separate file for this and could also serve as a configuration example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good!
I think maybe we should drop "treatment" everywhere in favor of "statistic" just to avoid introducing a new noun. If you don't like "pretreatment" without a "treatment" then maybe they're "filter"s but either seems fine to me.
of the experiment. | ||
""" | ||
|
||
ref_branch_label: str = "control" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it fair to assume that every statistic has ref_branch_label
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not every, but many statistics will be interested in making a comparison to the control group.
if statistic.name() == statistic_name: | ||
pre_treatments = [pt.resolve for pt in self.pre_treatments] | ||
|
||
if "ref_branch_label" not in params: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So unless ref_branch_label
is explicitly set in the config we'll use the control branch from experimenter. Or else it's just the default "control"
branch
pensieve/pre_treatment.py
Outdated
"""Return snake-cased name of the statistic.""" | ||
return re.sub(r"(?<!^)(?=[A-Z])", "_", cls.__name__).lower() | ||
|
||
#@abstractmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mypy doesn't agree with this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hype for this.
This PR adds support for defining and processing statistics. Currently still a WIP, but I think it's makes sense to see if this is going in the right direction.