Add an optional cohort
block to science experiments
#170
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
(This is the first of several improvements to scientist based on extractions from the GitHub monolith)
This adds the concept of a "cohort" to an experiment result, to enable and encourage bucketed result publishing.
Many experiments operate on data with a very long tail, and the fat part of the distribution can completely wash out notable results in sub-groups with lower frequency. For example, experiment results derived from the data of very large customers often look quite different than the much more common results from the small data, yet the latter might be so much more common as to make the former statistically invisible. Even the use of percentile metrics can't overcome these effects since often the relevant percentiles are very high (above 99-percentile).
To address this issue, this PR adds an optional block to Science::Experiment which should return a "cohort" when called. The cohort is passed the result of the experiment so it can determine the cohort from the context data, whether the result is a mismatch or any of the observation data.
The determined cohort value is available as
Scientist::Result#cohort
and is intended to be used by the user-defined publication mechanism.Here's an example of how it might be used to segment the results of an experiment between "large" and "small" users: