add some benchmarking guidance #267

oojo12 · 2022-11-07T14:10:36Z

This PR adds some documentation to the Contribute.MD. It aims to provide some guidance for writing benchmarks. At a later time, a benchmark will be referenced in the section to assist the community.

Resolves #265

YuhanLiin · 2022-11-08T01:13:54Z

CONTRIBUTE.md

+
+1. Test for a variety of sample sizes for most algorithms [1_000, 10_000, 20_000] will be sufficient
+2. Test of variety feature dimensions
+    - Spatial algorithms: [3, 5, 8]


3 and 8 are representative enough.

YuhanLiin · 2022-11-08T01:14:42Z

CONTRIBUTE.md

+    - Spatial algorithms: [3, 5, 8]
+    - Dimentionality reduction algorithms: [10, 20, 50]
+    - Others: [5, 10]
+3. Use Criterion


Go into why we use Criterion over Iai.

YuhanLiin · 2022-11-08T01:15:08Z

CONTRIBUTE.md

+    - Dimentionality reduction algorithms: [10, 20, 50]
+    - Others: [5, 10]
+3. Use Criterion
+4. Test various alg implementations


Be more specific and use examples like PLS

YuhanLiin · 2022-11-08T01:17:22Z

CONTRIBUTE.md

+3. Use Criterion
+4. Test various alg implementations
+5. Set a random seed for algorithm if applicable
+6. Test multi-target case if algorithm supports it: [4 targets]


Mention that generally we only want to benchmark one or two target counts, and give a range (3 to 8), for example

YuhanLiin · 2022-11-08T01:25:32Z

CONTRIBUTE.md

+    - Others: [5, 10]
+3. Use Criterion
+4. Test various alg implementations
+5. Set a random seed for algorithm if applicable


"For algorithms that require an RNG or random seed as input, use a constant seed for reproducibility"

YuhanLiin · 2022-11-08T01:34:25Z

CONTRIBUTE.md

+It is important to the project that we have benchmarks in place to evaluate the benefit of performance related changes. To make that process easier we provide some guidelines for writing benchmarks.
+
+1. Test for a variety of sample sizes for most algorithms [1_000, 10_000, 20_000] will be sufficient
+2. Test of variety feature dimensions


"Test for a variety of feature dimensions. Two is usually enough for most algorithms. The following are suggested feature dimensions to use for different types of algorithms:"

YuhanLiin · 2022-11-08T01:35:03Z

CONTRIBUTE.md

+
+It is important to the project that we have benchmarks in place to evaluate the benefit of performance related changes. To make that process easier we provide some guidelines for writing benchmarks.
+
+1. Test for a variety of sample sizes for most algorithms [1_000, 10_000, 20_000] will be sufficient


For algorithms where it's not too slow, use 100k instead of 20k

codecov-commenter · 2022-11-08T02:14:37Z

Codecov Report

Base: 38.68% // Head: 38.75% // Increases project coverage by +0.06% 🎉

Coverage data is based on head (2a7927e) compared to base (fb17c62).
Patch has no changes to coverable lines.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #267      +/-   ##
==========================================
+ Coverage   38.68%   38.75%   +0.06%     
==========================================
  Files          93       93              
  Lines        6087     6087              
==========================================
+ Hits         2355     2359       +4     
+ Misses       3732     3728       -4

Impacted Files	Coverage Δ
...rithms/linfa-trees/src/decision_trees/algorithm.rs	`39.73% <0.00%> (+1.78%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

YuhanLiin · 2022-11-08T02:47:17Z

CONTRIBUTE.md

+3. Use Criterion. Iai another popular benchmarking tool is not being actively maintained at the moment and at the time of this writing it hasn't been updated since Feb 25, 2021.
+4. Test various alg implementations for instance Pls has the following algorithms: Nipals and Svd.
+5. For algorithms that require an RNG or random seed as input, use a constant seed for reproducibility
+6. In most cases we only want to benchmark 1D or 2D targets. When benchmarking 2D targets the 2nd axis should be within the following range: [2, 4].


Delete the first sentence. In the 2nd sentence put "multi-target" instead of "2D targets" and "target count" instead of "2nd axis".

add some benchmarking guidance

0573725

YuhanLiin reviewed Nov 8, 2022

View reviewed changes

address feedback

4619e22

YuhanLiin reviewed Nov 8, 2022

View reviewed changes

YuhanLiin approved these changes Nov 8, 2022

View reviewed changes

oojo12 and others added 3 commits November 7, 2022 21:57

address feedback

0e22e60

Merge branch 'master' into doc_benchmark_standards

2a7927e

note on data creation for benchmark

d3a71d8

YuhanLiin approved these changes Nov 8, 2022

View reviewed changes

YuhanLiin merged commit 66877ac into rust-ml:master Nov 8, 2022

oojo12 deleted the doc_benchmark_standards branch November 9, 2022 01:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add some benchmarking guidance #267

add some benchmarking guidance #267

oojo12 commented Nov 7, 2022

YuhanLiin Nov 8, 2022

YuhanLiin Nov 8, 2022

YuhanLiin Nov 8, 2022

YuhanLiin Nov 8, 2022

YuhanLiin Nov 8, 2022 •

edited

Loading

YuhanLiin Nov 8, 2022

YuhanLiin Nov 8, 2022

codecov-commenter commented Nov 8, 2022 •

edited

Loading

YuhanLiin Nov 8, 2022


		It is important to the project that we have benchmarks in place to evaluate the benefit of performance related changes. To make that process easier we provide some guidelines for writing benchmarks.

		1. Test for a variety of sample sizes for most algorithms [1_000, 10_000, 20_000] will be sufficient

add some benchmarking guidance #267

add some benchmarking guidance #267

Conversation

oojo12 commented Nov 7, 2022

YuhanLiin Nov 8, 2022

Choose a reason for hiding this comment

YuhanLiin Nov 8, 2022

Choose a reason for hiding this comment

YuhanLiin Nov 8, 2022

Choose a reason for hiding this comment

YuhanLiin Nov 8, 2022

Choose a reason for hiding this comment

YuhanLiin Nov 8, 2022 • edited Loading

Choose a reason for hiding this comment

YuhanLiin Nov 8, 2022

Choose a reason for hiding this comment

YuhanLiin Nov 8, 2022

Choose a reason for hiding this comment

codecov-commenter commented Nov 8, 2022 • edited Loading

Codecov Report

YuhanLiin Nov 8, 2022

Choose a reason for hiding this comment

YuhanLiin Nov 8, 2022 •

edited

Loading

codecov-commenter commented Nov 8, 2022 •

edited

Loading