You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: vignettes/fasthplus.Rmd
+5-9
Original file line number
Diff line number
Diff line change
@@ -144,9 +144,7 @@ where $\alpha \in (0,1)$ is the proportion of total distances $N_{d}$ that are w
144
144
145
145
**Interpreting $H_{+}$ with two sets $A$ and $B$**.
146
146
147
-
$H_{+}$ can be thought of as the product of two parameters $\gamma_{A},\gamma_{B}$, which lends itself to a simple interpretation for $H_{+}$: $\gamma_{A}\times100\%$ of $a \in A$ (or $d_{ij}\in D_{W}$) are strictly greater than $\gamma_{B}\times100\%$ of $b \in B$ (or $d_{kl}\in D_{B}$).
148
-
149
-
We provide two equivalent algorithms for this estimation process, with the further benefit that our algorithms yields a range of reasonable values for $\gamma_{A},\gamma_{B}$. For further exploration of discordance, these estimators ($G_{+}$ and $H_{+}$), as well as their theoretical properties, please see Dyjack et al. (2022).
147
+
$H_{+}$ can be thought of as the product of two parameters $\gamma_{A},\gamma_{B}$, which lends itself to a simple interpretation for $H_{+}$: $\gamma_{A}\times100\%$ of $a \in A$ (or $d_{ij}\in D_{W}$) are strictly greater than $\gamma_{B}\times100\%$ of $b \in B$ (or $d_{kl}\in D_{B}$). We provide two equivalent algorithms for this estimation process, with the further benefit that our algorithms yields a range of reasonable values for $\gamma_{A},\gamma_{B}$. For further exploration of discordance, these estimators ($G_{+}$ and $H_{+}$), as well as their theoretical properties, please see Dyjack et al. (2022).
150
148
151
149
### Approximating $H_{+}$
152
150
@@ -180,25 +178,23 @@ As input, `hpe()` can estimate $H_{+}$ using **either**:
180
178
1. Two arbitrary vectors: $A = \{a_1, \ldots \}$ and $B = \{b_1, \ldots \}$
181
179
2. A dissimilarity matrix $D$ of dimension $n \times n$ and set of labels $L$ of length $n$
182
180
183
-
The `hpe()` function returns a numeric value for $H_{+}$ given $A,B$ or $D,L$, or it returns a list with the additional parameters (alpha and gamma) specified. In comparison to similar cluster-fitness packages (`clusterCrit`), which induce Euclidean distance from the observations, `fasthplus` is designed to handle an arbitrary dissimilarity matrix.
181
+
The `hpe()` function returns a numeric value for $H_{+}$ given $A,B$ or $D,L$, or it returns a list with the additional parameters (`alpha` and `gammas`) specified. In comparison to similar cluster-fitness packages (`clusterCrit`), which induce Euclidean distance from the observations, `fasthplus` is designed to handle an arbitrary dissimilarity matrix.
184
182
185
183
## Additional arguments
186
184
187
-
In addition to either $A,B$ or $D,L$, the user may also provide `hpe()` with four additional arguments: `p`, `alg`, `alpha`, `gammas`. We describe these in greater detail below.
185
+
In addition to either $A,B$ or $D,L$, the user may also provide `hpe()` with four additional arguments: `p`, `alg`, `alpha`, `gammas`.
188
186
189
187
### `p`
190
188
191
-
An `integer` value that specifies the number of percentiles to calculate for each set $A,B$.
192
-
`p` is used to specify the desired accuracy of the $H_{+}$ estimate, where the theoretical accuracy is guaranteed within $\pm \frac{1}{p-1}$.
193
-
The default value `p=101`, that is, an accuracy of 0.01.
189
+
An `integer` value that specifies the number of percentiles to calculate for each set $A,B$. In Dyjack et al. (2022), we derive a numerical bound for the accuracy of this estimator as a function of p, the number of percentiles taken from the two sets of interest. The user can specify p, and hpe() guarantees accuracy within ±1p−1 of the true H+. The default value `p=101`, that is, an accuracy of 0.01.
194
190
195
191
### `alg`
196
192
197
193
A character string referring to two algorithms used to estimate $H_{+}$.
198
194
For most values of $p$ the algorithms have comparable performance.
199
195
200
196
-`alg = "brute_force"` (default): this is a 'brute-force' estimation that performs best (in `R`) for smaller values of $p$.
201
-
-`alg = "grid_search"`: this is a percentile-based ($p$), grid search solution for the same value that dramatically reduces the number of comparisons compared to the number of calculations than Algorithm 1. In Dyjack et al. (2022), we derive a numerical bound for the accuracy of this estimator as a function of $p$, the number of percentiles taken from the two sets of interest. The user can specify $p$, and `hpe()` guarantees accuracy within $\pm \frac{1}{p-1}$ of the true $H_{+}$.
197
+
-`alg = "grid_search"`: this is a percentile-based ($p$), grid search solution for the same value that dramatically reduces the number of comparisons compared to the number of calculations than Algorithm 1.
202
198
203
199
In practice, the two algorithms have similar performance for most values of $p$. However, as Algorithm 2 performs strictly less calculations than Algorithm 1, we suggest Algorithm 2 for any $p > 101$.
0 commit comments