Skip to content

Commit 13e6ebf

Browse files
committed
cran submission 1
1 parent 5ee3f6d commit 13e6ebf

File tree

3 files changed

+8
-12
lines changed

3 files changed

+8
-12
lines changed

R/hpe.R

+3-3
Original file line numberDiff line numberDiff line change
@@ -132,11 +132,11 @@ hpe <- function(A, B, D, L, p = 101, alg = "brute_force",alpha=F,gammas=F) {
132132
#ab <- bn / (an+bn)
133133
}
134134

135-
if (alphas & gammas){
135+
if (alpha & gammas){
136136
fin <- list(h=he, gammas=c(gw =gAr, gb = gB), alpha=aw)
137-
} else if (alphas & !gammas) {
137+
} else if (alpha & !gammas) {
138138
fin <- list(h=he, alpha=aw)
139-
} else if (!alphas & gammas) {
139+
} else if (!alpha & gammas) {
140140
fin <- list(h=he, gammas=c(gw =gAr, gb = gB))
141141
}
142142

vignettes/.DS_Store

0 Bytes
Binary file not shown.

vignettes/fasthplus.Rmd

+5-9
Original file line numberDiff line numberDiff line change
@@ -144,9 +144,7 @@ where $\alpha \in (0,1)$ is the proportion of total distances $N_{d}$ that are w
144144

145145
**Interpreting $H_{+}$ with two sets $A$ and $B$**.
146146

147-
$H_{+}$ can be thought of as the product of two parameters $\gamma_{A},\gamma_{B}$, which lends itself to a simple interpretation for $H_{+}$: $\gamma_{A}\times100\%$ of $a \in A$ (or $d_{ij}\in D_{W}$) are strictly greater than $\gamma_{B}\times100\%$ of $b \in B$ (or $d_{kl}\in D_{B}$).
148-
149-
We provide two equivalent algorithms for this estimation process, with the further benefit that our algorithms yields a range of reasonable values for $\gamma_{A},\gamma_{B}$. For further exploration of discordance, these estimators ($G_{+}$ and $H_{+}$), as well as their theoretical properties, please see Dyjack et al. (2022).
147+
$H_{+}$ can be thought of as the product of two parameters $\gamma_{A},\gamma_{B}$, which lends itself to a simple interpretation for $H_{+}$: $\gamma_{A}\times100\%$ of $a \in A$ (or $d_{ij}\in D_{W}$) are strictly greater than $\gamma_{B}\times100\%$ of $b \in B$ (or $d_{kl}\in D_{B}$). We provide two equivalent algorithms for this estimation process, with the further benefit that our algorithms yields a range of reasonable values for $\gamma_{A},\gamma_{B}$. For further exploration of discordance, these estimators ($G_{+}$ and $H_{+}$), as well as their theoretical properties, please see Dyjack et al. (2022).
150148

151149
### Approximating $H_{+}$
152150

@@ -180,25 +178,23 @@ As input, `hpe()` can estimate $H_{+}$ using **either**:
180178
1. Two arbitrary vectors: $A = \{a_1, \ldots \}$ and $B = \{b_1, \ldots \}$
181179
2. A dissimilarity matrix $D$ of dimension $n \times n$ and set of labels $L$ of length $n$
182180

183-
The `hpe()` function returns a numeric value for $H_{+}$ given $A,B$ or $D,L$, or it returns a list with the additional parameters (alpha and gamma) specified. In comparison to similar cluster-fitness packages (`clusterCrit`), which induce Euclidean distance from the observations, `fasthplus` is designed to handle an arbitrary dissimilarity matrix.
181+
The `hpe()` function returns a numeric value for $H_{+}$ given $A,B$ or $D,L$, or it returns a list with the additional parameters (`alpha` and `gammas`) specified. In comparison to similar cluster-fitness packages (`clusterCrit`), which induce Euclidean distance from the observations, `fasthplus` is designed to handle an arbitrary dissimilarity matrix.
184182

185183
## Additional arguments
186184

187-
In addition to either $A,B$ or $D,L$, the user may also provide `hpe()` with four additional arguments: `p`, `alg`, `alpha`, `gammas`. We describe these in greater detail below.
185+
In addition to either $A,B$ or $D,L$, the user may also provide `hpe()` with four additional arguments: `p`, `alg`, `alpha`, `gammas`.
188186

189187
### `p`
190188

191-
An `integer` value that specifies the number of percentiles to calculate for each set $A,B$.
192-
`p` is used to specify the desired accuracy of the $H_{+}$ estimate, where the theoretical accuracy is guaranteed within $\pm \frac{1}{p-1}$.
193-
The default value `p=101`, that is, an accuracy of 0.01.
189+
An `integer` value that specifies the number of percentiles to calculate for each set $A,B$. In Dyjack et al. (2022), we derive a numerical bound for the accuracy of this estimator as a function of p, the number of percentiles taken from the two sets of interest. The user can specify p, and hpe() guarantees accuracy within ±1p−1 of the true H+. The default value `p=101`, that is, an accuracy of 0.01.
194190

195191
### `alg`
196192

197193
A character string referring to two algorithms used to estimate $H_{+}$.
198194
For most values of $p$ the algorithms have comparable performance.
199195

200196
- `alg = "brute_force"` (default): this is a 'brute-force' estimation that performs best (in `R`) for smaller values of $p$.
201-
- `alg = "grid_search"`: this is a percentile-based ($p$), grid search solution for the same value that dramatically reduces the number of comparisons compared to the number of calculations than Algorithm 1. In Dyjack et al. (2022), we derive a numerical bound for the accuracy of this estimator as a function of $p$, the number of percentiles taken from the two sets of interest. The user can specify $p$, and `hpe()` guarantees accuracy within $\pm \frac{1}{p-1}$ of the true $H_{+}$.
197+
- `alg = "grid_search"`: this is a percentile-based ($p$), grid search solution for the same value that dramatically reduces the number of comparisons compared to the number of calculations than Algorithm 1.
202198

203199
In practice, the two algorithms have similar performance for most values of $p$. However, as Algorithm 2 performs strictly less calculations than Algorithm 1, we suggest Algorithm 2 for any $p > 101$.
204200

0 commit comments

Comments
 (0)