Title: | Power Analysis for PLS Classification |
---|---|
Description: | It estimates power and sample size for Partial Least Squares-based methods described in Andreella, et al., (2024), <doi:10.48550/arXiv.2403.10289>. |
Authors: | Angela Andreella [aut, cre] (Main author, <https://orcid.org/0000-0002-1141-3041>) |
Maintainer: | Angela Andreella <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.2.0 |
Built: | 2025-01-08 05:28:21 UTC |
Source: | https://github.com/angeella/powerpls |
59 post-mortem aqueous humor samples collected from closed and opened sheep eyes
aqueous_humour
aqueous_humour
A data frame with 59 rows and 45 variables:
ID observation
class membership (C, O)
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
metabolic values
Angela Andreella [email protected]
https://link.springer.com/article/10.1007/s11306-019-1533-2
Estimates power for a given sample size, type I error level and number of score components.
computePower(X, Y, A, n, seed = 123, Nsim = 100, nperm = 200, alpha = 0.05, scaling = "auto-scaling", test = "R2", Y.prob = FALSE, eps = 0.01, post.transformation = TRUE, fast=FALSE,transformation = "clr")
computePower(X, Y, A, n, seed = 123, Nsim = 100, nperm = 200, alpha = 0.05, scaling = "auto-scaling", test = "R2", Y.prob = FALSE, eps = 0.01, post.transformation = TRUE, fast=FALSE,transformation = "clr")
X |
Data matrix where columns represent the |
Y |
Data matrix where columns represent the two classes and
rows the |
A |
Number of score components |
n |
Sample size |
seed |
Seed value |
Nsim |
Number of simulations |
nperm |
Number of permutations |
alpha |
Type I error level |
scaling |
Type of scaling, one of
|
test |
Type of test statistic, one of |
Y.prob |
Boolean value. Default |
eps |
Default 0.01. |
post.transformation |
Boolean value. |
fast |
Use the function |
transformation |
Transformation used to map |
Returns a matrix of estimated power for each number of components and tests selected.
Angela Andreella
For the general framework of power analysis for PLS-based methods see:
Andreella, A., Fino, L., Scarpa, B., & Stocchero, M. (2024). Towards a power analysis for PLS-based methods. arXiv preprint https://arxiv.org/abs/2403.10289.
## Not run: datas <- simulatePilotData(nvar = 10, clus.size = c(5,5),m = 6,nvar_rel = 5,A = 2) out <- computePower(X = datas$X, Y = datas$Y, A = 3, n = 20, test = "R2") ## End(Not run)
## Not run: datas <- simulatePilotData(nvar = 10, clus.size = c(5,5),m = 6,nvar_rel = 5,A = 2) out <- computePower(X = datas$X, Y = datas$Y, A = 3, n = 20, test = "R2") ## End(Not run)
Compute optimal sample size
computeSampleSize(n, X, Y, A, alpha, beta, nperm, Nsim, seed, test = "R2",...)
computeSampleSize(n, X, Y, A, alpha, beta, nperm, Nsim, seed, test = "R2",...)
n |
Vector of sample sizes to consider |
X |
Data matrix where columns represent the |
Y |
Data matrix where columns represent the two classes and
rows the |
A |
Number of score components |
alpha |
Type I error level. Default to 0.05 |
beta |
Type II error level. Default to 0.2. |
nperm |
Number of permutations. Default to 100. |
Nsim |
Number of simulations. Default to 100. |
seed |
Seed value |
test |
Type of test, one of |
... |
Further parameters. |
Returns a data frame that contains the estimated power for each sample size and number of components considered
Angela Andreella
For the general framework of power analysis for PLS-based methods see:
Andreella, A., Fino, L., Scarpa, B., & Stocchero, M. (2024). Towards a power analysis for PLS-based methods. arXiv preprint https://arxiv.org/abs/2403.10289.
## Not run: datas <- simulatePilotData(nvar = 10, clus.size = c(5,5),m = 6,nvar_rel = 5,A = 2) out <- computeSampleSize(X = datas$X, Y = datas$Y, A = 2, A = 3, n = 20, test = "R2") ## End(Not run)
## Not run: datas <- simulatePilotData(nvar = 10, clus.size = c(5,5),m = 6,nvar_rel = 5,A = 2) out <- computeSampleSize(X = datas$X, Y = datas$Y, A = 2, A = 3, n = 20, test = "R2") ## End(Not run)
Performs permutation-based test based on Matthews Correlation Coefficient
mccTest(X, Y, nperm = 200, A, randomization = FALSE, Y.prob = FALSE, eps = 0.01, scaling = "auto-scaling", post.transformation = TRUE)
mccTest(X, Y, nperm = 200, A, randomization = FALSE, Y.prob = FALSE, eps = 0.01, scaling = "auto-scaling", post.transformation = TRUE)
X |
data matrix where columns represent the |
Y |
data matrix where columns represent the two classes and
rows the |
nperm |
number of permutations. Default to 200. |
A |
number of score components |
randomization |
Boolean value. Default to |
Y.prob |
Boolean value. Default |
eps |
Default 0.01. |
scaling |
Type of scaling, one of
|
post.transformation |
Boolean value. |
List with the following objects:
raw p-value. It equals NA
if randomization = FALSE
adjusted p-value. It equals NA
if randomization = FALSE
estimated test statistic
Angela Andreella
For the general framework of power analysis for PLS-based methods see:
Andreella, A., Fino, L., Scarpa, B., & Stocchero, M. (2024). Towards a power analysis for PLS-based methods. arXiv preprint https://arxiv.org/abs/2403.10289.
Other test statistics implemented: scoreTest
R2Test
.
datas <- simulatePilotData(nvar = 30, clus.size = c(5,5),m = 6,nvar_rel = 5,A = 1) out <- mccTest(X = datas$X, Y = datas$Y, A = 1) out
datas <- simulatePilotData(nvar = 30, clus.size = c(5,5),m = 6,nvar_rel = 5,A = 1) out <- mccTest(X = datas$X, Y = datas$Y, A = 1) out
Performs Partial Least Squares classification
PLSc(X, Y, A, scaling = "auto-scaling", post.transformation = TRUE, eps = 0.01, Y.prob = FALSE, transformation = "ilr")
PLSc(X, Y, A, scaling = "auto-scaling", post.transformation = TRUE, eps = 0.01, Y.prob = FALSE, transformation = "ilr")
X |
Data matrix where columns represent the |
Y |
Data matrix where columns represent the two classes and
rows the |
A |
Number of score components |
scaling |
Type of scaling, one of
|
post.transformation |
Boolean value. |
eps |
Default 0.01. |
Y.prob |
Boolean value. Default |
transformation |
Transformation used to map |
List with the following objects:
Matrix of weights
Matrix of X
loading
Matrix of Y
loading
Matrix of X
data (predictor variables)
Matrix of Y
data (dependent variable)
Matrix of scores
Fitted Y
matrix
Matrix regression coefficients
Number of orthogonal components if post.transformation=TRUE
is applied.
Angela Andreella
Stocchero, M., De Nardi, M., & Scarpa, B. (2021). PLS for classification. Chemometrics and Intelligent Laboratory Systems, 216, 104374.
datas <- simulatePilotData(nvar = 30, clus.size = c(5,5),m = 6,nvar_rel = 5,A = 2) out <- PLSc(X = datas$X, Y = datas$Y, A = 3)
datas <- simulatePilotData(nvar = 30, clus.size = c(5,5),m = 6,nvar_rel = 5,A = 2) out <- PLSc(X = datas$X, Y = datas$Y, A = 3)
Performs permutation-based test based on R2
R2Test(X, Y, nperm = 100, A, randomization = FALSE, Y.prob = FALSE, eps = 0.01, scaling = "auto-scaling", post.transformation = TRUE)
R2Test(X, Y, nperm = 100, A, randomization = FALSE, Y.prob = FALSE, eps = 0.01, scaling = "auto-scaling", post.transformation = TRUE)
X |
data matrix where columns represent the |
Y |
data matrix where columns represent the two classes and
rows the |
nperm |
number of permutations. Default to 200. |
A |
number of score components |
randomization |
Boolean value. Default to |
Y.prob |
Boolean value. Default |
eps |
Default 0.01. |
scaling |
Type of scaling, one of
|
post.transformation |
Boolean value. |
List with the following objects:
raw p-value. It equals NA
if randomization = FALSE
adjusted p-value. It equals NA
if randomization = FALSE
estimated test statistic
Angela Andreella
For the general framework of power analysis for PLS-based methods see:
Andreella, A., Fino, L., Scarpa, B., & Stocchero, M. (2024). Towards a power analysis for PLS-based methods. arXiv preprint https://arxiv.org/abs/2403.10289.
Other test statistics implemented: mccTest
scoreTest
.
datas <- simulatePilotData(nvar = 30, clus.size = c(5,5),m = 6,nvar_rel = 5,A = 2) out <- R2Test(X = datas$X, Y = datas$Y, A = 1) out
datas <- simulatePilotData(nvar = 30, clus.size = c(5,5),m = 6,nvar_rel = 5,A = 2) out <- R2Test(X = datas$X, Y = datas$Y, A = 1) out
Performs permutation-based test based on predictive score vector
scoreTest(X, Y, nperm = 200, A, randomization = FALSE, Y.prob = FALSE, eps = 0.01, scaling = "auto-scaling", post.transformation = TRUE)
scoreTest(X, Y, nperm = 200, A, randomization = FALSE, Y.prob = FALSE, eps = 0.01, scaling = "auto-scaling", post.transformation = TRUE)
X |
data matrix where columns represent the |
Y |
data matrix where columns represent the two classes and
rows the |
nperm |
number of permutations. Default to 200. |
A |
number of score components |
randomization |
Boolean value. Default to |
Y.prob |
Boolean value. Default |
eps |
Default 0.01. |
scaling |
Type of scaling, one of
|
post.transformation |
Boolean value. |
List with the following objects:
raw p-value. It equals NA
if randomization = FALSE
adjusted p-value. It equals NA
if randomization = FALSE
estimated test statistic
Angela Andreella
For the general framework of power analysis for PLS-based methods see:
Andreella, A., Fino, L., Scarpa, B., & Stocchero, M. (2024). Towards a power analysis for PLS-based methods. arXiv preprint https://arxiv.org/abs/2403.10289.
Other test statistics implemented: mccTest
R2Test
.
datas <- simulatePilotData(nvar = 30, clus.size = c(5,5),m = 6,nvar_rel = 5,A = 2) out <- scoreTest(X = datas$X, Y = datas$Y, A = 1) out
datas <- simulatePilotData(nvar = 30, clus.size = c(5,5),m = 6,nvar_rel = 5,A = 2) out <- scoreTest(X = datas$X, Y = datas$Y, A = 1) out
Simulate data matrix under the alternative hypothesis with n
observations by kernel density estimation
sim_XY(out, n, seed = 123, post.transformation = TRUE, A, fast = FALSE)
sim_XY(out, n, seed = 123, post.transformation = TRUE, A, fast = FALSE)
out |
Output from |
n |
Number of observations to simulate |
seed |
Seed value |
post.transformation |
Boolean value. Default to |
A |
Number of score components used in |
fast |
Use the function |
Returns a list:
dependent variable, matrix with 2 columns and n
rows (observations)
predictor variables, matrix with n
rows (observations) and number of columns equal to out$X
(i.e., original dataset)
Angela Andreella
For the general framework of power analysis for PLS-based methods see:
Andreella, A., Fino, L., Scarpa, B., & Stocchero, M. (2024). Towards a power analysis for PLS-based methods. arXiv preprint https://arxiv.org/abs/2403.10289.
datas <- simulatePilotData(nvar = 10, clus.size = c(5,5),m = 6,nvar_rel = 5,A = 2) out <- PLSc(X = datas$X, Y = datas$Y, A = 3) out_sim <- sim_XY(out = out, n = 10, A = 3)
datas <- simulatePilotData(nvar = 10, clus.size = c(5,5),m = 6,nvar_rel = 5,A = 2) out <- PLSc(X = datas$X, Y = datas$Y, A = 3) out_sim <- sim_XY(out = out, n = 10, A = 3)
Simulate cluster pilot data
simulatePilotData(seed = 123, nvar, clus.size, nvar_rel,m, A = 2, S1 = NULL, S2 = NULL)
simulatePilotData(seed = 123, nvar, clus.size, nvar_rel,m, A = 2, S1 = NULL, S2 = NULL)
seed |
Seed value |
nvar |
Number of variables |
clus.size |
Vector of two elements, specifying the size of classes (only two classes are considered) |
nvar_rel |
Number of variables relevant to predict the dependent variable |
m |
Effect size of separation between classes |
A |
Oracle number of score components |
S1 |
Covariance matrix for the first class. Default |
S2 |
Covariance matrix for the second class. Default |
Angela Andreella @return List with the following objects:
matrix of predictor variables with nvar
columns and the sum of clus.size
values as number of rows.
vector of dependent variable with the sum of clus.size
values as length
For the general framework of power analysis for PLS-based methods see:
Andreella, A., Fino, L., Scarpa, B., & Stocchero, M. (2024). Towards a power analysis for PLS-based methods. arXiv preprint https://arxiv.org/abs/2403.10289.
datas <- simulatePilotData(nvar = 10, clus.size = c(5,5),m = 6,nvar_rel = 5,A = 2)
datas <- simulatePilotData(nvar = 10, clus.size = c(5,5),m = 6,nvar_rel = 5,A = 2)
32 urine samples from children at risk of early-onset asthma and those with transient wheezing.
wheezing
wheezing
A data frame with 32 rows and 176 variables
Angela Andreella [email protected]
https://onlinelibrary.wiley.com/doi/10.1111/pai.12879