
Generate CIBRA impact score permutation distribution
generate_permutation_dist.Rd
Generate CIBRA impact score permutation distribution
Usage
generate_permutation_dist(
data,
case_list,
control_list,
control_definition,
confidence,
iterations,
covariates = c(),
covariate_matrix = NULL,
parallel = FALSE,
method = "DESeq2",
permutation = "full"
)
Arguments
- data
RNA count dataframe with genes as rows and samples as columns
- case_list
vector of number of cases to test (vector)
- control_list
vector of number of controls to test (vector)
- control_definition
Definition term that will be used as reference for the comparison (e.g. WT)
- confidence
alpha threshold to calculate the proportion (default is 0.1)
- iterations
number of interations (int)
- covariates
list of column names from the definition matrix to use as covariates (supported only with DESeq2)
- covariate_matrix
design dataframe of the covariates, columns to take along as covariate values and samples as rownames.
- parallel
boolean status if the run should be done in parallel (boolean)
- method
method to perform the differential expression analysis, supported methods are DESeq2, edgeR and limma
- permutation
permutation style to perform, either full where all values in the matrix are permuted or sample if only the samples should be permuted.
Value
list of permutation results (dataframe), p-value, adjusted p-value and foldchange generated from the DE analysis from the permutations
Examples
# load transcriptomics data
count_data <- CIBRA::TCGA_CRC_rna_data
# subset sample to have a quicker run-time for the example
count_data <- count_data[,1:50]
# set parameters for the reference distribution
control_definition <- "NO_SNV"
confidence <- 0.1 # use the same confidence as used in standard analysis
iterations <- 9 # recommended are at least 1000 permutations to explore the full space
# create lists of different case and control sizes
case_list <- seq(10, ncol(count_data), length.out = 3) # recommended value is at least 20 different case and control values
control_list <- seq(10, ncol(count_data), length.out = 3)
# run permutation screen
CIBRA_res <- generate_permutation_dist(data = count_data, case_list = case_list, control_list = control_list, control_definition = control_definition, confidence = confidence, iterations = iterations, parallel = FALSE, permutation = "sample")
#> [1] "testing: 10and 10"
#> [1] "cases: 10, controls: 10, iteration: 1"
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> -- replacing outliers and refitting for 3316 genes
#> -- DESeq argument 'minReplicatesForReplace' = 7
#> -- original counts are preserved in counts(dds)
#> estimating dispersions
#> fitting model and testing
#> [1] "cases: 10, controls: 10, iteration: 2"
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> -- replacing outliers and refitting for 1598 genes
#> -- DESeq argument 'minReplicatesForReplace' = 7
#> -- original counts are preserved in counts(dds)
#> estimating dispersions
#> fitting model and testing
#> [1] "cases: 10, controls: 10, iteration: 3"
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> -- replacing outliers and refitting for 2953 genes
#> -- DESeq argument 'minReplicatesForReplace' = 7
#> -- original counts are preserved in counts(dds)
#> estimating dispersions
#> fitting model and testing
#> [1] "Done with cases: 10, controls: 10"
#> [1] "testing: 30and 10"
#> [1] "cases: 30, controls: 10, iteration: 1"
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> -- replacing outliers and refitting for 2803 genes
#> -- DESeq argument 'minReplicatesForReplace' = 7
#> -- original counts are preserved in counts(dds)
#> estimating dispersions
#> fitting model and testing
#> Warning: One or both parameters are on the limit of the defined parameter space
#> [1] "cases: 30, controls: 10, iteration: 2"
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> -- replacing outliers and refitting for 3228 genes
#> -- DESeq argument 'minReplicatesForReplace' = 7
#> -- original counts are preserved in counts(dds)
#> estimating dispersions
#> fitting model and testing
#> [1] "cases: 30, controls: 10, iteration: 3"
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> -- replacing outliers and refitting for 2822 genes
#> -- DESeq argument 'minReplicatesForReplace' = 7
#> -- original counts are preserved in counts(dds)
#> estimating dispersions
#> fitting model and testing
#> [1] "Done with cases: 30, controls: 10"
#> [1] "testing: 10and 30"
#> [1] "cases: 10, controls: 30, iteration: 1"
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> -- replacing outliers and refitting for 3031 genes
#> -- DESeq argument 'minReplicatesForReplace' = 7
#> -- original counts are preserved in counts(dds)
#> estimating dispersions
#> fitting model and testing
#> [1] "cases: 10, controls: 30, iteration: 2"
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> -- replacing outliers and refitting for 3133 genes
#> -- DESeq argument 'minReplicatesForReplace' = 7
#> -- original counts are preserved in counts(dds)
#> estimating dispersions
#> fitting model and testing
#> Warning: One or both parameters are on the limit of the defined parameter space
#> [1] "cases: 10, controls: 30, iteration: 3"
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> -- replacing outliers and refitting for 2862 genes
#> -- DESeq argument 'minReplicatesForReplace' = 7
#> -- original counts are preserved in counts(dds)
#> estimating dispersions
#> fitting model and testing
#> [1] "Done with cases: 10, controls: 30"