Skip to contents

Generate CIBRA impact score permutation distribution

Usage

generate_permutation_dist(
  data,
  case_list,
  control_list,
  control_definition,
  confidence,
  iterations,
  covariates = c(),
  covariate_matrix = NULL,
  parallel = FALSE,
  method = "DESeq2",
  permutation = "full"
)

Arguments

data

RNA count dataframe with genes as rows and samples as columns

case_list

vector of number of cases to test (vector)

control_list

vector of number of controls to test (vector)

control_definition

Definition term that will be used as reference for the comparison (e.g. WT)

confidence

alpha threshold to calculate the proportion (default is 0.1)

iterations

number of interations (int)

covariates

list of column names from the definition matrix to use as covariates (supported only with DESeq2)

covariate_matrix

design dataframe of the covariates, columns to take along as covariate values and samples as rownames.

parallel

boolean status if the run should be done in parallel (boolean)

method

method to perform the differential expression analysis, supported methods are DESeq2, edgeR and limma

permutation

permutation style to perform, either full where all values in the matrix are permuted or sample if only the samples should be permuted.

Value

list of permutation results (dataframe), p-value, adjusted p-value and foldchange generated from the DE analysis from the permutations

Examples


# load transcriptomics data
count_data <- CIBRA::TCGA_CRC_rna_data

# subset sample to have a quicker run-time for the example
count_data <- count_data[,1:50]

# set parameters for the reference distribution
control_definition <- "NO_SNV"
confidence <- 0.1 # use the same confidence as used in standard analysis
iterations <- 9 # recommended are at least 1000 permutations to explore the full space

# create lists of different case and control sizes
case_list <- seq(10, ncol(count_data), length.out = 3) # recommended value is at least 20 different case and control values
control_list <- seq(10, ncol(count_data), length.out = 3)

# run permutation screen
CIBRA_res <- generate_permutation_dist(data = count_data, case_list = case_list, control_list = control_list, control_definition = control_definition, confidence = confidence, iterations = iterations, parallel = FALSE, permutation = "sample")
#> [1] "testing: 10and 10"
#> [1] "cases: 10, controls: 10, iteration: 1"
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> -- replacing outliers and refitting for 3316 genes
#> -- DESeq argument 'minReplicatesForReplace' = 7 
#> -- original counts are preserved in counts(dds)
#> estimating dispersions
#> fitting model and testing
#> [1] "cases: 10, controls: 10, iteration: 2"
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> -- replacing outliers and refitting for 1598 genes
#> -- DESeq argument 'minReplicatesForReplace' = 7 
#> -- original counts are preserved in counts(dds)
#> estimating dispersions
#> fitting model and testing
#> [1] "cases: 10, controls: 10, iteration: 3"
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> -- replacing outliers and refitting for 2953 genes
#> -- DESeq argument 'minReplicatesForReplace' = 7 
#> -- original counts are preserved in counts(dds)
#> estimating dispersions
#> fitting model and testing
#> [1] "Done with cases: 10, controls: 10"
#> [1] "testing: 30and 10"
#> [1] "cases: 30, controls: 10, iteration: 1"
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> -- replacing outliers and refitting for 2803 genes
#> -- DESeq argument 'minReplicatesForReplace' = 7 
#> -- original counts are preserved in counts(dds)
#> estimating dispersions
#> fitting model and testing
#> Warning: One or both parameters are on the limit of the defined parameter space
#> [1] "cases: 30, controls: 10, iteration: 2"
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> -- replacing outliers and refitting for 3228 genes
#> -- DESeq argument 'minReplicatesForReplace' = 7 
#> -- original counts are preserved in counts(dds)
#> estimating dispersions
#> fitting model and testing
#> [1] "cases: 30, controls: 10, iteration: 3"
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> -- replacing outliers and refitting for 2822 genes
#> -- DESeq argument 'minReplicatesForReplace' = 7 
#> -- original counts are preserved in counts(dds)
#> estimating dispersions
#> fitting model and testing
#> [1] "Done with cases: 30, controls: 10"
#> [1] "testing: 10and 30"
#> [1] "cases: 10, controls: 30, iteration: 1"
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> -- replacing outliers and refitting for 3031 genes
#> -- DESeq argument 'minReplicatesForReplace' = 7 
#> -- original counts are preserved in counts(dds)
#> estimating dispersions
#> fitting model and testing
#> [1] "cases: 10, controls: 30, iteration: 2"
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> -- replacing outliers and refitting for 3133 genes
#> -- DESeq argument 'minReplicatesForReplace' = 7 
#> -- original counts are preserved in counts(dds)
#> estimating dispersions
#> fitting model and testing
#> Warning: One or both parameters are on the limit of the defined parameter space
#> [1] "cases: 10, controls: 30, iteration: 3"
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
#> -- replacing outliers and refitting for 2862 genes
#> -- DESeq argument 'minReplicatesForReplace' = 7 
#> -- original counts are preserved in counts(dds)
#> estimating dispersions
#> fitting model and testing
#> [1] "Done with cases: 10, controls: 30"