Skip to contents

Runs several iterations of a full COI sensitivity analysis with varying parameters.

Usage

disc_sensitivity(
  repetitions = 10,
  coi = 3,
  max_coi = 25,
  plmaf = runif(1000, 0, 0.5),
  coverage = 200,
  alpha = 1,
  overdispersion = 0,
  relatedness = 0,
  epsilon = 0,
  seq_error = 0.01,
  bin_size = 20,
  comparison = "overall",
  distance = "squared",
  coi_method = "variant",
  use_bins = FALSE
)

Arguments

repetitions

The number of times each sample will be run.

coi

Complexity of infection.

max_coi

A number indicating the maximum COI to compare the simulated data to.

plmaf

Vector of population-level minor allele frequencies at each locus.

coverage

Coverage at each locus. If a single value is supplied then the same coverage is applied over all loci.

alpha

Shape parameter of the symmetric Dirichlet prior on strain proportions.

overdispersion

The extent to which counts are over-dispersed relative to the binomial distribution. Counts are Beta-Binomially distributed, with the beta distribution having shape parameters \(\frac{p}{overdispersion}\) and \(\frac{1-p}{overdispersion}\).

relatedness

The probability that a strain in mixed infections is related to another. The implementation is similar to relatedness as defined in THE REAL McCOIL simulations (doi:10.1371/journal.pcbi.1005348 ): "... simulated relatedness (r) among lineages within the same host by sampling alleles either from an existing lineage within the same host (with probability r) or from the population (with probability (1-r))."

epsilon

The probability of a single read being miscalled as the other allele. This error is applied in both directions.

seq_error

The level of sequencing error that is assumed. If no value is inputted, then we infer the level of sequence error.

bin_size

[Deprecated] This argument is no longer supported; to estimate the COI, all data points are used. Data points are not grouped in bins of changing plaf.

comparison

[Deprecated] This argument is no longer supported; this function will compare the theoretical curve and sample curve for all PLMAFs.

distance

[Deprecated] This argument is no longer supported; this function will solve a weighted least squares minimization problem.

coi_method

The method we will use to generate the theoretical relationship. The method is either "variant" or "frequency". The default value is "variant".

use_bins

[Deprecated] This argument is no longer supported; to estimate the COI, all data points are used. Data points are not grouped in bins of changing plaf.

Value

A list of the following:

  • predicted_coi: A dataframe of the predicted COIs. COIs are predicted using compute_coi(). Each column represents a separate set of parameters. Each row represents a predicted COI. Predictions are done many times, depending on the value of repetitions.

  • probability:A list of matrices containing the probability that our model predicted each COI value. Each row contains the probability for a different run. The first row contains the average probabilities over all the runs.

  • param_grid: The parameter grid. The parameter grid is all possible combinations of the parameters inputted. Each row represents a unique combination.

  • boot_error: A dataframe containing information about the error of the algorithm. The first column indicates the COI that was fed into the simulation. The other columns indicate the mean absolute error (mae), the lower and upper bounds of the 95% confidence interval and the bias.