Skip to contents

Generate the simulated COI curve.

Usage

process_sim(sim, seq_error = 0.01, bin_size = 20, coi_method = "variant")

Arguments

sim

Output of sim_biallelic().

seq_error

The level of sequencing error that is assumed. If no value is inputted, then we infer the level of sequence error.

bin_size

[Deprecated] This argument is no longer supported; to estimate the COI, all data points are used. Data points are not grouped in bins of changing plaf.

coi_method

The method we will use to generate the theoretical relationship. The method is either "variant" or "frequency". The default value is "variant".

Value

A list of the following:

  • data: A tibble with

  • plmaf_cut: Breaks of the form [a, b).

  • m_variant: The average WSMAF or proportion of variant sites in each segment defined by plmaf_cut.

  • bucket_size: The number of loci in each bucket.

  • midpoints: The midpoint of each bucket.

  • seq_error: The sequence error inferred.

  • bin_size: The minimum size of each bin.

  • cuts: The breaks utilized in splitting the data. of each COI.

Details

Utilize the output of sim_biallelic(), which creates simulated data. The PLMAF is kept, and the function computes whether a SNP is a variant site or not, based on the simulated WSMAF at that SNP. This process additionally accounts for potential sequencing error. To check whether the simulated WSMAF correctly indicated a variant site or not, the phased haplotype of the parasites is computed.

See also

process_real() to process real data.

Other simulated data functions: plot-simulation, sim_biallelic()