Introduction to miplicorn

Parse

While {MIPTools} provides several .csv files that can be analyzed, parsing such files is difficult because of their non-rectangular structure. As such, attempting to read these files with default parameters fails.

#> # A tibble: 10 × 5
#>    X1            X2               X3               X4               X5          
#>    <chr>         <chr>            <chr>            <chr>            <chr>       
#>  1 Gene ID       PF3D7_0106300    PF3D7_0106300    PF3D7_0106300    PF3D7_01063…
#>  2 Gene          atp6             atp6             atp6             atp6        
#>  3 Mutation Name atp6-Ala623Glu   atp6-Glu431Lys   atp6-Gly639Asp   atp6-Ser466…
#>  4 ExonicFunc    missense_variant missense_variant missense_variant missense_va…
#>  5 AA Change     Ala623Glu        Glu431Lys        Gly639Asp        Ser466Asn   
#>  6 Targeted      Yes              Yes              No               No          
#>  7 D10-JJJ-23    608.0            699.0            608.0            237.0       
#>  8 D10-JJJ-43    20.0             30.0             20.0             0.0         
#>  9 D10-JJJ-55    158.0            242.0            158.0            61.0        
#> 10 D10-JJJ-5     2.0              9.0              2.0              1.0

As visible, there are six rows of metadata, which specify the gene ID, gene, mutation name, etc. The remaining rows contain the data we are actually interested in: the rows contain the samples and the columns contain the positions we are interested in. The difficult part of reading in these files is that the metadata must be extracted and treated differently from the data itself.

miplicorn, therefore, provides a family of functions: read_tbl_*() which quickly read in such non-rectangular files. The functions generate a tibble where each row represents a sample and a position. Thus, there will be multiple entries for each unique sample.

cov_file <- miplicorn::miplicorn_example("coverage_AA_table.csv")

data <- miplicorn::read_tbl_coverage(cov_file)
data
#> # A tibble: 6,344 × 8
#>    sample    gene_id gene  mutation_name exonic_func aa_change targeted coverage
#>    <chr>     <chr>   <chr> <chr>         <chr>       <chr>     <chr>       <dbl>
#>  1 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           608
#>  2 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes            20
#>  3 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           158
#>  4 D10-JJJ-5 PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             2
#>  5 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             1
#>  6 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           129
#>  7 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             0
#>  8 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             0
#>  9 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes            90
#> 10 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           175
#> # … with 6,334 more rows

Manipulate

Amino acids

In some the user may want to convert amino acid abbreviations from the three to one letters abbreviation, or vice versa for easier interpretation of data.

data %>%
  dplyr::mutate(aa_change = convert_three(aa_change))
#> # A tibble: 6,344 × 8
#>    sample    gene_id gene  mutation_name exonic_func aa_change targeted coverage
#>    <chr>     <chr>   <chr> <chr>         <chr>       <chr>     <chr>       <dbl>
#>  1 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… A623E     Yes           608
#>  2 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… A623E     Yes            20
#>  3 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… A623E     Yes           158
#>  4 D10-JJJ-5 PF3D7_… atp6  atp6-Ala623G… missense_v… A623E     Yes             2
#>  5 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… A623E     Yes             1
#>  6 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… A623E     Yes           129
#>  7 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… A623E     Yes             0
#>  8 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… A623E     Yes             0
#>  9 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… A623E     Yes            90
#> 10 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… A623E     Yes           175
#> # … with 6,334 more rows

Sort

In plotting data, it is useful to be able to control the order in which data appears. While dplyr::arrange() provides the functionality to sort numeric or character data, it lacks the ability to naturally sort alphanumeric vectors, vectors containing both letters and numerics. Furthermore, the ordering of data is not kept when fed into plotting functions. arrange_natural() attempts to address these limitations.

arrange_natural(data, sample, gene)
#> # A tibble: 6,344 × 8
#>    sample    gene_id gene  mutation_name exonic_func aa_change targeted coverage
#>    <fct>     <chr>   <fct> <chr>         <chr>       <chr>     <chr>       <dbl>
#>  1 D10-JJJ-1 PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes            10
#>  2 D10-JJJ-1 PF3D7_… atp6  atp6-Glu431L… missense_v… Glu431Lys Yes             5
#>  3 D10-JJJ-1 PF3D7_… atp6  atp6-Gly639A… missense_v… Gly639Asp No             10
#>  4 D10-JJJ-1 PF3D7_… atp6  atp6-Ser466A… missense_v… Ser466Asn No              2
#>  5 D10-JJJ-1 PF3D7_… atp6  atp6-Ser769A… missense_v… Ser769Asn Yes             1
#>  6 D10-JJJ-1 PF3D7_… crt   crt-Ala220Ser missense_v… Ala220Ser Yes             2
#>  7 D10-JJJ-1 PF3D7_… crt   crt-Asn326Asp missense_v… Asn326Asp No              2
#>  8 D10-JJJ-1 PF3D7_… crt   crt-Asn326Ser missense_v… Asn326Ser Yes             2
#>  9 D10-JJJ-1 PF3D7_… crt   crt-Asn75Glu  missense_v… Asn75Glu  Yes             0
#> 10 D10-JJJ-1 PF3D7_… crt   crt-Cys72Ser  missense_v… Cys72Ser  Yes             0
#> # … with 6,334 more rows

Visualize

There are an almost limitless different ways to visualize a single set of data. While miplicorn cannot address every method, it aims to simplify the creation of key figures.

Chromosome map

There are two built in ways to create chromosome maps, each with its own set of strengths and weaknesses. You can either make an interactive map or a more detailed karyoplot.

colours <- c("#006A8EFF", "#A8A6A7FF", "#B1283AFF")
map <- plot_chromoMap(genome_Pf3D7, probes, colours = colours)

# Used to embed into html
widgetframe::frameableWidget(map)

plot_karyoploteR(genome_Pf3D7, probes, colours = colours)

Average coverage

N.B. the remaining figures have not yet been incorporated into miplicorn, but as time goes on, more and more visualization methods will be added.