The read_tbl_*()
family of functions is designed to read data tables
generated by the software program
MIPtools
. Data is read lazily
using the vroom
package. Data can be
filtered, retaining all rows that satisfy the conditions. To be retained, the
row in question must produce a value of TRUE
for all conditions. Note that
when a condition evaluates to NA, the row will be dropped.
Usage
read_tbl_reference(.tbl, ...)
read_tbl_alternate(.tbl, ...)
read_tbl_coverage(.tbl, ...)
read_tbl_genotype(.tbl, ...)
read_tbl_haplotype(.tbl, ..., .col_select = NULL)
read_tbl_ref_alt_cov(
.tbl_ref,
.tbl_alt,
.tbl_cov,
...,
chrom = deprecated(),
gene = deprecated()
)
Arguments
- .tbl
File path to the table.
- ...
data-masking
Expressions that return a logical value and are used to filter the data. If multiple expressions are included, they are combined with the&
operator. Only rows for which all conditions evaluate toTRUE
are kept.- .col_select
One or more selection expressions, like in
dplyr::select()
. Usec()
orlist()
to use more than one expression. See?dplyr::select
for details on available selection options.- .tbl_ref
File path to the reference table.
- .tbl_alt
File path to the alternate table.
- .tbl_cov
File path to the coverage table.
- chrom
- gene
Value
A tibble()
. The first six columns contain the
metadata associated with each sample and mutation. The last column contains
the information parsed from the table. In some cases, this may be the
umi_count and in other cases it may be the coverage of the associated data
point.
Data structure
Input data must contain six rows of metadata. The metadata can vary depending on what type of file is read, but typically contains information about the location of a mutation. The remaining rows represent the data for each sample sequenced.
Useful filter functions
The dplyr::filter()
function is employed to subset the rows of the data
applying the expressions in ...
to the column values to determine which
rows should be retained.
There are many functions and operators that are useful when constructing the expressions used to filter the data:
Examples
# Get path to example file
ref_file <- miplicorn_example("reference_AA_table.csv")
alt_file <- miplicorn_example("alternate_AA_table.csv")
cov_file <- miplicorn_example("coverage_AA_table.csv")
ref_file
#> [1] "/home/runner/work/_temp/Library/miplicorn/extdata/reference_AA_table.csv"
# Input sources -------------------------------------------------------------
# Read from a path
read_tbl_reference(ref_file)
#> # A tibble: 6,344 × 8
#> sample gene_id gene mutation_name exonic_func aa_change targeted
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 D10-JJJ-23 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 2 D10-JJJ-43 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 3 D10-JJJ-55 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 4 D10-JJJ-5 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 5 D10-JJJ-47 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 6 D10-JJJ-15 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 7 D10-JJJ-27 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 8 D10-JJJ-10 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 9 D10-JJJ-28 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 10 D10-JJJ-52 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> # … with 6,334 more rows, and 1 more variable: ref_umi_count <dbl>
# You can also use paths directly
# read_tbl_alternate("alternate_AA_table.csv")
# Read entire file ----------------------------------------------------------
read_tbl_coverage(cov_file)
#> # A tibble: 6,344 × 8
#> sample gene_id gene mutation_name exonic_func aa_change targeted coverage
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 608
#> 2 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 20
#> 3 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 158
#> 4 D10-JJJ-5 PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 2
#> 5 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 1
#> 6 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 129
#> 7 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 0
#> 8 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 0
#> 9 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 90
#> 10 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 175
#> # … with 6,334 more rows
# Data filtering ------------------------------------------------------------
# Filtering by one criterion
read_tbl_reference(ref_file, gene == "atp6")
#> # A tibble: 260 × 8
#> sample gene_id gene mutation_name exonic_func aa_change targeted
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 D10-JJJ-23 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 2 D10-JJJ-43 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 3 D10-JJJ-55 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 4 D10-JJJ-5 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 5 D10-JJJ-47 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 6 D10-JJJ-15 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 7 D10-JJJ-27 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 8 D10-JJJ-10 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 9 D10-JJJ-28 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 10 D10-JJJ-52 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> # … with 250 more rows, and 1 more variable: ref_umi_count <dbl>
# Filtering by multiple criteria within a single logical expression
read_tbl_alternate(alt_file, gene == "atp6" & targeted == "Yes")
#> # A tibble: 156 × 8
#> sample gene_id gene mutation_name exonic_func aa_change targeted
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 D10-JJJ-23 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 2 D10-JJJ-43 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 3 D10-JJJ-55 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 4 D10-JJJ-5 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 5 D10-JJJ-47 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 6 D10-JJJ-15 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 7 D10-JJJ-27 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 8 D10-JJJ-10 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 9 D10-JJJ-28 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 10 D10-JJJ-52 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> # … with 146 more rows, and 1 more variable: alt_umi_count <dbl>
read_tbl_coverage(cov_file, gene == "atp6" | targeted == "Yes")
#> # A tibble: 2,496 × 8
#> sample gene_id gene mutation_name exonic_func aa_change targeted coverage
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 608
#> 2 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 20
#> 3 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 158
#> 4 D10-JJJ-5 PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 2
#> 5 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 1
#> 6 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 129
#> 7 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 0
#> 8 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 0
#> 9 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 90
#> 10 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 175
#> # … with 2,486 more rows
# When multiple expressions are used, they are combined using &
read_tbl_reference(ref_file, gene == "atp6", targeted == "Yes")
#> # A tibble: 156 × 8
#> sample gene_id gene mutation_name exonic_func aa_change targeted
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 D10-JJJ-23 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 2 D10-JJJ-43 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 3 D10-JJJ-55 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 4 D10-JJJ-5 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 5 D10-JJJ-47 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 6 D10-JJJ-15 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 7 D10-JJJ-27 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 8 D10-JJJ-10 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 9 D10-JJJ-28 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 10 D10-JJJ-52 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> # … with 146 more rows, and 1 more variable: ref_umi_count <dbl>
# Read multiple files together ----------------------------------------------
read_tbl_ref_alt_cov(ref_file, alt_file, cov_file)
#> # A tibble: 6,344 × 10
#> sample gene_id gene mutation_name exonic_func aa_change targeted
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 D10-JJJ-23 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 2 D10-JJJ-43 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 3 D10-JJJ-55 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 4 D10-JJJ-5 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 5 D10-JJJ-47 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 6 D10-JJJ-15 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 7 D10-JJJ-27 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 8 D10-JJJ-10 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 9 D10-JJJ-28 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 10 D10-JJJ-52 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> # … with 6,334 more rows, and 3 more variables: ref_umi_count <dbl>,
#> # alt_umi_count <dbl>, coverage <dbl>