read_file()
has been replaced by read_tbl_reference()
,
read_tbl_alternate()
, and read_tbl_coverage()
to provide more specific
functionality.
read()
has been renamed to read_tbl_ref_alt_cov()
.
Usage
read(
.ref_file,
.alt_file,
.cov_file,
...,
chrom = deprecated(),
gene = deprecated()
)
read_file(.file, ..., .name = "value")
Arguments
- .ref_file
File path to the reference table.
- .alt_file
File path to the alternate table.
- .cov_file
File path to the coverage table.
- ...
data-masking
Expressions that return a logical value and are used to filter the data. If multiple expressions are included, they are combined with the&
operator. Only rows for which all conditions evaluate toTRUE
are kept.- chrom
- gene
- .file
File path to a file.
- .name
The information contained in the specific file. For example
"coverage"
or"ref_umi_count"
.
Value
A tibble()
. The first six columns contain the
metadata associated with each sample and mutation. Columns ref_umi_count
and alt_umi_count
contain the umi count of the reference and alternate
allele, respectively. Column coverage
contains the coverage for each data
point.
Details
Read files containing
MIPtools' data tables.
read_file()
reads a single file. read()
is a convenience function that
reads all files output by
MIPtools and combines them.
Data files include the reference table, the alternate table, and the coverage
table. Data is read lazily using the
vroom
package. Data can be
filtered, retaining all rows that satisfy the conditions. To be retained, the
row in question must produce a value of TRUE
for all conditions. Note that
when a condition evaluates to NA, the row will be dropped.
Data structure
Input data must contain six rows of metadata. The metadata can vary depending on what type of file is read, but typically contains information about the location of a mutation. The remaining rows represent the data for each sample sequenced. Together, the alternate, reference, and coverage tables can provide information about mutations observed and the coverage at each site.
Useful filter functions
The dplyr::filter()
function is employed to subset the rows of the data
applying the expressions in ...
to the column values to determine which
rows should be retained.
There are many functions and operators that are useful when constructing the expressions used to filter the data:
Examples
# Get path to example file
ref_file <- miplicorn_example("reference_AA_table.csv")
alt_file <- miplicorn_example("alternate_AA_table.csv")
cov_file <- miplicorn_example("coverage_AA_table.csv")
cov_file
#> [1] "/home/runner/work/_temp/Library/miplicorn/extdata/coverage_AA_table.csv"
# Input sources -------------------------------------------------------------
# Read from a path
read_file(cov_file, .name = "coverage")
#> Warning: `read_file()` was deprecated in miplicorn 0.2.0.
#> The function has been replaced by three more specific functions:
#> `read_tbl_reference()`, `read_tbl_alternate()`, and `read_tbl_coverage()`.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
#> Input detected as the coverage table.
#> # A tibble: 6,344 × 8
#> sample gene_id gene mutation_name exonic_func aa_change targeted coverage
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 608
#> 2 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 20
#> 3 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 158
#> 4 D10-JJJ-5 PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 2
#> 5 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 1
#> 6 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 129
#> 7 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 0
#> 8 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 0
#> 9 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 90
#> 10 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 175
#> # … with 6,334 more rows
read(ref_file, alt_file, cov_file)
#> Warning: `read()` was deprecated in miplicorn 0.2.0.
#> Please use `read_tbl_ref_alt_cov()` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
#> # A tibble: 6,344 × 10
#> sample gene_id gene mutation_name exonic_func aa_change targeted
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 D10-JJJ-23 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 2 D10-JJJ-43 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 3 D10-JJJ-55 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 4 D10-JJJ-5 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 5 D10-JJJ-47 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 6 D10-JJJ-15 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 7 D10-JJJ-27 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 8 D10-JJJ-10 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 9 D10-JJJ-28 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 10 D10-JJJ-52 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> # … with 6,334 more rows, and 3 more variables: ref_umi_count <dbl>,
#> # alt_umi_count <dbl>, coverage <dbl>
# You can also use paths directly
# read_file("reference_AA_table.csv")
# read("reference_AA_table.csv", "alternate_AA_table.csv", "coverage_AA_table.csv")
# Read entire file ----------------------------------------------------------
read_file(cov_file, .name = "coverage")
#> Input detected as the coverage table.
#> # A tibble: 6,344 × 8
#> sample gene_id gene mutation_name exonic_func aa_change targeted coverage
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 608
#> 2 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 20
#> 3 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 158
#> 4 D10-JJJ-5 PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 2
#> 5 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 1
#> 6 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 129
#> 7 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 0
#> 8 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 0
#> 9 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 90
#> 10 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 175
#> # … with 6,334 more rows
read(ref_file, alt_file, cov_file)
#> # A tibble: 6,344 × 10
#> sample gene_id gene mutation_name exonic_func aa_change targeted
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 D10-JJJ-23 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 2 D10-JJJ-43 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 3 D10-JJJ-55 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 4 D10-JJJ-5 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 5 D10-JJJ-47 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 6 D10-JJJ-15 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 7 D10-JJJ-27 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 8 D10-JJJ-10 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 9 D10-JJJ-28 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 10 D10-JJJ-52 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> # … with 6,334 more rows, and 3 more variables: ref_umi_count <dbl>,
#> # alt_umi_count <dbl>, coverage <dbl>
# Data filtering ------------------------------------------------------------
# Filtering by one criterion
read_file(cov_file, gene == "atp6", .name = "coverage")
#> Input detected as the coverage table.
#> # A tibble: 260 × 8
#> sample gene_id gene mutation_name exonic_func aa_change targeted coverage
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 608
#> 2 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 20
#> 3 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 158
#> 4 D10-JJJ-5 PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 2
#> 5 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 1
#> 6 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 129
#> 7 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 0
#> 8 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 0
#> 9 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 90
#> 10 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 175
#> # … with 250 more rows
read(ref_file, alt_file, cov_file, gene == "atp6")
#> # A tibble: 260 × 10
#> sample gene_id gene mutation_name exonic_func aa_change targeted
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 D10-JJJ-23 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 2 D10-JJJ-43 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 3 D10-JJJ-55 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 4 D10-JJJ-5 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 5 D10-JJJ-47 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 6 D10-JJJ-15 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 7 D10-JJJ-27 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 8 D10-JJJ-10 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 9 D10-JJJ-28 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 10 D10-JJJ-52 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> # … with 250 more rows, and 3 more variables: ref_umi_count <dbl>,
#> # alt_umi_count <dbl>, coverage <dbl>
# Filtering by multiple criteria within a single logical expression
read_file(cov_file, gene == "atp6" & targeted == "Yes", .name = "coverage")
#> Input detected as the coverage table.
#> # A tibble: 156 × 8
#> sample gene_id gene mutation_name exonic_func aa_change targeted coverage
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 608
#> 2 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 20
#> 3 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 158
#> 4 D10-JJJ-5 PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 2
#> 5 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 1
#> 6 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 129
#> 7 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 0
#> 8 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 0
#> 9 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 90
#> 10 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 175
#> # … with 146 more rows
read_file(cov_file, gene == "atp6" | targeted == "Yes", .name = "coverage")
#> Input detected as the coverage table.
#> # A tibble: 2,496 × 8
#> sample gene_id gene mutation_name exonic_func aa_change targeted coverage
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 608
#> 2 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 20
#> 3 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 158
#> 4 D10-JJJ-5 PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 2
#> 5 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 1
#> 6 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 129
#> 7 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 0
#> 8 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 0
#> 9 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 90
#> 10 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 175
#> # … with 2,486 more rows
read(ref_file, alt_file, cov_file, gene == "atp6" & targeted == "Yes")
#> # A tibble: 156 × 10
#> sample gene_id gene mutation_name exonic_func aa_change targeted
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 D10-JJJ-23 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 2 D10-JJJ-43 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 3 D10-JJJ-55 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 4 D10-JJJ-5 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 5 D10-JJJ-47 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 6 D10-JJJ-15 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 7 D10-JJJ-27 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 8 D10-JJJ-10 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 9 D10-JJJ-28 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 10 D10-JJJ-52 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> # … with 146 more rows, and 3 more variables: ref_umi_count <dbl>,
#> # alt_umi_count <dbl>, coverage <dbl>
read(ref_file, alt_file, cov_file, gene == "atp6" | targeted == "Yes")
#> # A tibble: 2,496 × 10
#> sample gene_id gene mutation_name exonic_func aa_change targeted
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 D10-JJJ-23 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 2 D10-JJJ-43 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 3 D10-JJJ-55 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 4 D10-JJJ-5 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 5 D10-JJJ-47 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 6 D10-JJJ-15 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 7 D10-JJJ-27 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 8 D10-JJJ-10 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 9 D10-JJJ-28 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 10 D10-JJJ-52 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> # … with 2,486 more rows, and 3 more variables: ref_umi_count <dbl>,
#> # alt_umi_count <dbl>, coverage <dbl>
# When multiple expressions are used, they are combined using &
read_file(cov_file, gene == "atp6", targeted == "Yes", .name = "coverage")
#> Input detected as the coverage table.
#> # A tibble: 156 × 8
#> sample gene_id gene mutation_name exonic_func aa_change targeted coverage
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 608
#> 2 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 20
#> 3 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 158
#> 4 D10-JJJ-5 PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 2
#> 5 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 1
#> 6 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 129
#> 7 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 0
#> 8 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 0
#> 9 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 90
#> 10 D10-JJJ-… PF3D7_… atp6 atp6-Ala623G… missense_v… Ala623Glu Yes 175
#> # … with 146 more rows
read(ref_file, alt_file, cov_file, gene == "atp6", targeted == "Yes")
#> # A tibble: 156 × 10
#> sample gene_id gene mutation_name exonic_func aa_change targeted
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 D10-JJJ-23 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 2 D10-JJJ-43 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 3 D10-JJJ-55 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 4 D10-JJJ-5 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 5 D10-JJJ-47 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 6 D10-JJJ-15 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 7 D10-JJJ-27 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 8 D10-JJJ-10 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 9 D10-JJJ-28 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> 10 D10-JJJ-52 PF3D7_0106300 atp6 atp6-Ala623Glu missense_va… Ala623Glu Yes
#> # … with 146 more rows, and 3 more variables: ref_umi_count <dbl>,
#> # alt_umi_count <dbl>, coverage <dbl>