Skip to contents

[Deprecated]

read_file() has been replaced by read_tbl_reference(), read_tbl_alternate(), and read_tbl_coverage() to provide more specific functionality.

read() has been renamed to read_tbl_ref_alt_cov().

Usage

read(
  .ref_file,
  .alt_file,
  .cov_file,
  ...,
  chrom = deprecated(),
  gene = deprecated()
)

read_file(.file, ..., .name = "value")

Arguments

.ref_file

File path to the reference table.

.alt_file

File path to the alternate table.

.cov_file

File path to the coverage table.

...

data-masking Expressions that return a logical value and are used to filter the data. If multiple expressions are included, they are combined with the & operator. Only rows for which all conditions evaluate to TRUE are kept.

chrom

[Deprecated]: The chromosome(s) to filter to.

gene

[Deprecated]: The gene(s) to filter to.

.file

File path to a file.

.name

The information contained in the specific file. For example "coverage" or "ref_umi_count".

Value

A tibble(). The first six columns contain the metadata associated with each sample and mutation. Columns ref_umi_countand alt_umi_count contain the umi count of the reference and alternate allele, respectively. Column coverage contains the coverage for each data point.

Details

Read files containing MIPtools' data tables. read_file() reads a single file. read() is a convenience function that reads all files output by MIPtools and combines them. Data files include the reference table, the alternate table, and the coverage table. Data is read lazily using the vroom package. Data can be filtered, retaining all rows that satisfy the conditions. To be retained, the row in question must produce a value of TRUE for all conditions. Note that when a condition evaluates to NA, the row will be dropped.

Data structure

Input data must contain six rows of metadata. The metadata can vary depending on what type of file is read, but typically contains information about the location of a mutation. The remaining rows represent the data for each sample sequenced. Together, the alternate, reference, and coverage tables can provide information about mutations observed and the coverage at each site.

Useful filter functions

The dplyr::filter() function is employed to subset the rows of the data applying the expressions in ... to the column values to determine which rows should be retained.

There are many functions and operators that are useful when constructing the expressions used to filter the data:

Examples

# Get path to example file
ref_file <- miplicorn_example("reference_AA_table.csv")
alt_file <- miplicorn_example("alternate_AA_table.csv")
cov_file <- miplicorn_example("coverage_AA_table.csv")
cov_file
#> [1] "/home/runner/work/_temp/Library/miplicorn/extdata/coverage_AA_table.csv"

# Input sources -------------------------------------------------------------
# Read from a path
read_file(cov_file, .name = "coverage")
#> Warning: `read_file()` was deprecated in miplicorn 0.2.0.
#> The function has been replaced by three more specific functions:
#>  `read_tbl_reference()`, `read_tbl_alternate()`, and `read_tbl_coverage()`.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
#> Input detected as the coverage table.
#> # A tibble: 6,344 × 8
#>    sample    gene_id gene  mutation_name exonic_func aa_change targeted coverage
#>    <chr>     <chr>   <chr> <chr>         <chr>       <chr>     <chr>       <dbl>
#>  1 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           608
#>  2 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes            20
#>  3 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           158
#>  4 D10-JJJ-5 PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             2
#>  5 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             1
#>  6 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           129
#>  7 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             0
#>  8 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             0
#>  9 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes            90
#> 10 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           175
#> # … with 6,334 more rows
read(ref_file, alt_file, cov_file)
#> Warning: `read()` was deprecated in miplicorn 0.2.0.
#> Please use `read_tbl_ref_alt_cov()` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
#> # A tibble: 6,344 × 10
#>    sample     gene_id       gene  mutation_name  exonic_func  aa_change targeted
#>    <chr>      <chr>         <chr> <chr>          <chr>        <chr>     <chr>   
#>  1 D10-JJJ-23 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  2 D10-JJJ-43 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  3 D10-JJJ-55 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  4 D10-JJJ-5  PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  5 D10-JJJ-47 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  6 D10-JJJ-15 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  7 D10-JJJ-27 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  8 D10-JJJ-10 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  9 D10-JJJ-28 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#> 10 D10-JJJ-52 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#> # … with 6,334 more rows, and 3 more variables: ref_umi_count <dbl>,
#> #   alt_umi_count <dbl>, coverage <dbl>

# You can also use paths directly
# read_file("reference_AA_table.csv")
# read("reference_AA_table.csv", "alternate_AA_table.csv", "coverage_AA_table.csv")

# Read entire file ----------------------------------------------------------
read_file(cov_file, .name = "coverage")
#> Input detected as the coverage table.
#> # A tibble: 6,344 × 8
#>    sample    gene_id gene  mutation_name exonic_func aa_change targeted coverage
#>    <chr>     <chr>   <chr> <chr>         <chr>       <chr>     <chr>       <dbl>
#>  1 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           608
#>  2 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes            20
#>  3 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           158
#>  4 D10-JJJ-5 PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             2
#>  5 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             1
#>  6 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           129
#>  7 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             0
#>  8 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             0
#>  9 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes            90
#> 10 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           175
#> # … with 6,334 more rows
read(ref_file, alt_file, cov_file)
#> # A tibble: 6,344 × 10
#>    sample     gene_id       gene  mutation_name  exonic_func  aa_change targeted
#>    <chr>      <chr>         <chr> <chr>          <chr>        <chr>     <chr>   
#>  1 D10-JJJ-23 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  2 D10-JJJ-43 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  3 D10-JJJ-55 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  4 D10-JJJ-5  PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  5 D10-JJJ-47 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  6 D10-JJJ-15 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  7 D10-JJJ-27 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  8 D10-JJJ-10 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  9 D10-JJJ-28 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#> 10 D10-JJJ-52 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#> # … with 6,334 more rows, and 3 more variables: ref_umi_count <dbl>,
#> #   alt_umi_count <dbl>, coverage <dbl>

# Data filtering ------------------------------------------------------------
# Filtering by one criterion
read_file(cov_file, gene == "atp6", .name = "coverage")
#> Input detected as the coverage table.
#> # A tibble: 260 × 8
#>    sample    gene_id gene  mutation_name exonic_func aa_change targeted coverage
#>    <chr>     <chr>   <chr> <chr>         <chr>       <chr>     <chr>       <dbl>
#>  1 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           608
#>  2 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes            20
#>  3 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           158
#>  4 D10-JJJ-5 PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             2
#>  5 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             1
#>  6 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           129
#>  7 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             0
#>  8 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             0
#>  9 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes            90
#> 10 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           175
#> # … with 250 more rows
read(ref_file, alt_file, cov_file, gene == "atp6")
#> # A tibble: 260 × 10
#>    sample     gene_id       gene  mutation_name  exonic_func  aa_change targeted
#>    <chr>      <chr>         <chr> <chr>          <chr>        <chr>     <chr>   
#>  1 D10-JJJ-23 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  2 D10-JJJ-43 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  3 D10-JJJ-55 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  4 D10-JJJ-5  PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  5 D10-JJJ-47 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  6 D10-JJJ-15 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  7 D10-JJJ-27 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  8 D10-JJJ-10 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  9 D10-JJJ-28 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#> 10 D10-JJJ-52 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#> # … with 250 more rows, and 3 more variables: ref_umi_count <dbl>,
#> #   alt_umi_count <dbl>, coverage <dbl>

# Filtering by multiple criteria within a single logical expression
read_file(cov_file, gene == "atp6" & targeted == "Yes", .name = "coverage")
#> Input detected as the coverage table.
#> # A tibble: 156 × 8
#>    sample    gene_id gene  mutation_name exonic_func aa_change targeted coverage
#>    <chr>     <chr>   <chr> <chr>         <chr>       <chr>     <chr>       <dbl>
#>  1 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           608
#>  2 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes            20
#>  3 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           158
#>  4 D10-JJJ-5 PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             2
#>  5 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             1
#>  6 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           129
#>  7 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             0
#>  8 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             0
#>  9 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes            90
#> 10 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           175
#> # … with 146 more rows
read_file(cov_file, gene == "atp6" | targeted == "Yes", .name = "coverage")
#> Input detected as the coverage table.
#> # A tibble: 2,496 × 8
#>    sample    gene_id gene  mutation_name exonic_func aa_change targeted coverage
#>    <chr>     <chr>   <chr> <chr>         <chr>       <chr>     <chr>       <dbl>
#>  1 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           608
#>  2 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes            20
#>  3 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           158
#>  4 D10-JJJ-5 PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             2
#>  5 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             1
#>  6 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           129
#>  7 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             0
#>  8 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             0
#>  9 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes            90
#> 10 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           175
#> # … with 2,486 more rows
read(ref_file, alt_file, cov_file, gene == "atp6" & targeted == "Yes")
#> # A tibble: 156 × 10
#>    sample     gene_id       gene  mutation_name  exonic_func  aa_change targeted
#>    <chr>      <chr>         <chr> <chr>          <chr>        <chr>     <chr>   
#>  1 D10-JJJ-23 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  2 D10-JJJ-43 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  3 D10-JJJ-55 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  4 D10-JJJ-5  PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  5 D10-JJJ-47 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  6 D10-JJJ-15 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  7 D10-JJJ-27 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  8 D10-JJJ-10 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  9 D10-JJJ-28 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#> 10 D10-JJJ-52 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#> # … with 146 more rows, and 3 more variables: ref_umi_count <dbl>,
#> #   alt_umi_count <dbl>, coverage <dbl>
read(ref_file, alt_file, cov_file, gene == "atp6" | targeted == "Yes")
#> # A tibble: 2,496 × 10
#>    sample     gene_id       gene  mutation_name  exonic_func  aa_change targeted
#>    <chr>      <chr>         <chr> <chr>          <chr>        <chr>     <chr>   
#>  1 D10-JJJ-23 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  2 D10-JJJ-43 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  3 D10-JJJ-55 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  4 D10-JJJ-5  PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  5 D10-JJJ-47 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  6 D10-JJJ-15 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  7 D10-JJJ-27 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  8 D10-JJJ-10 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  9 D10-JJJ-28 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#> 10 D10-JJJ-52 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#> # … with 2,486 more rows, and 3 more variables: ref_umi_count <dbl>,
#> #   alt_umi_count <dbl>, coverage <dbl>

# When multiple expressions are used, they are combined using &
read_file(cov_file, gene == "atp6", targeted == "Yes", .name = "coverage")
#> Input detected as the coverage table.
#> # A tibble: 156 × 8
#>    sample    gene_id gene  mutation_name exonic_func aa_change targeted coverage
#>    <chr>     <chr>   <chr> <chr>         <chr>       <chr>     <chr>       <dbl>
#>  1 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           608
#>  2 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes            20
#>  3 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           158
#>  4 D10-JJJ-5 PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             2
#>  5 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             1
#>  6 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           129
#>  7 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             0
#>  8 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             0
#>  9 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes            90
#> 10 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           175
#> # … with 146 more rows
read(ref_file, alt_file, cov_file, gene == "atp6", targeted == "Yes")
#> # A tibble: 156 × 10
#>    sample     gene_id       gene  mutation_name  exonic_func  aa_change targeted
#>    <chr>      <chr>         <chr> <chr>          <chr>        <chr>     <chr>   
#>  1 D10-JJJ-23 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  2 D10-JJJ-43 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  3 D10-JJJ-55 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  4 D10-JJJ-5  PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  5 D10-JJJ-47 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  6 D10-JJJ-15 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  7 D10-JJJ-27 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  8 D10-JJJ-10 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#>  9 D10-JJJ-28 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#> 10 D10-JJJ-52 PF3D7_0106300 atp6  atp6-Ala623Glu missense_va… Ala623Glu Yes     
#> # … with 146 more rows, and 3 more variables: ref_umi_count <dbl>,
#> #   alt_umi_count <dbl>, coverage <dbl>