Skip to contents

File Sizes

In the world of genomic sequencing, files are often several gigabytes large containing millions of data points. Reading in such files to local machines, such as your laptop, can take an excruciating amount of time.

While there are programs that can handle large amounts of data, an easy and simple solution is to process your data in chunks. For instance, instead of looking at ten chromosomes simultaneously, it may be simpler to focus on two or three at a time.

Filters

The entire read_tbl_*() family of functions provide the ability to filter data so that data may load and run faster. This works by filtering even before objects are loaded into R. Data can be filtered using any of the information present in the metadata, and you may even filter on multiple conditions.

cov_file <- miplicorn_example("coverage_AA_table.csv")

read_tbl_coverage(cov_file)
#> # A tibble: 6,344 × 8
#>    sample    gene_id gene  mutation_name exonic_func aa_change targeted coverage
#>    <chr>     <chr>   <chr> <chr>         <chr>       <chr>     <chr>       <dbl>
#>  1 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           608
#>  2 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes            20
#>  3 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           158
#>  4 D10-JJJ-5 PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             2
#>  5 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             1
#>  6 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           129
#>  7 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             0
#>  8 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             0
#>  9 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes            90
#> 10 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           175
#> # … with 6,334 more rows

read_tbl_coverage(cov_file, gene == "atp6")
#> # A tibble: 260 × 8
#>    sample    gene_id gene  mutation_name exonic_func aa_change targeted coverage
#>    <chr>     <chr>   <chr> <chr>         <chr>       <chr>     <chr>       <dbl>
#>  1 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           608
#>  2 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes            20
#>  3 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           158
#>  4 D10-JJJ-5 PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             2
#>  5 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             1
#>  6 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           129
#>  7 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             0
#>  8 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             0
#>  9 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes            90
#> 10 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           175
#> # … with 250 more rows

read_tbl_coverage(cov_file, gene == "atp6", targeted == "Yes")
#> # A tibble: 156 × 8
#>    sample    gene_id gene  mutation_name exonic_func aa_change targeted coverage
#>    <chr>     <chr>   <chr> <chr>         <chr>       <chr>     <chr>       <dbl>
#>  1 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           608
#>  2 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes            20
#>  3 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           158
#>  4 D10-JJJ-5 PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             2
#>  5 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             1
#>  6 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           129
#>  7 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             0
#>  8 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes             0
#>  9 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes            90
#> 10 D10-JJJ-… PF3D7_… atp6  atp6-Ala623G… missense_v… Ala623Glu Yes           175
#> # … with 146 more rows