This vignette introduces package hyperSpec.tidytverse
.
hyperSpec.tidyverse
“fortifies” hyperSpec
objects so that they can be used in tidyverse
style. In particular, this piping becomes convenient. On a more technical level, hyperSpec.tidytverse
provides functions such as filter
, select
, and so on for hyperSpec
objects.
The spectra matrix in $spc
behaves a bit different from the extra-data columns, because it contains a whole matrix inside one column. Many tidyverse
functions can directly deal with this. In other cases, special attention is needed - this is explained in this vignette.
filter()
or slice()
dplyr::filter() selects rows, and filter
ing by extra data columns works as with any data.frame
:
The filter conditions (logical predicates) must yield one logical value per row. Logical expressions on the spectra matrix such as spc > 100
yield a logical matrix. This matrix indicates which elements of the spectra matrix match the condition.
To obtain the logical vector that indicates which spectra (rows) should be kept, further information is needed: should a spectrum be kept if any of its intensity values (spectra matrix elements) fulfills the condition, or only if all of them do? hyperSpec
functions any_wl()
and all_wl()
summarize the logical matrix in this way.
Typical filtering tasks based on the spectra matrix are:
remove “empty” spectra which have NA
s at all wavelengths (which may be produced by certain import functions):
remove spectra which have NA
s at any wavelength (for example after spikes or other artifacts have been marked by NA
s):
keep only spectra with all intensities inside a specified range (e.g. non-negative and below a saturation threshold):
keep only spectra with average intensity inside a specified range (e.g. non-negative and below a saturation threshold):
Here, any_wl()
or all_wl()
are not needed: the spectra are already summarized into a single intensity value by rowMeans()
. The condition therefore evaluates to a one logical value per spectrum already - which is what is needed.
To select rows by indices rather than a condition, use slice():
chondro %>%
slice (1:3)
chondro %>%
slice (800 : n())
chondro %>%
slice (-10 : -n())
flu %>%
slice (1, 3, 5)
Also, head()
and tail()
work as usual.
select()
select()
selects or discards particular columns from a hyperSpec
object. If the spectra matrix (colum spc
) is included in the selection, the result is still a hyperSpec
object:
chondro %>%
select (clusters, spc)
#> hyperSpec object
#> 875 spectra
#> 2 data columns
#> 300 data points / spectrum
#> wavelength: Delta * tilde(nu)/cm^-1 [numeric] 602 606 ... 1798
#> data: (875 rows x 2 columns)
#> 1. clusters: clusters [factor] matrix matrix ... lacuna + NA
#> 2. spc: I / a.u. [matrix300] 501.8194 500.4552 ... 169.2942
If the result does not have the spectra matrix in $spc
, it will be returned as data.frame
:
flu %>%
select (-spc)
#> filename c
#> 1 rawdata/flu1.txt 0.05
#> 2 rawdata/flu2.txt 0.10
#> 3 rawdata/flu3.txt 0.15
#> 4 rawdata/flu4.txt 0.20
#> 5 rawdata/flu5.txt 0.25
#> 6 rawdata/flu6.txt 0.30
To convert such a data.frame
into a hyperSpec
object again, use as.hyperSpec
:
flu %>%
select (-spc) %>%
as.hyperSpec ()
#> hyperSpec object
#> 6 spectra
#> 3 data columns
#> 0 data points / spectrum
#> wavelength: lambda/nm [numeric]
#> data: (6 rows x 3 columns)
#> 1. filename: filename [character] rawdata/flu1.txt rawdata/flu2.txt ... rawdata/flu6.txt
#> 2. c: c / (mg / l) [numeric] 0.05 0.10 ... 0.3
#> 3. spc: [matrix0]
The resulting hyperSpec
object has 0 wavelengths: its $spc
column contains a spectra matrix with 0 columns, and its wavelength vector is also of length 0. Behind the scenes, the data.frame
returned by select()
gets an attribute labels
which stores the labels of the hyperSpec
object. as.hyperSpec()
restores the labels from this attribute (if available). Therfore, the “back-converted” hyperSpec
object has its labels preserved.
rename()
rename()
renames extra data columns from the hyperSpec
object. Unlike select()
, rename()
keeps all the variables of the data frame intact.
chondro %>%
rename(region = clusters)
#> hyperSpec object
#> 875 spectra
#> 5 data columns
#> 300 data points / spectrum
#> wavelength: Delta * tilde(nu)/cm^-1 [numeric] 602 606 ... 1798
#> data: (875 rows x 5 columns)
#> 1. y: y [numeric] -4.77 -4.77 ... 19.23
#> 2. x: x [numeric] -11.55 -10.55 ... 22.45
#> 3. filename: filename [character] rawdata/chondro.txt rawdata/chondro.txt ... rawdata/chondro.txt
#> 4. region: region [factor] matrix matrix ... lacuna + NA
#> 5. spc: I / a.u. [matrix300] 501.8194 500.4552 ... 169.2942
TODO
sessionInfo()
#> R version 3.6.3 (2020-02-29)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS Catalina 10.15.4
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] grid stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] hyperSpec.tidyverse_0.1.0 magrittr_1.5
#> [3] dplyr_0.8.5 hyperSpec_0.99-20200213
#> [5] xml2_1.2.5 ggplot2_3.3.0
#> [7] lattice_0.20-38
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.0.0 xfun_0.12 remotes_2.1.1
#> [4] purrr_0.3.3 vctrs_0.2.4 colorspace_1.4-1
#> [7] testthat_2.3.2 htmltools_0.4.0 usethis_1.5.1
#> [10] yaml_2.2.1 rlang_0.4.5 pkgbuild_1.0.6
#> [13] pillar_1.4.3 glue_1.3.2 withr_2.1.2
#> [16] RColorBrewer_1.1-2 sessioninfo_1.1.1 jpeg_0.1-8.1
#> [19] lifecycle_0.2.0 stringr_1.4.0 munsell_0.5.0
#> [22] commonmark_1.7 gtable_0.3.0 devtools_2.2.2.9000
#> [25] memoise_1.1.0 evaluate_0.14 latticeExtra_0.6-29
#> [28] knitr_1.28 callr_3.4.2 ps_1.3.2
#> [31] fansi_0.4.1 Rcpp_1.0.4 backports_1.1.5
#> [34] scales_1.1.0 desc_1.2.0 pkgload_1.0.2
#> [37] fs_1.3.2 png_0.1-7 digest_0.6.25
#> [40] stringi_1.4.6 processx_3.4.2 rprojroot_1.3-2
#> [43] cli_2.0.2 tools_3.6.3 lazyeval_0.2.2
#> [46] tibble_2.1.3 crayon_1.3.4 pkgconfig_2.0.3
#> [49] ellipsis_0.3.0 prettyunits_1.1.1 assertthat_0.2.1
#> [52] rmarkdown_2.1 roxygen2_7.1.0 rstudioapi_0.11
#> [55] R6_2.4.1 compiler_3.6.3