hyperSpec.tidyverse

Claudia Beleites

2020-04-03

This vignette introduces package hyperSpec.tidytverse.
hyperSpec.tidyverse “fortifies” hyperSpec objects so that they can be used in tidyverse style. In particular, this piping becomes convenient. On a more technical level, hyperSpec.tidytverse provides functions such as filter, select, and so on for hyperSpec objects.

The Spectra Matrix

The spectra matrix in $spc behaves a bit different from the extra-data columns, because it contains a whole matrix inside one column. Many tidyverse functions can directly deal with this. In other cases, special attention is needed - this is explained in this vignette.

Selecting a subset of spectra with filter() or slice()

dplyr::filter() selects rows, and filtering by extra data columns works as with any data.frame:

flu %>% 
  filter (c > 0.2)

flu %>% 
  filter (c %>% between (0.15, 0.25))

The filter conditions (logical predicates) must yield one logical value per row. Logical expressions on the spectra matrix such as spc > 100 yield a logical matrix. This matrix indicates which elements of the spectra matrix match the condition.

To obtain the logical vector that indicates which spectra (rows) should be kept, further information is needed: should a spectrum be kept if any of its intensity values (spectra matrix elements) fulfills the condition, or only if all of them do? hyperSpec functions any_wl() and all_wl() summarize the logical matrix in this way.

Typical filtering tasks based on the spectra matrix are:

To select rows by indices rather than a condition, use slice():

chondro %>% 
  slice (1:3)

chondro %>% 
  slice (800 : n())

chondro %>% 
  slice (-10 : -n())

flu %>% 
  slice (1, 3, 5)

Also, head() and tail() work as usual.

Selecting particular data columns: select()

select() selects or discards particular columns from a hyperSpec object. If the spectra matrix (colum spc) is included in the selection, the result is still a hyperSpec object:

chondro %>% 
  select (clusters, spc)
#> hyperSpec object
#>    875 spectra
#>    2 data columns
#>    300 data points / spectrum
#> wavelength: Delta * tilde(nu)/cm^-1 [numeric] 602 606 ... 1798 
#> data:  (875 rows x 2 columns)
#>    1. clusters: clusters [factor] matrix matrix ... lacuna + NA
#>    2. spc: I / a.u. [matrix300] 501.8194 500.4552 ... 169.2942

If the result does not have the spectra matrix in $spc, it will be returned as data.frame:

flu %>% 
  select (-spc) 
#>           filename    c
#> 1 rawdata/flu1.txt 0.05
#> 2 rawdata/flu2.txt 0.10
#> 3 rawdata/flu3.txt 0.15
#> 4 rawdata/flu4.txt 0.20
#> 5 rawdata/flu5.txt 0.25
#> 6 rawdata/flu6.txt 0.30

To convert such a data.frame into a hyperSpec object again, use as.hyperSpec:

flu %>% 
  select (-spc) %>%
  as.hyperSpec ()
#> hyperSpec object
#>    6 spectra
#>    3 data columns
#>    0 data points / spectrum
#> wavelength: lambda/nm [numeric]
#> data:  (6 rows x 3 columns)
#>    1. filename: filename [character] rawdata/flu1.txt rawdata/flu2.txt ... rawdata/flu6.txt 
#>    2. c: c / (mg / l) [numeric] 0.05 0.10 ... 0.3 
#>    3. spc:  [matrix0]

The resulting hyperSpec object has 0 wavelengths: its $spc column contains a spectra matrix with 0 columns, and its wavelength vector is also of length 0. Behind the scenes, the data.frame returned by select() gets an attribute labels which stores the labels of the hyperSpec object. as.hyperSpec() restores the labels from this attribute (if available). Therfore, the “back-converted” hyperSpec object has its labels preserved.

Renaming extra data columns: rename()

rename() renames extra data columns from the hyperSpec object. Unlike select(), rename() keeps all the variables of the data frame intact.

chondro %>% 
  rename(region = clusters)
#> hyperSpec object
#>    875 spectra
#>    5 data columns
#>    300 data points / spectrum
#> wavelength: Delta * tilde(nu)/cm^-1 [numeric] 602 606 ... 1798 
#> data:  (875 rows x 5 columns)
#>    1. y: y [numeric] -4.77 -4.77 ... 19.23 
#>    2. x: x [numeric] -11.55 -10.55 ... 22.45 
#>    3. filename: filename [character] rawdata/chondro.txt rawdata/chondro.txt ... rawdata/chondro.txt 
#>    4. region: region [factor] matrix matrix ... lacuna + NA
#>    5. spc: I / a.u. [matrix300] 501.8194 500.4552 ... 169.2942

Selecting wavelength ranges

TODO


sessionInfo()
#> R version 3.6.3 (2020-02-29)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS Catalina 10.15.4
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] grid      stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#> [1] hyperSpec.tidyverse_0.1.0 magrittr_1.5             
#> [3] dplyr_0.8.5               hyperSpec_0.99-20200213  
#> [5] xml2_1.2.5                ggplot2_3.3.0            
#> [7] lattice_0.20-38          
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.0.0    xfun_0.12           remotes_2.1.1      
#>  [4] purrr_0.3.3         vctrs_0.2.4         colorspace_1.4-1   
#>  [7] testthat_2.3.2      htmltools_0.4.0     usethis_1.5.1      
#> [10] yaml_2.2.1          rlang_0.4.5         pkgbuild_1.0.6     
#> [13] pillar_1.4.3        glue_1.3.2          withr_2.1.2        
#> [16] RColorBrewer_1.1-2  sessioninfo_1.1.1   jpeg_0.1-8.1       
#> [19] lifecycle_0.2.0     stringr_1.4.0       munsell_0.5.0      
#> [22] commonmark_1.7      gtable_0.3.0        devtools_2.2.2.9000
#> [25] memoise_1.1.0       evaluate_0.14       latticeExtra_0.6-29
#> [28] knitr_1.28          callr_3.4.2         ps_1.3.2           
#> [31] fansi_0.4.1         Rcpp_1.0.4          backports_1.1.5    
#> [34] scales_1.1.0        desc_1.2.0          pkgload_1.0.2      
#> [37] fs_1.3.2            png_0.1-7           digest_0.6.25      
#> [40] stringi_1.4.6       processx_3.4.2      rprojroot_1.3-2    
#> [43] cli_2.0.2           tools_3.6.3         lazyeval_0.2.2     
#> [46] tibble_2.1.3        crayon_1.3.4        pkgconfig_2.0.3    
#> [49] ellipsis_0.3.0      prettyunits_1.1.1   assertthat_0.2.1   
#> [52] rmarkdown_2.1       roxygen2_7.1.0      rstudioapi_0.11    
#> [55] R6_2.4.1            compiler_3.6.3