rOpenSci Traits Package

The rOpenSci traits package is an R client for various sources of species trait data. The traits package provides functions that interface with the BETYdb API.

These instructions are from the traits package documentation, which is released with a MIT-BSD license.

Installation

Launch R

$ R

Install the stable version of the package from CRAN:

install.packages("traits")

Or development version from GitHub:

devtools::install_github("ropensci/traits")

Load traits

library("traits")

Example 1: Query trait data for Willow

Get trait data for Willow (Salix spp.)

(salix <- betydb_search("Salix Vcmax"))
#> Source: local data frame [14 x 31]
#> 
#>    access_level       author checked citation_id citation_year  city
#>           (int)        (chr)   (int)       (int)         (int) (chr)
#> 1             4       Merilo       1         430          2005 Saare
#> 2             4       Merilo       1         430          2005 Saare
#> 3             4       Merilo       1         430          2005 Saare
#> 4             4       Merilo       1         430          2005 Saare
#> 5             4 Wullschleger       1          51          1993    NA
#> 6             4       Merilo       1         430          2005 Saare
#> 7             4       Merilo       1         430          2005 Saare
#> 8             4       Merilo       1         430          2005 Saare
#> 9             4       Merilo       1         430          2005 Saare
#> 10            4       Merilo       1         430          2005 Saare
#> 11            4       Merilo       1         430          2005 Saare
#> 12            4       Merilo       1         430          2005 Saare
#> 13            4       Merilo       1         430          2005 Saare
#> 14            4         Wang       1         381          2010    NA
#> Variables not shown: commonname (chr), cultivar_id (int), date (chr),
#>   dateloc (chr), genus (chr), id (int), lat (dbl), lon (dbl), mean (chr),
#>   month (dbl), n (int), notes (chr), result_type (chr), scientificname
#>   (chr), site_id (int), sitename (chr), species_id (int), stat (chr),
#>   statname (chr), trait (chr), trait_description (chr), treatment (chr),
#>   treatment_id (int), units (chr), year (dbl)
# equivalent:
# (out <- betydb_search("willow"))

Summarise data from the output data.frame

library("dplyr")
salix %>%
  group_by(scientificname, trait) %>%
  mutate(.mean = as.numeric(mean)) %>%
  summarise(mean = round(mean(.mean, na.rm = TRUE), 2),
            min = round(min(.mean, na.rm = TRUE), 2),
            max = round(max(.mean, na.rm = TRUE), 2),
            n = length(n))
#> Source: local data frame [4 x 6]
#> Groups: scientificname [?]
#> 
#>                    scientificname trait  mean   min   max     n
#>                             (chr) (chr) (dbl) (dbl) (dbl) (int)
#> 1                           Salix Vcmax 65.00 65.00 65.00     1
#> 2                Salix dasyclados Vcmax 46.08 34.30 56.68     4
#> 3 Salix sachalinensis × miyabeana Vcmax 79.28 79.28 79.28     1
#> 4                 Salix viminalis Vcmax 43.04 19.99 61.29     8

BETYdb is the Biofuel Ecophysiological Traits and Yields Database. You can get many different types of data from this database, including trait data.

Function setup: All functions are prefixed with betydb_. Plural function names like betydb_traits() accept parameters and always give back a data.frame, while singular function names like betydb_trait() accept an ID and give back a list.

The idea with the functions with plural names is to search for either traits, species, etc., and with the singular function names to get data by one or more IDs.

Example 2: Get yield data for Switchgrass (Panicum virgatum)

out <- betydb_search(query = "Switchgrass Yield")

Summarise data from the output data.frame

library("dplyr")
out %>%
  group_by(id) %>%
  summarise(mean_result = mean(as.numeric(mean), na.rm = TRUE)) %>%
  arrange(desc(mean_result))
## Source: local data frame [509 x 2]
## 
##       id mean_result
## 1   1666       27.36
## 2  16845       27.00
## 3   1669       26.36
## 4  16518       26.00
## 5   1663       25.35
## 6  16742       25.00
## 7   1594       24.78
## 8   1674       22.71
## 9   1606       22.54
## 10  1665       22.46
## ..   ...         ...

Note: this code illustrates how to join management events to yield records. It replicates figure 4a from LeBauer et al 2018. Could similarly be done with traits.

All code used in the manuscript is available on GitHub at https://github.com/ebimodeling/betydb_manuscript/.

library(traits)
library(dplyr)
library(ggplot2)
options(betydb_url = 'https://www.betydb.org',
        betydb_api_version = 'v1') 

Step 1: Query Yield data for switchgrass and miscanthus:

yields <- betydb_search(result_type = 'yields', limit = 'none')
grass_yields <- yields %>% 
    dplyr::filter(genus %in% c('Miscanthus', 'Panicum')) 

Query and agronomic metadata

Note: treatments are categorical, each study has >=1 treatment; managements describe the actual activities (planting, fertilization, irrigation, etc) and sometimes the level (planting density, fertilization rate, etc).

There is a many-to-many relationship between treatments and managements. One treatment can have many managements (e.g. control treatment had a planting date, a level of fertilization, etc). And each management can be associated with one or more treatments - e.g. the same planting for both a control and fertilized treatment.

So first we query the tables, then join them, then create new columns for the date and level of specific managements.

treatments <- betydb_query(table = 'treatments', limit = 'none') %>% 
  dplyr::mutate(treatment_id = id) %>% 
  dplyr::select(treatment_id, name, definition, control)
managements <- betydb_query(table = 'managements', limit = 'none') %>%
  dplyr::filter(mgmttype %in% c('fertilizer_N', 'fertilizer_N_rate', 'planting', 'irrigation')) %>%
  dplyr::mutate(management_id = id) %>%
  dplyr::select(management_id, date, mgmttype, level, units) 
# now link managements to treatments
m <- betydb_query(table = 'managements', associations_mode = 'ids', limit = 'none') 
managements_treatments <- m %>%
 select(treatment_id = `associated treatment ids`, management_id = id) %>% 
  tidyr::unnest()
managements <- managements %>%
  left_join(managements_treatments, by = 'management_id') %>%
  left_join(treatments, by = 'treatment_id') 

Now compute specific managements of interest

nitrogen <- managements %>% 
  dplyr::filter(mgmttype == "fertilizer_N_rate") %>%
  dplyr::select(treatment_id, nrate = level)
planting <- managements %>% 
  dplyr::filter(mgmttype == "planting") %>%
  dplyr::select(treatment_id, planting_date = date)
planting_rate <- managements %>% 
  dplyr::filter(mgmttype == "planting") %>%
  dplyr::select(treatment_id, planting_date = date, planting_density = level) %>% 
  dplyr::filter(!is.na(planting_density))
irrigation <- managements %>% 
  dplyr::filter(mgmttype == 'irrigation') 
irrigation_rate <- irrigation %>% 
  dplyr::filter(units == 'mm', !is.na(treatment_id)) %>% 
  group_by(treatment_id, year = sql("extract(year from date)"), units) %>% 
  summarise(irrig.mm = sum(level)) %>% 
  group_by(treatment_id) %>% 
  summarise(irrig.mm.y = mean(irrig.mm))
irrigation_boolean <- irrigation %>%
  group_by(treatment_id) %>% 
  dplyr::mutate(irrig = as.logical(mean(level))) %>% 
  dplyr::select(treatment_id, irrig = irrig)
irrigation_all <- irrigation_boolean %>%
  full_join(irrigation_rate, copy = TRUE, by = 'treatment_id')

Subset species of interest; combine with agronomic data

grass_yields <- grass_yields %>% 
  dplyr::filter(genus %in% c('Miscanthus', 'Panicum')) %>%
  left_join(nitrogen, by = 'treatment_id') %>% 
  left_join(planting, by = 'treatment_id') %>% 
  left_join(planting_rate, by = 'treatment_id') %>% 
  left_join(irrigation_all, by = 'treatment_id', copy = TRUE) %>% 
  dplyr::mutate(age = lubridate::year(raw_date)- lubridate::year(planting_date),
         nrate = ifelse(is.na(nrate), 0, nrate),
         SE = ifelse(statname == "SE", stat, ifelse(statname == 'SD', stat / sqrt(n), NA)),
         continent = ifelse(lon < -30, 'united_states', ifelse(lon < 75, 'europe', 'asia'))) %>%
  dplyr::select(date, lat, lon, nrate, planting_date, planting_density, irrig, 
         irrig.mm.y, age, mean, n, SE, scientificname, genus, continent, 
         sitename, author, year) %>% 
  dplyr::filter(!duplicated(.))
save(grass_yields, file = "grass_yields.RData")

Reproduce figure 4a, but without regression fits for simplicity

ggplot(data = grass_yields, aes(x = nrate, color = genus)) +
  geom_point(aes(x = jitter(nrate, 20), y = mean), alpha = 0.25, size = 0.25) +
  ylab(expression(Yield~~"(Mg "*ha^"-1"*yr^"-1"*")")) +
  xlab(expression("Nitrogen Fertilization Rate"~~"(kg "*ha^"-1"*yr^"-1"*")")) + 
  xlim(0,250) +
  scale_colour_brewer(palette = "Set1", labels = c('Miscanthus', 'Panicum (Switchgrass)'))

Advanced Queries

The tables above will return values of _tablename_id that can be used to query other tables

Query a Single trait record by its id

betydb_trait(id = 10)
## $created_at
## NULL
## 
## $description
## [1] "Leaf Percent Nitrogen"
## 
## $id
## [1] 10
## 
## $label
## NULL
## 
## $max
## [1] "10"
## 
## $min
## [1] "0.02"
## 
## $name
## [1] "leafN"
## 
## $notes
## [1] ""
## 
## $standard_name
## NULL
## 
## $standard_units
## NULL
## 
## $units
## [1] "percent"
## 
## $updated_at
## [1] "2011-06-06T09:40:42-05:00"

Query a single Species

betydb_specie(id = 10)
## $AcceptedSymbol
## [1] "ACKA2"
## 
## $commonname
## [1] "karroothorn"
## 
## $created_at
## NULL
## 
## $genus
## [1] "Acacia"
## 
## $id
## [1] 10
## 
## $notes
## [1] ""
## 
## $scientificname
## [1] "Acacia karroo"
## 
## $spcd
## NULL
## 
## $species
## [1] "karroo"
## 
## $updated_at
## [1] "2011-03-01T15:02:25-06:00"

Query a single Citation

betydb_citation(10)
## $author
## [1] "Casler"
## 
## $created_at
## NULL
## 
## $doi
## [1] "10.2135/cropsci2003.2226"
## 
## $id
## [1] 10
## 
## $journal
## [1] "Crop Science"
## 
## $pdf
## [1] "http://crop.scijournals.org/cgi/reprint/43/6/2226.pdf"
## 
## $pg
## [1] "2226–2233"
## 
## $title
## [1] "Cultivar X environment interactions in switchgrass"
## 
## $updated_at
## NULL
## 
## $url
## [1] "http://crop.scijournals.org/cgi/content/abstract/43/6/2226"
## 
## $user_id
## NULL
## 
## $vol
## [1] 43
## 
## $year
## [1] 2003

Query a single Site

betydb_site(id = 1)
## $city
## [1] "Aliartos"
## 
## $country
## [1] "GR"
## 
## $geometry
## [1] "POINT (23.17 38.37 114.0)"
## 
## $greenhouse
## [1] FALSE
## 
## $notes
## [1] ""
## 
## $sitename
## [1] "Aliartos"
## 
## $state
## [1] ""

Last updated