rOpenSci Traits Package
The rOpenSci traits package is an R client for various sources of species trait data. The traits package provides functions that interface with the BETYdb API.
These instructions are from the traits package documentation, which is released with a MIT-BSD license.
Launch R
$ R
Install the stable version of the package from CRAN:
install.packages("traits")
Or development version from GitHub:
devtools::install_github("ropensci/traits")
library("traits")
Get trait data for Willow (Salix spp.)
(salix <- betydb_search("Salix Vcmax"))
#> Source: local data frame [14 x 31]
#>
#> access_level author checked citation_id citation_year city
#> (int) (chr) (int) (int) (int) (chr)
#> 1 4 Merilo 1 430 2005 Saare
#> 2 4 Merilo 1 430 2005 Saare
#> 3 4 Merilo 1 430 2005 Saare
#> 4 4 Merilo 1 430 2005 Saare
#> 5 4 Wullschleger 1 51 1993 NA
#> 6 4 Merilo 1 430 2005 Saare
#> 7 4 Merilo 1 430 2005 Saare
#> 8 4 Merilo 1 430 2005 Saare
#> 9 4 Merilo 1 430 2005 Saare
#> 10 4 Merilo 1 430 2005 Saare
#> 11 4 Merilo 1 430 2005 Saare
#> 12 4 Merilo 1 430 2005 Saare
#> 13 4 Merilo 1 430 2005 Saare
#> 14 4 Wang 1 381 2010 NA
#> Variables not shown: commonname (chr), cultivar_id (int), date (chr),
#> dateloc (chr), genus (chr), id (int), lat (dbl), lon (dbl), mean (chr),
#> month (dbl), n (int), notes (chr), result_type (chr), scientificname
#> (chr), site_id (int), sitename (chr), species_id (int), stat (chr),
#> statname (chr), trait (chr), trait_description (chr), treatment (chr),
#> treatment_id (int), units (chr), year (dbl)
# equivalent:
# (out <- betydb_search("willow"))
Summarise data from the output
data.frame
library("dplyr")
salix %>%
group_by(scientificname, trait) %>%
mutate(.mean = as.numeric(mean)) %>%
summarise(mean = round(mean(.mean, na.rm = TRUE), 2),
min = round(min(.mean, na.rm = TRUE), 2),
max = round(max(.mean, na.rm = TRUE), 2),
n = length(n))
#> Source: local data frame [4 x 6]
#> Groups: scientificname [?]
#>
#> scientificname trait mean min max n
#> (chr) (chr) (dbl) (dbl) (dbl) (int)
#> 1 Salix Vcmax 65.00 65.00 65.00 1
#> 2 Salix dasyclados Vcmax 46.08 34.30 56.68 4
#> 3 Salix sachalinensis × miyabeana Vcmax 79.28 79.28 79.28 1
#> 4 Salix viminalis Vcmax 43.04 19.99 61.29 8
BETYdb is the Biofuel Ecophysiological Traits and Yields Database. You can get many different types of data from this database, including trait data.
Function setup: All functions are prefixed with
betydb_
. Plural function names like betydb_traits()
accept parameters and always give back a data.frame, while singular function names like betydb_trait()
accept an ID and give back a list.The idea with the functions with plural names is to search for either traits, species, etc., and with the singular function names to get data by one or more IDs.
out <- betydb_search(query = "Switchgrass Yield")
Summarise data from the output
data.frame
library("dplyr")
out %>%
group_by(id) %>%
summarise(mean_result = mean(as.numeric(mean), na.rm = TRUE)) %>%
arrange(desc(mean_result))
## Source: local data frame [509 x 2]
##
## id mean_result
## 1 1666 27.36
## 2 16845 27.00
## 3 1669 26.36
## 4 16518 26.00
## 5 1663 25.35
## 6 16742 25.00
## 7 1594 24.78
## 8 1674 22.71
## 9 1606 22.54
## 10 1665 22.46
## .. ... ...
Note: this code illustrates how to join management events to yield records. It replicates figure 4a from LeBauer et al 2018. Could similarly be done with traits.
All code used in the manuscript is available on GitHub at https://github.com/ebimodeling/betydb_manuscript/.
library(traits)
library(dplyr)
library(ggplot2)
options(betydb_url = 'https://www.betydb.org',
betydb_api_version = 'v1')
Step 1: Query Yield data for switchgrass and miscanthus:
yields <- betydb_search(result_type = 'yields', limit = 'none')
grass_yields <- yields %>%
dplyr::filter(genus %in% c('Miscanthus', 'Panicum'))
Note: treatments are categorical, each study has >=1 treatment; managements describe the actual activities (planting, fertilization, irrigation, etc) and sometimes the level (planting density, fertilization rate, etc).
There is a many-to-many relationship between treatments and managements. One treatment can have many managements (e.g. control treatment had a planting date, a level of fertilization, etc). And each management can be associated with one or more treatments - e.g. the same planting for both a control and fertilized treatment.
So first we query the tables, then join them, then create new columns for the date and level of specific managements.
treatments <- betydb_query(table = 'treatments', limit = 'none') %>%
dplyr::mutate(treatment_id = id) %>%
dplyr::select(treatment_id, name, definition, control)
managements <- betydb_query(table = 'managements', limit = 'none') %>%
dplyr::filter(mgmttype %in% c('fertilizer_N', 'fertilizer_N_rate', 'planting', 'irrigation')) %>%
dplyr::mutate(management_id = id) %>%
dplyr::select(management_id, date, mgmttype, level, units)
# now link managements to treatments
m <- betydb_query(table = 'managements', associations_mode = 'ids', limit = 'none')
managements_treatments <- m %>%
select(treatment_id = `associated treatment ids`, management_id = id) %>%
tidyr::unnest()
managements <- managements %>%
left_join(managements_treatments, by = 'management_id') %>%
left_join(treatments, by = 'treatment_id')
Now compute specific managements of interest
nitrogen <- managements %>%
dplyr::filter(mgmttype == "fertilizer_N_rate") %>%
dplyr::select(treatment_id, nrate = level)
planting <- managements %>%
dplyr::filter(mgmttype == "planting") %>%
dplyr::select(treatment_id, planting_date = date)
planting_rate <- managements %>%
dplyr::filter(mgmttype == "planting") %>%
dplyr::select(treatment_id, planting_date = date, planting_density = level) %>%
dplyr::filter(!is.na(planting_density))
irrigation <- managements %>%
dplyr::filter(mgmttype == 'irrigation')
irrigation_rate <- irrigation %>%
dplyr::filter(units == 'mm', !is.na(treatment_id)) %>%
group_by(treatment_id, year = sql("extract(year from date)"), units) %>%
summarise(irrig.mm = sum(level)) %>%
group_by(treatment_id) %>%
summarise(irrig.mm.y = mean(irrig.mm))
irrigation_boolean <- irrigation %>%
group_by(treatment_id) %>%
dplyr::mutate(irrig = as.logical(mean(level))) %>%
dplyr::select(treatment_id, irrig = irrig)
irrigation_all <- irrigation_boolean %>%
full_join(irrigation_rate, copy = TRUE, by = 'treatment_id')
Subset species of interest; combine with agronomic data
grass_yields <- grass_yields %>%
dplyr::filter(genus %in% c('Miscanthus', 'Panicum')) %>%
left_join(nitrogen, by = 'treatment_id') %>%
left_join(planting, by = 'treatment_id') %>%
left_join(planting_rate, by = 'treatment_id') %>%
left_join(irrigation_all, by = 'treatment_id', copy = TRUE) %>%
dplyr::mutate(age = lubridate::year(raw_date)- lubridate::year(planting_date),
nrate = ifelse(is.na(nrate), 0, nrate),
SE = ifelse(statname == "SE", stat, ifelse(statname == 'SD', stat / sqrt(n), NA)),
continent = ifelse(lon < -30, 'united_states', ifelse(lon < 75, 'europe', 'asia'))) %>%
dplyr::select(date, lat, lon, nrate, planting_date, planting_density, irrig,
irrig.mm.y, age, mean, n, SE, scientificname, genus, continent,
sitename, author, year) %>%
dplyr::filter(!duplicated(.))
save(grass_yields, file = "grass_yields.RData")
Reproduce figure 4a, but without regression fits for simplicity
ggplot(data = grass_yields, aes(x = nrate, color = genus)) +
geom_point(aes(x = jitter(nrate, 20), y = mean), alpha = 0.25, size = 0.25) +
ylab(expression(Yield~~"(Mg "*ha^"-1"*yr^"-1"*")")) +
xlab(expression("Nitrogen Fertilization Rate"~~"(kg "*ha^"-1"*yr^"-1"*")")) +
xlim(0,250) +
scale_colour_brewer(palette = "Set1", labels = c('Miscanthus', 'Panicum (Switchgrass)'))
The tables above will return values of _tablename_id that can be used to query other tables
betydb_trait(id = 10)
## $created_at
## NULL
##
## $description
## [1] "Leaf Percent Nitrogen"
##
## $id
## [1] 10
##
## $label
## NULL
##
## $max
## [1] "10"
##
## $min
## [1] "0.02"
##
## $name
## [1] "leafN"
##
## $notes
## [1] ""
##
## $standard_name
## NULL
##
## $standard_units
## NULL
##
## $units
## [1] "percent"
##
## $updated_at
## [1] "2011-06-06T09:40:42-05:00"
betydb_specie(id = 10)
## $AcceptedSymbol
## [1] "ACKA2"
##
## $commonname
## [1] "karroothorn"
##
## $created_at
## NULL
##
## $genus
## [1] "Acacia"
##
## $id
## [1] 10
##
## $notes
## [1] ""
##
## $scientificname
## [1] "Acacia karroo"
##
## $spcd
## NULL
##
## $species
## [1] "karroo"
##
## $updated_at
## [1] "2011-03-01T15:02:25-06:00"
betydb_citation(10)
## $author
## [1] "Casler"
##
## $created_at
## NULL
##
## $doi
## [1] "10.2135/cropsci2003.2226"
##
## $id
## [1] 10
##
## $journal
## [1] "Crop Science"
##
## $pdf
## [1] "http://crop.scijournals.org/cgi/reprint/43/6/2226.pdf"
##
## $pg
## [1] "2226–2233"
##
## $title
## [1] "Cultivar X environment interactions in switchgrass"
##
## $updated_at
## NULL
##
## $url
## [1] "http://crop.scijournals.org/cgi/content/abstract/43/6/2226"
##
## $user_id
## NULL
##
## $vol
## [1] 43
##
## $year
## [1] 2003
betydb_site(id = 1)
## $city
## [1] "Aliartos"
##
## $country
## [1] "GR"
##
## $geometry
## [1] "POINT (23.17 38.37 114.0)"
##
## $greenhouse
## [1] FALSE
##
## $notes
## [1] ""
##
## $sitename
## [1] "Aliartos"
##
## $state
## [1] ""
Last modified 8mo ago