This page provides a general description of how to insert trait data via the v1 version of the BETYdb API. For information about accessing data via the v1 BETYdb API, visit https://pecan.gitbooks.io/betydb-data-access/content/API/beta_API.html. For a list of URLs of API endpoints, visit https://www.betydb.org/api/docs.
The path to use for trait insertion is /api/v1/traits(.EXT)
where EXT
is csv
, xml
, or json
. These are used to submit data in CSV, XML, and JSON format, respectively. If no extension is given, JSON format is assumed. A user must have Creator status (page access level 3) in order to use the trait insertion API.
To see some valid sample files in all three of the supported formats, see Running the Examples below.
XML data files must validate against the schema specified by app/lib/api/validation/TraitData.xsd
. It is possible to validate files on the command line with xmllint
:
Note that data files of other types (CSV and JSON) are converted to XML internally and then validated using this schema.
The root level elements is named trait-data-set
.
trait
elements appear below the root, either directly, or nested within intervening trait-group
elements.
A trait
element must include a mean
attribute with a value of type double. An exponent may be used; for example, in place of "0.0123" one may write "1.23E-2".
A trait
element may have the following attributes:
An attribute utc_datetime
whose value has the form "YYYY-MM-DDZ" or "YYYY-MM-DDTHH:MM:SSZ" which represents the date or date and time the trait measurement was taken. The trailing "Z" emphasizes that the value is in UTC time and is required. If a time of day is given, the symbol "T" must separate the time from the date. If a time is given, fractional seconds may be included.
A local_datetime
attribute may be used in place of utc_datetime
. This represents the date or date and time of a trait measurement in local time. This attribute may only be used if the trait is associated with a site having a specified time zone. The value format is the same as for utc_datetime
except there is no trailing "Z".
An attribute access_level
must be supplied unless it is supplied using the defaulting mechanism (see below).
A trait
element may have the following child elements:
site
: This may have any of the attributes id
, city
, state
, country
, and sitename
. It must have enough of these attributes to uniquely identify an existing site. Withing the database table sites
, the values of sitename
should be unique and sitename
is the preferred attribute to use in identifying a site. Unfortunately, uniqueness is not currently enforced and there are in fact several cases of multiple sites sharing the same site name. In general, using the id
attribute to identify a particular trait association is strongly discouraged and should be used only when necessary.
species
: Allowed attributes: id
, genus
, species
, scientificname
, commonname
, AcceptedSymbol
. scientificname
is the preferred attribute for identifying a species. Within the database table species
, non-blank values of scientificname
should be unique but this constraint is not yet enforced. (scientificname
is left blank in cases where a species row represents a category of plant; in this case, the commonname
column is used to describe the category.)
If a particular cultivar of the species is intended, a child cultivar
element should be included. This element may use either a name
attribute (preferred) or an id
attribute to identify the cultivar. Cultivar names are guaranteed to be unique within a given species.
citation
: Allowed attributes: id
, author
, year
, title
, doi
. The preferred method of selecting a citation is by doi or by author, year, and title (often just author and year will suffice).
treatment
: Allowed attributes: id
, name
, control
. The preferred method of selecting a method is by name. [In the process of implementation: A citation is required, either directly on the trait or as a default for a group of traits, if a treatment is to be specified. Moreover, the specified treatment must be associated with the specified citation. This will often make it possible to use the name
attribute to specify a treatment, since only treatments associated with the given citation will be considered when selecting by name.]
variable
: This specifies what the trait measure. This element must be included if it is not specified in a defaults
element (see below). Allowed attributes: id
, name
, description
. name
is the preferred attribute to use to specify the variable and should be unique, but this isn't yet enforced and there are a few cases of duplicates.
method
: Allowed attributes: name
. This element must have a citation child element. (This citation has no ostensive relation to the citation associated with the trait.) Together, the name and the citation should uniquely determine which method is being used. [To do: Constrain the methods
table to ensure that this is always possible.]
covariates
: This element specifies what covariates are associated with a trait measurement. It allows no attributes but must contain one or more covariate
child elements. Each covariate
element must contain a variable
element (specifying what the covariate measures) and have a level
attribute (specifying the value of that measurement).
entity
: Allowed attributes: name
and notes
. An entity with the given value for name
and notes
will be created if no entity with the given name exists. [To be implemented: It is an error to specify an entity at the trait level having a blank name. It is an error to supply a notes
attribute if name
refers to an existing entity.] [To do: Guarantee uniqueness of non-blank names in the entities table.]
The eight elements just mentioned specify how the trait is associated with data in other tables. In addition, a trait may include two additional elements that further describe the trait:
stat
: If a trait describes a group of of measurements (as opposed to a single measurement), a stat
element may be included. It must have the following three attributes:
sample_size
, a positive integer.
name
, the name of the statistic measured. Possible values are "SD", "SE", "MSE" "95%Cl", "LSD", and "MSD".
value
, a double giving the value of the named statistic.
notes
: This is an element having no attributes but containing free-form textual content.
Using a single entity for the whole data set.
If all of the traits in the data set should share the same entity, it is possible to specify this by placing an entity
element as the first child of the root trait-data-set
element. The element has the same form as an entity
element contained inside a trait
element except that this global entity is allowed to be anonymous, that is, to have no name or notes attribute. If a global entity
element is used, it must be the only entity
element in the document, and no trait-group
elements may be used in the document (see below).
Trait groups.
If a group of traits share a number of characteristics, it is possible to nest them within a trait-group
element. This is mainly useful in the following two cases:
Some (but not all) of the traits in the file should be associated with the same entity.
Some (but not all) of the traits in the file share the same metadata (site, citation, treatment, variable, date, species, etcetera).
Multiple level of nesting may be used: trait-group
elements may themselves contain trait-group
elements.
Entities for trait groups.
If a trait-group
element has no trait-group
child element, then it may contain, as its first child element, an entity
element. This usage is similar to the data-set entity usage describe above except that the entity will only be used for the traits in the trait group. If a trait group does use an entity
element, then none of the traits in the trait group can have their own entity
element.
Specifying metadata for sets of traits.
If many traits have a common citation, site, species, etcetera, it is possible to avoid repeating this information for each individual trait by using a defaults
element. A defaults
element may appear as the child of the trait-data-set
element (in which case the defaults apply to all of the traits in the document) or as the child of a trait-group
element (in which case it applies only to the traits within that group).
defaults
elements have many of the same attributes and child elements as trait
elements:
Allowed attributes are access_level
, utc_datetime
, and local_datetime
. local_datetime
is allowed only if a site having a time zone is specified in the defaults
element or in a defaults
element at a higher level and if the specified site is not overridden at a lower level (see below).
Allowed child elements are site
, species
, citation
, treatment
, variable
, and method
.
As for the other attributes and elements used with trait
elements, since the mean
attribute and the stat
, notes
, and covariates
elements are inherently trait-specific, they cannot be used with the defaults
element. (entity
elements applying to groups of traits are direct children of the trait-data-set
element or a trait-group
element rather than being nested within a defaults
element.)
A default specified by a defaults
element will apply to all traits occuring within the parent of the defaults
element unless overridden. A default may be overridden either by another defaults
element appearing at a lower level or by attributes and child elements of an individual trait.
[To-do]
The format to use for CSV trait data files is largely the same as that required for the Bulk Upload wizard explained in the previous section. (See the templates traits.csv and traits_by_doi.csv.) Some significant differences from the bulk-upload case are:
The date of a trait measurement must be given in a column with one of the following headings.
If the heading "utc_datetime" is used, the supplied values must conform to one of the following formats: 1918-11-11T10:00:00Z
or 1918-11-11Z
. In particular, the time must be given in UTC time (hence the "Z"), and if the time is specified (first format), the letter "T" must separate the date and the time portions. If the time is specified, seconds must be included; optionally, fractional seconds may be included as well. The resulting dateloc
value will always be 5
(exact date); the timeloc
value will be 1
(time to the second) if a time is given and 9
(no data) otherwise.
If the heading "local_datetime" is used, the supplied values must conform to one of the following formats: 1918-11-11T11:00:00
or 1918-11-11
. In particular, if the time is specified (first format), the letter "T" must separate the date and the time portions. If the time is specified, seconds must be included; optionally, fractional seconds may be included as well. The resulting dateloc
value will always be 5
(exact date); the timeloc
value will be 1
(time to the second) if a time is given and 9
(no data) otherwise.
When "local_datetime" is used to specify the date-time value of a trait measurement, the date and time are assumed to be local (site) time if a site is given and if that site has a time zone value stored. Otherwise, the value given is assumed to be UTC time. (The date-time value is always stored in the database as UTC time. This paragraph has to do with how the supplied date-time value is interpreted when read.)
Note that only one or the other of these columns may occur in the CSV file. Otherwise an error results.
Meta-data can not be specified interactively. Thus any associated citation, site, species, cultivar, or treatment must be specified in each row of the CSV file. (This may later change so that repeated metadata specification may be avoided.)
Unlike the bulk upload case, matching of metadata entries is case sensitive. Thus, if the CSV file specifies the species as "Sorghum Bicolor" but the database entry for the species specifies the scientificname as "Sorghum bicolor", the upload will not be successful.
Unlike the bulk upload case, it is not necessary to specify an associated citation, site, species, or treatment. The sample file SIMPLE_CSV_TEST_DATA
demonstrates the case where a trait value having no associated metadata is inserted.
It is required, however, to specify an access level for each trait; therefore, the CSV file must have a column named access_level
.
When specifying the citation in a CSV file for use with the Bulk Upload wizard, it is necessary to have either a citation_doi
column, or have all three of the columns citation_author
, citation_year
, and citation_title
. When using the API, however, any combination of these may be used so long as the values specified in each row determine a unique citation. For example, if there is only one citation with author "Doe", and if that is the value that occurs in the citation_author
column of every row of the table, then it is unnecessary to have a citation_year
or citation_title
column.
Just as for bulk uploads, the trait_covariate_associations
table is consulted to determine which column names correspond to trait variables and which ones correspond to covariate variables, and further, which covariates correspond to which traits. But failing to specify a required covariate for one or more traits will not result in an error. (Thus, in essence, required covariates are treated just like optional covariates; they will be associated if present, but no complaint will be made if they are not.)
No rounding is done of floating point values except to the extent required to fit within PostgreSQL's 8-byte float type. (Note that all floating point values may be specifed with an exponent; for example 3.20E-2 in place of 0.0320.)
As mentioned above, all trait insertion API calls generate an HTTP response. The response will use the same format as the format of the file submitted except in the case of CSV files, where the response is given in JSON format.
In the case of unsuccessful API calls, the response will contain information about the types of errors that caused the call to be unsuccessful. These errors can be classified as follows:
If an invalid API key is given, or if the given key is for a user who isn't authorized to perform the given action, an authorization error is returned. (To do: Distinguish between authentication and authorization.)
If a citation, site, species, or treatment is specified that doesn't match exactly one item in the database, a lookup error occurs. This causes the whole data set to be returned in tree form as annotated_post_data
. The annotations will be the error items next to the data item that caused the error.
As mentioned above, data files in CSV and JSON format are converted to XML format and then validated against an XML schema. For CSV files, since the structure of the XML document generated by the converter is generally correct, this error usually arises only when a data value of the wrong type is given (for example, an alphabetical string where a number is expected). But there are other situations that can trigger a validation error: for example, if a sample size column (n
) is given without including a standard error (SE
) column, or vice versa.
These errors occur when attempting to save a Trait object to the database and may occur if a variable value is found to be out of range or if a required attribute (e.g. access_level
) is missing. As in the lookup error case, this causes the whole data set to be returned in tree form as annotated_post_data
.
There are five sample data files in the directory app/lib/api/test
.
SIMPLE_XML_TEST_DATA
: A minimal XML data file consisting of a single trait. It provides only the (currently) required values: the trait variable name, the trait value, and an access level to specify who may view this data item.
SIMPLE_CSV_TEST_DATA
: A minimal CSV data file consisting of a single trait. It provides only a single variable name and value and the access level.
TEST_XML_DATA
: A full-fledged XML data file making use of all of the features available for XML trait data insertion: specification of defaults for groups of traits, meta-data lookup, and complete latitude for associating specific groups of covariates and specific sample statistics with specific traits.
TEST_JSON_DATA
: This JSON data file is an exact analogue to TEST_XML_DATA
; it should result in exactly the same trait data being inserted.
TEST_CSV_DATA
: This CSV data file has five column headings corresponding trait variables and two columns headings corresponding to covariate variable. There is a single data row, so when this file is ingested, a single new entity will be created having 5 associated traits and each trait will have 2 associated covariates. Complete metadata is given for the entity (or equivalently, for the traits it comprises).
You can upload the data in these files using curl
. To try this out, start your Rails server locally with rails s
and then, from the /api/lib/api/test
directory, run the command
(Substitute any of the other sample file names for TEST_XML_DATA
as desired, changing the .xml
extension to .json
or .csv
where appropriate. If a CSV file is being uploaded, add the option -H "Content-Type: text/csv"
to the curl
command.)
These API calls all generate a response (in XML format for the XML endpoint and in JSON format for the JSON endpoints). If the call is successful, the response will contain a list of the ids of the new traits that were inserted. Note that new entities and possibly new covariates will also be inserted, but the information about these is not (currently) contained in the response.
It's too easy to make a mistake without realizing it.
Examples:
a. If you misspell a trait variable name in the heading, that column will simple be
ignored; no error will occur if there exists at least one valid trait variable
in the heading.
b. If you include the same heading twice, the value is one column will overwrite those in the other.
Some error messages are obscure and seemingly unrelated to the error that
triggered them.
These errors should really be detected during CSV file parsing before attempting to convert to a valid XML file.
There are three phases for a basic bulk upload of data:
Use the web interface
to enter metadata pertaining to your data set (new sites, species, cultivars, citations, or treatments);
to obtain a template appropriate for your data set.
Fill in the template with your data. There are four templates to choose from:
yields.csv — Use this template if you are uploading yield data and you wish to specify the citation in the file by author, year, and title.
If your data includes standard error and cultivar information and you do not plan to specify any of the required information interactively, you will be able to use this template “as-is”. Otherwise, you will need to delete one or more columns:
If your data has no standard error information, delete both the SE
and the n
column.
If your data set has a single uniform value for the site, species, cultivar, treatment, access_level, or date, then these values may be entered interactively through the web interface; in this case you should delete the corresponding column(s) from the template.
Note that cultivar information can’t be specified interactively unless species information is as well; delete the cultivar
column if and only if you either have no cultivar information or you are specifying both the species and the cultivar interactively.
yields_by_doi.csv — Use this template if you are uploading yield data and you wish to specify the citation in the file by doi.
Again, if you do not have data for all of the columns listed in the template, or if you plan to specify some of the data interactively, you will have to delete one or more columns.
You may also use this template if all of the data in your data set pertains to a single citation and you wish to specify that citation interactively. In this case, you must delete the citation_doi
column.
traits.csv — Use this template if you are uploading trait data and you wish to specify the citation in the file by author, year, and title.
This template must be modified before it can be used. In particular, the column headings [trait variable 1]
…[trait variable n]
must be replaced by actual variable names that exactly match names of variables in the database that have been marked to be recognized as trait variables. The number of these trait variable columns may need to be increased or decreased to accomodate the data set.
Some trait variables allow or even require corresponding covariate information to be included. Again, the column headings [covariate 1]
…[covariate n]
must be changed to actual covariate variable names, and the number of these columns may need to be increased or decreased to match the available information. As with the yield data templates, some columns may also need to be deleted. For a list of recognized trait variable names and their corresponding required and optional covariates, visit the trait variable/covariates list at www.betydb.org. [TO-DO: Make this Web page.]
traits_by_doi.csv — As with the corresponding yield data template, use this template if you are uploading trait data and you wish to specify the citation in the file by doi or if you plan to specify the citation interactively (in which case delete the citation_doi
column). Again, this template must be modified before it can be used.
Use the web interface to upload your data set and insert it into
the database.
In what follows, the term “field” always refers either to a column name used in the heading of the uploaded CSV file or to an entry in that column in some particular row of the file. On the other hand, and the term “column” may either refer to a column of data in the uploaded CSV file or to an attribute of a trait or yield datum in the traits or yields table of the database.
Example of a template for bulk upload of yield data:
For yields uploads, the only required field is a yield
column.
For trait uploads, there must be at least one column whose label exactly matches the variable name for the trait value being specified. (Leading and trailing spaces are permitted, but letter case must exactly match the name of the variable specified in the database.) If this trait variable has any required covariates, columns for these covariates must be included.
Data values may be specified interactively only if there is a single value that pertains to the whole data set.
Information that references existing database entries
Citation
If only one citation for the entire dataset exists, it may be specified interactively by choosing a citation on the citations page instead of including citation information in the CSV file.
Otherwise, specify the citation in the CSV file, either by doi or by author, year, and title.
If a DOI is available for all citations in the data set, the citation corresponding to each row may be specified in a citation_doi
column. In this case, the citation_author
, citation_year
, and citation_title
columns must not be in the column heading list. (If such information is already included in the data set, to keep such columns for purely informational purposes, the string -ignore
may be appended to each of these headings. One might want to do this, for example, to keep a visual record of the author, year, and title even when it is the citation doi that is being used to determine how the data will associated with a citation in the database.) Each value in the citation_doi
column must exactly match the doi
attribute of some row in the citations
table except that letter case and leading and trailing spaces are ignored.
Conversely, if a DOI is not available for all citations in the data set, or if it is preferred to specify the citation by author, year, and title, then the citation_doi
column should not be included and the columns citation_author
, citation_year
, and citation_title
must all be present. (Again, if some DOI information is already included and you wish to retain it for purely informational purposes, simply give the column some descriptive name other than citation_doi
and it will be ignored by the upload code.)
Site
If all of the data in the data set pertains to a single site, that site may be specified interactively.
Otherwise, a site
column is required. The value must match an existing sitename
column value in the sites
table of the database. (Letter case, leading and trailing spaces, and extra internal spaces are ignored when searching for a match.)
Species
If all of the data in the data set pertains to a single species, that species may be specified interactively.
Otherwise, the species
column is required. The value must match an existing scientificname
column value in the species
table of the database. (Again, letter case, leading and trailing spaces, and extra internal spaces are ignored when searching for a match.)
Treatment
If a single treatment and a single citation applies to all of the data in the data set, the treatment may be specified interactively provided that the citation is specified interactively as well.
Otherwise, a treatment
column is required. The value must match an existing name
column value in the treatments
table of the database; moreover, this matching treatment must be consistent with the specified citation. (Again, letter case, leading and trailing spaces, and extra internal spaces are ignored when searching for a match.)
Other information that may be specified interactively
Date
If a single date applies to all of the data in the data set, the date may be specified interactively.
Otherwise, a date
column is required.
Date values must be in the form YYYY-MM-DD. For example, July 25, 2003 must be entered as “2003-07-25”. (Eventually, month and day may become optional, in which case any of the forms “2003-07-25”, “2003-07”, and “2003” would represent dates of varying degrees of specificity. Note that uploading dateloc, time, and timeloc information is not supported.)
Rounding
The amount of rounding for numerical data can only be specified interactively. Any value from 1 to 4 significant digits may be chosen. The amount of rounding for the standard error SE (if present) may be specified separately from the amount of rounding for yield and for trait variables and their covariates.
By default, all numerical data is rounded to three significant digits. For example, with this default in place, 999.1 will be rounded to 999 and 1001.1 will be rounded to 1000.
Data for Yields
Yield
Every yield data upload file must have a yield
column. The data in this column must always be a parsable non-negative number and must never be blank. Scientific notation is not currently supported. As noted above, the number given in the file is subject to rounding before being inserted into the database.
Sample Size
An n
column is required if and only if an SE
column is included. The value must always be an integer greater than 1.
Standard Error
An SE
column is required if and only if an n
column is included; this datum will be inserted into the stat
column of the yields
table, and the statname
column value will be set to “SE”.
Data for Traits
Trait variable values
Every trait data upload file must have at least one column whose heading matches the name of some recognized trait variable. A list of recognized trait variables is listed on the BetyDB web site. If multiple trait variable columns are used, each row in the CSV file will produce one row in the traits
table for each trait variable column. (These resulting rows will be effectively grouped by assigning them a unique entity id. Said another way, there is a one-to-one correspondence between rows in the CSV file and resultant rows in the entities
table, the table that keeps track of this grouping.) As with yield numbers, the data in this column must always be a parsable number and is subject to rounding before being inserted into the database. In addition, it must conform to any range restrictions on the value of the variable.
The template for traits uploads provides dummy column headings [trait variable 1]
, [trait variable 2]
, etc., which must be changed to actual variable names before data can be uploaded.
Covariate values
If any of the included trait variables has a required covariate, there must be a column corresponding to that covariate.
For any of the included trait variables that has an optional covariate, a column corresponding to that covariate may be included.
The template for traits uploads provides dummy column headings [covariate 1]
, [covariate 2]
, etc., which must be changed to actual variable names before data can be uploaded.
Sample Size and Standard Error
An SE
column is required if and only if an n
column is included; this datum will be inserted into the stat
column of the traits
table, and the statname
column value will be set to “SE”. Note that if you have more than one trait variable column, each trait will get the same values of n
and SE
. There is currently no way to use different sample size and standard error values for different trait variables. Also, the n
and SE
values for any associated covariates will be set to NULL. (Eventually, we may enable associating differing values of n
and SE
to different trait variables and covariates. In this case, we might add columns [trait variable 1] n
and [trait variable 1] SE
, etc. or [covariate 1] n
and [covariate 1] SE
, prefixing the usual column heading with a variable name to indicate which variable the sample size and standard error value is to be associated with.)
Again, values of n
must be at least 2, and columns for n
and SE
must both be present or both be absent.
Sample Size and Standard Error
As noted above, these are both optional, but if one of these is included, the other must be included as well. In other words, the column heading list must include both of n
and SE
(or, in the case of traits, [trait or covariate variable k] n
and [trait or covariate variable k] SE
) or neither. Note that if n
and SE
are not given fields of the uploaded CSV file, the value of the n
column of the traits or yields table will default to 1 and the stat
and statname
column values will default to NULL.
Cultivar
If a uniform value for the species is provided interactively when uploading the data set, the cultivar may be specified this way as well, provided that it also has a uniform value for the whole data set.
Otherwise, to include cultivar information in the upload file, both a species
and a cultivar
column must be included. The values in the cultivar
column are allowed to be blank (in which case a value of NULL is inserted into the cultivar_id
column for the given row); but if provided, the value must match the value of the name
column in some row of the cultivars
table, and moreover, this row must be a row associated with the species corresponding to the value given in the species
column. Again, matching is case insensitive, and leading, trailing, and excess internal whitespace is ignored.
Notes
To include notes, use a notes
column. There is no restriction on what can be included in this column, but leading and trailing space will be stripped before insertion into the database. Non-ascii characters entered in the file in UTF-8 encoding are allowed. If there is no notes
column, each row inserted into the traits
or yields
table will use the empty string as the value for the notes
column.
In general, a 'trait' is a phenotype (a characteristic that the plant exhibits). The traits that we are primarily interested in collecting data for are listed in Table \ref{tab:traits}. Before adding trait data, it is necessary to have the citation, treatments, and site information already entered. If the correct citation is not identified at the top of the page . To add a new Trait, go to the page: Trait
→ new
.
Key Traits Stored in BETYdb
Presently, we are also using the Trait table to record ecosystem level measurements other than Yield. Such ecosystem level measurements can include leaf area index or net primary productivity, but are only collected when required for a particular project. Most of the fields in the Traits table are also used in the Yields table. Here is a list of the fields with a brief description, followed by more thorough explanations:
Species: Search for species in the database using the search box; if species
is not found, then the new species should be added to the database.
Cultivar: primarily used for crops; If the cultivar being used is not found in
drop-down box
DateLOC: Date Level of confidence. See for values.
TimeLOC: Time Level of confidence. See for values.
Mean: For yield, mean is in units of tons per hectare per year (t/ha)
Stat name: is the name of the statistical method used (usually one of SE, SD, MSE,
CI, LSD, HSD, MSD). See for more details.
Statistic: is the value of the statistic associated with Stat name.
N: Always record N if provided. N is the number of experimental
replicates, often referred to as the sample size; N represents the
number of independent units within each treatment: in a field
setting, this is often the number of plots in each treatment, but in
a greenhouse, growth chamber, or pot-study this may be the number of
chambers, pots, or individual plants. Sometimes this value is not
clearly stated.
The date level of confidence (DateLOC, Table \ref{tab:dateloc}) provides an indication of how accurately the date associated with the trait or yield observation is known. It provides the values that should be entered in this field. If the event occurred at a level of precision not defined by an integer in this table, then use fractions. For example, we commonly use 5.5 to indicate a one week level of precision. If the exact year is not known, but the time of year is, then use 91 to 97, with the second digit to indicate the information known within the year.
Table Date level of confidence (DateLOC) field Numbering convention for the DateLOC (Date level of confidence) and TimeLOC (Time level of confidence) field, used in managements, traits, and yields table.
The time level of confidence (TimeLOC) provides an indication of how accurately the time associated with the trait or yield observation is known. It provides the values that should be entered in this field.
Where available, direct estimates of variance are preferred, including Standard Error (SE), sample Standard Deviation (SD), or Mean Squared Error (MSE). SE is usually presented in the format of mean (±SE). MSE is usually presented in a table. When extracting SE or SD from a figure, measure from the mean to the upper or lower bound. This is different than confidence intervals and range statistics (described below), for which the entire range is collected.
If MSE, SD, or SE are not provided, it is possible that LSD, MSD, HSD, or CI will be provided. These are range statistics and the most frequently found range statistics include a Confidence Interval (95%CI), Fisher’s Least Significant Difference (LSD), Tukey’s Honestly Significant Difference (HSD), and Minimum Significant Difference (MSD). Fundamentally, these methods calculate a range that indicates whether two means are different or not, and this range uses different approaches to penalize multiple comparisons. The important point is that these are ranges and that we record the entire range.
Another type of statistic is a “test statistic”; most frequently there will be an F-value that can be useful, but this should not be recorded if MSE is available. Only if there is no other information available should you record the P-value.
The protocol for entering yield data is identical to entering data for a trait, with a few exceptions:
There are no covariates associated with yield data
Yield data is always the dry harvestable biomass; if necessary, moisture content can be added as a trait
Yield is equivalent to aboveground biomass on a per-area basis, and has units of Mg ha^-1 y^-1
Covariates are required for many of the traits. Covariates generally indicate the environmental conditions under which a measurement was made. Without covariate information, the trait data will have limited value.
A complete list of required covariates can be found in Table \ref{tab:covariates}. For all respiration rates and photosynthetic parameters, temperature is recorded as a covariate. Soil moisture, humidity, and other such variables that were measured at the time of the measurement may be required in order to standardize across studies.
When root data is recorded, the root size class needs to be entered as a covariate. The term ’fine root’ often refers to the (<)2mm size class, and in this case, the covariate root_maximum_diameter
would be set to 2. If the size class is a range, then the root_minimum_diameter
can also be used.
Table \ref{tab:covariates}: Traits with required covariates \label{tab:covariates} A list of traits and the covariates that must be recorded along with the trait value in order to be converted to a constant scale from across studies.notes: stomatal conductance (gs
) is only useful when reported in conjunction with other photosynthetic data, such as Amax
. Specifically, if we have Amax
and gs
, then estimation of Vcmax
only covaries with dark_respiration_factor
and atmospheric CO2 concentration.
We also now have information to help constrain stomatal_slope
. If we have Amax
but not gs
, then our estimate of Vcmax
will covary with: dark_respiration_factor
, CO2
, stomatal_slope
, cuticular_conductance
, and vapor-pressure deficit VPD
(which is more difficult to estimate than CO2, but still possible given lat, lon, and date). Most important, there will be a strong covariance between Vcmax
and stomatal_slope
.
Our goal is to record statistics that can be used to estimate standard deviation or standard error (). Many different methods can be used to summarize data, and this is reflected in the diversity of statistics that are reported. An overview of these methods is given in a description below.
Variable
Units
Median (90%CI) or Range
Definition
Vcmax
(\mu) mol CO(_2) m(^{2}) s(^{-1})
(44 (12, 125))
maximum rubisco carboxylation capacity
SLA
m(^2) kg(^{-1})
(15(4,27))
Specific Leaf Area area of leaf per unit mass of leaf
LMA
kg m(^{-2})
(0.09 (0.03, 0.33))
Leaf Mass Area (LMA = SLM = 1/SLA) mass of leaf per unit area of leaf
leafN
%
(2.2(0.8, 17))
leaf percent nitrogen
c2n leaf
leaf C:N ratio
(39(21,79))
use only if leafN not provided
leaf turnover rate
1/year
(0.28(0.03,1.0) )
Jmax
(\mu) mol photons m(^{-2}) s(^{-1})
(121(30, 262))
maximum rate of electron transport
stomatal slope
(9(1, 20))
GS
stomatal conductance (= gs(_{\textrm{max}})
q*
0.2--5
ratio of fine root to leaf biomass
*grasses
ratio of root:leaf = below:above ground biomass
aboveground biomass
g m(^{-2}) or g plant(^{-1})
root biomass
g m(^{-2}) or g plant(^{-1})
*trees
ratio of fine root:leaf biomass
leaf biomass
g m(^{-2}) or g plant(^{-1})
fine root biomass (<2mm)
g m(^{-2}) or g plant(^{-1})
root turnover rate
1/year
0.1--10
rate of fine root loss (temperature dependent) year(^{-1})
leaf width
mm
22(5,102)
growth respiration factor
%
0--1
proportion of daily carbon gain lost to growth respiration
R(_{\textrm{dark}})
(\mu) mol CO(_2) m(^{-2}) s(^{-1})
dark respiration
quantum efficiency
%
0--1
efficiency of light conversion to carbon fixation, see Farqhuar model
dark respiration factor
%
0--1
converts Vm to leaf respiration
seedling mortality
%
0--1
proportion of seedlings that die
r fraction
%
0--1
fraction of storage to seed reproduction
root respiration rate*
CO(_2) kg(^{-1}) fine roots s(^{-1})
1--100
rate of fine root respiration at reference soil temperature
f labile
%
0--1
fraction of litter that goes into the labile carbon pool
water conductance
Dateloc
Definition
9
no data
8
year
7
season
6
month
5
day
95
unknown year, known day
96
unknown year, known month
...etc
Timeloc
Definition
9
no data
4
time of day i.e. morning, afternoon
3
hour
2
minute
1
second
Variable
Required Covariates
Optional Covariates
vcmax
temperature (leafT or airT)
irradiance
any leaf measurement
canopy_height or canopy_layer
root_respiration_rate
temperature (rootT or soilT)
soil moisture
root_diameter_max
root size class (usually 2mm)
any respiration
temperature
root biomass
min. size cutoff, max. size cutoff
root, soil
depth (cm)
used for max and min depths of soil, if only one value, assume min depth = 0; negative values indicate above ground
gs (stomatal conductance)
(A_{max})
see notes in caption
stomatal_slope (m)
humidity, temperature
specific humidity, assume leaf T = air T
SLA
canopy_level