library(traits.build)
source("R/custom_R_code.R")
19 Tutorial 3: Adding contexts and complex units
19.1 Overview
This is the third of five tutorials on adding datasets to your traits.build
database.
Before you begin this tutorial, ensure you have installed traits.build, cloned the traits.build-template repository, and have successfully build a database from the datasets in traits.build-template
. Instructions are available at Tutorial: Example compilation.
Goals
Learn how to add contexts.
Learn some complexities with respect to units.
Learn additional custom_R_code tricks.
New functions introduced
- metadata_add_contexts
19.2 Adding tutorial_dataset_3
Ensure the dataset folder contains the correct data files
In the traits.build-template repository, there is a folder titled tutorial_dataset_3
within the data folder.
Ensure that this folder exists on your computer.
The file
data.csv
exists within thetutorial_dataset_3
folder.There is a folder
raw
nested within thetutorial_dataset_3
folder, that contains one file,notes.txt
.
source necessary functions
- If you have restarted R Studio since last adding a dataset, ensure all functions are loaded from both the
traits.build
package and the custom functions file:
Use functions to create a metadata.yml file
Create a metadata template
To create the metadata template, run:
metadata_create_template("tutorial_dataset_3")
As with in the previous tutorials, this function leads you through a series of menus requiring user input. Ensure you select:
data format: wide
taxon_name column: 1: Species
location_name column: 5: site
individual_id column: 1: NA
collection_date column: 1: NA
Enter collection_date range in format ‘2007/2009’: 2011-02/2011-03
Do all traits need repeat_measurements_id
’s? 2: No
In this dataset, unlike the first two, the data being input is at the individual-level. Since there is only a single data row for each individual, it is not required to map in an individual_id. A column with an individual_id
is required if you want to keep track of multiple rows of data for the same individual.
Navigate to the dataset’s folder and open the metadata.yml file in Visual Studio Code, to ensure information is added to the expected sections as you work through the tutorial.
Propagate source information into the metadata.yml file
This dataset is from a published source and therefore the source information can be added with the function metadata_add_source_doi
:
metadata_add_source_doi(dataset_id = "tutorial_dataset_3",
doi = "10.1007/s11104-013-1725-x")
confirm:
- the authors’ names are formatted as
first name last name
orfirst initial last name
- the article title is in sentence case
- the page numbers are filled in as a range, separated by a double dash
You have just added 3 doi’s that all yield perfect reference information - and indeed most references are added correctly, but some journals and doi’s for many older references are in ALL CAPS or missing page numbers, so it is worth checking.
Add location details
All data for this dataset was collected at a single location, specified in the data.csv
file as The University of Melbourne Burnley campus
. No additional details are provided. For such studies, it is best to look up the campus location and input approximate latitude/longitude coordinates.
As well as adding locations and location properties from a table, the function metadata_add_locations
lets you add a basic location data scaffold in metadata.yml.
For instance, for this study:
- you add the location names from the data.csv file
- the function automatically adds blank fields for latitude, longitude, and description
- values for these fields must then be filled in manually
<- read_csv("data/tutorial_dataset_3/data.csv")
data
metadata_add_locations("tutorial_dataset_3", data)
You select the location name, but not any location properties, as none are provided in the data.csv file or another tabular format.
location_name: 4: site
location_property columns: just press enter
This creates the following scaffold in methdata.yml
:
:
The University of Melbourne Burnley campuslatitude (deg): na_character
longitude (deg): na_character
: na_character description
metadata_add_locations
automatically selects the unique values in the location name column.- if no columns with location properties are specified, the function just adds the three core location properties.
- the values for these location properties are available in the notes file.
Add traits
To select columns in the data.csv
file that include trait data, run:
metadata_add_traits(dataset_id = "tutorial_dataset_3")
Select columns 5 6 7 8 9 10, as these contain trait data.
Add contexts
A context is any piece of ancillary information that helps explain why a certain trait value was measured.
In traits.build, some contexts are mapped in as part of the default metadata structure, including the location (& location properties), a general sense of organism age (life_stage
), basis_of_record
, and the general methods for each trait.
However most contexts are pieces of information that are essential to record for a specific dataset, but not recorded for most other datasets. The context field therefore allows any context property to be added manually.
Context properties are divided into 5 categories:
method contexts: Context properties that capture differences in method between measurements of the same trait. For plants, canopy position and leaf age are two common method contexts.
temporal contexts: Context properties that capture explicit time-related differences between groups of measurements. This is separate from
collection_date
, as an explicit meaning should accompany each temporal context property and the distinct values may span a range of collection dates. For plantssampling season
(dry versus wet) is a commonly mapped in temporal context.
entity contexts: This context property category pertains to individual-level measurements, and documents features of the individual that explicitly distinguish it from other individuals that are measured. In addition to features like the sex of an individual, it is the location to document individual-level co-variates that are not themselves traits, but are information required to interpret other trait values.
treatment contexts: Any experimental treatment that has been applied to groups of individuals.
plot contexts: Any variation within a documented location, where different individuals experience know differences in growing/living conditions or growing/living history. For plants, this context category is frequently used to map in slope position or fire history.
Context properties are most frequently included in the data.csv
file as columns of values. Occasionally, separate columns of trait values might represent measurements with different context property values, a topic for a later tutorial.
Context properties that are columns in the data file, can be added with the function metadata_add_contexts
:
metadata_add_contexts("tutorial_dataset_3")
This leads to a user-prompt to select the relevent columns:
Indicate all columns that contain additional contextual data for tutorial_dataset_3 (by number separated by space; e.g. ‘1 2 4’):
1: Species
2: Treatment
3: Replicate
4: site
5: life_form
6: WP leaf (Mpa) predawn
7: WP leaf (Mpa) midday
8: LMA kg/m2
9: Stomatal density Upper surface
10: Stomatal density Lower surface
Select column 2 which is the only column with a context property:
Selection: 2
Additional user prompts ask for details about the context property category and values:
What category does context Treatment fit in? (by number separated by space; e.g. ‘1 2 4’):
1: treatment_context
2: plot_context
3: temporal_context
4: method_context
5: entity_context
This is a treatment context, so select 1:
Selection: 1
The following values exist for this context: Drought, Watered.
Are replacement values required? (y/n) y
Although the trait values Drought
and Watered
are probably sufficiently descriptive, for other drought-treatment studies we’ve used drought
and well-watered
, so prefer to align the context property values with these.
Are descriptions required? (y/n) y
The free-form description field let’s you add details about the exact meaning of drought
vs well-watered
for this study.
In the metadata.yml
file, there will now be a scaffold for the contexts:
:
contexts- context_property: unknown
: treatment
category: Treatment
var_in:
values- find: Drought
: unknown
value: unknown
description- find: Watered
: unknown
value: unknown description
In addition to filling in the preferred context property values and descriptions, you must also assign a name to the context_property
. This is a free-form field, but as with location_property
it is best to ensure you align context_propery
names throughout the database. In the AusTraits plant trait database, this context_property
is always called drought treatment
.
The finished context section will be:
:
contexts- context_property: drought treatment
: treatment
category: Treatment
var_in:
values- find: Drought
: drought
value: The plants were watered with 20% of the water used by well-watered
descriptionplants (determined gravimetrically) in the 3-4 days preceding each watering
).
event- find: Watered
: well-watered
value: The plants were watered to pot capacity at (2 L per pot). description
Manual filling in of metadata
The components of this dataset that can be propagated with functions are not complete, and the remaining unknown
fields must now be filled in manually.
the
contributors
section
description
,basis_of_record
,life_stage
,sampling_strategy
,original_file
, andnotes
under thedataset
section
details for each trait, including
unit_in
,trait_name
,entity_type
,value_type
,basis_of_record
,replicates
andmethods
Adding contributors
The file data/tutorial_dataset_3/raw/tutorial_dataset_3_notes.txt
indicates the main data_contributor for this study.
Dataset fields
The file data/tutorial_dataset_3/raw/tutorial_dataset_3_notes.txt
indicates how to fill in the unknown
dataset fields for this study.
Trait details
The file data/tutorial_dataset_3/raw/tutorial_dataset_3_notes.txt
indicates how to fill in the unknown
trait fields for this study, but see below as well.
Remember, the trait_name
must match a trait concept within the traits dictionary. For this example:
column in dataset | trait concept | units_in | entity_type | value_type | basis_of_ value | replicates |
---|---|---|---|---|---|---|
life_form | life_form | .na | species | mode | expert_score | .na |
WP leaf (Mpa) predawn | water_potential_predawn | neg_MPa | individual | raw | measurement | 1 |
WP leaf (Mpa) midday | water_potential_midday | neg_MPa | individual | raw | measurement | 1 |
LMA kg/m2 | leaf_mass_per_area | kg/m2 | individual | raw | measurement | 1 |
Stomatal density Upper surface | leaf_stomatal_density_adaxial | ‘{count}/mm2’ | individual | raw | measurement | 1 |
Stomatal density Lower surface | leaf_stomatal_density_abaxial | ‘{count}/mm2’ | individual | raw | measurement | 1 |
With the units, note:
In the data.csv file, all water potential values are positive, indicating the data contributor mapped in the “negative” of the true water potential values (which are always below zero). A negative sign at the beginning of the units field is not recognised and therefore the convention is to use the prefix
neg_
to indicate the values input are the negative of the true values.stomatal density is a “count density”, a number of stomata per unit area. The actual UCUM standard for this is simply
1/mm2
, but for clarity we use{count}/mm2
. The wordcount
is in curly brackets, since it is a “note” rather than a true unit.If the unit begin with a curly bracket, the unit needs to be placed in single quotes
Testing, error fixes, and report building
At this point, run the dataset tests and rebuild the dataset:
dataset_test("tutorial_dataset_3")
build_setup_pipeline(method = "base", database_name = "traits.build_database")
source("build.R")
The dataset test should yield an error that one water_potential_predawn
value does not convert to numeric, indicating a placeholder-character is being used in place of an NA {note: this error wasn’t triggering as this vignette was being written)
Looking at the excluded_data table indicates there is a “*” in one column, so one adds:
: '
custom_R_code data %>%
mutate(
across(c("WP leaf (Mpa) predawn"), ~na_if(.x,"*"))
)
'
However, now you’ll get the error: Caused by error in na_if(): ! Can’t convert y
This indicates a mismatch between column types, necessitating that you change the column to character:
: '
custom_R_code data %>%
mutate(
across(c("WP leaf (Mpa) predawn"), ~as.character(.x)),
across(c("WP leaf (Mpa) predawn"), ~na_if(.x,"*"))
)
'
At this point, rerunning the tests and rebuilding the database should not generate any errors or excluded values, so you can build and review the report.
As a final step, build a report for the study
$build_info$version <- "5.0.0"
traits.build_database# a fix because the function was built around specific AusTraits versions
dataset_report("tutorial_dataset_3", traits.build_database, overwrite = TRUE)