vignettes/austraits_database_structure.Rmd
austraits_database_structure.Rmd
This document describes the structure of the AusTraits compilation, corresponding to Version 3.0.2 of the dataset.
Note that the information provided below is based on the information provided within the file definitions.yml
.
AusTraits is essentially a series of linked components, which cross link against each other::
austraits
├── traits
├── sites
├── contexts
├── methods
├── excluded_data
├── taxa
├── taxonomic_updates
├── definitions
├── contributors
├── sources
└── build_info
These include all the data and contextual information submitted with each contributed dataset.
The core components are defined as follows.
Description: A table containing measurements of plant traits.
Content:
key | value |
---|---|
dataset_id |
Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005 .
|
taxon_name |
Whenever possible, this field indicates the currently accepted scientific name of a taxon, per the Australian Plant Census (APC). Alternatively, taxon_name can indicate an alignment with a name included in the comprehensive Australian Plant Names Index (APNI) but not currently marked as accepted in APC; or a name that cannot be aligned to available lists of Australian plant names.
|
site_name |
Name of site where individual was sampled. Cross-references between similar columns in sites and traits .
|
context_name |
Name of contextual senario where individual was sampled. Cross-references between similar columns in contexts and traits .
|
observation_id |
A unique identifier for the observation, useful for joining traits coming from the same observation_id . For wide datasets, these are assigned automatically, based on the dataset_id and row number of the raw data.
|
trait_name |
Name of trait sampled. Allowable values specified in the table definitions .
|
value | Measured value. |
unit | Units of the sampled trait value after aligning with AusTraits standards. |
date |
Date sample was taken, in the format yyyy-mm-dd , yyyy-mm or yyyy depending on resoluton specified.
|
value_type | A categorical variable describing the type of trait value recorded. |
replicates |
Number of replicate measurements that comprise the data points for the trait for each measurement. A numeric value (or range) is ideal and appropriate if the value type is a mean , median , min or max . For these value types, if replication is unknown the entry should be unknown . If the value type is raw_value the replicate value should be 1. If the value type is expert_mean , expert_min , or expert_max the replicate value should be .na .
|
original_name | Name given to taxon in the original data supplied by the authors |
Details:
Each trait measurement has an associated observation_id
. Observation IDs bind together related measurements within any dataset, and thereby allow transformation between long format (e.g. with variables trait_name
and value
) and wide format (e.g. with traits as columns). When multiple traits were collected on the same individual, the observation_id
identified which measurements were made on the same individual (or population, species depending on the level of detail supplied). Generally, observation_id
has the format dataset_id_XX
where XX
is a unique number within each dataset.
For datasets submitted to AusTraits in long format, a column specifying observation_id
should already exist, assigned based on a specified grouping variable. If missing, observation_id
is assigned based on species_name
.
For datasets submitted to AusTraits in wide format each row is assigned a unique observation_id
during processing, allowing conversion to long format.
Each record in the table of trait data has an associated value
and value_type
.
Traits are either numeric
or categorical
. For traits with numerical values, the recorded value has been converted into standardised units and we have checked that the value can be converted into a number and lies within the allowable range. For categorical variables, we only include records whose values are included as allowable values (terms) in a trait’s definition in the definitions table. Moreover, we use a format whereby
_
for multi-word terms, e.g. semi_deciduous
annual biennial
for a plant species which is either annual or biennialEach trait measurement also has an associated value_type
, which is A categorical variable describing the type of trait value recorded.
Possible values are:
key | value |
---|---|
raw_value | Value is a direct measurement. |
site_min | Value is the minimum of measurements on multiple individuals of the taxon at a single site. |
site_mean | Value is the mean or median of measurements on multiple individuals of the taxon at a single site. |
site_max | Value is the maximum of measurements on multiple individuals of the taxon at a single site. |
multisite_min | Value is the minimum of measurements on multiple individuals of the taxon across multiple sites. |
multisite_mean | Value is the mean or median of measurements on multiple individuals of the taxon across multiple sites. |
multisite_max | Value is the maximum of measurements on multiple individuals of the taxon across multiple sites. |
expert_min | Value is the minimum observed for a taxon across its range or in this particular dataset, as estimated by an expert based on their knowledge of the taxon. Data fitting this category include estimates from floras that represent a taxon’s entire range. |
expert_mean | Value is the mean observed for a taxon across its range or in this particular dataset, as estimated by an expert based on their knowledge of the taxon. Data fitting this category include estimates from floras that represent a taxon’s entire range, and values for categorical variables obtained from a reference book, or identified by an expert. |
expert_max | Value is the maximum observed for a taxon across its range or in this particular dataset, as estimated by an expert based on their knowledge of the taxon. Data fitting this category include estimates from floras that represent a taxon’s entire range. |
experiment_min | Value is the minimum of measurements on multiple individuals of the taxon from an experimental study (either in the field or a glasshouse). |
experiment_mean | Value is the mean or median of measurements on multiple individuals of the taxon from an experimental study (either in the field or a glasshouse). |
experiment_max | Value is the maximum of measurements on multiple individuals of the taxon from an experimental study (either in the field or a glasshouse). |
individual_mean | Value is a mean of replicate measurements on an individual. |
individual_max | Value is a maximum of replicate measurements on an individual. |
literature_source | Value is a site_mean, multisite_mean, or expert_mean that has been sourced from an unknown literature source. |
unknown | Value type is not currently known. |
AusTraits does not include intra-individual observations. When multiple measurements per individual are submitted to AusTraits, we take the mean of the values and record the value_type as individual_mean
.
Description: A table containing observations of site characteristics associated with information in traits
. Cross referencing between the two dataframes is possible using combinations of the variables dataset_id
, site_name
.
Content:
key | value |
---|---|
dataset_id |
Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005 .
|
site_name |
Name of site where individual was sampled. Cross-references between similar columns in sites and traits .
|
site_property |
The site characteristic being recorded. Name should include units of measurement, e.g. MAT (C) . Ideally we have at least these variables for each site - longitude (deg) , latitude (deg) , description .
|
value | Measured value. |
Description: A table containing observations of contextual characteristics associated with information in traits
. Cross referencing between the two dataframes is possible using combinations of the variables dataset_id
, context_name
.
Content:
key | value |
---|---|
dataset_id |
Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005 .
|
context_name |
Name of contextual senario where individual was sampled. Cross-references between similar columns in contexts and traits .
|
context_property |
The contextual characteristic being recorded. Name should include units of measurement, e.g. CO2 concentration (ppm) .
|
value | Measured value. |
Description: A table containing details on methods with which data were collected, including time frame and source. Cross referencing with the traits
table is possible using combinations of the variables dataset_id
, trait_name
.
Content:
key | value |
---|---|
dataset_id |
Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005 .
|
trait_name |
Name of trait sampled. Allowable values specified in the table definitions .
|
methods | A textual description of the methods used to collect the trait data. Whenever available, methods are taken near-verbatim from referenced source. Methods can include descriptions such as ‘measured on botanical collections’, ‘data from the literature’, or a detailed description of the field or lab methods used to collect the data. |
year_collected_start | The year data collection commenced. |
year_collected_end | The year data collection was completed. |
description | A 1-2 sentence description of the purpose of the study. |
collection_type |
A field to indicate where the majority of plants on which traits were measured were collected - in the field , lab , glasshouse , botanical collection , or literature . The latter should only be used when the data were sourced from the literature and the collection type is unknown.
|
sample_age_class |
A field to indicate if the study was completed on adult or juvenile plants.
|
sampling_strategy | A written description of how study sites were selected and how study individuals were selected. When available, this information is lifted verbatim from a published manuscript. For botanical collections, this field ideally indicates which records were ‘sampled’ to measure a specific trait. |
source_primary_key |
Citation key for primary source in sources . The key is typically of format Surname_year .
|
source_primary_citation | Citation for primary source. This detail is generated from the primary source in the metadata. |
source_secondary_key |
Citation key for secondary source in sources . The key is typically of format Surname_year .
|
source_secondary_citation | Citations for secondary source. This detail is generated from the secondary source in the metadata. |
Description: A table of data that did not pass quality test and so were excluded from the master dataset. Structure is identical to that presented in the traits
table, only with an extra column called error
indicating why the record was excluded. Common reasons are missing_unit_conversions, missing_value, and unsupported_trait_value.
Content:
key | value |
---|---|
error | Indicating why the record was excluded. Common reasons are missing_unit_conversions, missing_value, and unsupported_trait_value. |
dataset_id |
Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005 .
|
taxon_name |
Whenever possible, this field indicates the currently accepted scientific name of a taxon, per the Australian Plant Census (APC). Alternatively, taxon_name can indicate an alignment with a name included in the comprehensive Australian Plant Names Index (APNI) but not currently marked as accepted in APC; or a name that cannot be aligned to available lists of Australian plant names.
|
site_name |
Name of site where individual was sampled. Cross-references between similar columns in sites and traits .
|
context_name |
Name of contextual senario where individual was sampled. Cross-references between similar columns in contexts and traits .
|
observation_id |
A unique identifier for the observation, useful for joining traits coming from the same observation_id . For wide datasets, these are assigned automatically, based on the dataset_id and row number of the raw data.
|
trait_name |
Name of trait sampled. Allowable values specified in the table definitions .
|
value | Measured value. |
unit | Units of the sampled trait value after aligning with AusTraits standards. |
date |
Date sample was taken, in the format yyyy-mm-dd , yyyy-mm or yyyy depending on resoluton specified.
|
value_type | A categorical variable describing the type of trait value recorded. |
replicates |
Number of replicate measurements that comprise the data points for the trait for each measurement. A numeric value (or range) is ideal and appropriate if the value type is a mean , median , min or max . For these value types, if replication is unknown the entry should be unknown . If the value type is raw_value the replicate value should be 1. If the value type is expert_mean , expert_min , or expert_max the replicate value should be .na .
|
original_name | Name given to taxon in the original data supplied by the authors |
Description: A table containing details on taxa that are included in the table traits
. We have attempted to align species names with known taxonomic units in the Australian Plant Census
(APC) and/or the Australian Plant Names Index
(APNI); the sourced information is released under a CC-BY3 license.
Version 3.0.2 of AusTraits contains records for 28640 different taxa.
Content:
key | value |
---|---|
taxon_name |
Whenever possible, this field indicates the currently accepted scientific name of a taxon, per the Australian Plant Census (APC). Alternatively, taxon_name can indicate an alignment with a name included in the comprehensive Australian Plant Names Index (APNI) but not currently marked as accepted in APC; or a name that cannot be aligned to available lists of Australian plant names.
|
source | Source of taxonomic information, either APC or APNI. |
acceptedNameUsageID | The identifier of the accepted concept in APC for this taxon name. |
scientificNameAuthorship |
Authority for the taxon indicated under taxon_name ; applicable for most taxa in APC.
|
taxonRank | Rank of the taxon. |
taxonomicStatus |
Taxonomic status of the taxon. accepted indicates the taxon name is designated as current in APC, a taxonomy endorsed by the Council of Heads of Australasian Herbaria. Alternate statuses are: unplaced indicating the taxon name is included in APNI but has yet to be reviewed for inclusion/exclusion as a current name (or synonym) for APC; genus_known indicating the taxon name can only be aligned to the genus level; family_known indicating the taxon name can only be aligned to the family level; and unknown indicating a taxon name submitted to AusTraits that cannot be aligned at any level.
|
family | Family of the taxon. |
taxonDistribution | Known distribution of the taxon, by Australian state. |
ccAttributionIRI | Source of taxonomic information (for taxa designated as current for APC) or name information (for taxa included in APNI, but unplaced for APC). |
genus | Genus of the taxon. |
Description: A table of all taxonomic changes implemented in the construction of AusTraits. Changes are determined by comparing the originally submitted taxon name against the APC (Australian Plant Census) and APNI (Australian Plant Names Index). Cross referencing with the traits
table is possible using combinations of the variables dataset_id
, taxon_name
.
Content:
key | value |
---|---|
dataset_id |
Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005 .
|
original_name | Name given to taxon in the original data supplied by the authors |
cleaned_name |
Name of the taxon after implementing automated syntax standardisation and spelling changes as well as manually encoded syntax alignments for this taxon in the metadata file for the corresponding dataset_id .
|
taxonIDClean |
The APC identifier for the cleaned_name of the taxon (for taxa designated as current for APC) or APNI identifier (for taxa included in APNI, but unplaced for APC).
|
taxonomicStatusClean |
The APC taxonomic status for the cleaned_name of the taxon identified by taxonIDClean .
|
alternativeTaxonomicStatusClean |
The APC taxonomic status of alternative APC records with the name cleaned_name .
|
acceptedNameUsageID |
The APC identifier for the accepted name (taxon_name ) for a taxon; different from taxonIDClean if taxonomicStatusClean is not accepted .
|
taxon_name |
Whenever possible, this field indicates the currently accepted scientific name of a taxon, per the Australian Plant Census (APC). Alternatively, taxon_name can indicate an alignment with a name included in the comprehensive Australian Plant Names Index (APNI) but not currently marked as accepted in APC; or a name that cannot be aligned to available lists of Australian plant names.
|
Both the original and the updated taxon names are included in the traits
table.
Description: A copy of the definitions for all tables and terms. Information included here was used to process data and generate any documentation for the study.
Details on trait definitions: The allowable trait names and trait values are defined in the definitions file. Each trait is labelled as either numeric
or categorical
. An example of each type is as follows. For the full list, see the Trait definitions vignette.
specific_leaf_area
woodiness
Description: A table of people contributing to each study.
Content:
key | value |
---|---|
dataset_id |
Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005 .
|
name | Name of contributor |
institution | Last known institution or affiliation |
role | Their role in the study |
For each dataset in the compilation there is the option to list primary and secondary citations. The primary citation is defined as, The original study in which data were collected.
The secondary citation is defined as, A subsequent study where data were compiled or re-analysed.
The element sources
includes bibtex versions of all sources which can be imported into your reference library:
WriteBib(austraits$sources) #write all sources to file
WriteBib(austraits$sources["Falster_2005_1"]) #write a single reference to a file
Or individually viewed:
austraits$sources["Falster_2005_1"]
A formatted version of the sources also exists within the table methods.