This document describes the structure of the AusTraits compilation.
As an on-going collaborative community resource we would appreciate your contribution on any of the following:
AusTraits is essentially a series of linked components, which cross link against each other::
austraits
├── traits
├── sites
├── methods
├── excluded_data
├── taxonomy
├── definitions
├── contributors
├── sources
└── build_info
These include all the data and contextual information submitted with each contributed dataset. It is essential that users of AusTraits data are confident the data have the meaning they expect it to and were collected using methods they trust. As such, each dataset within Austraits must include descriptions of the study, sites, and methods used as well as the data itself.
The core components are defined as follows.
Description: A table containing measurements of plant traits.
Content:
key | value |
---|---|
dataset_id |
Primary identifier for each study contributed into AusTraits; most often
these are scientific papers, books, or online resources. By default
should be name of first author and year of publication,
e.g. Falster_2005 .
|
taxon_name | Currently accepted name of taxon in the Australian Plant Census or, for unplaced species, in the Australian Plant Names Index. |
site_name |
Name of site where individual was sampled. Cross-references between
similar columns in sites and traits .
|
context_name |
Name of contextual senario where individual was sampled.
Cross-references between similar columns in contexts and
traits .
|
observation_id |
A unique identifier for the observation, useful for joining traits
coming from the same observation_id . These are assigned
automatically, based on the dataset_id and row number of
the raw data.
|
trait_name |
Name of trait sampled. Allowable values specified in the table
traits .
|
value | Measured value. |
unit | Units of the sampled trait value after aligning with AusTraits standards. |
date |
Date sample was taken, in the format yyyy-mm-dd , but with
days and months only when specified.
|
value_type | A categorical variable describing the type of trait value recorded. |
replicates |
Number of replicate measurements that comprise the data points for the
trait for each measurement. A numeric value (or range) is ideal and
appropriate if the value type is a mean ,
median , min or max . For these
value types, if replication is unknown the entry should be
unknown . If the value type is raw_value the
replicate value should be 1. If the value type is
expert_mean , expert_min , or
expert_max the replicate value should be .na .
|
original_name | Name given to taxon in the original data supplied by the authors |
Description: A table containing observations of site
characteristics associated with information in traits
.
Cross referencing between the two dataframes is possible using
combinations of the variables dataset_id
,
site_name
.
Content:
key | value |
---|---|
dataset_id |
Primary identifier for each study contributed into AusTraits; most often
these are scientific papers, books, or online resources. By default
should be name of first author and year of publication,
e.g. Falster_2005 .
|
site_name |
Name of site where individual was sampled. Cross-references between
similar columns in sites and traits .
|
site_property |
The site characteristic being recorded. Name should include units of
measurement, e.g. longitude (deg) . Ideally we have at least
these variables for each site - longitude (deg) ,
latitude (deg) , description .
|
value | Measured value. |
Description: A table containing observations of
contextual characteristics associated with information in
traits
. Cross referencing between the two dataframes is
possible using combinations of the variables dataset_id
,
context_name
.
Content:
key | value |
---|---|
dataset_id |
Primary identifier for each study contributed into AusTraits; most often
these are scientific papers, books, or online resources. By default
should be name of first author and year of publication,
e.g. Falster_2005 .
|
context_name |
Name of contextual senario where individual was sampled.
Cross-references between similar columns in contexts and
traits .
|
context_property |
The contextual characteristic being recorded. Name should include units
of measurement, e.g. elevation (m) .
|
value | Measured value. |
Description: A table containing details on methods with which data were collected, including time frame and source.
Content:
key | value |
---|---|
dataset_id |
Primary identifier for each study contributed into AusTraits; most often
these are scientific papers, books, or online resources. By default
should be name of first author and year of publication,
e.g. Falster_2005 .
|
trait_name |
Name of trait sampled. Allowable values specified in the table
traits .
|
methods | A textual description of the methods used to collect the trait data. Whenever available, methods are taken near-verbatim from referenced source. Methods can include descriptions such as ‘measured on botanical collections’,‘data from the literature’, or a detailed description of the field or lab methods used to collect the data. |
year_collected_start | The year data collection commenced. |
year_collected_end | The year data collection was completed. |
description | A 1-2 sentence description of the purpose of the study. |
collection_type |
A field to indicate where the majority of plants on which traits were
measured were collected - in the field , lab ,
glasshouse , botanical collection , or
literature . The latter should only be used when the data
were sourced from the literature and the collection type is unknown.
|
sample_age_class |
A field to indicate if the study was completed on adult or
juvenile plants.
|
sampling_strategy | A written description of how study sites were selected and how study individuals were selected. When available, this information is lifted verbatim from a published manuscript. For botanical collections, this field ideally indicates which records were ‘sampled’ to measure a specific trait. |
source_primary_citation | Citation for primary source. This detail is generated from the primary source in the metadata. |
source_primary_key |
Citation key for primary source in sources . The key is
typically of format Surname_year .
|
source_secondary_citation | Citations for secondary source. This detail is generated from the secondary source in the metadata. |
source_secondary_key |
Citation key for secondary source in sources . The key is
typically of format Surname_year .
|
Description: A table of data that did not pass quality test and so were excluded from the master dataset.
Description: A table of all taxonomic changes implemented in the construction of AusTraits. Changes are determined by comapring against the APC (Australian Plant Census) and APNI (Australian Plant Names Index).
Content:
key | value |
---|---|
dataset_id |
Primary identifier for each study contributed into AusTraits; most often
these are scientific papers, books, or online resources. By default
should be name of first author and year of publication,
e.g. Falster_2005 .
|
original_name | Name given to taxon in the original data supplied by the authors |
aligned_name |
Name of the taxon after implementing any changes encoded for this taxon
in the metadata file in the specified correpsonding
dataset_id .
|
taxonIDClean |
Where it could be indentified, the taxonID of the
aligned_name for this taxon in the APC.
|
taxonomicStatusClean |
Taxonomic status of the taxon identified by taxonIDClean in
the APC.
|
alternativeTaxonomicStatusClean |
The status of alternative records with the name
aligned_name in the APC.
|
acceptedNameUsageID | ID of the accepted name for taxon in the APC or APNI. |
taxon_name | Currently accepted name of taxon in the Australian Plant Census or, for unplaced species, in the Australian Plant Names Index. |
Description: A table containing details on taxa
associated with information in traits
. This information has
been sourced from the APC (Australian Plant Census) and APNI (Australian
Plant Names Index) and is released under a CC-BY3 license.
Content:
key | value |
---|---|
taxon_name | Currently accepted name of taxon in the Australian Plant Census or, for unplaced species, in the Australian Plant Names Index. |
source | Source of taxnonomic information, either APC or APNI. |
acceptedNameUsageID | Identifier for the accepted name of the taxon. |
scientificNameAuthorship | Authority for accepted of the taxon indicated under taxon_name. |
taxonRank | Rank of the taxon. |
taxonomicStatus | Taxonomic status of the taxon. |
family | Family of the taxon. |
genus | Genus of the taxon. |
taxonDistribution | Known distribution of the taxon. |
ccAttributionIRI | Source of taxonomic information. |
Description: A copy of the definitions for all tables and terms. Information included here was used to process data and generate any documentation for the study.
Description: A table of people contributing to each study.
Content:
key | value |
---|---|
dataset_id |
Primary identifier for each study contributed into AusTraits; most often
these are scientific papers, books, or online resources. By default
should be name of first author and year of publication,
e.g. Falster_2005 .
|
name | Name of contributor |
institution | Last known institution or affiliation |
role | Their role in the study |
The core organising unit behind AusTraits is the
dataset_id
. Records are organised as coming from a
particular study, defined by the dataset_id
. Our preferred
format for dataset_id
is the surname of the first author of
any corresponding publication, followed by the year, as
surname_year
. E.g. Falster_2005
. Wherever
there are multiple studies with the same id, we add a suffix
_2
, _3
etc. E.g.Falster_2005
,
Falster_2005_2
.
As well as a dataset_id
, each trait measurement has an
associated observation_id
. Observation IDs bind together
related measurements within any dataset, and thereby allow
transformation between long (e.g. with variables trait_name
and value
) and wide (e.g. with traits as columns)
formats.
Generally, observation_id
has the format
dataset_id_XX
where XX
is a unique number
within each dataset. For example, if multiple traits were collected on
the same individual, the observation_id
allows us to gather
these together. For floras reporting species averages, the
observation_id
is assigned at the species level.
For datasets that arrive in wide format we assume each row has a
unique observation_id
. For datasets that arrive in long
format, the observation_id
is assigned based on a specified
grouping variable. This variable can be specified in the
metadata.yml
file under the section
variable_match
. If missing, observation_id
is
assigned based on species_name
.
As well as dataset_id
and observation_id
,
where appropriate, trait values are associated with a
site_name
. Unique combinations of dataset_id
and site_name
can be used to cross-match against the sites
table, which provide further details on the site sampled.
As well as dataset_id
, observation_id
, and
site_name
, where appropriate, trait values are associated
with a context_name
. Unique combinations of
dataset_id
and context_name
can be used to
cross-match against the context table, which provide further details on
the context sampled.
Each record in the table of trait data has an associated
value
and value_type
.
Traits are either numeric
or categorical
.
For traits with numerical values, the recorded value has been converted
into standardised units and we have checked that the value can be
converted into a number and lies within the allowable range. For
categorical variables, we only include records that are defined in the
definitions. Moreover, we use a format whereby:
_
for multi-word terms,
e.g. semi_deciduous
annual biennial
for something which is
either annual or biennialEach trait measurement also has an associated
value_type
, which gives
A categorical variable describing the type of trait value recorded.
.
Possible values are:
key | value |
---|---|
raw_value | Value is a direct measurement |
site_min | Value is the minimum of measurements on multiple individuals of the taxon at a single site |
site_mean | Value is the mean or median of measurements on multiple individuals of the taxon at a single site |
site_max | Value is the maximum of measurements on multiple individuals of the taxon at a single site |
multisite_min | Value is the minimum of measurements on multiple individuals of the taxon across multiple sites |
multisite_mean | Value is the mean or median of measurements on multiple individuals of the taxon across multiple sites |
multisite_max | Value is the maximum of measurements on multiple individuals of the taxon across multiple sites |
expert_min | Value is the minimum observed for a taxon across its range or in this particular dataset, as estimated by an expert based on their knowledge of the taxon. Data fitting this category include estimates from flora that represent a taxon’s entire range, and values for categorical variables obtained from a reference book, or identified by an expert. |
expert_mean | Value is the mean observed for a taxon across its range or in this particular dataset, as estimated by an expert based on their knowledge of the taxon. Data fitting this category include estimates from flora that represent a taxon’s entire range, and values for categorical variables obtained from a reference book, or identified by an expert. |
expert_max | Value is the maximum observed for a taxon across its range or in this particular dataset, as estimated by an expert based on their knowledge of the taxon. Data fitting this category include estimates from flora that represent a taxon’s entire range, and values for categorical variables obtained from a reference book, or identified by an expert. |
experiment_min | Value is the minimum of measurements from an experimental study either in the field or a glasshouse |
experiment_mean | Value is the mean or median of measurements from an experimental study either in the field or a glasshouse |
experiment_max | Value is the maximum of measurements from an experimental study either in the field or a glasshouse |
individual_mean | Value is a mean of replicate measurements on an individual (usually for experimental ecophysiology studies) |
individual_max | Value is a maximum of replicate measurements on an individual (usually for experimental ecophysiology studies) |
literature_source | Value is a site or multi-site mean that has been sourced from an unknown literature source |
unknown | Value type is not currently known |
AusTraits does not include intra-individual observations. When
multiple measurements per individual are submitted to AusTraits, we take
the mean of the values and record the value_type as
individual_mean
.
The latest version of AusTraits contains records for over 28640
different taxa. We have attempted to align species names with known
taxonomic units in the Australian Plant Census
(APC) and/or the Australian Plant Names Index
(APNI).
The table taxa
lists all taxa in the database, including
additional information about the taxa (see Table above).
The traits
table reports both the original and the
updated taxon name alongside each trait record.
The table taxanomic_updates
provides details on all
taxonomic name changes implemented when aligning with APC and APNI.
For each dataset in the compilation there is the option to list primary and secondary citations. The primary citation is The original study in which data were collected. while the secondary citation is A subsequent study where data were compiled or re-analysed and then made available.. These references are included in two places: