This document describes the structure of the AusTraits compilation, corresponding to Version 3.0.2 of the dataset.

Note that the information provided below is based on the information provided within the file definitions.yml.

AusTraits is essentially a series of linked components, which cross link against each other::

austraits
├── traits
├── sites
├── contexts
├── methods
├── excluded_data
├── taxa
├── definitions
├── contributors
├── sources
└── build_info

These include all the data and contextual information submitted with each contributed dataset.

# Components

The core components are defined as follows.

## traits

Description: A table containing measurements of plant traits.

Content:

key value
dataset_id Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005.
taxon_name Whenever possible, this field indicates the currently accepted scientific name of a taxon, per the Australian Plant Census (APC). Alternatively, taxon_name can indicate an alignment with a name included in the comprehensive Australian Plant Names Index (APNI) but not currently marked as accepted in APC; or a name that cannot be aligned to available lists of Australian plant names.
site_name Name of site where individual was sampled. Cross-references between similar columns in sites and traits.
context_name Name of contextual senario where individual was sampled. Cross-references between similar columns in contexts and traits.
observation_id A unique identifier for the observation, useful for joining traits coming from the same observation_id. For wide datasets, these are assigned automatically, based on the dataset_id and row number of the raw data.
trait_name Name of trait sampled. Allowable values specified in the table definitions.
value Measured value.
unit Units of the sampled trait value after aligning with AusTraits standards.
date Date sample was taken, in the format yyyy-mm-dd, yyyy-mm or yyyy depending on resoluton specified.
value_type A categorical variable describing the type of trait value recorded.
replicates Number of replicate measurements that comprise the data points for the trait for each measurement. A numeric value (or range) is ideal and appropriate if the value type is a mean, median, min or max. For these value types, if replication is unknown the entry should be unknown. If the value type is raw_value the replicate value should be 1. If the value type is expert_mean, expert_min, or expert_max the replicate value should be .na.
original_name Name given to taxon in the original data supplied by the authors

Details:

#### Observation IDs

Each trait measurement has an associated observation_id. Observation IDs bind together related measurements within any dataset, and thereby allow transformation between long format (e.g. with variables trait_name and value) and wide format (e.g. with traits as columns). When multiple traits were collected on the same individual, the observation_id identified which measurements were made on the same individual (or population, species depending on the level of detail supplied). Generally, observation_id has the format dataset_id_XX where XX is a unique number within each dataset.

For datasets submitted to AusTraits in long format, a column specifying observation_id should already exist, assigned based on a specified grouping variable. If missing, observation_id is assigned based on species_name.

For datasets submitted to AusTraits in wide format each row is assigned a unique observation_id during processing, allowing conversion to long format.

#### Values and Value types

Each record in the table of trait data has an associated value and value_type.

Traits are either numeric or categorical. For traits with numerical values, the recorded value has been converted into standardised units and we have checked that the value can be converted into a number and lies within the allowable range. For categorical variables, we only include records whose values are included as allowable values (terms) in a trait’s definition in the definitions table. Moreover, we use a format whereby

• we use _ for multi-word terms, e.g. semi_deciduous
• we use a space for situations where a single study reports two possible values for that trait, e.g. annual biennial for a plant species which is either annual or biennial

Each trait measurement also has an associated value_type, which is A categorical variable describing the type of trait value recorded. Possible values are:

key value
raw_value Value is a direct measurement.
site_min Value is the minimum of measurements on multiple individuals of the taxon at a single site.
site_mean Value is the mean or median of measurements on multiple individuals of the taxon at a single site.
site_max Value is the maximum of measurements on multiple individuals of the taxon at a single site.
multisite_min Value is the minimum of measurements on multiple individuals of the taxon across multiple sites.
multisite_mean Value is the mean or median of measurements on multiple individuals of the taxon across multiple sites.
multisite_max Value is the maximum of measurements on multiple individuals of the taxon across multiple sites.
expert_min Value is the minimum observed for a taxon across its range or in this particular dataset, as estimated by an expert based on their knowledge of the taxon. Data fitting this category include estimates from floras that represent a taxon’s entire range.
expert_mean Value is the mean observed for a taxon across its range or in this particular dataset, as estimated by an expert based on their knowledge of the taxon. Data fitting this category include estimates from floras that represent a taxon’s entire range, and values for categorical variables obtained from a reference book, or identified by an expert.
expert_max Value is the maximum observed for a taxon across its range or in this particular dataset, as estimated by an expert based on their knowledge of the taxon. Data fitting this category include estimates from floras that represent a taxon’s entire range.
experiment_min Value is the minimum of measurements on multiple individuals of the taxon from an experimental study (either in the field or a glasshouse).
experiment_mean Value is the mean or median of measurements on multiple individuals of the taxon from an experimental study (either in the field or a glasshouse).
experiment_max Value is the maximum of measurements on multiple individuals of the taxon from an experimental study (either in the field or a glasshouse).
individual_mean Value is a mean of replicate measurements on an individual.
individual_max Value is a maximum of replicate measurements on an individual.
literature_source Value is a site_mean, multisite_mean, or expert_mean that has been sourced from an unknown literature source.
unknown Value type is not currently known.

AusTraits does not include intra-individual observations. When multiple measurements per individual are submitted to AusTraits, we take the mean of the values and record the value_type as individual_mean.

## sites

Description: A table containing observations of site characteristics associated with information in traits. Cross referencing between the two dataframes is possible using combinations of the variables dataset_id, site_name.

Content:

key value
dataset_id Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005.
site_name Name of site where individual was sampled. Cross-references between similar columns in sites and traits.
site_property The site characteristic being recorded. Name should include units of measurement, e.g. MAT (C). Ideally we have at least these variables for each site - longitude (deg), latitude (deg), description.
value Measured value.

## contexts

Description: A table containing observations of contextual characteristics associated with information in traits. Cross referencing between the two dataframes is possible using combinations of the variables dataset_id, context_name.

Content:

key value
dataset_id Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005.
context_name Name of contextual senario where individual was sampled. Cross-references between similar columns in contexts and traits.
context_property The contextual characteristic being recorded. Name should include units of measurement, e.g. CO2 concentration (ppm).
value Measured value.

## methods

Description: A table containing details on methods with which data were collected, including time frame and source. Cross referencing with the traits table is possible using combinations of the variables dataset_id, trait_name.

Content:

key value
dataset_id Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005.
trait_name Name of trait sampled. Allowable values specified in the table definitions.
methods A textual description of the methods used to collect the trait data. Whenever available, methods are taken near-verbatim from referenced source. Methods can include descriptions such as ‘measured on botanical collections’, ‘data from the literature’, or a detailed description of the field or lab methods used to collect the data.
year_collected_start The year data collection commenced.
year_collected_end The year data collection was completed.
description A 1-2 sentence description of the purpose of the study.
collection_type A field to indicate where the majority of plants on which traits were measured were collected - in the field, lab, glasshouse, botanical collection, or literature. The latter should only be used when the data were sourced from the literature and the collection type is unknown.
sample_age_class A field to indicate if the study was completed on adult or juvenile plants.
sampling_strategy A written description of how study sites were selected and how study individuals were selected. When available, this information is lifted verbatim from a published manuscript. For botanical collections, this field ideally indicates which records were ‘sampled’ to measure a specific trait.
source_primary_key Citation key for primary source in sources. The key is typically of format Surname_year.
source_primary_citation Citation for primary source. This detail is generated from the primary source in the metadata.
source_secondary_key Citation key for secondary source in sources. The key is typically of format Surname_year.
source_secondary_citation Citations for secondary source. This detail is generated from the secondary source in the metadata.

## exluded_data

Description: A table of data that did not pass quality test and so were excluded from the master dataset. Structure is identical to that presented in the traits table, only with an extra column called error indicating why the record was excluded. Common reasons are missing_unit_conversions, missing_value, and unsupported_trait_value.

Content:

key value
error Indicating why the record was excluded. Common reasons are missing_unit_conversions, missing_value, and unsupported_trait_value.
dataset_id Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005.
taxon_name Whenever possible, this field indicates the currently accepted scientific name of a taxon, per the Australian Plant Census (APC). Alternatively, taxon_name can indicate an alignment with a name included in the comprehensive Australian Plant Names Index (APNI) but not currently marked as accepted in APC; or a name that cannot be aligned to available lists of Australian plant names.
site_name Name of site where individual was sampled. Cross-references between similar columns in sites and traits.
context_name Name of contextual senario where individual was sampled. Cross-references between similar columns in contexts and traits.
observation_id A unique identifier for the observation, useful for joining traits coming from the same observation_id. For wide datasets, these are assigned automatically, based on the dataset_id and row number of the raw data.
trait_name Name of trait sampled. Allowable values specified in the table definitions.
value Measured value.
unit Units of the sampled trait value after aligning with AusTraits standards.
date Date sample was taken, in the format yyyy-mm-dd, yyyy-mm or yyyy depending on resoluton specified.
value_type A categorical variable describing the type of trait value recorded.
replicates Number of replicate measurements that comprise the data points for the trait for each measurement. A numeric value (or range) is ideal and appropriate if the value type is a mean, median, min or max. For these value types, if replication is unknown the entry should be unknown. If the value type is raw_value the replicate value should be 1. If the value type is expert_mean, expert_min, or expert_max the replicate value should be .na.
original_name Name given to taxon in the original data supplied by the authors

## taxa

Description: A table containing details on taxa that are included in the table traits. We have attempted to align species names with known taxonomic units in the Australian Plant Census (APC) and/or the Australian Plant Names Index (APNI); the sourced information is released under a CC-BY3 license.

Version 3.0.2 of AusTraits contains records for 28640 different taxa.

Content:

key value
taxon_name Whenever possible, this field indicates the currently accepted scientific name of a taxon, per the Australian Plant Census (APC). Alternatively, taxon_name can indicate an alignment with a name included in the comprehensive Australian Plant Names Index (APNI) but not currently marked as accepted in APC; or a name that cannot be aligned to available lists of Australian plant names.
source Source of taxonomic information, either APC or APNI.
acceptedNameUsageID The identifier of the accepted concept in APC for this taxon name.
scientificNameAuthorship Authority for the taxon indicated under taxon_name; applicable for most taxa in APC.
taxonRank Rank of the taxon.
taxonomicStatus Taxonomic status of the taxon. accepted indicates the taxon name is designated as current in APC, a taxonomy endorsed by the Council of Heads of Australasian Herbaria. Alternate statuses are: unplaced indicating the taxon name is included in APNI but has yet to be reviewed for inclusion/exclusion as a current name (or synonym) for APC; genus_known indicating the taxon name can only be aligned to the genus level; family_known indicating the taxon name can only be aligned to the family level; and unknown indicating a taxon name submitted to AusTraits that cannot be aligned at any level.
family Family of the taxon.
taxonDistribution Known distribution of the taxon, by Australian state.
ccAttributionIRI Source of taxonomic information (for taxa designated as current for APC) or name information (for taxa included in APNI, but unplaced for APC).
genus Genus of the taxon.

Description: A table of all taxonomic changes implemented in the construction of AusTraits. Changes are determined by comparing the originally submitted taxon name against the APC (Australian Plant Census) and APNI (Australian Plant Names Index). Cross referencing with the traits table is possible using combinations of the variables dataset_id, taxon_name.

Content:

key value
dataset_id Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005.
original_name Name given to taxon in the original data supplied by the authors
cleaned_name Name of the taxon after implementing automated syntax standardisation and spelling changes as well as manually encoded syntax alignments for this taxon in the metadata file for the corresponding dataset_id.
taxonIDClean The APC identifier for the cleaned_name of the taxon (for taxa designated as current for APC) or APNI identifier (for taxa included in APNI, but unplaced for APC).
taxonomicStatusClean The APC taxonomic status for the cleaned_name of the taxon identified by taxonIDClean.
alternativeTaxonomicStatusClean The APC taxonomic status of alternative APC records with the name cleaned_name.
acceptedNameUsageID The APC identifier for the accepted name (taxon_name) for a taxon; different from taxonIDClean if taxonomicStatusClean is not accepted.
taxon_name Whenever possible, this field indicates the currently accepted scientific name of a taxon, per the Australian Plant Census (APC). Alternatively, taxon_name can indicate an alignment with a name included in the comprehensive Australian Plant Names Index (APNI) but not currently marked as accepted in APC; or a name that cannot be aligned to available lists of Australian plant names.

Both the original and the updated taxon names are included in the traits table.

## definitions

Description: A copy of the definitions for all tables and terms. Information included here was used to process data and generate any documentation for the study.

Details on trait definitions: The allowable trait names and trait values are defined in the definitions file. Each trait is labelled as either numeric or categorical. An example of each type is as follows. For the full list, see the Trait definitions vignette.

specific_leaf_area

• label: Leaf area per unit leaf dry mass (specific leaf area, SLA)
• description: Leaf area per unit leaf dry mass; SLA
• number of records: 32691
• number of studies: 130
• type: numeric
• units: mm2/mg
• allowable range: 0.1 - 500 mm2/mg

woodiness

• label: Woodiness
• description: A plant’s degree of lignification in stems
• number of records: 14138
• number of studies: 15
• type: categorical
• allowable values:
• herbaceous: Plant with non-lignified stems
• semi_woody: Plant with partially lignified stems
• woody: Plant that produces secondary xylem, have lignin

## contributors

Description: A table of people contributing to each study.

Content:

key value
dataset_id Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005.
name Name of contributor
institution Last known institution or affiliation
role Their role in the study

## sources

For each dataset in the compilation there is the option to list primary and secondary citations. The primary citation is defined as, The original study in which data were collected. The secondary citation is defined as, A subsequent study where data were compiled or re-analysed.

The element sources includes bibtex versions of all sources which can be imported into your reference library:

WriteBib(austraits$sources) #write all sources to file WriteBib(austraits$sources["Falster_2005_1"])           #write a single reference to a file

Or individually viewed:

austraits\$sources["Falster_2005_1"]

A formatted version of the sources also exists within the table methods.

## build_info

Description: A description of the computing environment used to create this version of the dataset, including version number, git commit and R session_info.