This document describes the structure of the AusTraits compilation.

As an on-going collaborative community resource we would appreciate your contribution on any of the following:

• Reporting Errors: If you notice a possible error in AusTraits, please get in contact.
• Refining documentation: We welcome additions and edits that make using the existing data or adding new data easier for the community.
• Contributing new data: We gladly accept new data contributions to AusTraits. For full instructions on preparing data for inclusion in AusTraits, , please get in contact.

AusTraits is essentially a series of linked components, which cross link against each other::

austraits
├── traits
├── sites
├── methods
├── excluded_data
├── taxonomy
├── definitions
├── contributors
├── sources
└── build_info

These include all the data and contextual information submitted with each contributed dataset. It is essential that users of AusTraits data are confident the data have the meaning they expect it to and were collected using methods they trust. As such, each dataset within Austraits must include descriptions of the study, sites, and methods used as well as the data itself.

## Components

The core components are defined as follows.

### traits

Description: A table containing measurements of plant traits.

Content:

key value
dataset_id Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005.
taxon_name Currently accepted name of taxon in the Australian Plant Census or, for unplaced species, in the Australian Plant Names Index.
site_name Name of site where individual was sampled. Cross-references between similar columns in sites and traits.
context_name Name of contextual senario where individual was sampled. Cross-references between similar columns in contexts and traits.
observation_id A unique identifier for the observation, useful for joining traits coming from the same observation_id. These are assigned automatically, based on the dataset_id and row number of the raw data.
trait_name Name of trait sampled. Allowable values specified in the table traits.
value Measured value.
unit Units of the sampled trait value after aligning with AusTraits standards.
date Date sample was taken, in the format yyyy-mm-dd, but with days and months only when specified.
value_type A categorical variable describing the type of trait value recorded.
replicates Number of replicate measurements that comprise the data points for the trait for each measurement. A numeric value (or range) is ideal and appropriate if the value type is a mean, median, min or max. For these value types, if replication is unknown the entry should be unknown. If the value type is raw_value the replicate value should be 1. If the value type is expert_mean, expert_min, or expert_max the replicate value should be .na.
original_name Name given to taxon in the original data supplied by the authors

### sites

Description: A table containing observations of site characteristics associated with information in traits. Cross referencing between the two dataframes is possible using combinations of the variables dataset_id, site_name.

Content:

key value
dataset_id Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005.
site_name Name of site where individual was sampled. Cross-references between similar columns in sites and traits.
site_property The site characteristic being recorded. Name should include units of measurement, e.g. longitude (deg). Ideally we have at least these variables for each site - longitude (deg), latitude (deg), description.
value Measured value.

### contexts

Description: A table containing observations of contextual characteristics associated with information in traits. Cross referencing between the two dataframes is possible using combinations of the variables dataset_id, context_name.

Content:

key value
dataset_id Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005.
context_name Name of contextual senario where individual was sampled. Cross-references between similar columns in contexts and traits.
context_property The contextual characteristic being recorded. Name should include units of measurement, e.g. elevation (m).
value Measured value.

### methods

Description: A table containing details on methods with which data were collected, including time frame and source.

Content:

key value
dataset_id Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005.
trait_name Name of trait sampled. Allowable values specified in the table traits.
methods A textual description of the methods used to collect the trait data. Whenever available, methods are taken near-verbatim from referenced source. Methods can include descriptions such as ‘measured on botanical collections’,‘data from the literature’, or a detailed description of the field or lab methods used to collect the data.
year_collected_start The year data collection commenced.
year_collected_end The year data collection was completed.
description A 1-2 sentence description of the purpose of the study.
collection_type A field to indicate where the majority of plants on which traits were measured were collected - in the field, lab, glasshouse, botanical collection, or literature. The latter should only be used when the data were sourced from the literature and the collection type is unknown.
sample_age_class A field to indicate if the study was completed on adult or juvenile plants.
sampling_strategy A written description of how study sites were selected and how study individuals were selected. When available, this information is lifted verbatim from a published manuscript. For botanical collections, this field ideally indicates which records were ‘sampled’ to measure a specific trait.
source_primary_citation Citation for primary source. This detail is generated from the primary source in the metadata.
source_primary_key Citation key for primary source in sources. The key is typically of format Surname_year.
source_secondary_citation Citations for secondary source. This detail is generated from the secondary source in the metadata.
source_secondary_key Citation key for secondary source in sources. The key is typically of format Surname_year.

### excluded_data

Description: A table of data that did not pass quality test and so were excluded from the master dataset.

Description: A table of all taxonomic changes implemented in the construction of AusTraits. Changes are determined by comapring against the APC (Australian Plant Census) and APNI (Australian Plant Names Index).

Content:

key value
dataset_id Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005.
original_name Name given to taxon in the original data supplied by the authors
cleaned_name Name of the taxon after implementing any changes encoded for this taxon in the metadata file in the specified correpsonding dataset_id.
taxonIDClean Where it could be indentified, the taxonID of the cleaned_name for this taxon in the APC.
taxonomicStatusClean Taxonomic status of the taxon identified by taxonIDClean in the APC.
alternativeTaxonomicStatusClean The status of alternative records with the name cleaned_name in the APC.
acceptedNameUsageID ID of the accepted name for taxon in the APC or APNI.
taxon_name Currently accepted name of taxon in the Australian Plant Census or, for unplaced species, in the Australian Plant Names Index.

### taxa

Description: A table containing details on taxa associated with information in traits. This information has been sourced from the APC (Australian Plant Census) and APNI (Australian Plant Names Index) and is released under a CC-BY3 license.

Content:

key value
taxon_name Currently accepted name of taxon in the Australian Plant Census or, for unplaced species, in the Australian Plant Names Index.
source Source of taxnonomic information, either APC or APNI.
acceptedNameUsageID Identifier for the accepted name of the taxon.
scientificNameAuthorship Authority for accepted of the taxon indicated under taxon_name.
taxonRank Rank of the taxon.
taxonomicStatus Taxonomic status of the taxon.
family Family of the taxon.
genus Genus of the taxon.
taxonDistribution Known distribution of the taxon.

### definitions

Description: A copy of the definitions for all tables and terms. Information included here was used to process data and generate any documentation for the study.

### contributors

Description: A table of people contributing to each study.

Content:

key value
dataset_id Primary identifier for each study contributed into AusTraits; most often these are scientific papers, books, or online resources. By default should be name of first author and year of publication, e.g. Falster_2005.
name Name of contributor
institution Last known institution or affiliation
role Their role in the study

### sources

Description: Bibtex entries for all primary and secondary sources in the compilation.

### build_info

Description: A description of the computing environment used to create this version of the dataset, including version number, git commit and R session_info.

## Dataset IDs

The core organising unit behind AusTraits is the dataset_id. Records are organised as coming from a particular study, defined by the dataset_id. Our preferred format for dataset_id is the surname of the first author of any corresponding publication, followed by the year, as surname_year. E.g. Falster_2005. Wherever there are multiple studies with the same id, we add a suffix _2, _3 etc. E.g.Falster_2005, Falster_2005_2.

## Observation IDs

As well as a dataset_id, each trait measurement has an associated observation_id. Observation IDs bind together related measurements within any dataset, and thereby allow transformation between long (e.g. with variables trait_name and value) and wide (e.g. with traits as columns) formats.

Generally, observation_id has the format dataset_id_XX where XX is a unique number within each dataset. For example, if multiple traits were collected on the same individual, the observation_id allows us to gather these together. For floras reporting species averages, the observation_id is assigned at the species level.

For datasets that arrive in wide format we assume each row has a unique observation_id. For datasets that arrive in long format, the observation_id is assigned based on a specified grouping variable. This variable can be specified in the metadata.yml file under the section variable_match. If missing, observation_id is assigned based on species_name.

## Site names

As well as dataset_id and observation_id, where appropriate, trait values are associated with a site_name. Unique combinations of dataset_id and site_name can be used to cross-match against the sites table, which provide further details on the site sampled.

## Context names

As well as dataset_id, observation_id, and site_name, where appropriate, trait values are associated with a context_name. Unique combinations of dataset_id and context_name can be used to cross-match against the context table, which provide further details on the context sampled.

## Values and Value types

Each record in the table of trait data has an associated value and value_type.

Traits are either numeric or categorical. For traits with numerical values, the recorded value has been converted into standardised units and we have checked that the value can be converted into a number and lies within the allowable range. For categorical variables, we only include records that are defined in the definitions. Moreover, we use a format whereby:

• we use _ for multi-word terms, e.g. semi_deciduous
• use a space for situations where there are two possible values for that trait, e.g. annual biennial for something which is either annual or biennial

Each trait measurement also has an associated value_type, which gives A categorical variable describing the type of trait value recorded.. Possible values are:

key value
raw_value Value is a direct measurement
site_min Value is the minimum of measurements on multiple individuals of the taxon at a single site
site_mean Value is the mean or median of measurements on multiple individuals of the taxon at a single site
site_max Value is the maximum of measurements on multiple individuals of the taxon at a single site
multisite_min Value is the minimum of measurements on multiple individuals of the taxon across multiple sites
multisite_mean Value is the mean or median of measurements on multiple individuals of the taxon across multiple sites
multisite_max Value is the maximum of measurements on multiple individuals of the taxon across multiple sites
expert_min Value is the minimum observed for a taxon across its range or in this particular dataset, as estimated by an expert based on their knowledge of the taxon. Data fitting this category include estimates from flora that represent a taxon’s entire range, and values for categorical variables obtained from a reference book, or identified by an expert.
expert_mean Value is the mean observed for a taxon across its range or in this particular dataset, as estimated by an expert based on their knowledge of the taxon. Data fitting this category include estimates from flora that represent a taxon’s entire range, and values for categorical variables obtained from a reference book, or identified by an expert.
expert_max Value is the maximum observed for a taxon across its range or in this particular dataset, as estimated by an expert based on their knowledge of the taxon. Data fitting this category include estimates from flora that represent a taxon’s entire range, and values for categorical variables obtained from a reference book, or identified by an expert.
experiment_min Value is the minimum of measurements from an experimental study either in the field or a glasshouse
experiment_mean Value is the mean or median of measurements from an experimental study either in the field or a glasshouse
experiment_max Value is the maximum of measurements from an experimental study either in the field or a glasshouse
individual_mean Value is a mean of replicate measurements on an individual (usually for experimental ecophysiology studies)
individual_max Value is a maximum of replicate measurements on an individual (usually for experimental ecophysiology studies)
literature_source Value is a site or multi-site mean that has been sourced from an unknown literature source
unknown Value type is not currently known

AusTraits does not include intra-individual observations. When multiple measurements per individual are submitted to AusTraits, we take the mean of the values and record the value_type as individual_mean.

## Taxonomy

The latest version of AusTraits contains records for over 28640 different taxa. We have attempted to align species names with known taxonomic units in the Australian Plant Census (APC) and/or the Australian Plant Names Index (APNI).

The table taxa lists all taxa in the database, including additional information about the taxa (see Table above).

The traits table reports both the original and the updated taxon name alongside each trait record.

The table taxanomic_updates provides details on all taxonomic name changes implemented when aligning with APC and APNI.

## Sources

For each dataset in the compilation there is the option to list primary and secondary citations. The primary citation is The original study in which data were collected. while the secondary citation is A subsequent study where data were compiled or re-analysed and then made available.. These references are included in two places:

1. Within the table methods, where we provide a formatted version of each.
2. In the element sources, where we provide bibtex versions of all sources which can be imported into your reference library. The keys for these references are listed within the methods.