AusTraits is an open-source, harmonised database of Australian plant trait data. Traits vary in scope from physiological measures of performance (e.g. photosynthetic gas exchange, water-use efficiency) to morphological attributes (e.g. leaf area, seed mass, plant height).

This vignette provides an overview of our workflow, to demonstrate our commitment to creating a reliable, reproducible resource for anyone interested in plant traits.

AusTraits workflow

Data sources

The data in AusTraits are derived from nearly 300 distinct sources, each contributed by an individual researcher, government entity (e.g. herbaria), or NGO. Each reflects the research agenda of the individual/organisation who contributed the data - the species selected, traits measured, manipulative treatments performed, and locations sampled encompass the diversity of research interests present in Australia across the past many decades. The AusTraits data curators have simply sought out researcher upon researcher to share their data, reaching out to as many people as time permitted, but not explicitly soliciting datasets with specific traits and the spotty data coverage by trait or location simply represents what has been merged into AusTraits at this time.

These datasets use different variable names, data structures, units and sometimes methods.

Standardising and harmonising data

To create a single database for distribution to the research community, we developed a reproducible and transparent workflow in R for merging each dataset into AusTraits. The pipeline ensures the following information is standardised across all datasets in AusTraits. A metadata file for each study documents how the data tables submitted by an individual contributor are translated into the standardised terms used in the AusTraits database.

  • taxonomic nomenclature follows the Australian Plant Census (APC), with a pipeline to update outdated taxonomy, correct minor spelling mistakes, and align with a known genus when a full species names isn’t provided.
  • trait names are defined in our definitions file and only data for traits included in this file can be merged into AusTraits. The trait names used in the incoming dataset are mapped onto the appropriate AusTraits trait name.
  • For numeric traits the definitions file includes units and the allowable range of values. All incoming data are converted to the appropriate units.
  • For categorical traits the definitions file includes a list of allowable values, allowed terms for the trait. Each categorical trait value is defined in the definitions file. List of substitutions translate the exact syntax and terms in the spreadsheet submitted into the values allowed by AusTraits. This ensures that for a certain trait the same value has an identical meaning throughout the AusTraits database.
  • Site locations are recorded in decimal degrees

Referencing sources and recording methods

The metadata file also includes all metadata associated with the study:

  • The source information for each dataset is recorded. Most frequently, these are the primary publications derived from the dataset.
  • People associated with the collection of the data are listed, including their role in the project.
  • Collection methods, including value type (raw value, site mean, species mean) and replicate number are recorded.
  • Sampling date is recorded.
  • Sampling age class and collection type are recorded.
  • Available data on site properties are recorded.
  • Available data on contextual properties are recorded.

Error checking

  • The AusTraits data curator runs a series of tests on each data set, detailed in the adding data vignette
  • These tests identify misaligned units, unrecognised taxon names, and unsupported categorical trait values
  • These tests also identify and eliminate most duplicate data - instances where the same numeric trait data is submitted by multiple people
  • Each dataset is then compiled into a report which summarises metadata and plots/charts trait values in comparison to other measurements of that trait in AusTraits. The report is reviewed by the data contributor to ensure metadata are complete and data values are as expected.
  • A second member of the AusTraits team double checks each dataset before it is merged into the main repository.