10  The package

The traits.build package provides a workflow for harmonising data from disconnected primary sources and arises from the AusTraits project. In 2023 this package was spun out as a separate package from the autraits.build repository.

The traits.build package provides the functions needed to build a compilation from the primary sources ino a standardised database. specified

The core components of the {traits.build} package are:

  1. 15 functions functions, supplemented by a detailed protocol to wrangle diverse datasets into input files with a common structure that captures both the trait data and all essential metadata and context properties. These are a table (data.csv) containing all trait data, taxon names, location names (if relevant), and any context properties (if relevant) and a structured metadata file (metadata.yml) that assigns the columns from the data.csv file to their specific variables and maps all additional dataset metadata in a structured format.

  2. An R-based pipeline to combine the input files into a single harmonised database with aligned trait names, aligned units, aligned categorical trait values, and aligned taxon names. Four database-specific configuration files are required for the build process, 1) a trait dictionary; 2) a units conversion file; 3) a taxon list; and 4) a database metadata file.

Guided by the information in the configuration files, the R-scripted workflow combines the data.csv and metadata.yml files for the individual datasets into a unified, harmonised database. There are three distinct steps to this process, processed by a trio of functions, dataset_configure, dataset_process, and dataset_taxonomic_updates. These functions cyclically build each dataset, only combining them into a single database at the end of the workflow.

A combination of automated tests and other quality controls ensure each dataset has been appropriately merged in and the output data are reliable, accurate, and supported by detailed metadata.