12  File organisation

This chapter describes the typical files you may encounter in a traits.build compilation. The description is based on the austraits.build compilation.

We strongly suggest you create a standalone folder for your repository, e.g. austraits.build. This folder should contain all files needed to build your compilation. We’re big fans of github as a platform for collaboration. If you’re not familiar with git or github, we suggest you check out the happy git with R book.

12.1 Repository structure

The main directory for the austraits.build repository contains the following files and folders, with purpose as indicated. Not all of these files are required for a compilation, some are used for extra features such as website. They are included here for completeness.

Files used for data compilation

├── remake.yml/build.R    # instructions for build
├── config                # configuration files
├── data                  # raw data files
├── R                     # folder with custom R functions
├── export                # folder for output
└── scripts               # R scripts for processing files before/after build

R project file

├── austraits.build.Rproj     # Rstudio project

Files for maintaining a repo on github

├── README.md         # landing page
├── .github           # folder containing github actions, issue templates, code of conduct
├── LICENCE
├── NEWS.md
├── inst              # contains images that appear on github repo 

Additional files describing the compendium and testing the database build process

├── DESCRIPTION           # compendium description
├── tests                 # tests for whether database builds

12.2 /config folder

The folder config contains four files which govern the building of the dataset.

config
├── metadata.yml
├── traits.yml
├── taxon_list.csv
└── unit_conversions.csv

metadata.yml

The file metadata.yml documents dataset-level metadata, including a database description, authors, and funders.

traits.yml

The file traits.yml provides the trait definitions used to compile the trait database, including allowable trait values. See creating a trait dictionary for more information on the process of creating this file. A .yml file is a structured data file where information is presented in a hierarchical format (see appendix for details).

taxon_list.csv

The file taxon_list.csv is our master list of taxa in the trait database.

It includes all unique taxon names after typos have been corrected (through taxonomic_updates). It includes both accepted/valid taxon concepts and outdated taxonomic names. It includes taxon names indicating a taxon that can be identified to species and names that can only be resolved to a lower taxon rank.

There are only three required columns within taxon_list.csv: aligned_name, taxon_name and taxon_rank. In the file, aligned_name refers to the taxon name after any typos have been corrected, while taxon_name is the taxon name following updates to the currently accepted/valid taxon name (when available). Taxon_rank indicates the resolution of the taxon_name.

However, it is best practice to include additional columns when available, including taxon identifiers for species (& infraspecific taxon concepts) or genera that align with known taxon concepts.

The file taxon_list.csv should be added to if a study includes taxa not previously represented in the trait database. It must be compiled outside of the traits.build workflow, as each dataset will use different taxonomic datasets, with different columns of information.

For AusTraits, the taxonomic datasets referenced are the two vascular plant lists within the National Species Lists (NSL), the APC (Australian Plant Census) and the Australian Plant Name Index (APNI). The workflow used by AusTraits to rebuild the taxon list is available here.

taxon_name aligned_name family taxonomic_dataset taxon_rank aligned_name_taxonomic_status taxon_id scientific_name scientific_name_id
Abelia x grandiflora Abelia x grandiflora Caprifoliaceae APC Species accepted https://id.biodiversity.org.au/taxon/apni/51432945 Abelia x grandiflora (Rovelli ex André) Rehder https://id.biodiversity.org.au/name/apni/190758
Abelmoschus ficulneus Abelmoschus ficulneus Malvaceae APC Species accepted https://id.biodiversity.org.au/node/apni/2897916 Abelmoschus ficulneus (L.) Wight https://id.biodiversity.org.au/name/apni/55929
Abelmoschus manihot Abelmoschus manihot Malvaceae APC Species accepted https://id.biodiversity.org.au/node/apni/2901085 Abelmoschus manihot (L.) Medik. https://id.biodiversity.org.au/name/apni/55937
Abelmoschus manihot subsp. manihot Abelmoschus manihot subsp. manihot Malvaceae APC Subspecies accepted https://id.biodiversity.org.au/node/apni/2917035 Abelmoschus manihot (L.) Medik. subsp. manihot https://id.biodiversity.org.au/name/apni/116920
Abelmoschus manihot subsp. tetraphyllus Abelmoschus manihot subsp. tetraphyllus Malvaceae APC Subspecies accepted https://id.biodiversity.org.au/node/apni/2892917 Abelmoschus manihot subsp. tetraphyllus (Roxb. ex Hornem.) Borss.Waalk. https://id.biodiversity.org.au/name/apni/55945
Abelmoschus moschatus Abelmoschus moschatus Malvaceae APC Species accepted https://id.biodiversity.org.au/node/apni/2900572 Abelmoschus moschatus Medik. https://id.biodiversity.org.au/name/apni/55953
Abelmoschus moschatus subsp. biakensis Abelmoschus moschatus subsp. biakensis Malvaceae APC Subspecies accepted https://id.biodiversity.org.au/node/apni/2907435 Abelmoschus moschatus subsp. biakensis (Hochr.) Borss.Waalk. https://id.biodiversity.org.au/name/apni/116595
Abelmoschus moschatus subsp. moschatus Abelmoschus moschatus subsp. moschatus Malvaceae APC Subspecies accepted https://id.biodiversity.org.au/node/apni/2911283 Abelmoschus moschatus Medik. subsp. moschatus https://id.biodiversity.org.au/name/apni/243806
Abelmoschus moschatus subsp. tuberosus Abelmoschus moschatus subsp. tuberosus Malvaceae APC Subspecies accepted https://id.biodiversity.org.au/node/apni/2919287 Abelmoschus moschatus subsp. tuberosus (Span.) Borss.Waalk. https://id.biodiversity.org.au/name/apni/55961
Abildgaardia ovata Abildgaardia ovata Cyperaceae APC Species accepted https://id.biodiversity.org.au/node/apni/2919627 Abildgaardia ovata (Burm.f.) Kral https://id.biodiversity.org.au/name/apni/150737

unit_conversions.csv

The file unit_conversions.csv defines the unit conversions that are used when converting contributed trait data to common units, e.g.

unit_from unit_to function
% mg/g x*10
% g/g x*0.01
% mg/mg x*0.01
% mg/kg x*10000
% {dimensionless} x*.01
% {count}/{count} x*.01
{dimensionless} {count}/{count} x*1
a mo x*12
{count}/m2 {count}/mm2 x*1/1000000
cm m x*0.01

12.3 /data folder

The folder data contains the raw data from individual studies included in the trait database.

Records within the data folder are organised as coming from a particular study, defined by the dataset_id. Data from each study are organised into a separate folder, with two files:

  • data.csv: a table containing the actual trait data.
  • metadata.yml: a file that contains study metadata (source, methods, locations, and context), maps trait names and units onto standard types, and lists any substitutions applied to the data in processing.

The folder data thus contains a long list of folders, one for each study and each containing two files:

data
├── Angevin_2010
│   ├── data.csv
│   └── metadata.yml
├── Barlow_1981
│   ├── data.csv
│   └── metadata.yml
├── Bean_1997
│   ├── data.csv
│   └── metadata.yml
├── ....

where Angevin_2010, Barlow_1981, & Bean_1997 are each a unique dataset_id in the final dataset.

This file can be added to within specific traits.build projects, as required for different dataset styles.