taxon_name | aligned_name | family | taxonomic_dataset | taxon_rank | aligned_name_taxonomic_status | taxon_id | scientific_name | scientific_name_id |
---|---|---|---|---|---|---|---|---|
Abelia x grandiflora | Abelia x grandiflora | Caprifoliaceae | APC | Species | accepted | https://id.biodiversity.org.au/taxon/apni/51432945 | Abelia x grandiflora (Rovelli ex André) Rehder | https://id.biodiversity.org.au/name/apni/190758 |
Abelmoschus ficulneus | Abelmoschus ficulneus | Malvaceae | APC | Species | accepted | https://id.biodiversity.org.au/node/apni/2897916 | Abelmoschus ficulneus (L.) Wight | https://id.biodiversity.org.au/name/apni/55929 |
Abelmoschus manihot | Abelmoschus manihot | Malvaceae | APC | Species | accepted | https://id.biodiversity.org.au/node/apni/2901085 | Abelmoschus manihot (L.) Medik. | https://id.biodiversity.org.au/name/apni/55937 |
Abelmoschus manihot subsp. manihot | Abelmoschus manihot subsp. manihot | Malvaceae | APC | Subspecies | accepted | https://id.biodiversity.org.au/node/apni/2917035 | Abelmoschus manihot (L.) Medik. subsp. manihot | https://id.biodiversity.org.au/name/apni/116920 |
Abelmoschus manihot subsp. tetraphyllus | Abelmoschus manihot subsp. tetraphyllus | Malvaceae | APC | Subspecies | accepted | https://id.biodiversity.org.au/node/apni/2892917 | Abelmoschus manihot subsp. tetraphyllus (Roxb. ex Hornem.) Borss.Waalk. | https://id.biodiversity.org.au/name/apni/55945 |
Abelmoschus moschatus | Abelmoschus moschatus | Malvaceae | APC | Species | accepted | https://id.biodiversity.org.au/node/apni/2900572 | Abelmoschus moschatus Medik. | https://id.biodiversity.org.au/name/apni/55953 |
Abelmoschus moschatus subsp. biakensis | Abelmoschus moschatus subsp. biakensis | Malvaceae | APC | Subspecies | accepted | https://id.biodiversity.org.au/node/apni/2907435 | Abelmoschus moschatus subsp. biakensis (Hochr.) Borss.Waalk. | https://id.biodiversity.org.au/name/apni/116595 |
Abelmoschus moschatus subsp. moschatus | Abelmoschus moschatus subsp. moschatus | Malvaceae | APC | Subspecies | accepted | https://id.biodiversity.org.au/node/apni/2911283 | Abelmoschus moschatus Medik. subsp. moschatus | https://id.biodiversity.org.au/name/apni/243806 |
Abelmoschus moschatus subsp. tuberosus | Abelmoschus moschatus subsp. tuberosus | Malvaceae | APC | Subspecies | accepted | https://id.biodiversity.org.au/node/apni/2919287 | Abelmoschus moschatus subsp. tuberosus (Span.) Borss.Waalk. | https://id.biodiversity.org.au/name/apni/55961 |
Abildgaardia ovata | Abildgaardia ovata | Cyperaceae | APC | Species | accepted | https://id.biodiversity.org.au/node/apni/2919627 | Abildgaardia ovata (Burm.f.) Kral | https://id.biodiversity.org.au/name/apni/150737 |
12 File organisation
This chapter describes the typical files you may encounter in a traits.build
compilation. The description is based on the austraits.build
compilation.
We strongly suggest you create a standalone folder for your repository, e.g. austraits.build
. This folder should contain all files needed to build your compilation. We’re big fans of github as a platform for collaboration. If you’re not familiar with git or github, we suggest you check out the happy git with R book.
12.1 Repository structure
The main directory for the austraits.build
repository contains the following files and folders, with purpose as indicated. Not all of these files are required for a compilation, some are used for extra features such as website. They are included here for completeness.
Files used for data compilation
├── remake.yml/build.R # instructions for build
├── config # configuration files
├── data # raw data files
├── R # folder with custom R functions
├── export # folder for output
└── scripts # R scripts for processing files before/after build
R project file
├── austraits.build.Rproj # Rstudio project
Files for maintaining a repo on github
├── README.md # landing page
├── .github # folder containing github actions, issue templates, code of conduct
├── LICENCE
├── NEWS.md
├── inst # contains images that appear on github repo
Additional files describing the compendium and testing the database build process
├── DESCRIPTION # compendium description
├── tests # tests for whether database builds
12.2 /config
folder
The folder config
contains four files which govern the building of the dataset.
config
├── metadata.yml
├── traits.yml
├── taxon_list.csv
└── unit_conversions.csv
metadata.yml
The file metadata.yml
documents dataset-level metadata, including a database description, authors, and funders.
traits.yml
The file traits.yml
provides the trait definitions used to compile the trait database, including allowable trait values. See creating a trait dictionary for more information on the process of creating this file. A .yml
file is a structured data file where information is presented in a hierarchical format (see appendix for details).
taxon_list.csv
The file taxon_list.csv
is our master list of taxa in the trait database.
It includes all unique taxon names after typos have been corrected (through taxonomic_updates). It includes both accepted/valid taxon concepts and outdated taxonomic names. It includes taxon names indicating a taxon that can be identified to species and names that can only be resolved to a lower taxon rank.
There are only three required columns within taxon_list.csv
: aligned_name
, taxon_name
and taxon_rank
. In the file, aligned_name
refers to the taxon name after any typos have been corrected, while taxon_name
is the taxon name following updates to the currently accepted/valid taxon name (when available). Taxon_rank
indicates the resolution of the taxon_name
.
However, it is best practice to include additional columns when available, including taxon identifiers for species (& infraspecific taxon concepts) or genera that align with known taxon concepts.
The file taxon_list.csv
should be added to if a study includes taxa not previously represented in the trait database. It must be compiled outside of the traits.build workflow, as each dataset will use different taxonomic datasets, with different columns of information.
For AusTraits, the taxonomic datasets referenced are the two vascular plant lists within the National Species Lists (NSL), the APC (Australian Plant Census) and the Australian Plant Name Index (APNI). The workflow used by AusTraits to rebuild the taxon list is available here.
unit_conversions.csv
The file unit_conversions.csv
defines the unit conversions that are used when converting contributed trait data to common units, e.g.
unit_from | unit_to | function |
---|---|---|
% | mg/g | x*10 |
% | g/g | x*0.01 |
% | mg/mg | x*0.01 |
% | mg/kg | x*10000 |
% | {dimensionless} | x*.01 |
% | {count}/{count} | x*.01 |
{dimensionless} | {count}/{count} | x*1 |
a | mo | x*12 |
{count}/m2 | {count}/mm2 | x*1/1000000 |
cm | m | x*0.01 |
12.3 /data
folder
The folder data
contains the raw data from individual studies included in the trait database.
Records within the data
folder are organised as coming from a particular study, defined by the dataset_id
. Data from each study are organised into a separate folder, with two files:
data.csv
: a table containing the actual trait data.metadata.yml
: a file that contains study metadata (source, methods, locations, and context), maps trait names and units onto standard types, and lists any substitutions applied to the data in processing.
The folder data
thus contains a long list of folders, one for each study and each containing two files:
data
├── Angevin_2010
│ ├── data.csv
│ └── metadata.yml
├── Barlow_1981
│ ├── data.csv
│ └── metadata.yml
├── Bean_1997
│ ├── data.csv
│ └── metadata.yml
├── ....
where Angevin_2010
, Barlow_1981
, & Bean_1997
are each a unique dataset_id
in the final dataset.
This file can be added to within specific traits.build
projects, as required for different dataset styles.