14 Adding datasets, an introduction
This section gives a brief overview of how to add datasets to the database. For a more detailed guide, see the Guide to adding data part of this book, starting with Tutorial: Adding datasets.
All steps must be followed for the automated workflow to proceed without problems.
- Install the
{traits.build}
R-package from github - Clone the
traits.build-template
repository from github - For each dataset, create a new branch in the repo, named for the new
dataset_id
inauthor_year
format, e.g.Gallagher_2014
. - Create a new folder within the folder
data
with the namedataset_id
, e.g.Gallagher_2014
. - Prepare the file
data.csv
and place it within the new folder. - Prepare the file
metadata.yml
and place it within the new folder. - Run tests on the newly added dataset and correct the
data.csv
andmetadata.yml
files as necessary. - Add the new study into the build framework and rebuild the trait database, by running
build_setup_pipeline()
andsource(build.R)
.
You can then rebuild the database, including the new dataset.
- Run quality checks on the newly added dataset and correct the
data.csv
andmetadata.yml
files as necessary. - Generate and proofread a report on the data. In particular, check that numeric trait values fall within a logical range relative to other studies, and that individual trait observations are not unnecessarily excluded because their trait values are unsupported.
- Return to step 6 if changes are made to the
data.csv
ormetadata.yml
files. - Push the GitHub branch to your database repository.
The best place to get started learning how to add datasets is to work through a series of 7 tutorials. Each introduces you to specific traits.build functions designed to facilitate the addition of dataset metadata or the metadata formats for specific types of datasets.
The chapter Adding datasets, a lengthy guide then offers a comprehensive guide to generating the data.csv
and metadata.yml
files and error-checking your results. This document is likely overwhelming until you are familiar with the traits.build workflow and metadata format.
It may also help to download one of the two sample datasets to use as a template for your own files and a guide on required content. Or alteratively, to see a greater diversity of dataset styles, look at the austraits.build repository
You should look at the files in the config folder, particularly the definitions
file for the list of traits in AusTraits and how trait definitions are formatted.