14 Adding datasets, an introduction
This section gives a brief overview of how to add datasets to the database. For a more detailed guide, see the Guide to adding data part of this book, starting with Tutorial: Adding datasets.
All steps must be followed for the automated workflow to proceed without problems.
- Install the
{traits.build}R-package from github - Clone the
traits.build-templaterepository from github - For each dataset, create a new branch in the repo, named for the new
dataset_idinauthor_yearformat, e.g.Gallagher_2014. - Create a new folder within the folder
datawith the namedataset_id, e.g.Gallagher_2014. - Prepare the file
data.csvand place it within the new folder. - Prepare the file
metadata.ymland place it within the new folder. - Run tests on the newly added dataset and correct the
data.csvandmetadata.ymlfiles as necessary. - Add the new study into the build framework and rebuild the trait database, by running
build_setup_pipeline()andsource(build.R).
You can then rebuild the database, including the new dataset.
- Run quality checks on the newly added dataset and correct the
data.csvandmetadata.ymlfiles as necessary. - Generate and proofread a report on the data. In particular, check that numeric trait values fall within a logical range relative to other studies, and that individual trait observations are not unnecessarily excluded because their trait values are unsupported.
- Return to step 6 if changes are made to the
data.csvormetadata.ymlfiles. - Push the GitHub branch to your database repository.
The best place to get started learning how to add datasets is to work through a series of 7 tutorials. Each introduces you to specific traits.build functions designed to facilitate the addition of dataset metadata or the metadata formats for specific types of datasets.
The chapter Adding datasets, a lengthy guide then offers a comprehensive guide to generating the data.csv and metadata.yml files and error-checking your results. This document is likely overwhelming until you are familiar with the traits.build workflow and metadata format.
It may also help to download one of the two sample datasets to use as a template for your own files and a guide on required content. Or alteratively, to see a greater diversity of dataset styles, look at the austraits.build repository
You should look at the files in the config folder, particularly the definitions file for the list of traits in AusTraits and how trait definitions are formatted.