14  Adding datasets, an introduction

This section gives a brief overview of how to add datasets to the database. For a more detailed guide, see the Guide to adding data part of this book, starting with Tutorial: Adding datasets.

All steps must be followed for the automated workflow to proceed without problems.

  1. Install the {traits.build} R-package from github
  2. Clone the traits.build-template repository from github
  3. For each dataset, create a new branch in the repo, named for the new dataset_id in author_year format, e.g. Gallagher_2014.
  4. Create a new folder within the folder data with the name dataset_id, e.g. Gallagher_2014.
  5. Prepare the file data.csv and place it within the new folder.
  6. Prepare the file metadata.yml and place it within the new folder.
  7. Run tests on the newly added dataset and correct the data.csv and metadata.yml files as necessary.
  8. Add the new study into the build framework and rebuild the trait database, by running build_setup_pipeline() and source(build.R).

You can then rebuild the database, including the new dataset.

  1. Run quality checks on the newly added dataset and correct the data.csv and metadata.yml files as necessary.
  2. Generate and proofread a report on the data. In particular, check that numeric trait values fall within a logical range relative to other studies, and that individual trait observations are not unnecessarily excluded because their trait values are unsupported.
  3. Return to step 6 if changes are made to the data.csv or metadata.yml files.
  4. Push the GitHub branch to your database repository.

The best place to get started learning how to add datasets is to work through a series of 7 tutorials. Each introduces you to specific traits.build functions designed to facilitate the addition of dataset metadata or the metadata formats for specific types of datasets.

The chapter Adding datasets, a lengthy guide then offers a comprehensive guide to generating the data.csv and metadata.yml files and error-checking your results. This document is likely overwhelming until you are familiar with the traits.build workflow and metadata format.

It may also help to download one of the two sample datasets to use as a template for your own files and a guide on required content. Or alteratively, to see a greater diversity of dataset styles, look at the austraits.build repository

You should look at the files in the config folder, particularly the definitions file for the list of traits in AusTraits and how trait definitions are formatted.