26 Common issues

Note, this chapter is a work in progress. It will be expanded over time.

26.1 Unsupported trait values

This error occurs when, for a categorical trait, the value in data.csv is different to the value in the traits dictionary (config/traits.yml).

table <- my_database$excluded_data %>%
  filter(dataset_id == current_study) %>%
  filter(error == "Unsupported trait value") %>%
  select(dataset_id, trait_name, value) %>%
  distinct()

You can individually add substitutions to metadata.yml using the function metadata_add_substitution

metadata_add_substitution(
  dataset_id = current_study,
  trait_name = "plant_growth_form",
  find = "T",
  replace = "tree"
)

Or, you can add an additional column to the table output (code above) and read it into metadata.yml using the function metadata_add_substitutions_table

The table read in must have the columns dataset_id, trait_name, find, and replace.

This is a hypothetical example for a table that contains 5 rows with plant_growth_form value that need updating.

table <- table %>%
  rename(find = value) %>%
  mutate(replace = c("tree", "mallee", "shrub", "graminoid", "herb"))

metadata_add_substitutions_table(
  table,
  dataset_id = dataset_id,
  trait_name = trait_name,
  find = find,
  replace = replace
)

You can of course also write the table to a csv file, edit it in Excel or a text editor, then read it back into R.

write_csv(table, "data/dataset_id/raw/substitutions_required.csv")

...edit outside of R

table <- read_csv("data/dataset_id/raw/substitutions_required.csv")

26.2 Dataset can’t pivot wider

In order to convert a traits.build database into a wide format, the traits.build$traits table must be able to pivot wider.

One of the tests run during dataset_test is the function traits.build::check_pivot_wider(). This function checks that each row in the traits table has a unique combination of a particular 7 columns: dataset_id, trait_name, observation_id, value_type, repeat_measurements_id, method_id, method_context_id

There are two likely explanations – and solutions – to this error:

If your dataset combines individual (or population) level measurements with species-level measurements, the same species-level measurement may be read in many times. To solve this problem, you need to retain only the first instance of each species-level measurement, by including the following custom_R_code, where taxon_name is the column that contains taxon names and column 1, column 2, etc is a vector of the columns with categorical traits that require de-duplicating.

data %>%
  group_by(taxon_name) %>%
  mutate(across(c("column 1", "column 2", "column 3"), replace_duplicates_with_NA))
  ungroup()

Rows of data that represent measurements made at different times,

… TBC