<- my_database$excluded_data %>%
table filter(dataset_id == current_study) %>%
filter(error == "Unsupported trait value") %>%
select(dataset_id, trait_name, value) %>%
distinct()
18 Common issues
Note, this chapter is a work in progress. It will be expanded over time.
18.1 Unsupported trait values
This error occurs when, for a categorical trait, the value in data.csv is different to the value in the traits dictionary (config/traits.yml).
You can individually add substitutions to metadata.yml using the function metadata_add_substitution
metadata_add_substitution(dataset_id = current_study, trait_name = "plant_growth_form", find = "T", replace = "tree")
Or, you can add an additional column to the table output (code above) and read it into metadata.yml using the function metadata_add_substitutions_table
The table read in must have the columns dataset_id, trait_name, find, and replace.
This is a hypothetical example for a table that contains 5 rows with plant_growth_form value that need updating.
<- table %>%
table rename(find = value) %>%
mutate(replace = c("tree", "mallee", "shrub", "graminoid", "herb"))
metadata_add_substitutions_table(table, dataset_id = dataset_id, trait_name = trait_name, find = find, replace = replace)
You can of course also write the table to a csv file, edit it in Excel or a text editor, then read it back into R.
write_csv(table, "data/dataset_id/raw/substitutions_required.csv")
...edit outside of R
<- read_csv("data/dataset_id/raw/substitutions_required.csv") table
18.2 Dataset can’t pivot wider
In order to convert a traits.build database into a wide format, the traits.build$traits table must be able to pivot wider. This dataset was unable to pivot, due to duplication in the following rows:
$traits %>%
my_databasefilter(dataset_id == dataset_ids) %>%
select(
::all_of(c("dataset_id", "trait_name", "value", "observation_id", "source_id", "taxon_name",
dplyr"entity_type", "life_stage", "basis_of_record", "value_type", "population_id", "individual_id",
"temporal_id", "method_id", "method_context_id", "entity_context_id", "original_name"))
)pivot_wider(names_from = trait_name, values_from = value, values_fn = length) %>%
pivot_longer(cols = 16:ncol(.)) %>%
rename(trait_name = name, number_of_duplicates = value) %>%
select(dataset_id, taxon_name, trait_name, number_of_duplicates, observation_id, entity_type, value_type, population_id, everything()) %>%
filter(number_of_duplicates > 1)
There are two likely explanations – and solutions – to this error:
- If your dataset combines individual (or population) level measurements with species-level measurements, the same species-level measurement may be read in many times. To solve this problem, you need to retain only the first instance of each species-level measurement, by including the following
custom_R_code
, wheretaxon_name
is the column that contains taxon names andcolumn 1
,column 2
, etc is a vector of the columns with categorical traits that requirede-duplicating
.
%>%
data group_by(taxon_name) %>%
mutate(across(c("column 1", "column 2", "column 3"), replace_duplicates_with_NA))
ungroup()
- Rows of data that represent measurements made at different times,
… TBC