<- my_database$excluded_data %>%
table filter(dataset_id == current_study) %>%
filter(error == "Unsupported trait value") %>%
select(dataset_id, trait_name, value) %>%
distinct()
26 Common issues
Note, this chapter is a work in progress. It will be expanded over time.
26.1 Unsupported trait values
This error occurs when, for a categorical trait, the value in data.csv is different to the value in the traits dictionary (config/traits.yml).
You can individually add substitutions to metadata.yml using the function metadata_add_substitution
metadata_add_substitution(
dataset_id = current_study,
trait_name = "plant_growth_form",
find = "T",
replace = "tree"
)
Or, you can add an additional column to the table output (code above) and read it into metadata.yml using the function metadata_add_substitutions_table
The table read in must have the columns dataset_id, trait_name, find, and replace.
This is a hypothetical example for a table that contains 5 rows with plant_growth_form value that need updating.
<- table %>%
table rename(find = value) %>%
mutate(replace = c("tree", "mallee", "shrub", "graminoid", "herb"))
metadata_add_substitutions_table(
table,dataset_id = dataset_id,
trait_name = trait_name,
find = find,
replace = replace
)
You can of course also write the table to a csv file, edit it in Excel or a text editor, then read it back into R.
write_csv(table, "data/dataset_id/raw/substitutions_required.csv")
...edit outside of R
<- read_csv("data/dataset_id/raw/substitutions_required.csv") table
26.2 Dataset can’t pivot wider
In order to convert a traits.build database into a wide format, the traits.build$traits table must be able to pivot wider.
One of the tests run during dataset_test
is the function traits.build::check_pivot_wider()
. This function checks that each row in the traits
table has a unique combination of a particular 7 columns: dataset_id
, trait_name
, observation_id
, value_type
, repeat_measurements_id
, method_id
, method_context_id
There are two likely explanations – and solutions – to this error:
- If your dataset combines individual (or population) level measurements with species-level measurements, the same species-level measurement may be read in many times. To solve this problem, you need to retain only the first instance of each species-level measurement, by including the following
custom_R_code
, wheretaxon_name
is the column that contains taxon names andcolumn 1
,column 2
, etc is a vector of the columns with categorical traits that requirede-duplicating
.
%>%
data group_by(taxon_name) %>%
mutate(across(c("column 1", "column 2", "column 3"), replace_duplicates_with_NA))
ungroup()
- Rows of data that represent measurements made at different times,
… TBC