Create a table with the best-possible scientific name match for Australian plant names
Source:R/create_taxonomic_update_lookup.R
create_taxonomic_update_lookup.Rd
This function takes a list of Australian plant names that need to be reconciled with current taxonomy and generates a lookup table of the best-possible scientific name match for each input name.
Usage case: This is APCalign’s core function, merging together the alignment and updating of taxonomy.
Usage
create_taxonomic_update_lookup(
taxa,
stable_or_current_data = "stable",
version = default_version(),
taxonomic_splits = "most_likely_species",
full = FALSE,
fuzzy_abs_dist = 3,
fuzzy_rel_dist = 0.2,
fuzzy_matches = TRUE,
APNI_matches = TRUE,
imprecise_fuzzy_matches = FALSE,
identifier = NA_character_,
resources = load_taxonomic_resources(quiet = quiet),
quiet = FALSE,
output = NULL
)
Arguments
- taxa
A list of Australian plant species that needs to be reconciled with current taxonomy.
- stable_or_current_data
either "stable" for a consistent version, or "current" for the leading edge version.
- version
The version number of the dataset to use.
- taxonomic_splits
How to handle one_to_many taxonomic matches. Default is "return_all". The other options are "collapse_to_higher_taxon" and "most_likely_species". most_likely_species defaults to the original_name if that name is accepted by the APC; this will be right for certain species subsets, but make errors in other cases, use with caution.
- full
logical for whether the full lookup table is returned or just key columns
- fuzzy_abs_dist
The number of characters allowed to be different for a fuzzy match.
- fuzzy_rel_dist
The proportion of characters allowed to be different for a fuzzy match.
- fuzzy_matches
Fuzzy matches are turned on as a default. The relative and absolute distances allowed for fuzzy matches to species and infraspecific taxon names are defined by the parameters
fuzzy_abs_dist
andfuzzy_rel_dist
.- APNI_matches
Name matches to the APNI (Australian Plant Names Index) are turned off as a default.
- imprecise_fuzzy_matches
Imprecise fuzzy matches uses the fuzzy matching function with lenient levels set (absolute distance of 5 characters; relative distance = 0.25). It offers a way to get a wider range of possible names, possibly corresponding to very distant spelling mistakes. This is FALSE as default and all outputs should be checked as it often makes erroneous matches.
- identifier
A dataset, location or other identifier, which defaults to NA.
- resources
These are the taxonomic resources used for cleaning, this will default to loading them from a local place on your computer. If this is to be called repeatedly, it's much faster to load the resources using
load_taxonomic_resources
separately and pass the data in.- quiet
Logical to indicate whether to display messages while loading data and aligning taxa.
- output
file path to save the output. If this file already exists, this function will check if it's a subset of the species passed in and try to add to this file. This can be useful for large and growing projects.
Value
A lookup table containing the accepted and suggested names for each original name input, and additional taxonomic information such as taxon rank, taxonomic status, taxon IDs and genera.
original_name: the original plant name.
aligned_name: the input plant name that has been aligned to a taxon name in the APC or APNI by the align_taxa function.
accepted_name: the APC-accepted plant name, when available.
suggested_name: the suggested plant name to use. Identical to the accepted_name, when an accepted_name exists; otherwise the the suggested_name is the aligned_name.
genus: the genus of the accepted (or suggested) name; only APC-accepted genus names are filled in.
family: the family of the accepted (or suggested) name; only APC-accepted family names are filled in.
taxon_rank: the taxonomic rank of the suggested (and accepted) name.
taxonomic_dataset: the source of the suggested (and accepted) names (APC or APNI).
taxonomic_status: the taxonomic status of the suggested (and accepted) name.
taxonomic_status_aligned: the taxonomic status of the aligned name, before any taxonomic updates have been applied.
aligned_reason: the explanation of a specific taxon name alignment (from an original name to an aligned name).
update_reason: the explanation of a specific taxon name update (from an aligned name to an accepted or suggested name).
subclass: the subclass of the accepted name.
taxon_distribution: the distribution of the accepted name; only filled in if an APC accepted_name is available.
scientific_name_authorship: the authorship information for the accepted (or synonymous) name; available for both APC and APNI names.
taxon_ID: the unique taxon concept identifier for the accepted_name; only filled in if an APC accepted_name is available.
taxon_ID_genus: an identifier for the genus; only filled in if an APC-accepted genus name is available.
scientific_name_ID: an identifier for the nomenclatural (not taxonomic) details of a scientific name; available for both APC and APNI names.
row_number: the row number of a specific original_name in the input.
number_of_collapsed_taxa: when taxonomic_splits == "collapse_to_higher_taxon", the number of possible taxon names that have been collapsed.
Details
It uses first the function
align_taxa
, then the functionupdate_taxonomy
to achieve the output. The aligned name is plant name that has been aligned to a taxon name in the APC or APNI by the align_taxa function.
Notes:
If you will be running the function APCalign::create_taxonomic_update_lookup many times, it is best to load the taxonomic resources separately using
resources <- load_taxonomic_resources()
, then add the argument resources = resourcesThe name Banksia cerrata does not align as the fuzzy matching algorithm does not allow the first letter of the genus and species epithet to change.
The argument taxonomic_splits allows you to choose the outcome for updating the names of taxa with ambiguous taxonomic histories; this applies to scientific names that were once attached to a more broadly circumscribed taxon concept, that was then split into several more narrowly circumscribed taxon concepts, one of which retains the original name. There are three options: most_likely_species returns the name that is retained, with alternative names documented in square brackets; return_all adds additional rows to the output, one for each possible taxon concept; collapse_to_higher_taxon returns the genus with possible names in square brackets.
The argument identifier allows you to add a fix text string to all genus- and family- level names, such as identifier = "Royal NP" would return
Acacia sp. \[Royal NP]
.
See also
Other taxonomic alignment functions:
align_taxa()
,
update_taxonomy()
Examples
# \donttest{
resources <- load_taxonomic_resources()
#>
#> Loading resources into memory...
#>
===========================
=====================================================
================================================================================
#> ...done
# example 1
create_taxonomic_update_lookup(c("Eucalyptus regnans",
"Acacia melanoxylon",
"Banksia integrifolia",
"Not a species"),
resources = resources)
#> Checking alignments of 4 taxa
#> -> of these 3 names have a perfect match to a scientific name in the APC.
#> Alignments being sought for remaining names.
#> # A tibble: 4 × 12
#> original_name aligned_name accepted_name suggested_name genus taxon_rank
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Eucalyptus regnans Eucalyptus … Eucalyptus r… Eucalyptus re… Euca… species
#> 2 Acacia melanoxylon Acacia mela… Acacia melan… Acacia melano… Acac… species
#> 3 Banksia integrifol… Banksia int… Banksia inte… Banksia integ… Bank… species
#> 4 Not a species NA NA NA NA NA
#> # ℹ 6 more variables: taxonomic_dataset <chr>, taxonomic_status <chr>,
#> # scientific_name <chr>, aligned_reason <chr>, update_reason <chr>,
#> # number_of_collapsed_taxa <dbl>
# example 2
input <- c("Banksia serrata", "Banksia serrate", "Banksia cerrata",
"Banksea serrata", "Banksia serrrrata", "Dryandra")
create_taxonomic_update_lookup(
taxa = input,
identifier = "APCalign test",
full = TRUE,
resources = resources
)
#> Checking alignments of 6 taxa
#> -> of these 1 names have a perfect match to a scientific name in the APC.
#> Alignments being sought for remaining names.
#> # A tibble: 6 × 21
#> original_name aligned_name accepted_name suggested_name genus family
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Banksia serrata Banksia serrata Banksia serr… Banksia serra… Bank… Prote…
#> 2 Banksia serrate Banksia serrata Banksia serr… Banksia serra… Bank… Prote…
#> 3 Banksia cerrata Banksia sp. [Bank… NA Banksia sp. [… Bank… Prote…
#> 4 Banksea serrata Banksia serrata Banksia serr… Banksia serra… Bank… Prote…
#> 5 Banksia serrrrata Banksia serrata Banksia serr… Banksia serra… Bank… Prote…
#> 6 Dryandra Dryandra sp. [Dry… NA Banksia sp. [… Bank… Prote…
#> # ℹ 15 more variables: taxon_rank <chr>, taxonomic_dataset <chr>,
#> # taxonomic_status <chr>, taxonomic_status_aligned <chr>,
#> # aligned_reason <chr>, update_reason <chr>, subclass <chr>,
#> # taxon_distribution <chr>, scientific_name <chr>, taxon_ID <chr>,
#> # taxon_ID_genus <chr>, scientific_name_ID <chr>, canonical_name <chr>,
#> # row_number <dbl>, number_of_collapsed_taxa <dbl>
# example 3
taxon_list <-
readr::read_csv(
system.file("extdata", "test_taxa.csv", package = "APCalign"),
show_col_types = FALSE)
create_taxonomic_update_lookup(
taxa = taxon_list$original_name,
identifier = taxon_list$notes,
full = TRUE,
resources = resources
)
#> Checking alignments of 32 taxa
#> -> of these 10 names have a perfect match to a scientific name in the APC.
#> Alignments being sought for remaining names.
#> # A tibble: 32 × 21
#> original_name aligned_name accepted_name suggested_name genus family
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Banksia serrata Banksia ser… Banksia serr… Banksia serra… Bank… Prote…
#> 2 Banksia serrate Banksia ser… Banksia serr… Banksia serra… Bank… Prote…
#> 3 Banksee serrate Banksia ser… Banksia serr… Banksia serra… Bank… Prote…
#> 4 Banksia cerrata Banksia sp.… NA Banksia sp. [… Bank… Prote…
#> 5 Banksia sp. Banksia sp.… NA Banksia sp. [… Bank… Prote…
#> 6 Dryandra sp. Dryandra sp… NA Banksia sp. [… Bank… Prote…
#> 7 Argyrodendron (Whyanb… Argyrodendr… Argyrodendro… Argyrodendron… Argy… Malva…
#> 8 Argyrodendron ssp. (W… Argyrodendr… Argyrodendro… Argyrodendron… Argy… Malva…
#> 9 Argyrodendron Whyanbe… Argyrodendr… Argyrodendro… Argyrodendron… Argy… Malva…
#> 10 Argyrodendron sp. (Wh… Argyrodendr… Argyrodendro… Argyrodendron… Argy… Malva…
#> # ℹ 22 more rows
#> # ℹ 15 more variables: taxon_rank <chr>, taxonomic_dataset <chr>,
#> # taxonomic_status <chr>, taxonomic_status_aligned <chr>,
#> # aligned_reason <chr>, update_reason <chr>, subclass <chr>,
#> # taxon_distribution <chr>, scientific_name <chr>, taxon_ID <chr>,
#> # taxon_ID_genus <chr>, scientific_name_ID <chr>, canonical_name <chr>,
#> # row_number <dbl>, number_of_collapsed_taxa <dbl>
# }