Skip to contents

This function takes a list of Australian plant names that need to be reconciled with current taxonomy and generates a lookup table of the best-possible scientific name match for each input name.

Usage case: This is APCalign’s core function, merging together the alignment and updating of taxonomy.

Usage

create_taxonomic_update_lookup(
  taxa,
  stable_or_current_data = "stable",
  version = default_version(),
  taxonomic_splits = "most_likely_species",
  full = FALSE,
  fuzzy_abs_dist = 3,
  fuzzy_rel_dist = 0.2,
  fuzzy_matches = TRUE,
  APNI_matches = TRUE,
  imprecise_fuzzy_matches = FALSE,
  identifier = NA_character_,
  resources = load_taxonomic_resources(quiet = quiet),
  quiet = FALSE,
  output = NULL
)

Arguments

taxa

A list of Australian plant species that needs to be reconciled with current taxonomy.

stable_or_current_data

either "stable" for a consistent version, or "current" for the leading edge version.

version

The version number of the dataset to use.

taxonomic_splits

How to handle one_to_many taxonomic matches. Default is "return_all". The other options are "collapse_to_higher_taxon" and "most_likely_species". most_likely_species defaults to the original_name if that name is accepted by the APC; this will be right for certain species subsets, but make errors in other cases, use with caution.

full

logical for whether the full lookup table is returned or just key columns

fuzzy_abs_dist

The number of characters allowed to be different for a fuzzy match.

fuzzy_rel_dist

The proportion of characters allowed to be different for a fuzzy match.

fuzzy_matches

Fuzzy matches are turned on as a default. The relative and absolute distances allowed for fuzzy matches to species and infraspecific taxon names are defined by the parameters fuzzy_abs_dist and fuzzy_rel_dist.

APNI_matches

Name matches to the APNI (Australian Plant Names Index) are turned off as a default.

imprecise_fuzzy_matches

Imprecise fuzzy matches uses the fuzzy matching function with lenient levels set (absolute distance of 5 characters; relative distance = 0.25). It offers a way to get a wider range of possible names, possibly corresponding to very distant spelling mistakes. This is FALSE as default and all outputs should be checked as it often makes erroneous matches.

identifier

A dataset, location or other identifier, which defaults to NA.

resources

These are the taxonomic resources used for cleaning, this will default to loading them from a local place on your computer. If this is to be called repeatedly, it's much faster to load the resources using load_taxonomic_resources separately and pass the data in.

quiet

Logical to indicate whether to display messages while loading data and aligning taxa.

output

file path to save the output. If this file already exists, this function will check if it's a subset of the species passed in and try to add to this file. This can be useful for large and growing projects.

Value

A lookup table containing the accepted and suggested names for each original name input, and additional taxonomic information such as taxon rank, taxonomic status, taxon IDs and genera.

  • original_name: the original plant name.

  • aligned_name: the input plant name that has been aligned to a taxon name in the APC or APNI by the align_taxa function.

  • accepted_name: the APC-accepted plant name, when available.

  • suggested_name: the suggested plant name to use. Identical to the accepted_name, when an accepted_name exists; otherwise the the suggested_name is the aligned_name.

  • genus: the genus of the accepted (or suggested) name; only APC-accepted genus names are filled in.

  • family: the family of the accepted (or suggested) name; only APC-accepted family names are filled in.

  • taxon_rank: the taxonomic rank of the suggested (and accepted) name.

  • taxonomic_dataset: the source of the suggested (and accepted) names (APC or APNI).

  • taxonomic_status: the taxonomic status of the suggested (and accepted) name.

  • taxonomic_status_aligned: the taxonomic status of the aligned name, before any taxonomic updates have been applied.

  • aligned_reason: the explanation of a specific taxon name alignment (from an original name to an aligned name).

  • update_reason: the explanation of a specific taxon name update (from an aligned name to an accepted or suggested name).

  • subclass: the subclass of the accepted name.

  • taxon_distribution: the distribution of the accepted name; only filled in if an APC accepted_name is available.

  • scientific_name_authorship: the authorship information for the accepted (or synonymous) name; available for both APC and APNI names.

  • taxon_ID: the unique taxon concept identifier for the accepted_name; only filled in if an APC accepted_name is available.

  • taxon_ID_genus: an identifier for the genus; only filled in if an APC-accepted genus name is available.

  • scientific_name_ID: an identifier for the nomenclatural (not taxonomic) details of a scientific name; available for both APC and APNI names.

  • row_number: the row number of a specific original_name in the input.

  • number_of_collapsed_taxa: when taxonomic_splits == "collapse_to_higher_taxon", the number of possible taxon names that have been collapsed.

Details

  • It uses first the function align_taxa, then the function update_taxonomy to achieve the output. The aligned name is plant name that has been aligned to a taxon name in the APC or APNI by the align_taxa function.

Notes:

  • If you will be running the function APCalign::create_taxonomic_update_lookup many times, it is best to load the taxonomic resources separately using resources <- load_taxonomic_resources(), then add the argument resources = resources

  • The name Banksia cerrata does not align as the fuzzy matching algorithm does not allow the first letter of the genus and species epithet to change.

  • The argument taxonomic_splits allows you to choose the outcome for updating the names of taxa with ambiguous taxonomic histories; this applies to scientific names that were once attached to a more broadly circumscribed taxon concept, that was then split into several more narrowly circumscribed taxon concepts, one of which retains the original name. There are three options: most_likely_species returns the name that is retained, with alternative names documented in square brackets; return_all adds additional rows to the output, one for each possible taxon concept; collapse_to_higher_taxon returns the genus with possible names in square brackets.

  • The argument identifier allows you to add a fix text string to all genus- and family- level names, such as identifier = "Royal NP" would return Acacia sp. \[Royal NP].

See also

load_taxonomic_resources

Other taxonomic alignment functions: align_taxa(), update_taxonomy()

Examples

# \donttest{
resources <- load_taxonomic_resources()
#> 

#> Loading resources into memory...
#> 
===========================
=====================================================
================================================================================
#> ...done

# example 1
create_taxonomic_update_lookup(c("Eucalyptus regnans",
                                 "Acacia melanoxylon",
                                 "Banksia integrifolia",
                                 "Not a species"),
                                 resources = resources)
#> Checking alignments of 4 taxa
#>   -> of these 3 names have a perfect match to a scientific name in the APC. 
#>       Alignments being sought for remaining names.
#> # A tibble: 4 × 12
#>   original_name       aligned_name accepted_name suggested_name genus taxon_rank
#>   <chr>               <chr>        <chr>         <chr>          <chr> <chr>     
#> 1 Eucalyptus regnans  Eucalyptus … Eucalyptus r… Eucalyptus re… Euca… species   
#> 2 Acacia melanoxylon  Acacia mela… Acacia melan… Acacia melano… Acac… species   
#> 3 Banksia integrifol… Banksia int… Banksia inte… Banksia integ… Bank… species   
#> 4 Not a species       NA           NA            NA             NA    NA        
#> # ℹ 6 more variables: taxonomic_dataset <chr>, taxonomic_status <chr>,
#> #   scientific_name <chr>, aligned_reason <chr>, update_reason <chr>,
#> #   number_of_collapsed_taxa <dbl>
                                 
# example 2
input <- c("Banksia serrata", "Banksia serrate", "Banksia cerrata", 
"Banksea serrata", "Banksia serrrrata", "Dryandra")

create_taxonomic_update_lookup(
    taxa = input,
    identifier = "APCalign test",
    full = TRUE,
    resources = resources
  )
#> Checking alignments of 6 taxa
#>   -> of these 1 names have a perfect match to a scientific name in the APC. 
#>       Alignments being sought for remaining names.
#> # A tibble: 6 × 21
#>   original_name     aligned_name       accepted_name suggested_name genus family
#>   <chr>             <chr>              <chr>         <chr>          <chr> <chr> 
#> 1 Banksia serrata   Banksia serrata    Banksia serr… Banksia serra… Bank… Prote…
#> 2 Banksia serrate   Banksia serrata    Banksia serr… Banksia serra… Bank… Prote…
#> 3 Banksia cerrata   Banksia sp. [Bank… NA            Banksia sp. [… Bank… Prote…
#> 4 Banksea serrata   Banksia serrata    Banksia serr… Banksia serra… Bank… Prote…
#> 5 Banksia serrrrata Banksia serrata    Banksia serr… Banksia serra… Bank… Prote…
#> 6 Dryandra          Dryandra sp. [Dry… NA            Banksia sp. [… Bank… Prote…
#> # ℹ 15 more variables: taxon_rank <chr>, taxonomic_dataset <chr>,
#> #   taxonomic_status <chr>, taxonomic_status_aligned <chr>,
#> #   aligned_reason <chr>, update_reason <chr>, subclass <chr>,
#> #   taxon_distribution <chr>, scientific_name <chr>, taxon_ID <chr>,
#> #   taxon_ID_genus <chr>, scientific_name_ID <chr>, canonical_name <chr>,
#> #   row_number <dbl>, number_of_collapsed_taxa <dbl>

# example 3
taxon_list <-
  readr::read_csv(
  system.file("extdata", "test_taxa.csv", package = "APCalign"),
  show_col_types = FALSE)

create_taxonomic_update_lookup(
    taxa = taxon_list$original_name,
    identifier = taxon_list$notes,
    full = TRUE,
    resources = resources
  )
#> Checking alignments of 32 taxa
#>   -> of these 10 names have a perfect match to a scientific name in the APC. 
#>       Alignments being sought for remaining names.
#> # A tibble: 32 × 21
#>    original_name          aligned_name accepted_name suggested_name genus family
#>    <chr>                  <chr>        <chr>         <chr>          <chr> <chr> 
#>  1 Banksia serrata        Banksia ser… Banksia serr… Banksia serra… Bank… Prote…
#>  2 Banksia serrate        Banksia ser… Banksia serr… Banksia serra… Bank… Prote…
#>  3 Banksee serrate        Banksia ser… Banksia serr… Banksia serra… Bank… Prote…
#>  4 Banksia cerrata        Banksia sp.… NA            Banksia sp. [… Bank… Prote…
#>  5 Banksia sp.            Banksia sp.… NA            Banksia sp. [… Bank… Prote…
#>  6 Dryandra sp.           Dryandra sp… NA            Banksia sp. [… Bank… Prote…
#>  7 Argyrodendron (Whyanb… Argyrodendr… Argyrodendro… Argyrodendron… Argy… Malva…
#>  8 Argyrodendron ssp. (W… Argyrodendr… Argyrodendro… Argyrodendron… Argy… Malva…
#>  9 Argyrodendron Whyanbe… Argyrodendr… Argyrodendro… Argyrodendron… Argy… Malva…
#> 10 Argyrodendron sp. (Wh… Argyrodendr… Argyrodendro… Argyrodendron… Argy… Malva…
#> # ℹ 22 more rows
#> # ℹ 15 more variables: taxon_rank <chr>, taxonomic_dataset <chr>,
#> #   taxonomic_status <chr>, taxonomic_status_aligned <chr>,
#> #   aligned_reason <chr>, update_reason <chr>, subclass <chr>,
#> #   taxon_distribution <chr>, scientific_name <chr>, taxon_ID <chr>,
#> #   taxon_ID_genus <chr>, scientific_name_ID <chr>, canonical_name <chr>,
#> #   row_number <dbl>, number_of_collapsed_taxa <dbl>
# }