Methods for updating taxon names in APCalign
Source:vignettes/updating-taxon-names.Rmd
updating-taxon-names.Rmd
Aligning taxon names with taxon concepts/names in APC and APNI
The following table indicates the rules for each of the 51 separate algorithms sequentially applied to attempt to align each submitted name to a taxon concept in APC or scientific names in APNI.
Note, if the table is truncated on your screen, use horizontal scroll to view the entire table.
alignment_code | search algorithm | original name variant matched to | match type | taxonomic dataset aligned to | taxon_rank of alignment | notes about sequence |
---|---|---|---|---|---|---|
match_01a | Detect scientific names, including authorship | original_name | exact | APC accepted taxon concepts | species/infraspecific | Check if strings are full scientific names, including authorship. |
match_01b | Detect scientific names, including authorship | original_name | exact | other APC taxon concepts | species/infraspecific | NA |
match_01c | Detect canonical names, lacking authorship | cleaned_name | exact | APC accepted taxon concepts | species/infraspecific | Check if strings are taxon names, lacking authorship. |
match_01d | Detect canonical names, lacking authorship | cleaned_name | exact | other APC taxon concepts | species/infraspecific | NA |
match_02a |
Detect genus sp. , genus ssp. and
genus spp.
|
first word (“genus”) | exact | APC accepted taxon concepts, other APC taxon concepts, APNI | genus | First goal is to align 2-word strings that indicate an unknown species within a genus (or family) |
match_02b |
Detect genus sp. , genus ssp. and
genus spp.
|
first word (“genus”) | fuzzy | APC accepted taxon concepts | genus | NA |
match_02c |
Detect genus sp. , genus ssp. and
genus spp.
|
first word (“genus”) | fuzzy | other APC taxon concepts | genus | NA |
match_02d |
Detect family sp. , family ssp. and
family spp.
|
first word (“genus”) | exact | APC accepted taxon concepts | family | NA |
match_03a |
Detect -- , -- (intergrade taxa) and align to
genus
|
first word (“genus”) | exact | APC accepted taxon concepts, other APC taxon concepts, APNI | genus | Next find strings that indicate a name reflects an intergrade between two taxa. These names can only be aligned to a genus. |
match_03b |
Detect -- , -- (intergrade taxa) and align to
genus
|
first word (“genus”) | fuzzy | APC accepted taxon concepts | genus | NA |
match_03c |
Detect -- , -- (intergrade taxa) and align to
genus
|
first word (“genus”) | fuzzy | other APC taxon concepts | genus | NA |
match_03d |
Detect -- , -- (intergrade taxa) and align to
genus
|
first word (“genus”) | fuzzy | APNI | genus | NA |
match_03e |
Detect -- , -- (intergrade taxa), but fail to
align to genus
|
NA | no match | NA | NA | NA |
match_04a |
Detect \ (indecision between taxa) and align to genus.
|
first word (“genus”) | exact | APC accepted taxon concepts, other APC taxon concepts, APNI | genus | Next find strings that indicate a name reflects a data collector’s indecision about which of two (or more) taxa is the appropriate taxon. These names can only be aligned to a genus. |
match_04b |
Detect \ (indecision between taxa) and align to genus.
|
first word (“genus”) | fuzzy | APC accepted taxon concepts | genus | NA |
match_04c |
Detect \ (indecision between taxa) and align to genus.
|
first word (“genus”) | fuzzy | other APC taxon concepts | genus | NA |
match_04d |
Detect \ (indecision between taxa) and align to genus.
|
first word (“genus”) | fuzzy | APNI | genus | NA |
match_04e |
Detect \ (indecision between taxa), but fail to align to
genus
|
NA | no match | NA | NA | NA |
match_05a | Detect canonical names, lacking authorship | stripped_name | fuzzy | APC accepted taxon concepts | species/infraspecific | NA |
match_05b | Detect canonical names, lacking authorship | stripped_name | fuzzy | other APC taxon concepts | species/infraspecific | NA |
match_05c | Detect canonical names, lacking authorship | cleaned_name | exact | APNI | species/infraspecific | NA |
match_06a |
Detect aff , affinis (affinity to) and align to
genus
|
first word (“genus”) | exact | APC accepted taxon concepts, other APC taxon concepts, APNI | genus | Find strings that indicate a name that indicates an affinity to a specific taxon, but the name itself is not that taxon. Such names, unless documented in APC (i.e. matches 6, 7 above) can only be aligned to genus. |
match_06b |
Detect aff , affinis (affinity to) and align to
genus
|
first word (“genus”) | fuzzy | APC accepted taxon concepts | genus | NA |
match_06c |
Detect aff , affinis (affinity to) and align to
genus
|
first word (“genus”) | fuzzy | other APC taxon concepts | genus | NA |
match_06d |
Detect aff , affinis (affinity to) and align to
genus
|
first word (“genus”) | fuzzy | APNI | genus | NA |
match_06e |
Detect aff , affinis (affinity to), but fail to
align to genus
|
NA | no match | NA | NA | NA |
match_07a | Detect canonical names, lacking authorship | stripped_name | imprecise fuzzy | APC accepted taxon concepts | species/infraspecific | Further checks if strings are taxon names, lacking authorship, now with imprecise fuzzy matching |
match_07b | Detect canonical names, lacking authorship | stripped_name | imprecise fuzzy | other APC taxon concepts | species/infraspecific | NA |
match_08a |
Detect x (hybrid taxon) and align to genus
|
first word (“genus”) | exact | APC accepted taxon concepts, other APC taxon concepts, APNI | genus | Find strings that indicate a name that is a hybrid between two taxa. Such names, unless documented in APC (i.e. matches 6, 7 above) can only be aligned to genus. |
match_08b |
Detect x (hybrid taxon) and align to genus
|
first word (“genus”) | fuzzy | APC accepted taxon concepts | genus | NA |
match_08c |
Detect x (hybrid taxon) and align to genus
|
first word (“genus”) | fuzzy | other APC taxon concepts | genus | NA |
match_08d |
Detect x (hybrid taxon) and align to genus
|
first word (“genus”) | fuzzy | APNI | genus | NA |
match_08e |
Detect x (hybrid taxon), but fail to align to genus
|
NA | no match | NA | NA | NA |
match_09a | Detect canonical names, by checking first three words in string | three words (from stripped_name_2) | exact | APC accepted taxon concepts | species/infraspecific | Check if the first three words in the name string match with a taxon name, allowing notes to be discarded. Also useful for aligning phrase names. |
match_09b | Detect canonical names, by checking first three words in string | three words (from stripped_name_2) | exact | other APC taxon concepts | species/infraspecific | NA |
match_09c | Detect canonical names, by checking first three words in string | three words (from stripped_name_2) | fuzzy | APC accepted taxon concepts | species/infraspecific | NA |
match_09d | Detect canonical names, by checking first three words in string | three words (from stripped_name_2) | fuzzy | other APC taxon concepts | species/infraspecific | NA |
match_10a | Detect canonical names, by checking first two words in string | two words (from stripped_name_2) | exact | APC accepted taxon concepts | species/infraspecific | Check if the first two words in the name string match with a taxon name, allowing notes and invalid infraspecific names to be discarded. Also useful for aligning phrase names. |
match_10b | Detect canonical names, by checking first two words in string | two words (from stripped_name_2) | exact | other APC taxon concepts | species/infraspecific | NA |
match_10c | Detect canonical names, by checking first two words in string | two words (from stripped_name_2) | fuzzy | APC accepted taxon concepts | species/infraspecific | NA |
match_10d | Detect canonical names, by checking first two words in string | two words (from stripped_name_2) | fuzzy | other APC taxon concepts | species/infraspecific | NA |
match_11a | Detect canonical names, lacking authorship | stripped_name | fuzzy | APNI | species/infraspecific | Further checks if strings are APNI taxon names, lacking authorship, now with fuzzy matching or considering just the first three or two words in the string. |
match_11b | Detect canonical names, lacking authorship | stripped_name | imprecise fuzzy | APNI | species/infraspecific | NA |
match_11c | Detect canonical names, by checking first three words in string | three words (from stripped_name_2) | exact | APNI | species/infraspecific | NA |
match_11d | Detect canonical names, by checking first two words in string | two words (from stripped_name_2) | exact | APNI | species/infraspecific | NA |
match_12a | Detect genus, by checking the first word in the string | first word (“genus”) | exact | APC accepted taxon concepts | genus | Check if the first two word in the name string match with a taxon name, allowing an alignment to the genus-level or family-level |
match_12b | Detect genus, by checking the first word in the string | first word (“genus”) | exact | other APC taxon concepts | genus | NA |
match_12c | Detect genus, by checking the first word in the string | first word (“genus”) | exact | APNI | genus | NA |
match_12d | Detect family, by checking the first word in the string | first word (“genus”) | exact | APC accepted taxon concepts | family | NA |
match_12e | Detect family, by checking the first word in the string | first word (“genus”) | exact | other APC taxon concepts | family | NA |
match_12f | Detect genus, by checking the first word in the string | first word (“genus”) | fuzzy | APC accepted taxon concepts | genus | NA |
match_12g | Detect genus, by checking the first word in the string | first word (“genus”) | fuzzy | other APC taxon concepts | genus | NA |
match_12h | Detect family, by checking the first word in the string | first word (“genus”) | fuzzy | APC accepted taxon concepts | family | NA |
match_12i | Detect family, by checking the first word in the string | first word (“genus”) | fuzzy | other APC taxon concepts | family | NA |
Updating taxonomy
The following table indicates the separate functions used to:
- update aligned names to accepted names in the APC
- add best-practice suggested names to all submitted names
- add identifiers to taxon concepts (in the APC) or scientific names (in the APC or APNI)
Different functions are used depending on the taxon rank of the aligned name and the taxonomic dataset to which the name was aligned (APC vs APNI).
function name | taxonomic dataset | taxon rank | updates to aligned name |
format of suggested_name
|
accepted name (& taxon_ID) | genus (& taxon_ID_genus) | scientific_name_ID |
---|---|---|---|---|---|---|---|
update_taxonomy_APC_genus | APC | genus | to APC accepted genus |
genus sp. [notes] *
|
no | yes | no |
update_taxonomy_APNI_genus | APNI | genus | none |
genus sp. [notes]
|
no | no | no |
update_taxonomy_APC_family | APC | family | none |
family sp. [notes]
|
no | no | no |
update_taxonomy_APC_species_and_infraspecific_taxa | APC | species & infraspecific | NA | APC accepted species** name | yes | yes | yes |
– taxonomic_splits = “most_likely_species” | NA | NA | to APC accepted taxon concept | most likely APC accepted species** name [alternative possible names] | yes | yes | yes |
– taxonomic_splits = “return_all” | NA | NA | to APC accepted taxon concept | all possible APC accepted species** name (extra rows added) | yes | yes | yes |
– taxonomic_splits = “collapse_to_higher_taxon” | NA | NA | collapsed to APC accepted genus |
genus sp. [collapsed names]
|
no | yes | no |
update_taxonomy_APNI_species_and_infraspecific_taxa | APNI | species & infraspecific | none to species name; genus to APC accepted genus if possible | APNI listed species** name* | no | sometimes | yes |
(names not aligned) | (not aligned) | (not aligned) | none | original name | no | no | no |
-* genus updated to APC accepted genus if possible; ** species or infraspecific taxon name
Outputs of APCalign
The following columns are output by the core function
create_taxonomic_update_lookup
and the two component
functions align_taxa
and update_taxonomy
.
variable | returned by | description |
---|---|---|
original_name | default | The original plant name. |
aligned_name | default | The input plant name that has been aligned to a taxon name in the APC or APNI by the align_taxa function. |
accepted_name | default | The APC-accepted plant name when available. |
suggested_name | default | The suggested plant name to use. Identical to the accepted_name when an accepted_name exists; otherwise the suggested_name is the aligned_name or the aligned name with an outdated genus updated. |
genus | default | The genus of the accepted (or suggested) name; only APC-accepted genus names are filled in. |
family | full | The family of the accepted (or suggested) name; only APC-accepted family names are filled in. |
taxon_rank | default | The taxonomic rank of the suggested (and accepted) name. |
taxonomic_dataset | default | The source of the suggested (and accepted) names (APC or APNI). |
taxonomic_status | full | The taxonomic status of the suggested (and accepted) name. |
aligned_reason | default | The explanation of a specific taxon name alignment (from an original name to an aligned name). |
update_reason | default | The explanation of a specific taxon name update (from an aligned name to an accepted or suggested name). |
subclass | full | The subclass of the accepted name. |
taxon_distribution | full | The distribution of the accepted name; only filled in if an APC accepted_name is available. |
scientific_name_authorship | default | The authorship information for the accepted (or synonymous) name; available for both APC and APNI names. |
taxon_ID | full | The unique taxon concept identifier for the accepted_name; only filled in if an APC accepted_name is available. |
taxon_ID_genus | full | An identifier for the genus; only filled in if an APC-accepted genus name is available. |
scientific_name_ID | full | An identifier for the nomenclatural (not taxonomic) details of a scientific name; available for both APC and APNI names. |
taxonomic_status_aligned | full | The taxonomic status of the aligned name before any taxonomic updates have been applied. |
row_number | full | The row number of a specific original_name in the input. |
number_of_collapsed_taxa | default | The number of possible taxon names that have been collapsed when taxonomic_splits == “collapse_to_higher_taxon”. |