austraits

austraits allow users to access, explore and wrangle data from traits.build relational databases. It is also an R interface to AusTraits, the Australian plant trait database. This package contains functions for joining data from various tables, filtering to specific records, combining multiple databases and visualising the distribution of the data. Below, we’ve include a tutorial using the AusTraits database to illustrate how some these functions work together to generate useful outputs.

Install and load `austraits`

austraits is still under development. To install the current version from GitHub:

#install.packages("remotes")
remotes::install_github("traitecoevo/austraits", dependencies = TRUE, upgrade = "ask")

# Load the austraits package
library(austraits)

Retrieve AusTraits database

We will use the latest AusTraits database as an example database.

We can download the AusTraits database by calling load_austraits(). This function will download AusTraits to a specified path. By default it is data/austraits. The function will reload the database from this location in the future. You can set update = TRUE so the database is downloaded fresh from Zenodo. Note that load_austraits() will happily accept a DOI of a particular version.

austraits <- load_austraits(version = "6.0.0", path = "data/austraits")

You can check out different versions of AusTraits and their associated DOI by using:

get_versions(path = "data/austraits")

#> # A tibble: 6 × 4
#>   publication_date doi                     version id      
#>   <date>           <chr>                   <chr>   <chr>   
#> 1 2024-05-14       10.5281/zenodo.11188867 6.0.0   11188867
#> 2 2023-11-19       10.5281/zenodo.10156222 5.0.0   10156222
#> 3 2023-09-18       10.5281/zenodo.8353840  4.2.0   8353840 
#> 4 2023-01-30       10.5281/zenodo.7583087  4.1.0   7583087 
#> 5 2022-11-27       10.5281/zenodo.7368074  4.0.0   7368074 
#> 6 2021-07-14       10.5281/zenodo.5112001  3.0.2   5112001

AusTraits, like all traits.build databases, is a relational database. In R, it is a very large list with multiple tables. If you are not familiar with working with lists in R, we recommend having a quick look at this tutorial. To learn more about the structure of austraits, check out the structure of the database.

austraits

#> ── This is 6.0.0 of AusTraits: a curated plant trait database for the Australian flora! ──────────────────────────────

#> ℹ This database is built using traits.build version 1.1.0.9000
#> ℹ This database contains a total of 1726024 records, for 33494 taxa and 497 traits.

#> ── This object is a 'list' with the following components: ──
#> 
#> • traits: A table containing measurements of traits.
#> • locations: A table containing observations of location/site characteristics associated with information in
#> `traits`. Cross referencing between the two dataframes is possible using combinations of the variables `dataset_id`,
#> `location_name`.
#> • contexts: A table containing observations of contextual characteristics associated with information in `traits`.
#> Cross referencing between the two dataframes is possible using combinations of the variables `dataset_id`, `link_id`,
#> and `link_vals`.
#> • methods: A table containing details on methods with which data were collected, including time frame and source.
#> Cross referencing with the `traits` table is possible using combinations of the variables `dataset_id`, `trait_name`.
#> • excluded_data: A table of data that did not pass quality test and so were excluded from the master dataset.
#> • taxonomic_updates: A table of all taxonomic changes implemented in the construction of AusTraits. Changes are
#> determined by comparing against the APC (Australian Plant Census) and APNI (Australian Plant Names Index).
#> • taxa: A table containing details on taxa associated with information in `traits`. This information has been sourced
#> from the APC (Australian Plant Census) and APNI (Australian Plant Names Index) and is released under a CC-BY3
#> license.
#> • contributors: A table of people contributing to each study.
#> • sources: Bibtex entries for all primary and secondary sources in the compilation.
#> • definitions: A copy of the definitions for all tables and terms. Information included here was used to process data
#> and generate any documentation for the study.
#> • schema: A copy of the schema for all tables and terms. Information included here was used to process data and
#> generate any documentation for the study.
#> • metadata: Metadata associated with the dataset, including title, creators, license, subject, funding sources.
#> • build_info: A description of the computing environment used to create this version of the dataset, including
#> version number, git commit and R session_info.

#> ℹ To access a component, try using the $ e.g. austraits$traits

Descriptive summaries of traits and taxa

AusTraits contains 497 plant traits. Check out definitions of the traits to learn more about how each trait is defined.

Have a look at data coverage by trait or taxa with:

summarise_database(austraits, "trait_name")

#> # A tibble: 497 × 5
#>    trait_name                    n_records n_dataset n_taxa percent_total
#>    <chr>                             <int>     <int>  <int>         <dbl>
#>  1 accessory_cost_fraction              47         1     47     0.0000272
#>  2 accessory_cost_mass                  47         1     47     0.0000272
#>  3 atmospheric_CO2_concentration       840         4    121     0.000487 
#>  4 bark_Al_per_dry_mass                 70         1     10     0.0000406
#>  5 bark_B_per_dry_mass                  70         1     10     0.0000406
#>  6 bark_C_per_dry_mass                 229         2     27     0.000133 
#>  7 bark_Ca_per_dry_mass                104         3     21     0.0000603
#>  8 bark_Cu_per_dry_mass                 70         1     10     0.0000406
#>  9 bark_Fe_per_dry_mass                 70         1     10     0.0000406
#> 10 bark_K_per_dry_mass                 104         3     21     0.0000603
#> # ℹ 487 more rows

summarise_database(austraits, "family")

#> # A tibble: 310 × 5
#>    family           n_records n_dataset n_taxa percent_total
#>    <chr>                <int>     <int>  <int>         <dbl>
#>  1 Acanthaceae           3719        57    149     0.00216  
#>  2 Achariaceae            162        14      3     0.0000939
#>  3 Actinidiaceae          186        16      3     0.000108 
#>  4 Agapanthaceae          107        13      3     0.000062 
#>  5 Aizoaceae             5004        63    102     0.0029   
#>  6 Akaniaceae             123        16      1     0.0000713
#>  7 Alismataceae           892        30     20     0.000517 
#>  8 Alliaceae              561        19     18     0.000325 
#>  9 Alseuosmiaceae         318        13      3     0.000184 
#> 10 Alstroemeriaceae       175        15      2     0.000101 
#> # ℹ 300 more rows

summarise_database(austraits, "genus")

#> # A tibble: 3,177 × 5
#>    genus        n_records n_dataset n_taxa percent_total
#>    <chr>            <int>     <int>  <int>         <dbl>
#>  1 (Dockrillia          3         2      1    0.00000174
#>  2 Abelia              16         4      1    0.00000928
#>  3 Abelmoschus        271        19      8    0.000157  
#>  4 Abildgaardia        74         7      2    0.0000429 
#>  5 Abrodictyum        123        14      3    0.0000713 
#>  6 Abroma              39         7      2    0.0000226 
#>  7 Abrophyllum        181        19      3    0.000105  
#>  8 Abrotanella        183        18      4    0.000106  
#>  9 Abrus              202        26      3    0.000117  
#> 10 Abutilon          1975        52     54    0.00115   
#> # ℹ 3,167 more rows

Quickly look up data

Interested in a specific trait or context property, but unsure what terms we use? Try our lookup_ functions.

lookup_trait(austraits, "leaf") %>% head()

#> [1] "leaf_compoundness" "leaf_phenology"    "leaf_length"       "leaf_width"        "leaf_margin"      
#> [6] "leaf_shape"

lookup_context_property(austraits, "fire") %>% head()

#> [1] "fire intensity"     "fire history"       "fire response type" "fire severity"      "fire season"

lookup_location_property(austraits, "temperature") %>% head()

#> [1] "temperature, max (C)"             "temperature, MAT (C)"             "temperature, mean summer max (C)"
#> [4] "temperature, mean winter max (C)" "temperature, max MAT (C)"         "temperature, min MAT (C)"

Extracting data

In most cases, users would like to extract a subset of a database for their research purposes.

extract_dataset() filters for a particular study
extract_trait() filters for a certain trait
extract_taxa() filters for a specific taxon

Note you can supply a vector to each of these functions to filter for more than one study/trait/taxa. All our extract_ function supports partial matching e.g. extract_trait("leaf") would return all traits containing leaf.

If you would like to extract from other tables or columns, use extract_data

All extract_ functions simultaneously filter across all tables in the database.

Extracting by dataset

Filtering one particular dataset and assigning it to an object

one_study <- extract_dataset(austraits, "Falster_2005_2")

one_study$traits

#> # A tibble: 165 × 26
#>    dataset_id     taxon_name    observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>          <chr>         <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 Falster_2005_2 Acacia longi… 01             huber_val… 0.00… mm2{… population  mean       measurement    unknown   
#>  2 Falster_2005_2 Acacia longi… 01             huber_val… 0.00… mm2{… population  mean       measurement    unknown   
#>  3 Falster_2005_2 Acacia longi… 01             huber_val… 0.00… mm2{… population  mean       measurement    unknown   
#>  4 Falster_2005_2 Acacia longi… 01             huber_val… 0.00… mm2{… population  mean       measurement    unknown   
#>  5 Falster_2005_2 Acacia longi… 01             leaf_N_pe… 23.2  mg/g  population  mean       measurement    4         
#>  6 Falster_2005_2 Acacia longi… 01             leaf_area  1761  mm2   population  mean       measurement    4         
#>  7 Falster_2005_2 Acacia longi… 01             leaf_mass… 128   g/m2  population  mean       measurement    4         
#>  8 Falster_2005_2 Acacia longi… 01             plant_hei… 4     m     population  maximum    measurement    unknown   
#>  9 Falster_2005_2 Acacia longi… 01             resprouti… fire… <NA>  population  mode       expert_score   <NA>      
#> 10 Falster_2005_2 Acacia longi… 01             seed_dry_… 14    mg    population  mean       measurement    unknown   
#> # ℹ 155 more rows
#> # ℹ 16 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>,
#> #   entity_context_id <chr>, plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>,
#> #   measurement_remarks <chr>, method_id <chr>, method_context_id <chr>, original_name <chr>

Filtering multiple datasets and assigning it to an object

multi_studies <- extract_dataset(austraits, 
                                        dataset_id = c("Thompson_2001","Ilic_2000"))
 
multi_studies$traits

#> # A tibble: 2,209 × 26
#>    dataset_id taxon_name       observation_id trait_name  value unit  entity_type value_type basis_of_value replicates
#>    <chr>      <chr>            <chr>          <chr>       <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 Ilic_2000  Acacia acradenia 0001           wood_densi… 0.904 mg/m… individual  raw        measurement    unknown   
#>  2 Ilic_2000  Acacia acuminata 0002           wood_densi… 0.895 mg/m… individual  raw        measurement    unknown   
#>  3 Ilic_2000  Acacia acuminata 0003           wood_densi… 1.008 mg/m… individual  raw        measurement    unknown   
#>  4 Ilic_2000  Acacia adsurgens 0004           wood_densi… 0.887 mg/m… individual  raw        measurement    unknown   
#>  5 Ilic_2000  Acacia alleniana 0005           wood_densi… 0.56  mg/m… individual  raw        measurement    unknown   
#>  6 Ilic_2000  Acacia ampliceps 0006           wood_densi… 0.568 mg/m… individual  raw        measurement    unknown   
#>  7 Ilic_2000  Acacia aneura    0007           wood_densi… 1.035 mg/m… individual  raw        measurement    unknown   
#>  8 Ilic_2000  Acacia aneura    0008           wood_densi… 1.019 mg/m… individual  raw        measurement    unknown   
#>  9 Ilic_2000  Acacia aneura    0009           wood_densi… 0.861 mg/m… individual  raw        measurement    unknown   
#> 10 Ilic_2000  Acacia aneura    0010           wood_densi… 0.996 mg/m… individual  raw        measurement    unknown   
#> # ℹ 2,199 more rows
#> # ℹ 16 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>,
#> #   entity_context_id <chr>, plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>,
#> #   measurement_remarks <chr>, method_id <chr>, method_context_id <chr>, original_name <chr>

Filtering multiple datasets by same lead author (e.g. Falster) and assigning it to an object.

falster_studies <- extract_dataset(austraits, "Falster")

falster_studies$traits

#> # A tibble: 685 × 26
#>    dataset_id   taxon_name      observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>        <chr>           <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 Falster_2003 Acacia floribu… 01             leaf_area  142   mm2   population  mean       measurement    3         
#>  2 Falster_2003 Acacia floribu… 01             leaf_incl… 57    deg   population  mean       measurement    3         
#>  3 Falster_2003 Acacia floribu… 02             leaf_comp… simp… <NA>  species     mode       expert_score   <NA>      
#>  4 Falster_2003 Acacia myrtifo… 03             leaf_area  319   mm2   population  mean       measurement    3         
#>  5 Falster_2003 Acacia myrtifo… 03             leaf_incl… 66.1  deg   population  mean       measurement    3         
#>  6 Falster_2003 Acacia myrtifo… 04             leaf_comp… simp… <NA>  species     mode       expert_score   <NA>      
#>  7 Falster_2003 Acacia suaveol… 05             leaf_area  562   mm2   population  mean       measurement    3         
#>  8 Falster_2003 Acacia suaveol… 05             leaf_incl… 71.7  deg   population  mean       measurement    3         
#>  9 Falster_2003 Acacia suaveol… 06             leaf_comp… simp… <NA>  species     mode       expert_score   <NA>      
#> 10 Falster_2003 Angophora hisp… 07             leaf_area  1590  mm2   population  mean       measurement    3         
#> # ℹ 675 more rows
#> # ℹ 16 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>,
#> #   entity_context_id <chr>, plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>,
#> #   measurement_remarks <chr>, method_id <chr>, method_context_id <chr>, original_name <chr>

Extracting by taxonomy

# By family 
proteaceae <- extract_taxa(austraits, family = "Proteaceae")
# Checking that only taxa in Proteaceae have been extracted
proteaceae$taxa$family %>% unique()

#> [1] "Proteaceae"

# By genus 
acacia <- extract_taxa(austraits, genus = "Acacia")
# Checking that only taxa in Acacia have been extracted
acacia$traits$taxon_name %>% unique() %>% head()

#> [1] "Acacia abbatiana"                        "Acacia abbreviata"                      
#> [3] "Acacia abrupta"                          "Acacia acanthaster"                     
#> [5] "Acacia acanthoclada subsp. acanthoclada" "Acacia acanthoclada subsp. glaucescens"

acacia$taxa$genus %>% unique()

#> [1] "Acacia"

Extracting by trait

data_fruit <- extract_trait(austraits, "fruit")

data_fruit$traits

#> # A tibble: 216,465 × 26
#>    dataset_id taxon_name        observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>      <chr>             <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 ABRS_1981  Ceratophyllum de… 0566           fruit_len… 4     mm    species     minimum    measurement    <NA>      
#>  2 ABRS_1981  Ceratophyllum de… 0566           fruit_len… 6     mm    species     maximum    measurement    <NA>      
#>  3 ABRS_1981  Ceratophyllum de… 0566           fruit_wid… 3     mm    species     minimum    measurement    <NA>      
#>  4 ABRS_1981  Ceratophyllum de… 0566           fruit_wid… 3.5   mm    species     maximum    measurement    <NA>      
#>  5 ABRS_1981  Conospermum peti… 0680           fruit_len… 2.5   mm    species     minimum    measurement    <NA>      
#>  6 ABRS_1981  Conospermum peti… 0680           fruit_wid… 3     mm    species     minimum    measurement    <NA>      
#>  7 ABRS_1981  Proiphys amboine… 3182           fruit_len… 15    mm    species     minimum    measurement    <NA>      
#>  8 ABRS_1981  Proiphys amboine… 3182           fruit_len… 30    mm    species     maximum    measurement    <NA>      
#>  9 ABRS_1981  Proiphys amboine… 3182           fruit_wid… 15    mm    species     minimum    measurement    <NA>      
#> 10 ABRS_1981  Proiphys amboine… 3182           fruit_wid… 30    mm    species     maximum    measurement    <NA>      
#> # ℹ 216,455 more rows
#> # ℹ 16 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>,
#> #   entity_context_id <chr>, plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>,
#> #   measurement_remarks <chr>, method_id <chr>, method_context_id <chr>, original_name <chr>

Combining lookup_trait with extract_trait to obtain all traits with ‘leaf’ in the trait name and assigning it to an object. Note we use the . notation to pass on the lookup_trait results to extract_trait

leaf <- lookup_trait(austraits, "leaf") %>% extract_trait(austraits, .) 

leaf$traits

#> # A tibble: 511,952 × 26
#>    dataset_id taxon_name        observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>      <chr>             <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 ABRS_1981  Acanthocarpus ca… 0001           leaf_comp… simp… <NA>  species     mode       expert_score   <NA>      
#>  2 ABRS_1981  Acanthocarpus hu… 0002           leaf_comp… simp… <NA>  species     mode       expert_score   <NA>      
#>  3 ABRS_1981  Acanthocarpus pa… 0003           leaf_comp… simp… <NA>  species     mode       expert_score   <NA>      
#>  4 ABRS_1981  Acanthocarpus pr… 0004           leaf_comp… simp… <NA>  species     mode       expert_score   <NA>      
#>  5 ABRS_1981  Acanthocarpus ro… 0005           leaf_comp… simp… <NA>  species     mode       expert_score   <NA>      
#>  6 ABRS_1981  Acanthocarpus ru… 0006           leaf_comp… simp… <NA>  species     mode       expert_score   <NA>      
#>  7 ABRS_1981  Acanthocarpus ve… 0007           leaf_comp… simp… <NA>  species     mode       expert_score   <NA>      
#>  8 ABRS_1981  Acer pseudoplata… 0008           leaf_phen… deci… <NA>  species     mode       expert_score   <NA>      
#>  9 ABRS_1981  Acidonia microca… 0009           leaf_comp… comp… <NA>  species     mode       expert_score   <NA>      
#> 10 ABRS_1981  Callitris acumin… 0010           leaf_comp… simp… <NA>  species     mode       expert_score   <NA>      
#> # ℹ 511,942 more rows
#> # ℹ 16 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>,
#> #   entity_context_id <chr>, plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>,
#> #   measurement_remarks <chr>, method_id <chr>, method_context_id <chr>, original_name <chr>

Extracting from other tables

You may want to extract data from tables that have specific column values. For example calling the code below will return data where “fire” is mentioned in the context_property column

data_fire <- extract_data(austraits, 
                          table =  "contexts",
                          col =  "context_property", 
                          col_value = "fire")

data_fire

Extracting from a single table

If you have already manipulated the original database and are working with just the traits table, the extract functions will also work on a single table.

seedling_data <- extract_data(austraits$traits,
                          col =  "life_stage", 
                          col_value = "seedling")

Falster_data <- extract_data(austraits$traits,
                          col =  "dataset_id", 
                          col_value = "Falster")

leaf_data <- extract_trait(austraits$traits, 
                          c("leaf_area", "leaf_N_per_dry_mass"))

Join data from other tables

Once users have extracted the data they want, they may want to merge other study details into the main traits dataframe for their analyses. For example, users may require taxonomic information for a phylogenetic analysis. This is where the join_ functions come in.

There are five join_ functions in total, each designed to append specific information from other tables and elements in the austraits object. Their suffixes refer to the type of information that is joined, e.g. join_taxa appends taxonomic information to the traits dataframe.

We recommend pulling up the help file for each one for more details e.g ?join_location_coordinates()

Each of the functions has specific default parameters and formatting, but offers versatile joining options.

# Join taxonomic information 
(data_fire %>% join_taxa)$traits

#> # A tibble: 1,822 × 30
#>    dataset_id    taxon_name     observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>         <chr>          <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 Campbell_2006 Acacia falcif… 001            bud_bank_… basa… <NA>  population  mode       expert_score   <NA>      
#>  2 Campbell_2006 Acacia falcif… 001            resprouti… resp… <NA>  population  mode       expert_score   <NA>      
#>  3 Campbell_2006 Acacia falcif… 001            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#>  4 Campbell_2006 Acacia falcif… 002            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#>  5 Campbell_2006 Acacia falcif… 003            dispersers ants  <NA>  species     mode       expert_score   <NA>      
#>  6 Campbell_2006 Acacia falcif… 003            plant_gro… tree  <NA>  species     mode       expert_score   <NA>      
#>  7 Campbell_2006 Acacia irrora… 004            bud_bank_… none  <NA>  population  mode       expert_score   <NA>      
#>  8 Campbell_2006 Acacia irrora… 004            resprouti… fire… <NA>  population  mode       expert_score   <NA>      
#>  9 Campbell_2006 Acacia irrora… 004            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#> 10 Campbell_2006 Acacia irrora… 005            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#> # ℹ 1,812 more rows
#> # ℹ 20 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>,
#> #   entity_context_id <chr>, plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>,
#> #   measurement_remarks <chr>, method_id <chr>, method_context_id <chr>, original_name <chr>, family <chr>,
#> #   genus <chr>, taxon_rank <chr>, establishment_means <chr>

# Join methodological information 
(data_fire %>% join_methods)$traits

#> # A tibble: 1,822 × 27
#>    dataset_id    taxon_name     observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>         <chr>          <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 Campbell_2006 Acacia falcif… 001            bud_bank_… basa… <NA>  population  mode       expert_score   <NA>      
#>  2 Campbell_2006 Acacia falcif… 001            resprouti… resp… <NA>  population  mode       expert_score   <NA>      
#>  3 Campbell_2006 Acacia falcif… 001            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#>  4 Campbell_2006 Acacia falcif… 002            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#>  5 Campbell_2006 Acacia falcif… 003            dispersers ants  <NA>  species     mode       expert_score   <NA>      
#>  6 Campbell_2006 Acacia falcif… 003            plant_gro… tree  <NA>  species     mode       expert_score   <NA>      
#>  7 Campbell_2006 Acacia irrora… 004            bud_bank_… none  <NA>  population  mode       expert_score   <NA>      
#>  8 Campbell_2006 Acacia irrora… 004            resprouti… fire… <NA>  population  mode       expert_score   <NA>      
#>  9 Campbell_2006 Acacia irrora… 004            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#> 10 Campbell_2006 Acacia irrora… 005            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#> # ℹ 1,812 more rows
#> # ℹ 17 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>,
#> #   entity_context_id <chr>, plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>,
#> #   measurement_remarks <chr>, method_id <chr>, method_context_id <chr>, original_name <chr>, methods <chr>

# Join location coordinates 
(data_fire %>% join_location_coordinates)$traits

#> # A tibble: 1,822 × 29
#>    dataset_id    taxon_name     observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>         <chr>          <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 Campbell_2006 Acacia falcif… 001            bud_bank_… basa… <NA>  population  mode       expert_score   <NA>      
#>  2 Campbell_2006 Acacia falcif… 001            resprouti… resp… <NA>  population  mode       expert_score   <NA>      
#>  3 Campbell_2006 Acacia falcif… 001            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#>  4 Campbell_2006 Acacia falcif… 002            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#>  5 Campbell_2006 Acacia falcif… 003            dispersers ants  <NA>  species     mode       expert_score   <NA>      
#>  6 Campbell_2006 Acacia falcif… 003            plant_gro… tree  <NA>  species     mode       expert_score   <NA>      
#>  7 Campbell_2006 Acacia irrora… 004            bud_bank_… none  <NA>  population  mode       expert_score   <NA>      
#>  8 Campbell_2006 Acacia irrora… 004            resprouti… fire… <NA>  population  mode       expert_score   <NA>      
#>  9 Campbell_2006 Acacia irrora… 004            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#> 10 Campbell_2006 Acacia irrora… 005            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#> # ℹ 1,812 more rows
#> # ℹ 19 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>,
#> #   entity_context_id <chr>, plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>,
#> #   measurement_remarks <chr>, method_id <chr>, method_context_id <chr>, original_name <chr>, location_name <chr>,
#> #   `latitude (deg)` <chr>, `longitude (deg)` <chr>

# Join information pertaining to location properties 
(data_fire %>% join_location_properties)$traits

#> # A tibble: 1,822 × 28
#>    dataset_id    taxon_name     observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>         <chr>          <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 Campbell_2006 Acacia falcif… 001            bud_bank_… basa… <NA>  population  mode       expert_score   <NA>      
#>  2 Campbell_2006 Acacia falcif… 001            resprouti… resp… <NA>  population  mode       expert_score   <NA>      
#>  3 Campbell_2006 Acacia falcif… 001            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#>  4 Campbell_2006 Acacia falcif… 002            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#>  5 Campbell_2006 Acacia falcif… 003            dispersers ants  <NA>  species     mode       expert_score   <NA>      
#>  6 Campbell_2006 Acacia falcif… 003            plant_gro… tree  <NA>  species     mode       expert_score   <NA>      
#>  7 Campbell_2006 Acacia irrora… 004            bud_bank_… none  <NA>  population  mode       expert_score   <NA>      
#>  8 Campbell_2006 Acacia irrora… 004            resprouti… fire… <NA>  population  mode       expert_score   <NA>      
#>  9 Campbell_2006 Acacia irrora… 004            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#> 10 Campbell_2006 Acacia irrora… 005            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#> # ℹ 1,812 more rows
#> # ℹ 18 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>,
#> #   entity_context_id <chr>, plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>,
#> #   measurement_remarks <chr>, method_id <chr>, method_context_id <chr>, original_name <chr>, location_name <chr>,
#> #   location_properties <chr>

# Join information pertaining to location properties 
(data_fire %>% join_location_properties(format = "many_columns", vars = "temperature, min MAT (C)"))$traits

#> # A tibble: 1,822 × 28
#>    dataset_id    taxon_name     observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>         <chr>          <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 Campbell_2006 Acacia falcif… 001            bud_bank_… basa… <NA>  population  mode       expert_score   <NA>      
#>  2 Campbell_2006 Acacia falcif… 001            resprouti… resp… <NA>  population  mode       expert_score   <NA>      
#>  3 Campbell_2006 Acacia falcif… 001            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#>  4 Campbell_2006 Acacia falcif… 002            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#>  5 Campbell_2006 Acacia falcif… 003            dispersers ants  <NA>  species     mode       expert_score   <NA>      
#>  6 Campbell_2006 Acacia falcif… 003            plant_gro… tree  <NA>  species     mode       expert_score   <NA>      
#>  7 Campbell_2006 Acacia irrora… 004            bud_bank_… none  <NA>  population  mode       expert_score   <NA>      
#>  8 Campbell_2006 Acacia irrora… 004            resprouti… fire… <NA>  population  mode       expert_score   <NA>      
#>  9 Campbell_2006 Acacia irrora… 004            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#> 10 Campbell_2006 Acacia irrora… 005            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#> # ℹ 1,812 more rows
#> # ℹ 18 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>,
#> #   entity_context_id <chr>, plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>,
#> #   measurement_remarks <chr>, method_id <chr>, method_context_id <chr>, original_name <chr>, location_name <chr>,
#> #   `location_property: temperature, min MAT (C)` <chr>

# Join context information 
(data_fire %>% join_context_properties)$traits

#> # A tibble: 1,822 × 31
#>    dataset_id    taxon_name     observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>         <chr>          <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 Campbell_2006 Acacia falcif… 001            bud_bank_… basa… <NA>  population  mode       expert_score   <NA>      
#>  2 Campbell_2006 Acacia falcif… 001            resprouti… resp… <NA>  population  mode       expert_score   <NA>      
#>  3 Campbell_2006 Acacia falcif… 001            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#>  4 Campbell_2006 Acacia falcif… 002            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#>  5 Campbell_2006 Acacia falcif… 003            dispersers ants  <NA>  species     mode       expert_score   <NA>      
#>  6 Campbell_2006 Acacia falcif… 003            plant_gro… tree  <NA>  species     mode       expert_score   <NA>      
#>  7 Campbell_2006 Acacia irrora… 004            bud_bank_… none  <NA>  population  mode       expert_score   <NA>      
#>  8 Campbell_2006 Acacia irrora… 004            resprouti… fire… <NA>  population  mode       expert_score   <NA>      
#>  9 Campbell_2006 Acacia irrora… 004            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#> 10 Campbell_2006 Acacia irrora… 005            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#> # ℹ 1,812 more rows
#> # ℹ 21 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>,
#> #   entity_context_id <chr>, plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>,
#> #   measurement_remarks <chr>, method_id <chr>, method_context_id <chr>, original_name <chr>,
#> #   treatment_context_properties <chr>, plot_context_properties <chr>, entity_context_properties <chr>,
#> #   temporal_context_properties <chr>, method_context_properties <chr>

# Join information from multiple tables 
(data_fire %>% join_context_properties %>% join_location_coordinates)$traits

#> # A tibble: 1,822 × 34
#>    dataset_id    taxon_name     observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>         <chr>          <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 Campbell_2006 Acacia falcif… 001            bud_bank_… basa… <NA>  population  mode       expert_score   <NA>      
#>  2 Campbell_2006 Acacia falcif… 001            resprouti… resp… <NA>  population  mode       expert_score   <NA>      
#>  3 Campbell_2006 Acacia falcif… 001            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#>  4 Campbell_2006 Acacia falcif… 002            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#>  5 Campbell_2006 Acacia falcif… 003            dispersers ants  <NA>  species     mode       expert_score   <NA>      
#>  6 Campbell_2006 Acacia falcif… 003            plant_gro… tree  <NA>  species     mode       expert_score   <NA>      
#>  7 Campbell_2006 Acacia irrora… 004            bud_bank_… none  <NA>  population  mode       expert_score   <NA>      
#>  8 Campbell_2006 Acacia irrora… 004            resprouti… fire… <NA>  population  mode       expert_score   <NA>      
#>  9 Campbell_2006 Acacia irrora… 004            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#> 10 Campbell_2006 Acacia irrora… 005            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#> # ℹ 1,812 more rows
#> # ℹ 24 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>,
#> #   entity_context_id <chr>, plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>,
#> #   measurement_remarks <chr>, method_id <chr>, method_context_id <chr>, original_name <chr>,
#> #   treatment_context_properties <chr>, plot_context_properties <chr>, entity_context_properties <chr>,
#> #   temporal_context_properties <chr>, method_context_properties <chr>, location_name <chr>, …

Alternatively,users can join all information using flatten_database():

data_fire %>% flatten_database()

#> # A tibble: 1,822 × 66
#>    dataset_id    taxon_name     observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>         <chr>          <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 Campbell_2006 Acacia falcif… 001            bud_bank_… basa… <NA>  population  mode       expert_score   <NA>      
#>  2 Campbell_2006 Acacia falcif… 001            resprouti… resp… <NA>  population  mode       expert_score   <NA>      
#>  3 Campbell_2006 Acacia falcif… 001            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#>  4 Campbell_2006 Acacia falcif… 002            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#>  5 Campbell_2006 Acacia falcif… 003            dispersers ants  <NA>  species     mode       expert_score   <NA>      
#>  6 Campbell_2006 Acacia falcif… 003            plant_gro… tree  <NA>  species     mode       expert_score   <NA>      
#>  7 Campbell_2006 Acacia irrora… 004            bud_bank_… none  <NA>  population  mode       expert_score   <NA>      
#>  8 Campbell_2006 Acacia irrora… 004            resprouti… fire… <NA>  population  mode       expert_score   <NA>      
#>  9 Campbell_2006 Acacia irrora… 004            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#> 10 Campbell_2006 Acacia irrora… 005            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#> # ℹ 1,812 more rows
#> # ℹ 56 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>,
#> #   entity_context_id <chr>, plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>,
#> #   measurement_remarks <chr>, method_id <chr>, method_context_id <chr>, original_name <chr>, location_name <chr>,
#> #   `latitude (deg)` <chr>, `longitude (deg)` <chr>, location_properties <chr>, treatment_context_properties <chr>,
#> #   plot_context_properties <chr>, entity_context_properties <chr>, temporal_context_properties <chr>, …

Visualising data by site

plot_locations() graphically summarises where trait data was collected from and how much data is available. The legend refers to the number of neighbouring points: the warmer the colour, the more data that is available. This function only works for studies that are geo-referenced. Users must first use join_location_coordinates() to append latitude and longitude information from the locations dataframe into the traits dataframe before plotting.

plot_locations() defaults to dividing the data by trait_name (feature = “trait_name”), but you can select any of the columns within the traits table - including columns you add with join_ functions. However, selecting taxon_name will likely crash R if you are working with a dataframe that still contains a large number of species.

data_fire <- data_fire %>% join_location_coordinates()
plot_locations(data_fire$traits)

plot of chunk site_plot

Visualising data distribution and variance

plot_trait_distribution() creates histograms and beeswarm plots for specific traits to help users visualise the variance of the data. Users can specify whether to create separate beeswarm plots at the level of taxonomic family, genus or by a column in the traits table, such as dataset_id

austraits %>% plot_trait_distribution_beeswarm(trait_name = "wood_density", y_axis_category = "family")

plot of chunk beeswarm

austraits %>% plot_trait_distribution_beeswarm(trait_name = "wood_density", y_axis_category = "dataset_id")

plot of chunk beeswarm

Reshaping the traits table

The traits table in AusTraits is in long format, where data for all trait information are denoted by two columns called trait_name and value. You can convert this to wide format, where each trait is in a separate column, using the function trait_pivot_wider().

Note that the following columns are lost when pivoting: unit, replicates, measurement_remarks, and basis_of_value to provide a useful output.

Pivot wider

Note that the latest version of trait_pivot_wider() is no longer supporting AusTraits database versions <=4.0.2. Please refer to our README to install an older version of the austraits R package to work old versions of the AusTraits database.

data_fire %>% trait_pivot_wider()

#> # A tibble: 1,366 × 49
#>    dataset_id  taxon_name observation_id entity_type value_type basis_of_record life_stage population_id individual_id
#>    <chr>       <chr>      <chr>          <chr>       <chr>      <chr>           <chr>      <chr>         <chr>        
#>  1 Campbell_2… Acacia fa… 001            population  mode       field           adult      01            <NA>         
#>  2 Campbell_2… Acacia fa… 002            population  mode       field           seedling   01            <NA>         
#>  3 Campbell_2… Acacia fa… 003            species     mode       field           adult      <NA>          <NA>         
#>  4 Campbell_2… Acacia ir… 004            population  mode       field           adult      01            <NA>         
#>  5 Campbell_2… Acacia ir… 005            population  mode       field           seedling   01            <NA>         
#>  6 Campbell_2… Acacia ir… 006            species     mode       field           adult      <NA>          <NA>         
#>  7 Campbell_2… Acacia ma… 007            population  mode       field           adult      02            <NA>         
#>  8 Campbell_2… Acacia ma… 008            population  mode       field           seedling   02            <NA>         
#>  9 Campbell_2… Acacia ma… 009            species     mode       field           adult      <NA>          <NA>         
#> 10 Campbell_2… Acacia me… 010            population  mode       field           adult      02            <NA>         
#> # ℹ 1,356 more rows
#> # ℹ 40 more variables: repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>,
#> #   entity_context_id <chr>, plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>,
#> #   method_id <chr>, method_context_id <chr>, original_name <chr>, location_name <chr>, `latitude (deg)` <chr>,
#> #   `longitude (deg)` <chr>, bud_bank_location <chr>, resprouting_capacity <chr>, seedbank_location <chr>,
#> #   post_fire_recruitment <chr>, dispersers <chr>, plant_growth_form <chr>, stem_dark_respiration_per_area <chr>,
#> #   bark_thickness <chr>, huber_value <chr>, leaf_dry_matter_content <chr>, leaf_dark_respiration_per_area <chr>, …

Binding trait values

Some datasets will have multiple observations for some traits, for instance datasets from floras often report a minimum and maximum fruit length for a species. You can use bind_trait_values to merge these into a single cell.

data_fruit <- austraits %>% 
  extract_trait("fruit_length") %>% 
  extract_taxa(family = "Rutaceae") %>% 
  extract_data(table = "traits", col = "value_type", col_value = c("minimum", "maximum"))

data_trait_bound <- data_fruit$traits %>%
  bind_trait_values() # Joining multiple obs with `--`
  
data_trait_bound  %>%   
  dplyr::filter(stringr::str_detect(value, "--"))

#> # A tibble: 288 × 26
#>    dataset_id taxon_name        observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>      <chr>             <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 ABRS_2023  Acronychia aberr… 01324          fruit_len… 13--… mm    species     minimum--… measurement--… NA--NA    
#>  2 ABRS_2023  Acronychia acidu… 01325          fruit_len… 13--… mm    species     minimum--… measurement--… NA--NA    
#>  3 ABRS_2023  Acronychia acron… 01326          fruit_len… 8--13 mm    species     minimum--… measurement--… NA--NA    
#>  4 ABRS_2023  Acronychia acumi… 01327          fruit_len… 12--… mm    species     minimum--… measurement--… NA--NA    
#>  5 ABRS_2023  Acronychia baeue… 01328          fruit_len… 10--… mm    species     minimum--… measurement--… NA--NA    
#>  6 ABRS_2023  Acronychia choor… 01329          fruit_len… 10--… mm    species     minimum--… measurement--… NA--NA    
#>  7 ABRS_2023  Acronychia crass… 01330          fruit_len… 10--… mm    species     minimum--… measurement--… NA--NA    
#>  8 ABRS_2023  Acronychia imper… 01332          fruit_len… 9--16 mm    species     minimum--… measurement--… NA--NA    
#>  9 ABRS_2023  Acronychia laevis 01333          fruit_len… 7--10 mm    species     minimum--… measurement--… NA--NA    
#> 10 ABRS_2023  Acronychia litto… 01334          fruit_len… 8--14 mm    species     minimum--… measurement--… NA--NA    
#> # ℹ 278 more rows
#> # ℹ 16 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>,
#> #   entity_context_id <chr>, plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>,
#> #   measurement_remarks <chr>, method_id <chr>, method_context_id <chr>, original_name <chr>

If you would like to revert the bounded trait values, call separate_trait_values():

data_trait_bound %>% 
  separate_trait_values(., austraits$definitions)

#> # A tibble: 119 × 26
#>    dataset_id  taxon_name       observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>       <chr>            <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 Cooper_2013 Acronychia baeu… 0071           fruit_len… 15    mm    species     maximum    measurement    <NA>      
#>  2 ABRS_2023   Acronychia aber… 01324          fruit_len… 16    mm    species     maximum    measurement    <NA>      
#>  3 ABRS_2023   Acronychia aber… 01324          fruit_len… 13    mm    species     minimum    measurement    <NA>      
#>  4 ABRS_2023   Acronychia eung… 01331          fruit_len… 12    mm    species     maximum    measurement    <NA>      
#>  5 ABRS_2023   Asterolasia ele… 02248          fruit_len… 10    mm    species     maximum    measurement    <NA>      
#>  6 ABRS_2023   Boronia angusti… 02910          fruit_len… 6     mm    species     maximum    measurement    <NA>      
#>  7 ABRS_2023   Boronia quadril… 03056          fruit_len… 6     mm    species     maximum    measurement    <NA>      
#>  8 ABRS_2023   Bosistoa floydii 03120          fruit_len… 10    mm    species     maximum    measurement    <NA>      
#>  9 ABRS_2023   Citrus australa… 04176          fruit_len… 50    mm    species     maximum    measurement    <NA>      
#> 10 ABRS_2023   Citrus garrawayi 04178          fruit_len… 100   mm    species     maximum    measurement    <NA>      
#> # ℹ 109 more rows
#> # ℹ 16 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>,
#> #   entity_context_id <chr>, plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>,
#> #   measurement_remarks <chr>, method_id <chr>, method_context_id <chr>, original_name <chr>

Fonti Kar, Elizabeth Wenk, Daniel Falster

2025-03-28

Install and load `austraits`

Retrieve AusTraits database

Descriptive summaries of traits and taxa

Quickly look up data

Extracting data

Extracting by dataset

Extracting by taxonomy

Extracting by trait

Extracting from other tables

Extracting from a single table

Join data from other tables

Visualising data by site

Visualising data distribution and variance

Reshaping the traits table

Pivot wider

Binding trait values

austraits

Fonti Kar, Elizabeth Wenk, Daniel Falster

2025-03-28

Install and load austraits

Retrieve AusTraits database

Descriptive summaries of traits and taxa

Quickly look up data

Extracting data

Extracting by dataset

Extracting by taxonomy

Extracting by trait

Extracting from other tables

Extracting from a single table

Join data from other tables

Visualising data by site

Visualising data distribution and variance

Reshaping the traits table

Pivot wider

Binding trait values

Install and load `austraits`