austraits

austraits allow users to access, explore and wrangle data from traits.build relational databases. It is also an R interface to AusTraits, the Australian plant trait database. This package contains functions for joining data from various tables, filtering to specific records, combining multiple databases and visualising the distribution of the data. Below, we’ve include a tutorial using the AusTraits database to illustrate how some these functions work together to generate useful outputs.

Install and load austraits

austraits is still under development. To install the current version from GitHub:

#install.packages("remotes")
remotes::install_github("traitecoevo/austraits", dependencies = TRUE, upgrade = "ask")

# Load the austraits package
library(austraits)

Retrieve AusTraits database

We will use the latest AusTraits database as an example database.

We can download the AusTraits database by calling load_austraits(). This function will download AusTraits to a specified path. By default it is data/austraits. The function will reload the database from this location in the future. You can set update = TRUE so the database is downloaded fresh from Zenodo. Note that load_austraits() will happily accept a DOI of a particular version.

austraits <- load_austraits(version = "6.0.0", path = "data/austraits")

You can check out different versions of AusTraits and their associated DOI by using:

get_versions(path = "data/austraits")
#> # A tibble: 6 × 4
#>   publication_date doi                     version id      
#>   <date>           <chr>                   <chr>   <chr>   
#> 1 2024-05-14       10.5281/zenodo.11188867 6.0.0   11188867
#> 2 2023-11-19       10.5281/zenodo.10156222 5.0.0   10156222
#> 3 2023-09-18       10.5281/zenodo.8353840  4.2.0   8353840 
#> 4 2023-01-30       10.5281/zenodo.7583087  4.1.0   7583087 
#> 5 2022-11-27       10.5281/zenodo.7368074  4.0.0   7368074 
#> 6 2021-07-14       10.5281/zenodo.5112001  3.0.2   5112001

AusTraits, like all traits.build databases, is a relational database. In R, it is a very large list with multiple tables. If you are not familiar with working with lists in R, we recommend having a quick look at this tutorial. To learn more about the structure of austraits, check out the structure of the database.

austraits
#> ── This is 6.0.0 of AusTraits: a curated plant trait database for the Australian flora! #> ──────────────────────────────────
#> ℹ This database is built using traits.build version 1.1.0.9000
#> ℹ This database contains a total of 1822 records, for 454 taxa and 26 traits.
#> 
#> ── This object is a 'list' with the following components: ──
#> 
#> • traits: A table containing measurements of traits.
#> • locations: A table containing observations of location/site characteristics associated with information in `traits`.
#> Cross referencing between the two dataframes is possible using combinations of the variables `dataset_id`,
#> `location_name`.
#> • contexts: A table containing observations of contextual characteristics associated with information in `traits`. Cross
#> referencing between the two dataframes is possible using combinations of the variables `dataset_id`, `link_id`, and
#> `link_vals`.
#> • methods: A table containing details on methods with which data were collected, including time frame and source. Cross
#> referencing with the `traits` table is possible using combinations of the variables `dataset_id`, `trait_name`.
#> • excluded_data: A table of data that did not pass quality test and so were excluded from the master dataset.
#> • taxonomic_updates: A table of all taxonomic changes implemented in the construction of AusTraits. Changes are
#> determined by comapring against the APC (Australian Plant Census) and APNI (Australian Plant Names Index).
#> • taxa: A table containing details on taxa associated with information in `traits`. This information has been sourced
#> from the APC (Australian Plant Census) and APNI (Australian Plant Names Index) and is released under a CC-BY3 license.
#> • contributors: A table of people contributing to each study.
#> • sources: Bibtex entries for all primary and secondary sources in the compilation.
#> • definitions: A copy of the definitions for all tables and terms. Information included here was used to process data #> and
#> generate any documentation for the study.
#> • schema: A copy of the schema for all tables and terms. Information included here was used to process data and generate
#> any documentation for the study.
#> • metadata: Metadata associated with the dataset, including title, creators, license, subject, funding sources.
#> • build_info: A description of the computing environment used to create this version of the dataset, including version
#> number, git commit and R session_info.
#> ℹ To access a component, try using the $ e.g. austraits$traits

Descriptive summaries of traits and taxa

AusTraits contains 497 plant traits. Check out definitions of the traits to learn more about how each trait is defined.

Have a look at data coverage by trait or taxa with:

summarise_database(austraits, "trait_name") 
#> # A tibble: 497 × 5
#>    trait_name                    n_records n_dataset n_taxa percent_total
#>    <chr>                             <int>     <int>  <int>         <dbl>
#>  1 accessory_cost_fraction              47         1     47     0.0000272
#>  2 accessory_cost_mass                  47         1     47     0.0000272
#>  3 atmospheric_CO2_concentration       840         4    121     0.000487 
#>  4 bark_Al_per_dry_mass                 70         1     10     0.0000406
#>  5 bark_B_per_dry_mass                  70         1     10     0.0000406
#>  6 bark_C_per_dry_mass                 229         2     27     0.000133 
#>  7 bark_Ca_per_dry_mass                104         3     21     0.0000603
#>  8 bark_Cu_per_dry_mass                 70         1     10     0.0000406
#>  9 bark_Fe_per_dry_mass                 70         1     10     0.0000406
#> 10 bark_K_per_dry_mass                 104         3     21     0.0000603
#> # ℹ 487 more rows
summarise_database(austraits, "family") 
#> # A tibble: 310 × 5
#>    family           n_records n_dataset n_taxa percent_total
#>    <chr>                <int>     <int>  <int>         <dbl>
#>  1 Acanthaceae           3719        57    149     0.00216  
#>  2 Achariaceae            162        14      3     0.0000939
#>  3 Actinidiaceae          186        16      3     0.000108 
#>  4 Agapanthaceae          107        13      3     0.000062 
#>  5 Aizoaceae             5004        63    102     0.0029   
#>  6 Akaniaceae             123        16      1     0.0000713
#>  7 Alismataceae           892        30     20     0.000517 
#>  8 Alliaceae              561        19     18     0.000325 
#>  9 Alseuosmiaceae         318        13      3     0.000184 
#> 10 Alstroemeriaceae       175        15      2     0.000101 
#> # ℹ 300 more rows
summarise_database(austraits, "genus") 
#> # A tibble: 3,177 × 5
#>    genus        n_records n_dataset n_taxa percent_total
#>    <chr>            <int>     <int>  <int>         <dbl>
#>  1 (Dockrillia          3         2      1    0.00000174
#>  2 Abelia              16         4      1    0.00000928
#>  3 Abelmoschus        271        19      8    0.000157  
#>  4 Abildgaardia        74         7      2    0.0000429 
#>  5 Abrodictyum        123        14      3    0.0000713 
#>  6 Abroma              39         7      2    0.0000226 
#>  7 Abrophyllum        181        19      3    0.000105  
#>  8 Abrotanella        183        18      4    0.000106  
#>  9 Abrus              202        26      3    0.000117  
#> 10 Abutilon          1975        52     54    0.00115   
#> # ℹ 3,167 more rows

Quickly look up data

Interested in a specific trait or context property, but unsure what terms we use? Try our lookup_ functions.

lookup_trait(austraits, "leaf") %>% head()
#> [1] "leaf_compoundness" "leaf_phenology"    "leaf_length"       "leaf_width"        "leaf_margin"      
#> [6] "leaf_shape"
lookup_context_property(austraits, "fire") %>% head() 
#> [1] "fire intensity"     "fire history"       "fire response type" "fire severity"      "fire season"
lookup_location_property(austraits, "temperature") %>% head()
#> [1] "temperature, max (C)"             "temperature, MAT (C)"             "temperature, mean summer max (C)"
#> [4] "temperature, mean winter max (C)" "temperature, max MAT (C)"         "temperature, min MAT (C)"

Extracting data

In most cases, users would like to extract a subset of a database for their research purposes.

  • extract_dataset() filters for a particular study
  • extract_trait() filters for a certain trait
  • extract_taxa() filters for a specific taxon

Note you can supply a vector to each of these functions to filter for more than one study/trait/taxa. All our extract_ function supports partial matching e.g. extract_trait("leaf") would return all traits containing leaf.

If you would like to extract from other tables or columns, use extract_data

All extract_ functions simultaneously filter across all tables in the database.

Extracting by dataset

Filtering one particular dataset and assigning it to an object

one_study <- extract_dataset(austraits, "Falster_2005_2")

one_study$traits 
#> # A tibble: 165 × 26
#>    dataset_id     taxon_name        observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>          <chr>             <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 Falster_2005_2 Acacia longifolia 01             huber_val… 0.00… mm2{… population  mean       measurement    unknown   
#>  2 Falster_2005_2 Acacia longifolia 01             huber_val… 0.00… mm2{… population  mean       measurement    unknown   
#>  3 Falster_2005_2 Acacia longifolia 01             huber_val… 0.00… mm2{… population  mean       measurement    unknown   
#>  4 Falster_2005_2 Acacia longifolia 01             huber_val… 0.00… mm2{… population  mean       measurement    unknown   
#>  5 Falster_2005_2 Acacia longifolia 01             leaf_N_pe… 23.2  mg/g  population  mean       measurement    4         
#>  6 Falster_2005_2 Acacia longifolia 01             leaf_area  1761  mm2   population  mean       measurement    4         
#>  7 Falster_2005_2 Acacia longifolia 01             leaf_mass… 128   g/m2  population  mean       measurement    4         
#>  8 Falster_2005_2 Acacia longifolia 01             plant_hei… 4     m     population  maximum    measurement    unknown   
#>  9 Falster_2005_2 Acacia longifolia 01             resprouti… fire… <NA>  population  mode       expert_score   <NA>      
#> 10 Falster_2005_2 Acacia longifolia 01             seed_dry_… 14    mg    population  mean       measurement    unknown   
#> # ℹ 155 more rows
#> # ℹ 16 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>, entity_context_id <chr>,
#> #   plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>, measurement_remarks <chr>, method_id <chr>,
#> #   method_context_id <chr>, original_name <chr>

Filtering multiple datasets and assigning it to an object

multi_studies <- extract_dataset(austraits, 
                                        dataset_id = c("Thompson_2001","Ilic_2000"))
 
multi_studies$traits 
#> # A tibble: 2,209 × 26
#>    dataset_id taxon_name       observation_id trait_name   value unit   entity_type value_type basis_of_value replicates
#>    <chr>      <chr>            <chr>          <chr>        <chr> <chr>  <chr>       <chr>      <chr>          <chr>     
#>  1 Ilic_2000  Acacia acradenia 0001           wood_density 0.904 mg/mm3 individual  raw        measurement    unknown   
#>  2 Ilic_2000  Acacia acuminata 0002           wood_density 0.895 mg/mm3 individual  raw        measurement    unknown   
#>  3 Ilic_2000  Acacia acuminata 0003           wood_density 1.008 mg/mm3 individual  raw        measurement    unknown   
#>  4 Ilic_2000  Acacia adsurgens 0004           wood_density 0.887 mg/mm3 individual  raw        measurement    unknown   
#>  5 Ilic_2000  Acacia alleniana 0005           wood_density 0.56  mg/mm3 individual  raw        measurement    unknown   
#>  6 Ilic_2000  Acacia ampliceps 0006           wood_density 0.568 mg/mm3 individual  raw        measurement    unknown   
#>  7 Ilic_2000  Acacia aneura    0007           wood_density 1.035 mg/mm3 individual  raw        measurement    unknown   
#>  8 Ilic_2000  Acacia aneura    0008           wood_density 1.019 mg/mm3 individual  raw        measurement    unknown   
#>  9 Ilic_2000  Acacia aneura    0009           wood_density 0.861 mg/mm3 individual  raw        measurement    unknown   
#> 10 Ilic_2000  Acacia aneura    0010           wood_density 0.996 mg/mm3 individual  raw        measurement    unknown   
#> # ℹ 2,199 more rows
#> # ℹ 16 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>, entity_context_id <chr>,
#> #   plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>, measurement_remarks <chr>, method_id <chr>,
#> #   method_context_id <chr>, original_name <chr>

Filtering multiple datasets by same lead author (e.g. Falster) and assigning it to an object.

falster_studies <- extract_dataset(austraits, "Falster")

falster_studies$traits 
#> # A tibble: 685 × 26
#>    dataset_id   taxon_name        observation_id trait_name   value unit  entity_type value_type basis_of_value replicates
#>    <chr>        <chr>             <chr>          <chr>        <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 Falster_2003 Acacia floribunda 01             leaf_area    142   mm2   population  mean       measurement    3         
#>  2 Falster_2003 Acacia floribunda 01             leaf_inclin… 57    deg   population  mean       measurement    3         
#>  3 Falster_2003 Acacia floribunda 02             leaf_compou… simp… <NA>  species     mode       expert_score   <NA>      
#>  4 Falster_2003 Acacia myrtifolia 03             leaf_area    319   mm2   population  mean       measurement    3         
#>  5 Falster_2003 Acacia myrtifolia 03             leaf_inclin… 66.1  deg   population  mean       measurement    3         
#>  6 Falster_2003 Acacia myrtifolia 04             leaf_compou… simp… <NA>  species     mode       expert_score   <NA>      
#>  7 Falster_2003 Acacia suaveolens 05             leaf_area    562   mm2   population  mean       measurement    3         
#>  8 Falster_2003 Acacia suaveolens 05             leaf_inclin… 71.7  deg   population  mean       measurement    3         
#>  9 Falster_2003 Acacia suaveolens 06             leaf_compou… simp… <NA>  species     mode       expert_score   <NA>      
#> 10 Falster_2003 Angophora hispida 07             leaf_area    1590  mm2   population  mean       measurement    3         
#> # ℹ 675 more rows
#> # ℹ 16 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>, entity_context_id <chr>,
#> #   plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>, measurement_remarks <chr>, method_id <chr>,
#> #   method_context_id <chr>, original_name <chr>

Extracting by taxonomy

# By family 
proteaceae <- extract_taxa(austraits, family = "Proteaceae")
# Checking that only taxa in Proteaceae have been extracted
proteaceae$taxa$family %>% unique()
#> [1] "Proteaceae"
# By genus 
acacia <- extract_taxa(austraits, genus = "Acacia")
# Checking that only taxa in Acacia have been extracted
acacia$traits$taxon_name %>% unique() %>% head()
#> [1] "Acacia abbatiana"                        "Acacia abbreviata"                      
#> [3] "Acacia abrupta"                          "Acacia acanthaster"                     
#> [5] "Acacia acanthoclada subsp. acanthoclada" "Acacia acanthoclada subsp. glaucescens"
acacia$taxa$genus %>% unique()
#> [1] "Acacia"

Extracting by trait

data_fruit <- extract_trait(austraits, "fruit")

data_fruit$traits 
#> # A tibble: 216,465 × 26
#>    dataset_id taxon_name            observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>      <chr>                 <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 ABRS_1981  Ceratophyllum demers… 0566           fruit_len… 4     mm    species     minimum    measurement    <NA>      
#>  2 ABRS_1981  Ceratophyllum demers… 0566           fruit_len… 6     mm    species     maximum    measurement    <NA>      
#>  3 ABRS_1981  Ceratophyllum demers… 0566           fruit_wid… 3     mm    species     minimum    measurement    <NA>      
#>  4 ABRS_1981  Ceratophyllum demers… 0566           fruit_wid… 3.5   mm    species     maximum    measurement    <NA>      
#>  5 ABRS_1981  Conospermum petiolare 0680           fruit_len… 2.5   mm    species     minimum    measurement    <NA>      
#>  6 ABRS_1981  Conospermum petiolare 0680           fruit_wid… 3     mm    species     minimum    measurement    <NA>      
#>  7 ABRS_1981  Proiphys amboinensis  3182           fruit_len… 15    mm    species     minimum    measurement    <NA>      
#>  8 ABRS_1981  Proiphys amboinensis  3182           fruit_len… 30    mm    species     maximum    measurement    <NA>      
#>  9 ABRS_1981  Proiphys amboinensis  3182           fruit_wid… 15    mm    species     minimum    measurement    <NA>      
#> 10 ABRS_1981  Proiphys amboinensis  3182           fruit_wid… 30    mm    species     maximum    measurement    <NA>      
#> # ℹ 216,455 more rows
#> # ℹ 16 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>, entity_context_id <chr>,
#> #   plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>, measurement_remarks <chr>, method_id <chr>,
#> #   method_context_id <chr>, original_name <chr>

Combining lookup_trait with extract_trait to obtain all traits with ‘leaf’ in the trait name and assigning it to an object. Note we use the . notation to pass on the lookup_trait results to extract_trait

leaf <- lookup_trait(austraits, "leaf") %>% extract_trait(austraits, .) 

leaf$traits
#> # A tibble: 511,952 × 26
#>    dataset_id taxon_name            observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>      <chr>                 <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 ABRS_1981  Acanthocarpus canali… 0001           leaf_comp… simp… <NA>  species     mode       expert_score   <NA>      
#>  2 ABRS_1981  Acanthocarpus humilis 0002           leaf_comp… simp… <NA>  species     mode       expert_score   <NA>      
#>  3 ABRS_1981  Acanthocarpus parvif… 0003           leaf_comp… simp… <NA>  species     mode       expert_score   <NA>      
#>  4 ABRS_1981  Acanthocarpus preiss… 0004           leaf_comp… simp… <NA>  species     mode       expert_score   <NA>      
#>  5 ABRS_1981  Acanthocarpus robust… 0005           leaf_comp… simp… <NA>  species     mode       expert_score   <NA>      
#>  6 ABRS_1981  Acanthocarpus rupest… 0006           leaf_comp… simp… <NA>  species     mode       expert_score   <NA>      
#>  7 ABRS_1981  Acanthocarpus vertic… 0007           leaf_comp… simp… <NA>  species     mode       expert_score   <NA>      
#>  8 ABRS_1981  Acer pseudoplatanus   0008           leaf_phen… deci… <NA>  species     mode       expert_score   <NA>      
#>  9 ABRS_1981  Acidonia microcarpa   0009           leaf_comp… comp… <NA>  species     mode       expert_score   <NA>      
#> 10 ABRS_1981  Callitris acuminata   0010           leaf_comp… simp… <NA>  species     mode       expert_score   <NA>      
#> # ℹ 511,942 more rows
#> # ℹ 16 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>, entity_context_id <chr>,
#> #   plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>, measurement_remarks <chr>, method_id <chr>,
#> #   method_context_id <chr>, original_name <chr>

Extracting from other tables

You may want to extract data from tables that have specific column values. For example calling the code below will return data where “fire” is mentioned in the context_property column

data_fire <- extract_data(austraits, 
                          table =  "contexts",
                          col =  "context_property", 
                          col_value = "fire")

data_fire

Extracting from a single table

If you have already manipulated the original database and are working with just the traits table, the extract functions will also work on a single table.

seedling_data <- extract_data(austraits$traits,
                          col =  "life_stage", 
                          col_value = "seedling")

Falster_data <- extract_data(austraits$traits,
                          col =  "dataset_id", 
                          col_value = "Falster")

leaf_data <- extract_trait(austraits$traits, 
                          c("leaf_area", "leaf_N_per_dry_mass"))

Join data from other tables

Once users have extracted the data they want, they may want to merge other study details into the main traits dataframe for their analyses. For example, users may require taxonomic information for a phylogenetic analysis. This is where the join_ functions come in.

There are five join_ functions in total, each designed to append specific information from other tables and elements in the austraits object. Their suffixes refer to the type of information that is joined, e.g. join_taxa appends taxonomic information to the traits dataframe.

  • join_taxa()
  • join_methods()
  • join_location_coordinates()
  • join_location_properties()
  • join_context_properties()

We recommend pulling up the help file for each one for more details e.g ?join_location_coordinates()

Each of the functions has specific default parameters and formatting, but offers versatile joining options.

# Join taxonomic information 
(data_fire %>% join_taxa)$traits 
#> # A tibble: 1,822 × 30
#>    dataset_id    taxon_name         observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>         <chr>              <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 Campbell_2006 Acacia falciformis 001            bud_bank_… basa… <NA>  population  mode       expert_score   <NA>      
#>  2 Campbell_2006 Acacia falciformis 001            resprouti… resp… <NA>  population  mode       expert_score   <NA>      
#>  3 Campbell_2006 Acacia falciformis 001            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#>  4 Campbell_2006 Acacia falciformis 002            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#>  5 Campbell_2006 Acacia falciformis 003            dispersers ants  <NA>  species     mode       expert_score   <NA>      
#>  6 Campbell_2006 Acacia falciformis 003            plant_gro… tree  <NA>  species     mode       expert_score   <NA>      
#>  7 Campbell_2006 Acacia irrorata    004            bud_bank_… none  <NA>  population  mode       expert_score   <NA>      
#>  8 Campbell_2006 Acacia irrorata    004            resprouti… fire… <NA>  population  mode       expert_score   <NA>      
#>  9 Campbell_2006 Acacia irrorata    004            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#> 10 Campbell_2006 Acacia irrorata    005            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#> # ℹ 1,812 more rows
#> # ℹ 20 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>, entity_context_id <chr>,
#> #   plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>, measurement_remarks <chr>, method_id <chr>,
#> #   method_context_id <chr>, original_name <chr>, family <chr>, genus <chr>, taxon_rank <chr>, establishment_means <chr>
# Join methodological information 
(data_fire %>% join_methods)$traits
#> # A tibble: 1,822 × 27
#>    dataset_id    taxon_name         observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>         <chr>              <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 Campbell_2006 Acacia falciformis 001            bud_bank_… basa… <NA>  population  mode       expert_score   <NA>      
#>  2 Campbell_2006 Acacia falciformis 001            resprouti… resp… <NA>  population  mode       expert_score   <NA>      
#>  3 Campbell_2006 Acacia falciformis 001            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#>  4 Campbell_2006 Acacia falciformis 002            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#>  5 Campbell_2006 Acacia falciformis 003            dispersers ants  <NA>  species     mode       expert_score   <NA>      
#>  6 Campbell_2006 Acacia falciformis 003            plant_gro… tree  <NA>  species     mode       expert_score   <NA>      
#>  7 Campbell_2006 Acacia irrorata    004            bud_bank_… none  <NA>  population  mode       expert_score   <NA>      
#>  8 Campbell_2006 Acacia irrorata    004            resprouti… fire… <NA>  population  mode       expert_score   <NA>      
#>  9 Campbell_2006 Acacia irrorata    004            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#> 10 Campbell_2006 Acacia irrorata    005            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#> # ℹ 1,812 more rows
#> # ℹ 17 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>, entity_context_id <chr>,
#> #   plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>, measurement_remarks <chr>, method_id <chr>,
#> #   method_context_id <chr>, original_name <chr>, methods <chr>
# Join location coordinates 
(data_fire %>% join_location_coordinates)$traits 
#> # A tibble: 1,822 × 29
#>    dataset_id    taxon_name         observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>         <chr>              <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 Campbell_2006 Acacia falciformis 001            bud_bank_… basa… <NA>  population  mode       expert_score   <NA>      
#>  2 Campbell_2006 Acacia falciformis 001            resprouti… resp… <NA>  population  mode       expert_score   <NA>      
#>  3 Campbell_2006 Acacia falciformis 001            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#>  4 Campbell_2006 Acacia falciformis 002            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#>  5 Campbell_2006 Acacia falciformis 003            dispersers ants  <NA>  species     mode       expert_score   <NA>      
#>  6 Campbell_2006 Acacia falciformis 003            plant_gro… tree  <NA>  species     mode       expert_score   <NA>      
#>  7 Campbell_2006 Acacia irrorata    004            bud_bank_… none  <NA>  population  mode       expert_score   <NA>      
#>  8 Campbell_2006 Acacia irrorata    004            resprouti… fire… <NA>  population  mode       expert_score   <NA>      
#>  9 Campbell_2006 Acacia irrorata    004            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#> 10 Campbell_2006 Acacia irrorata    005            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#> # ℹ 1,812 more rows
#> # ℹ 19 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>, entity_context_id <chr>,
#> #   plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>, measurement_remarks <chr>, method_id <chr>,
#> #   method_context_id <chr>, original_name <chr>, location_name <chr>, `latitude (deg)` <chr>, `longitude (deg)` <chr>
# Join information pertaining to location properties 
(data_fire %>% join_location_properties)$traits 
#> # A tibble: 1,822 × 28
#>    dataset_id    taxon_name         observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>         <chr>              <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 Campbell_2006 Acacia falciformis 001            bud_bank_… basa… <NA>  population  mode       expert_score   <NA>      
#>  2 Campbell_2006 Acacia falciformis 001            resprouti… resp… <NA>  population  mode       expert_score   <NA>      
#>  3 Campbell_2006 Acacia falciformis 001            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#>  4 Campbell_2006 Acacia falciformis 002            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#>  5 Campbell_2006 Acacia falciformis 003            dispersers ants  <NA>  species     mode       expert_score   <NA>      
#>  6 Campbell_2006 Acacia falciformis 003            plant_gro… tree  <NA>  species     mode       expert_score   <NA>      
#>  7 Campbell_2006 Acacia irrorata    004            bud_bank_… none  <NA>  population  mode       expert_score   <NA>      
#>  8 Campbell_2006 Acacia irrorata    004            resprouti… fire… <NA>  population  mode       expert_score   <NA>      
#>  9 Campbell_2006 Acacia irrorata    004            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#> 10 Campbell_2006 Acacia irrorata    005            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#> # ℹ 1,812 more rows
#> # ℹ 18 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>, entity_context_id <chr>,
#> #   plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>, measurement_remarks <chr>, method_id <chr>,
#> #   method_context_id <chr>, original_name <chr>, location_name <chr>, location_properties <chr>
# Join information pertaining to location properties 
(data_fire %>% join_location_properties(format = "many_columns", vars = "temperature, min MAT (C)"))$traits 
#> # A tibble: 1,822 × 28
#>    dataset_id    taxon_name         observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>         <chr>              <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 Campbell_2006 Acacia falciformis 001            bud_bank_… basa… <NA>  population  mode       expert_score   <NA>      
#>  2 Campbell_2006 Acacia falciformis 001            resprouti… resp… <NA>  population  mode       expert_score   <NA>      
#>  3 Campbell_2006 Acacia falciformis 001            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#>  4 Campbell_2006 Acacia falciformis 002            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#>  5 Campbell_2006 Acacia falciformis 003            dispersers ants  <NA>  species     mode       expert_score   <NA>      
#>  6 Campbell_2006 Acacia falciformis 003            plant_gro… tree  <NA>  species     mode       expert_score   <NA>      
#>  7 Campbell_2006 Acacia irrorata    004            bud_bank_… none  <NA>  population  mode       expert_score   <NA>      
#>  8 Campbell_2006 Acacia irrorata    004            resprouti… fire… <NA>  population  mode       expert_score   <NA>      
#>  9 Campbell_2006 Acacia irrorata    004            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#> 10 Campbell_2006 Acacia irrorata    005            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#> # ℹ 1,812 more rows
#> # ℹ 18 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>, entity_context_id <chr>,
#> #   plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>, measurement_remarks <chr>, method_id <chr>,
#> #   method_context_id <chr>, original_name <chr>, location_name <chr>,
#> #   `location_property: temperature, min MAT (C)` <chr>
# Join context information 
(data_fire %>% join_context_properties)$traits
#> # A tibble: 1,822 × 31
#>    dataset_id    taxon_name         observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>         <chr>              <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 Campbell_2006 Acacia falciformis 001            bud_bank_… basa… <NA>  population  mode       expert_score   <NA>      
#>  2 Campbell_2006 Acacia falciformis 001            resprouti… resp… <NA>  population  mode       expert_score   <NA>      
#>  3 Campbell_2006 Acacia falciformis 001            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#>  4 Campbell_2006 Acacia falciformis 002            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#>  5 Campbell_2006 Acacia falciformis 003            dispersers ants  <NA>  species     mode       expert_score   <NA>      
#>  6 Campbell_2006 Acacia falciformis 003            plant_gro… tree  <NA>  species     mode       expert_score   <NA>      
#>  7 Campbell_2006 Acacia irrorata    004            bud_bank_… none  <NA>  population  mode       expert_score   <NA>      
#>  8 Campbell_2006 Acacia irrorata    004            resprouti… fire… <NA>  population  mode       expert_score   <NA>      
#>  9 Campbell_2006 Acacia irrorata    004            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#> 10 Campbell_2006 Acacia irrorata    005            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#> # ℹ 1,812 more rows
#> # ℹ 21 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>, entity_context_id <chr>,
#> #   plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>, measurement_remarks <chr>, method_id <chr>,
#> #   method_context_id <chr>, original_name <chr>, treatment_context_properties <chr>, plot_context_properties <chr>,
#> #   entity_context_properties <chr>, temporal_context_properties <chr>, method_context_properties <chr>
# Join information from multiple tables 
(data_fire %>% join_context_properties %>% join_location_coordinates)$traits 
#> # A tibble: 1,822 × 34
#>    dataset_id    taxon_name         observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>         <chr>              <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 Campbell_2006 Acacia falciformis 001            bud_bank_… basa… <NA>  population  mode       expert_score   <NA>      
#>  2 Campbell_2006 Acacia falciformis 001            resprouti… resp… <NA>  population  mode       expert_score   <NA>      
#>  3 Campbell_2006 Acacia falciformis 001            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#>  4 Campbell_2006 Acacia falciformis 002            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#>  5 Campbell_2006 Acacia falciformis 003            dispersers ants  <NA>  species     mode       expert_score   <NA>      
#>  6 Campbell_2006 Acacia falciformis 003            plant_gro… tree  <NA>  species     mode       expert_score   <NA>      
#>  7 Campbell_2006 Acacia irrorata    004            bud_bank_… none  <NA>  population  mode       expert_score   <NA>      
#>  8 Campbell_2006 Acacia irrorata    004            resprouti… fire… <NA>  population  mode       expert_score   <NA>      
#>  9 Campbell_2006 Acacia irrorata    004            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#> 10 Campbell_2006 Acacia irrorata    005            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#> # ℹ 1,812 more rows
#> # ℹ 24 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>, entity_context_id <chr>,
#> #   plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>, measurement_remarks <chr>, method_id <chr>,
#> #   method_context_id <chr>, original_name <chr>, treatment_context_properties <chr>, plot_context_properties <chr>,
#> #   entity_context_properties <chr>, temporal_context_properties <chr>, method_context_properties <chr>,
#> #   location_name <chr>, `latitude (deg)` <chr>, `longitude (deg)` <chr>

Alternatively,users can join all information using flatten_database():

data_fire %>% flatten_database() 
#> # A tibble: 1,822 × 66
#>    dataset_id    taxon_name         observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>         <chr>              <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 Campbell_2006 Acacia falciformis 001            bud_bank_… basa… <NA>  population  mode       expert_score   <NA>      
#>  2 Campbell_2006 Acacia falciformis 001            resprouti… resp… <NA>  population  mode       expert_score   <NA>      
#>  3 Campbell_2006 Acacia falciformis 001            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#>  4 Campbell_2006 Acacia falciformis 002            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#>  5 Campbell_2006 Acacia falciformis 003            dispersers ants  <NA>  species     mode       expert_score   <NA>      
#>  6 Campbell_2006 Acacia falciformis 003            plant_gro… tree  <NA>  species     mode       expert_score   <NA>      
#>  7 Campbell_2006 Acacia irrorata    004            bud_bank_… none  <NA>  population  mode       expert_score   <NA>      
#>  8 Campbell_2006 Acacia irrorata    004            resprouti… fire… <NA>  population  mode       expert_score   <NA>      
#>  9 Campbell_2006 Acacia irrorata    004            seedbank_… soil… <NA>  population  mode       expert_score   <NA>      
#> 10 Campbell_2006 Acacia irrorata    005            post_fire… post… <NA>  population  mode       expert_score   <NA>      
#> # ℹ 1,812 more rows
#> # ℹ 56 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>, entity_context_id <chr>,
#> #   plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>, measurement_remarks <chr>, method_id <chr>,
#> #   method_context_id <chr>, original_name <chr>, location_name <chr>, `latitude (deg)` <chr>, `longitude (deg)` <chr>,
#> #   location_properties <chr>, treatment_context_properties <chr>, plot_context_properties <chr>,
#> #   entity_context_properties <chr>, temporal_context_properties <chr>, method_context_properties <chr>, methods <chr>, …

Visualising data by site

plot_locations() graphically summarises where trait data was collected from and how much data is available. The legend refers to the number of neighbouring points: the warmer the colour, the more data that is available. This function only works for studies that are geo-referenced. Users must first use join_location_coordinates() to append latitude and longitude information from the locations dataframe into the traits dataframe before plotting.

plot_locations() defaults to dividing the data by trait_name (feature = “trait_name”), but you can select any of the columns within the traits table - including columns you add with join_ functions. However, selecting taxon_name will likely crash R if you are working with a dataframe that still contains a large number of species.

data_fire <- data_fire %>% join_location_coordinates()
plot_locations(data_fire$traits)
plot of chunk site_plot

plot of chunk site_plot

Visualising data distribution and variance

plot_trait_distribution() creates histograms and beeswarm plots for specific traits to help users visualise the variance of the data. Users can specify whether to create separate beeswarm plots at the level of taxonomic family, genus or by a column in the traits table, such as dataset_id

austraits %>% plot_trait_distribution_beeswarm(trait_name = "wood_density", y_axis_category = "family")
plot of chunk beeswarm

plot of chunk beeswarm

austraits %>% plot_trait_distribution_beeswarm(trait_name = "wood_density", y_axis_category = "dataset_id")
plot of chunk beeswarm

plot of chunk beeswarm

Reshaping the traits table

The traits table in AusTraits is in long format, where data for all trait information are denoted by two columns called trait_name and value. You can convert this to wide format, where each trait is in a separate column, using the function trait_pivot_wider().

Note that the following columns are lost when pivoting: unit, replicates, measurement_remarks, and basis_of_value to provide a useful output.

Pivot wider

Note that the latest version of trait_pivot_wider() is no longer supporting AusTraits database versions <=4.0.2. Please refer to our README to install an older version of the austraits R package to work old versions of the AusTraits database.

data_fire %>% trait_pivot_wider()
#> # A tibble: 1,366 × 49
#>    dataset_id    taxon_name   observation_id entity_type value_type basis_of_record life_stage population_id individual_id
#>    <chr>         <chr>        <chr>          <chr>       <chr>      <chr>           <chr>      <chr>         <chr>        
#>  1 Campbell_2006 Acacia falc… 001            population  mode       field           adult      01            <NA>         
#>  2 Campbell_2006 Acacia falc… 002            population  mode       field           seedling   01            <NA>         
#>  3 Campbell_2006 Acacia falc… 003            species     mode       field           adult      <NA>          <NA>         
#>  4 Campbell_2006 Acacia irro… 004            population  mode       field           adult      01            <NA>         
#>  5 Campbell_2006 Acacia irro… 005            population  mode       field           seedling   01            <NA>         
#>  6 Campbell_2006 Acacia irro… 006            species     mode       field           adult      <NA>          <NA>         
#>  7 Campbell_2006 Acacia maid… 007            population  mode       field           adult      02            <NA>         
#>  8 Campbell_2006 Acacia maid… 008            population  mode       field           seedling   02            <NA>         
#>  9 Campbell_2006 Acacia maid… 009            species     mode       field           adult      <NA>          <NA>         
#> 10 Campbell_2006 Acacia mela… 010            population  mode       field           adult      02            <NA>         
#> # ℹ 1,356 more rows
#> # ℹ 40 more variables: repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>,
#> #   entity_context_id <chr>, plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>, method_id <chr>,
#> #   method_context_id <chr>, original_name <chr>, location_name <chr>, `latitude (deg)` <chr>, `longitude (deg)` <chr>,
#> #   bud_bank_location <chr>, resprouting_capacity <chr>, seedbank_location <chr>, post_fire_recruitment <chr>,
#> #   dispersers <chr>, plant_growth_form <chr>, stem_dark_respiration_per_area <chr>, bark_thickness <chr>,
#> #   huber_value <chr>, leaf_dry_matter_content <chr>, leaf_dark_respiration_per_area <chr>, …

Binding trait values

Some datasets will have multiple observations for some traits, for instance datasets from floras often report a minimum and maximum fruit length for a species. You can use bind_trait_values to merge these into a single cell.

data_fruit <- austraits %>% 
  extract_trait("fruit_length") %>% 
  extract_taxa(family = "Rutaceae") %>% 
  extract_data(table = "traits", col = "value_type", col_value = c("minimum", "maximum"))

data_trait_bound <- data_fruit$traits %>%
  bind_trait_values() # Joining multiple obs with `--`
  
data_trait_bound  %>%   
  dplyr::filter(stringr::str_detect(value, "--"))
#> # A tibble: 288 × 26
#>    dataset_id taxon_name            observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>      <chr>                 <chr>          <chr>      <chr> <chr> <chr>       <chr>      <chr>          <chr>     
#>  1 ABRS_2023  Acronychia aberrans   01324          fruit_len… 13--… mm    species     minimum--… measurement--… NA--NA    
#>  2 ABRS_2023  Acronychia acidula    01325          fruit_len… 13--… mm    species     minimum--… measurement--… NA--NA    
#>  3 ABRS_2023  Acronychia acronychi… 01326          fruit_len… 8--13 mm    species     minimum--… measurement--… NA--NA    
#>  4 ABRS_2023  Acronychia acuminata  01327          fruit_len… 12--… mm    species     minimum--… measurement--… NA--NA    
#>  5 ABRS_2023  Acronychia baeuerlen… 01328          fruit_len… 10--… mm    species     minimum--… measurement--… NA--NA    
#>  6 ABRS_2023  Acronychia chooreech… 01329          fruit_len… 10--… mm    species     minimum--… measurement--… NA--NA    
#>  7 ABRS_2023  Acronychia crassipet… 01330          fruit_len… 10--… mm    species     minimum--… measurement--… NA--NA    
#>  8 ABRS_2023  Acronychia imperfora… 01332          fruit_len… 9--16 mm    species     minimum--… measurement--… NA--NA    
#>  9 ABRS_2023  Acronychia laevis     01333          fruit_len… 7--10 mm    species     minimum--… measurement--… NA--NA    
#> 10 ABRS_2023  Acronychia littoralis 01334          fruit_len… 8--14 mm    species     minimum--… measurement--… NA--NA    
#> # ℹ 278 more rows
#> # ℹ 16 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>, entity_context_id <chr>,
#> #   plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>, measurement_remarks <chr>, method_id <chr>,
#> #   method_context_id <chr>, original_name <chr>

If you would like to revert the bounded trait values, call separate_trait_values():

data_trait_bound %>% 
  separate_trait_values(., austraits$definitions)
#> # A tibble: 119 × 26
#>    dataset_id  taxon_name           observation_id trait_name value unit  entity_type value_type basis_of_value replicates
#>    <chr>       <chr>                <chr>          <chr>      <chr> <chr> <chr>       <fct>      <chr>          <chr>     
#>  1 Cooper_2013 Acronychia baeuerle… 0071           fruit_len… 15    mm    species     <NA>       measurement    <NA>      
#>  2 ABRS_2023   Acronychia aberrans  01324          fruit_len… 13    mm    species     <NA>       measurement    <NA>      
#>  3 ABRS_2023   Acronychia aberrans  01324          fruit_len… 16    mm    species     <NA>       measurement    <NA>      
#>  4 ABRS_2023   Acronychia eungelle… 01331          fruit_len… 12    mm    species     <NA>       measurement    <NA>      
#>  5 ABRS_2023   Asterolasia elegans  02248          fruit_len… 10    mm    species     <NA>       measurement    <NA>      
#>  6 ABRS_2023   Boronia angustisepa… 02910          fruit_len… 6     mm    species     <NA>       measurement    <NA>      
#>  7 ABRS_2023   Boronia quadrilata   03056          fruit_len… 6     mm    species     <NA>       measurement    <NA>      
#>  8 ABRS_2023   Bosistoa floydii     03120          fruit_len… 10    mm    species     <NA>       measurement    <NA>      
#>  9 ABRS_2023   Citrus australasica  04176          fruit_len… 50    mm    species     <NA>       measurement    <NA>      
#> 10 ABRS_2023   Citrus garrawayi     04178          fruit_len… 100   mm    species     <NA>       measurement    <NA>      
#> # ℹ 109 more rows
#> # ℹ 16 more variables: basis_of_record <chr>, life_stage <chr>, population_id <chr>, individual_id <chr>,
#> #   repeat_measurements_id <chr>, temporal_context_id <chr>, source_id <chr>, location_id <chr>, entity_context_id <chr>,
#> #   plot_context_id <chr>, treatment_context_id <chr>, collection_date <chr>, measurement_remarks <chr>, method_id <chr>,
#> #   method_context_id <chr>, original_name <chr>