Title: | A workflow for harmonising trait data from diverse sources into a documented standard structure |
---|---|
Description: | The `traits.build` package provides a workflow to harmonise trait data from diverse sources. The code was originally built to support AusTraits (see Falster et al 2021, <doi:10.1038/s41597-021-01006-6>, <https://github.com/traitecoevo/autraits.build>) and has been generalised here to support construction of other trait databases. For detailed instructions and examples see <https://traitecoevo.github.io/traits.build-book/>. |
Authors: | Daniel Falster [cre, aut] , Elizabeth Wenk [cur, aut] , Sophie Yang [cur, aut] , Fonti Kar [aut, ctb] , ARDC [fnd], ARC [fnd] |
Maintainer: | Daniel Falster <[email protected]> |
License: | BSD_2_clause + file LICENCE |
Version: | 2.0.0 |
Built: | 2024-12-20 01:20:05 UTC |
Source: | https://github.com/traitecoevo/traits.build |
Format BibEntry object according to desired style using RefManageR
bib_print( bib, .opts = list(first.inits = TRUE, max.names = 1000, style = "markdown") )
bib_print( bib, .opts = list(first.inits = TRUE, max.names = 1000, style = "markdown") )
bib |
BibEntry object |
.opts |
List of parameters for formatting style |
Character string of formatted reference
Add version information to AusTraits
build_add_version(austraits, version, git_sha)
build_add_version(austraits, version, git_sha)
austraits |
AusTraits database object |
version |
Version number |
git_sha |
Git SHA |
AusTraits database object with version information added
build_combine
compiles all the loaded studies into a single AusTraits
database object as a large list.
build_combine(..., d = list(...))
build_combine(..., d = list(...))
... |
Arguments passed to other functions |
d |
List of all the AusTraits studies |
AusTraits compilation database as a large list
remake.yml
file with new studiesbuild_setup_pipeline
rewrites the remake.yml
file to include new
studies.
build_setup_pipeline( dataset_ids = dir("data"), method = "base", database_name = "database", template = select_pipeline_template(method), workers = 1 )
build_setup_pipeline( dataset_ids = dir("data"), method = "base", database_name = "database", template = select_pipeline_template(method), workers = 1 )
dataset_ids |
|
method |
Approach to use in build |
database_name |
Name of database to be built |
template |
Template used to build |
workers |
Number of workers/parallel processes to use when using method = "furrr" |
Updated remake.yml
file
Identify duplicates preventing pivoting wider
check_pivot_duplicates( database_object, dataset_ids = unique(database_object$traits$dataset_id) )
check_pivot_duplicates( database_object, dataset_ids = unique(database_object$traits$dataset_id) )
database_object |
Database object |
dataset_ids |
|
Tibble with duplicates and pivot columns
Test whether the traits table of a dataset can pivot wider with the minimum required columns.
check_pivot_wider(dataset)
check_pivot_wider(dataset)
dataset |
Built dataset with |
Number of rows with duplicates preventing pivoting wider
create_tree_branch()
is used to create a tree structure to show how things
are related. In AusTraits, this is used in the vignettes to show the file
structure of the repository and also to show the different components of the
AusTraits database.
create_tree_branch(x, title, prefix = "")
create_tree_branch(x, title, prefix = "")
x |
Vector of terms |
title |
Name of branch |
prefix |
Specifies the amount of indentation |
Vector of character strings for the tree structure
Build specified dataset. This function completes three steps, which can be executed separately if desired:
dataset_configure
, dataset_process
, dataset_update_taxonomy
dataset_build( filename_metadata, filename_data_raw, definitions, unit_conversion_functions, schema, resource_metadata, taxon_list, filter_missing_values = TRUE )
dataset_build( filename_metadata, filename_data_raw, definitions, unit_conversion_functions, schema, resource_metadata, taxon_list, filter_missing_values = TRUE )
filename_metadata |
Metadata yaml file for a given study |
filename_data_raw |
Raw |
definitions |
Definitions read in from the |
unit_conversion_functions |
|
schema |
Schema for traits.build |
resource_metadata |
metadata for the compilation |
taxon_list |
Taxon list |
filter_missing_values |
Default filters missing values from the excluded data table; change to false to see the rows with missing values |
List, AusTraits database object
## Not run: dataset_build( "data/Falster_2003/data.csv", "data/Falster_2003/metadata.yml", read_yaml("config/traits.yml"), get_unit_conversions("config/unit_conversions.csv"), get_schema(), get_schema("config/metadata.yml", "metadata"), read_csv_char("config/taxon_list.csv") ) ## End(Not run)
## Not run: dataset_build( "data/Falster_2003/data.csv", "data/Falster_2003/metadata.yml", read_yaml("config/traits.yml"), get_unit_conversions("config/unit_conversions.csv"), get_schema(), get_schema("config/metadata.yml", "metadata"), read_csv_char("config/taxon_list.csv") ) ## End(Not run)
Creates the config object which gets passed onto dataset_process
. The config list contains
the subset of definitions and unit conversions for those traits for a each study.
dataset_configure
is used in the remake::make
process to configure individual studies mapping the
individual traits found in that study along with any relevant unit conversions
and definitions. dataset_configure
and dataset_process
are applied to every study
in the remake.yml
file.
dataset_configure(filename_metadata, definitions)
dataset_configure(filename_metadata, definitions)
filename_metadata |
Metadata yaml file for a given study |
definitions |
Definitions read in from the |
List with dataset_id
, metadata
, definitions
and unit_conversion_functions
## Not run: dataset_configure("data/Falster_2003/metadata.yml", read_yaml("config/traits.yml")) ## End(Not run)
## Not run: dataset_configure("data/Falster_2003/metadata.yml", read_yaml("config/traits.yml")) ## End(Not run)
Find list of unique datasets within compilation containing specified taxa
dataset_find_taxon(taxa, austraits, original_name = FALSE)
dataset_find_taxon(taxa, austraits, original_name = FALSE)
taxa |
A vector which contains species names |
austraits |
AusTraits compilation |
original_name |
Logical; if TRUE use column in compilation which contains original species names, default = FALSE |
List of unique datasets within compilation containing each taxon
dataset_process
is used to load individual studies using the config file generated
from dataset_configure()
. dataset_configure
and dataset_process
are applied to every
study in the remake.yml
file.
dataset_process( filename_data_raw, config_for_dataset, schema, resource_metadata, unit_conversion_functions, filter_missing_values = TRUE )
dataset_process( filename_data_raw, config_for_dataset, schema, resource_metadata, unit_conversion_functions, filter_missing_values = TRUE )
filename_data_raw |
Raw |
config_for_dataset |
Config settings generated from |
schema |
Schema for traits.build |
resource_metadata |
Metadata about the traits compilation read in from the config folder |
unit_conversion_functions |
|
filter_missing_values |
Default filters missing values from the excluded data table; change to false to see the rows with missing values |
List, AusTraits database object
## Not run: dataset_process("data/Falster_2003/data.csv", dataset_configure("data/Falster_2003/metadata.yml", read_yaml("config/traits.yml")), get_schema(), get_schema("config/metadata.yml", "metadata"), get_unit_conversions("config/unit_conversions.csv")) ## End(Not run)
## Not run: dataset_process("data/Falster_2003/data.csv", dataset_configure("data/Falster_2003/metadata.yml", read_yaml("config/traits.yml")), get_schema(), get_schema("config/metadata.yml", "metadata"), get_unit_conversions("config/unit_conversions.csv")) ## End(Not run)
Builds a detailed report for every dataset with a unique dataset_id
, based on the template Rmd file provided.
The reports are rendered as html files and saved in the specified output folder.
dataset_report( dataset_id, austraits, overwrite = FALSE, output_path = "export/reports", input_file = system.file("support", "report_dataset.Rmd", package = "traits.build"), quiet = TRUE, keep = FALSE )
dataset_report( dataset_id, austraits, overwrite = FALSE, output_path = "export/reports", input_file = system.file("support", "report_dataset.Rmd", package = "traits.build"), quiet = TRUE, keep = FALSE )
dataset_id |
Name of specific study/dataset |
austraits |
Compiled austraits database |
overwrite |
Logical value to determine whether to overwrite existing report |
output_path |
Location where rendered report will be saved |
input_file |
Report script (.Rmd) file to build study report |
quiet |
An option to suppress printing during rendering from knitr, pandoc command line and others |
keep |
Keep intermediate Rmd file used? |
Html file of the rendered report located in the specified output folder
dataset_id
has the correct setupRun tests to ensure that specified dataset_id
has the correct setup.
dataset_test( dataset_ids, path_config = "config", path_data = "data", reporter = testthat::CompactProgressReporter )
dataset_test( dataset_ids, path_config = "config", path_data = "data", reporter = testthat::CompactProgressReporter )
dataset_ids |
Vector of |
path_config |
Path to folder containing configuration files |
path_data |
Path to folder containing data files |
reporter |
|
dataset_id
has the correct setupRun tests to ensure that specified dataset_id
has the correct setup.
dataset_test_worker( test_dataset_ids, path_config = "config", path_data = "data", schema = get_schema(), definitions = get_schema(file.path(path_config, "traits.yml"), I("traits")) )
dataset_test_worker( test_dataset_ids, path_config = "config", path_data = "data", schema = get_schema(), definitions = get_schema(file.path(path_config, "traits.yml"), I("traits")) )
test_dataset_ids |
Vector of |
path_config |
Path to folder containing configuration files |
path_data |
Path to folder containing data files |
schema |
Data schema |
definitions |
Trait defininitons |
Applies taxonomic updates to austraits_raw.
dataset_update_taxonomy(austraits_raw, taxa)
dataset_update_taxonomy(austraits_raw, taxa)
austraits_raw |
AusTraits compiled data as a large list without taxonomic updates applied |
taxa |
Taxon list |
List of AusTraits compiled data with taxonomic updates applied
Load schema for an traits.build data compilation (excluding traits)
get_schema( path = system.file("support", "traits.build_schema.yml", package = "traits.build"), subsection = NULL )
get_schema( path = system.file("support", "traits.build_schema.yml", package = "traits.build"), subsection = NULL )
path |
path to schema file. By default loads version included with the package |
subsection |
section to load |
a list
{ schema <- get_schema() }
{ schema <- get_schema() }
Make unit conversion functions
get_unit_conversions(filename)
get_unit_conversions(filename)
filename |
Name of file containing unit conversions |
List of conversion functions
## Not run: get_unit_conversions("config/unit_conversions.csv") ## End(Not run)
## Not run: get_unit_conversions("config/unit_conversions.csv") ## End(Not run)
dataset_id
import context data from a dataframeThis functions asks users which columns in the dataframe they would like to keep and records this appropriately in the metadata. The input data is assumed to be in wide format. The output may require additional manual editing.
metadata_add_contexts(dataset_id, overwrite = FALSE, user_responses = NULL)
metadata_add_contexts(dataset_id, overwrite = FALSE, user_responses = NULL)
dataset_id |
Identifier for a particular study in the database |
overwrite |
Overwrite existing information |
user_responses |
Named list containing simulated user input for manual selection of variables, mainly for testing purposes |
dataset_id
import location data from a dataframeThis functions asks users which columns in the dataframe they would like to keep and records this appropriately in the metadata. The input data is assumed to be in wide format. The output may require additional manual editing.
metadata_add_locations(dataset_id, location_data, user_responses = NULL)
metadata_add_locations(dataset_id, location_data, user_responses = NULL)
dataset_id |
Identifier for a particular study in the database |
location_data |
A dataframe of site variables |
user_responses |
Named list containing simulated user input for manual selection of variables, mainly for testing purposes |
## Not run: austraits$locations %>% dplyr::filter(dataset_id == "Falster_2005_1") %>% select(-dataset_id) %>% spread(location_property, value) %>% type_convert() -> location_data metadata_add_locations("Falster_2005_1", location_data) ## End(Not run)
## Not run: austraits$locations %>% dplyr::filter(dataset_id == "Falster_2005_1") %>% select(-dataset_id) %>% spread(location_property, value) %>% type_convert() -> location_data metadata_add_locations("Falster_2005_1", location_data) ## End(Not run)
Adds citation details to a metadata file for given study
metadata_add_source_bibtex( dataset_id, file, type = "primary", drop = c("dateobj", "month") )
metadata_add_source_bibtex( dataset_id, file, type = "primary", drop = c("dateobj", "month") )
dataset_id |
Identifier for a particular study in the database |
file |
Name of file where reference is saved |
type |
Type of reference: |
drop |
Variables in bibtex to ignore |
metadata.yml
file with citation details added
dataset_id
Uses rcrossref package to access publication details from the crossref database
metadata_add_source_doi(..., doi, bib = NULL)
metadata_add_source_doi(..., doi, bib = NULL)
... |
Arguments passed from metadata_add_source_bibtex() |
doi |
doi of reference to add |
bib |
(Only use for testing purposes) Result of calling |
metadata.yml
file with citation details added
dataset_id
metadata_add_substitution
is used to align the categorical trait values used
by a contributor to the categorical values supported by the database. These values
are defined in the traits.yml
file.
metadata_add_substitution(dataset_id, trait_name, find, replace)
metadata_add_substitution(dataset_id, trait_name, find, replace)
dataset_id |
Identifier for a particular study in the database |
trait_name |
The database defined name for a particular trait |
find |
Trait value in the original data.csv file |
replace |
Trait value supported by database |
metadata.yml
file with a substitution added
Add a dataframe of trait value substitutions into a metadata file for a dataset_id
metadata_add_substitutions_list(dataset_id, substitutions)
metadata_add_substitutions_list(dataset_id, substitutions)
dataset_id |
Identifier for a particular study in the database |
substitutions |
Dataframe of trait value substitutions |
metadata.yml
file with multiple trait value substitutions added
Function that simultaneously adds many trait value replacements, potentially
across many trait_name
's and dataset_id
's, to the respective metadata.yml
files.
This function will be used to quickly re-align/re-assign trait values across all studies.
metadata_add_substitutions_table( dataframe_of_substitutions, dataset_id, trait_name, find, replace )
metadata_add_substitutions_table( dataframe_of_substitutions, dataset_id, trait_name, find, replace )
dataframe_of_substitutions |
Dataframe with columns indicating |
dataset_id |
Name of column containing study |
trait_name |
Name of column containing trait name(s) for which a trait value replacement needs to be made |
find |
Name of column containing trait values submitted by the contributor for a data observation |
replace |
Name of column containing database aligned trait values |
Modified metadata files with trait value replacements
## Not run: read_csv("export/dispersal_syndrome_substitutions.csv") %>% select(-extra) %>% filter(dataset_id == "Angevin_2011") -> dataframe_of_substitutions metadata_add_substitutions_table(dataframe_of_substitutions, dataset_id, trait_name, find, replace) ## End(Not run)
## Not run: read_csv("export/dispersal_syndrome_substitutions.csv") %>% select(-extra) %>% filter(dataset_id == "Angevin_2011") -> dataframe_of_substitutions metadata_add_substitutions_table(dataframe_of_substitutions, dataset_id, trait_name, find, replace) ## End(Not run)
metadata.yml
file for a dataset_id
Add a single taxonomic change into the metadata.yml
file for a specific study.
metadata_add_taxonomic_change( dataset_id, find, replace, reason, taxonomic_resolution, overwrite = TRUE )
metadata_add_taxonomic_change( dataset_id, find, replace, reason, taxonomic_resolution, overwrite = TRUE )
dataset_id |
Identifier for a particular study in the database |
find |
Original name used by the contributor |
replace |
Taxonomic name accepted by APC or APNI |
reason |
Reason for taxonomic change |
taxonomic_resolution |
The rank of the most specific taxon name (or scientific name) to which a submitted orignal name resolves |
overwrite |
Parameter indicating whether preexisting find-replace entries should be overwritten. Defaults to |
metadata.yml
file with taxonomic change added
dataset_id
Add multiple taxonomic changes to the metadata.yml
file using a dataframe
containing the taxonomic changes to be made.
metadata_add_taxonomic_changes_list(dataset_id, taxonomic_updates)
metadata_add_taxonomic_changes_list(dataset_id, taxonomic_updates)
dataset_id |
Identifier for a particular study in the database |
taxonomic_updates |
Dataframe of taxonomic updates |
metadata.yml
file with multiple taxonomic updates added
dataset_id
, populate columns for traits into metadataThis function asks users which traits they would like to keep, and adds a template for those traits in the metadata. This template must then be finished manually.
metadata_add_traits(dataset_id, user_responses = NULL)
metadata_add_traits(dataset_id, user_responses = NULL)
dataset_id |
Identifier for a particular study in the database |
user_responses |
Named list containing simulated user input for manual selection of variables, mainly for testing purposes |
Can also be used to add a trait to an existing metadata file.
custom_R_code
specified in
the metadata for specified dataset_id
Function to check the output of running custom_R_code
specified in
the metadata.yml
file for specified dataset_id
.
For the specified dataset_id
, reads in the file data.csv
and
applies manipulations as described in the file metadata.yml
metadata_check_custom_R_code(dataset_id, path_data = "data")
metadata_check_custom_R_code(dataset_id, path_data = "data")
dataset_id |
Identifier for a particular study in the database |
path_data |
Path to folder with data |
metadata.yml
for specified dataset_id
Includes place-holders for major sections of the metadata.
metadata_create_template( dataset_id, path = file.path("data", dataset_id), skip_manual = FALSE, user_responses = NULL )
metadata_create_template( dataset_id, path = file.path("data", dataset_id), skip_manual = FALSE, user_responses = NULL )
dataset_id |
Identifier for a particular study in the database |
path |
Location of file where output is saved |
skip_manual |
Allows skipping of manual selection of variables, default = FALSE |
user_responses |
Named list containing simulated user input for manual selection of variables, mainly for testing purposes |
A yaml file template for metadata
dataset_id
Exclude observations in a yaml file for a dataset_id
metadata_exclude_observations(dataset_id, variable, find, reason)
metadata_exclude_observations(dataset_id, variable, find, reason)
dataset_id |
Identifier for a particular study in the database |
variable |
Variable name |
find |
Term to find by |
reason |
Reason for exclusion |
metadata.yml
file with excluded observations
dataset_id
's with a given taxonomic changeFind dataset_id
's with a given taxonomic change
metadata_find_taxonomic_change(find, replace = NULL, studies = NULL)
metadata_find_taxonomic_change(find, replace = NULL, studies = NULL)
find |
Name of original species |
replace |
Name of replacement species, default = NULL |
studies |
Name of studies to look through, default = NULL |
metadata.yml
file for specified dataset_id
Path to the metadata.yml
file for specified dataset_id
metadata_path_dataset_id(dataset_id, path_data = "data")
metadata_path_dataset_id(dataset_id, path_data = "data")
dataset_id |
Identifier for a particular study in the database |
path_data |
Path to folder with data |
A string
dataset_id
Remove a taxonomic change from a yaml file for a dataset_id
metadata_remove_taxonomic_change(dataset_id, find)
metadata_remove_taxonomic_change(dataset_id, find)
dataset_id |
Identifier for a particular study in the database |
find |
Taxonomic name to find |
metadata.yml
file with a taxonomic change removed
dataset_id
Update a taxonomic change into a yaml file for a dataset_id
metadata_update_taxonomic_change( dataset_id, find, replace, reason, taxonomic_resolution )
metadata_update_taxonomic_change( dataset_id, find, replace, reason, taxonomic_resolution )
dataset_id |
Identifier for a particular study in the database |
find |
Original taxonomic name |
replace |
Updated taxonomic name to replace original taxonomic name |
reason |
Reason for change |
taxonomic_resolution |
The rank of the most specific taxon name (or scientific name) to which a submitted orignal name resolves |
metadata.yml
file with added substitution
metadata_user_select_column
is used to select which columns in a dataframe/tibble
corresponds to the variable of interest. It is used to compile the metadata yaml
file by prompting the user to choose the relevant columns. It is used in
metadata_add_locations
and metadata_create_template
.
metadata_user_select_column(column, choices)
metadata_user_select_column(column, choices)
column |
Name of the variable of interest |
choices |
The options that can be selected from |
metadata_user_select_names
is used to prompt the user to select the variables that
are relevant for compiling the metadata yaml file. It is currently used for
metadata_add_traits
, metadata_add_locations
and metadata_add_contexts
.
metadata_user_select_names(title, vars)
metadata_user_select_names(title, vars)
title |
Character string providing the instruction for the user |
vars |
Variable names |
Creates a string of random letters with 8 characters as the default, useful for defining unique hyperlinks
notes_random_string(n = 8)
notes_random_string(n = 8)
n |
numerical integer, default is 8 |
character string with 8 letters
Add a note to the note recorder as a new row
notetaker_add_note(notes, new_note)
notetaker_add_note(notes, new_note)
notes |
object containing the report notes |
new_note |
vector of character notes to be added to existing notes |
A tibble with additional notes added
Creates a tibble with two columns with one column consisting of a randomly generated string of letters
notetaker_as_note(note, link = NA_character_)
notetaker_as_note(note, link = NA_character_)
note |
character string |
link |
character string, default is NA_character_ which generates a random string |
a tibble with two columns named note and link
Returns a specific row from notes specified by i. Default is nrow(notes) which returns the last note
notetaker_get_note(notes, i = nrow(notes))
notetaker_get_note(notes, i = nrow(notes))
notes |
object containing the report notes |
i |
numerical; row number for corresponding note, default is nrow(notes) |
a single row from a tibble
Print all notes
notetaker_print_all(notes, ..., numbered = TRUE)
notetaker_print_all(notes, ..., numbered = TRUE)
notes |
object containing the report notes |
... |
arguments passed to other functions |
numbered |
logical default is TRUE |
character string containing the notes
Print note (needs review?)
notetaker_print_note( note, as_anchor = FALSE, anchor_text = "", link_text = "link" )
notetaker_print_note( note, as_anchor = FALSE, anchor_text = "", link_text = "link" )
note |
object containing the report notes |
as_anchor |
logical default is FALSE |
anchor_text |
character string, default is "" |
link_text |
character string, default is "link" |
character string containing the notes
Prints a specific row from notes specified by i
notetaker_print_notes(notes, i = nrow(notes), ...)
notetaker_print_notes(notes, i = nrow(notes), ...)
notes |
object containing the report notes |
i |
specify the row which contains the note to be returned |
... |
arguments passed to notetaker_print_note() |
character string containing the notes
Note recorder used in report_study.Rmd file to initiate note recorder
notetaker_start()
notetaker_start()
A tibble where notes are recorded
Add or remove columns of data as needed so that all datasets have the same columns. Also adds in an error column.
process_add_all_columns(data, vars, add_error_column = TRUE)
process_add_all_columns(data, vars, add_error_column = TRUE)
data |
Dataframe containing study data read in as a csv file |
vars |
Vector of variable columns names to be included in the final formatted tibble |
add_error_column |
Adds an extra column called error if TRUE |
Tibble with the correct selection of columns including an error column
Convert units to desired type
process_convert_units(data, definitions, unit_conversion_functions)
process_convert_units(data, definitions, unit_conversion_functions)
data |
Tibble or dataframe containing the study data |
definitions |
Definitions read in from the |
unit_conversion_functions |
|
Tibble with converted units
Creates 3-part entity id codes that combine a segment for species, population,
and, when applicable, individual.
This depends upon a parsing_id
being established when the data.csv
file is first read in.
process_create_observation_id(data, metadata)
process_create_observation_id(data, metadata)
data |
The traits table at the point where this function is called |
metadata |
Yaml file with metadata |
Character string
Applies custom data manipulations if the metadata field custom_R_code
is not empty
Otherwise no manipulations will be done by applying the identity
function.
The code custom_R_code
assumes a single input.
process_custom_code(txt)
process_custom_code(txt)
txt |
Character text within custom_R_code of a |
character text containing custom_R_code if custom_R_code is not empty, otherwise no changes are made
Checks the metadata yaml file for any excluded observations. If there are none, returns the original data. If there are excluded observations returns the mutated data with excluded observations flagged in a new column.
process_flag_excluded_observations(data, metadata)
process_flag_excluded_observations(data, metadata)
data |
Tibble or dataframe containing the study data |
metadata |
Yaml file with metadata |
Dataframe with flagged excluded observations if there are any
Flags any numeric values that are outside the allowable range defined in the
traits.yml
file.
process_flag_out_of_range_values(data, definitions)
process_flag_out_of_range_values(data, definitions)
data |
Tibble or dataframe containing the study data |
definitions |
Definitions read in from the |
Tibble with flagged values outside of allowable range
Disallowed characters are flagged as errors, including for numeric traits, prior to unit conversions to avoid their conversion to NAs during the unit conversion process.
process_flag_unsupported_characters(data)
process_flag_unsupported_characters(data)
data |
Tibble or dataframe containing the study data |
Tibble with flagged values containing unsupported characters
Flag any unrecognised traits, as defined in the traits.yml
file.
process_flag_unsupported_traits(data, definitions)
process_flag_unsupported_traits(data, definitions)
data |
Tibble or dataframe containing the study data |
definitions |
Definitions read in from the |
Tibble with unrecognised traits flagged in the "error" column
Flags any categorical traits values that are not on the list of allowed values defined in the
traits.yml
file.
NA values are flagged as errors.
Numeric values that cannot convert to numeric are also flagged as errors.
process_flag_unsupported_values(data, definitions)
process_flag_unsupported_values(data, definitions)
data |
Tibble or dataframe containing the study data |
definitions |
Definitions read in from the |
Tibble with flagged values that are unsupported categorical trait values, missing values or numeric trait values that cannot be converted to numeric
Format context data read in from the metadata.yml
file. Converts from list to tibble.
process_format_contexts(my_list, dataset_id, traits)
process_format_contexts(my_list, dataset_id, traits)
my_list |
List of input information |
dataset_id |
Identifier for a particular study in the AusTraits database |
traits |
Table of trait data (for this function, just the data.csv file with custom_R_code applied) |
Tibble with context details if available
## Not run: process_format_contexts(read_metadata("data/Apgaua_2017/metadata.yml")$context, dataset_id, traits) ## End(Not run)
## Not run: process_format_contexts(read_metadata("data/Apgaua_2017/metadata.yml")$context, dataset_id, traits) ## End(Not run)
Format contributors, read in from the metadata.yml
file. Converts from list to tibble.
process_format_contributors(my_list, dataset_id, schema)
process_format_contributors(my_list, dataset_id, schema)
my_list |
List of input information |
dataset_id |
Identifier for a particular study in the AusTraits database |
schema |
Schema for traits.build |
Tibble with details of contributors
## Not run: process_format_contributors(read_metadata("data/Falster_2003/metadata.yml")$contributors) ## End(Not run)
## Not run: process_format_contributors(read_metadata("data/Falster_2003/metadata.yml")$contributors) ## End(Not run)
Format location data read in from the metadata.yml
file. Converts from list to tibble.
process_format_locations(my_list, dataset_id, schema)
process_format_locations(my_list, dataset_id, schema)
my_list |
List of input information |
dataset_id |
Identifier for a particular study in the AusTraits database |
schema |
Schema for traits.build |
Tibble with location details if available
## Not run: process_format_locations(read_metadata("data/Falster_2003/metadata.yml")$locations, "Falster_2003") ## End(Not run)
## Not run: process_format_locations(read_metadata("data/Falster_2003/metadata.yml")$locations, "Falster_2003") ## End(Not run)
Function to generate sequence of integer ids from vector of names Determines number of 00s needed based on number of records
process_generate_id(x, prefix, sort = FALSE)
process_generate_id(x, prefix, sort = FALSE)
x |
Vector of text to convert |
prefix |
Text to put before id integer |
sort |
Logical to indicate whether x should be sorted before ids are generated |
Vector of ids
Function to generate sequence of integer ids for methods
process_generate_method_ids(metadata_traits)
process_generate_method_ids(metadata_traits)
metadata_traits |
the traits section of the metadata |
Tibble with traits, methods, and method_id
Process a single dataset with dataset_id
using the associated data.csv
and
metadata.yml
files. Adds a unique observation id for each row of observation,
trait names are formatted using AusTraits accepted names and trait substitutions
are added. parse data
is used in the core workflow pipeline (i.e. in load study
).
process_parse_data(data, dataset_id, metadata, contexts, schema)
process_parse_data(data, dataset_id, metadata, contexts, schema)
data |
Tibble or dataframe containing the study data |
dataset_id |
Identifier for a particular study in the AusTraits database |
metadata |
Yaml file with metadata |
contexts |
Dataframe of contexts for this study |
schema |
Schema for traits.build |
Tibble in long format with AusTraits formatted trait names, trait substitutions and unique observation id added
Enforces some standards on species names.
process_standardise_names(x)
process_standardise_names(x)
x |
Vector, dataframe or list containing original species names |
Vector with standardised species names
Applies taxonomic updates to the study data from the metadata.yml
file.
process_taxonomic_updates(data, metadata)
process_taxonomic_updates(data, metadata)
data |
Tibble or dataframe containing the study data |
metadata |
Yaml file with metadata |
Tibble with the taxonomic updates applied
Creates the unit conversion name based on the original units and the units to be converted to.
process_unit_conversion_name(from, to)
process_unit_conversion_name(from, to)
from |
Character of original units |
to |
Character of units to be converted to |
Character string containing the name what units are being converted to
Reads in a csv file using the read_csv
function from readr
with columns as characters.
read_csv_char(...)
read_csv_char(...)
... |
Arguments passed to the |
A tibble
metadata.yml
file for a studyRead in a metadata.yml
file for a study
read_metadata(path)
read_metadata(path)
path |
Location of the metadata file |
metadata.yml
file for specified dataset_id
Read the metadata.yml
file for specified dataset_id
read_metadata_dataset(dataset_id, path_data = "data")
read_metadata_dataset(dataset_id, path_data = "data")
dataset_id |
Identifier for a particular study in the database |
path_data |
Path to folder with data |
A list with contents of metadata for specified dataset_id
Add an item to the end of a list
util_append_to_list(my_list, to_append)
util_append_to_list(my_list, to_append)
my_list |
A list |
to_append |
A list |
A list merged with an added item at the end
## Not run: util_append_to_list(as.list(dplyr::starwars)[c(1,2)], as.list(dplyr::starwars)[c(3,4)]) ## End(Not run)
## Not run: util_append_to_list(as.list(dplyr::starwars)[c(1,2)], as.list(dplyr::starwars)[c(3,4)]) ## End(Not run)
Convert BibEntry object to a list
util_bib_to_list(bib)
util_bib_to_list(bib)
bib |
BibEntry object |
List
util_check_all_values_in
checks if values in vector x are in y. Values in x may
contain multiple values separated by sep
so these are split first using str_split
.
util_check_all_values_in(x, y, sep = " ")
util_check_all_values_in(x, y, sep = " ")
x |
Vector |
y |
Vector |
sep |
Amount of space separating values to be split, default = " " (a single space) |
Vector of logical values
util_check_disallowed_chars
checks if values in a vector do not contain disallowed characters,
i.e. values outside of ASCII.
util_check_disallowed_chars(object)
util_check_disallowed_chars(object)
object |
Vector |
Vector of logical values
Convert all columns in data frame to character
util_df_convert_character(df)
util_df_convert_character(df)
df |
A dataframe |
A dataframe
lapply(traits.build:::util_df_convert_character(dplyr::starwars), class)
lapply(traits.build:::util_df_convert_character(dplyr::starwars), class)
Convert a dataframe to a named list, useful when converting to yaml.
util_df_to_list(df)
util_df_to_list(df)
df |
A dataframe |
A (yaml) list
util_df_to_list(dplyr::starwars)
util_df_to_list(dplyr::starwars)
Extract a trait element from the definitions$traits$elements
util_extract_list_element(i, my_list, var)
util_extract_list_element(i, my_list, var)
i |
A value within the definitions$traits$elements list which refers to types of traits |
my_list |
The list that contains the element we're interested in (i.e. definitions$traits$elements) |
var |
The type of variable of a trait |
The element/properties of a trait
## Not run: util_extract_list_element(1, definitions$traits$elements, "units") ## End(Not run)
## Not run: util_extract_list_element(1, definitions$traits$elements, "units") ## End(Not run)
Get SHA string for the latest commit on Github for the repository. SHA is the abbreviated SHA-1 40 digit hexadecimal number which Github uses as the Commit ID to track changes made to a repo
util_get_SHA(path = ".")
util_get_SHA(path = ".")
path |
root directory where a specified file is located, default file name is the remake.yml file |
40-digit SHA character string for the latest commit to the repository
Retrieve version for compilation from definitions
util_get_version(path = "config/metadata.yml")
util_get_version(path = "config/metadata.yml")
path |
path to traits definitions |
a string
Format table with kable and default styling for html
util_kable_styling_html(...)
util_kable_styling_html(...)
... |
Arguments passed to |
Convert a list of elements into a BibEntry object
util_list_to_bib(ref)
util_list_to_bib(ref)
ref |
List of elements for a reference |
BibEntry object
util_list_to_df1(my_list)
util_list_to_df1(my_list)
my_list |
A list with single entries |
A tibble with two columns
## Not run: util_list_to_df1(as.list(dplyr::starwars)[2]) ## End(Not run)
## Not run: util_list_to_df1(as.list(dplyr::starwars)[2]) ## End(Not run)
Convert a list of lists to dataframe; requires that every list have same named elements.
util_list_to_df2(my_list, as_character = TRUE, on_empty = NA)
util_list_to_df2(my_list, as_character = TRUE, on_empty = NA)
my_list |
A list of lists to dataframe |
as_character |
A logical value, indicating whether the values are read as character |
on_empty |
Value to return if my_list is NULL, NA or is length == 0, default = NA |
util_list_to_df2(util_df_to_list(dplyr::starwars))
util_list_to_df2(util_df_to_list(dplyr::starwars))
util_replace_null
converts NULL values a different value. Default is
converting NULL to NA.
util_replace_null(x, val = NA)
util_replace_null(x, val = NA)
x |
A NULL value or a non-NULL object |
val |
Specify what the null value should be returned as, default is NA |
NA or a non-NULL object
## Not run: util_replace_null(NULL) ## End(Not run)
## Not run: util_replace_null(NULL) ## End(Not run)
util_separate_and_sort
: For a vector x in which individual cell may have
multiple values (separated by 'sep'), sort records within each cell alphabetically.
util_separate_and_sort(x, sep = " ")
util_separate_and_sort(x, sep = " ")
x |
An individual cell with multiple values |
sep |
A separator, a whitespace is the default |
A vector of alphabetically sorted records
## Not run: util_separate_and_sort("z y x")
## Not run: util_separate_and_sort("z y x")
Standardise doi
util_standardise_doi(doi)
util_standardise_doi(doi)
doi |
doi of reference to add |
metadata.yml
for a studyWrite metadata.yml
file with custom R code formatted to allow line breaks.
write_metadata(data, path, style_code = FALSE)
write_metadata(data, path, style_code = FALSE)
data |
|
path |
Location where the metadata file is to be written to |
style_code |
Should the R code be styled? |
## Not run: f <- "data/Falster_2003/metadata.yml" data <- read_metadata(f) write_metadata(data, f) ## End(Not run)
## Not run: f <- "data/Falster_2003/metadata.yml" data <- read_metadata(f) write_metadata(data, f) ## End(Not run)
metadata.yml
for specified dataset_id
to
file data/dataset_id/metadata.yml
Write the YAML representation of metadata.yml
for specified dataset_id
to
file data/dataset_id/metadata.yml
write_metadata_dataset(metadata, dataset_id)
write_metadata_dataset(metadata, dataset_id)
metadata |
Metadata file |
dataset_id |
Identifier for a particular study in the database |
A yaml file
Export AusTraits version as plain text
write_plaintext(austraits, path)
write_plaintext(austraits, path)
austraits |
AusTraits database object |
path |
Pathway to save file |
csv files of tibbles containing traits, locations, contexts, methods, excluded_data, taxonomic updates, taxa, contributors