run_clean_checks.RdMain dispatch function to run multiple cleaning operations on a dataset This currently includes a function to check for negative values and percent ranges. The functionality to add other checks exists by using a custom function.
run_clean_checks(
data,
dataset_name,
diagnostics_dir = "data/clean/diagnostics",
negative_cols = NULL,
percent_cols = NULL,
custom_checks = NULL,
verbose = FALSE
)Data frame to clean
Name of dataset (for logging and file naming)
Directory to save diagnostic CSV files (created if doesn't exist)
Character vector of columns to check for negative values (optional)
Character vector of columns to check for percent range (optional)
List of custom cleaning functions (optional)
Logical, print detailed progress messages (default: FALSE)
List with cleaned data, issue log, and all diagnostic details
if (FALSE) { # \dontrun{
# Basic usage
result <- run_clean_checks(
data = my_data,
dataset_name = "water_quality_2024",
negative_cols = c("temperature", "do_mgl"),
percent_cols = c("do_pct_sat", "turbidity_pct")
)
# Access cleaned data
clean_data <- result$data
# View issue log
print(result$issue_log)
# Save issue log
write.csv(result$issue_log, "cleaning_log.csv", row.names = FALSE)
} # }