Main dispatch function to run multiple cleaning operations on a dataset This currently includes a function to check for negative values and percent ranges. The functionality to add other checks exists by using a custom function.

run_clean_checks(
  data,
  dataset_name,
  diagnostics_dir = "data/clean/diagnostics",
  negative_cols = NULL,
  percent_cols = NULL,
  custom_checks = NULL,
  verbose = FALSE
)

Arguments

data

Data frame to clean

dataset_name

Name of dataset (for logging and file naming)

diagnostics_dir

Directory to save diagnostic CSV files (created if doesn't exist)

negative_cols

Character vector of columns to check for negative values (optional)

percent_cols

Character vector of columns to check for percent range (optional)

custom_checks

List of custom cleaning functions (optional)

verbose

Logical, print detailed progress messages (default: FALSE)

Value

List with cleaned data, issue log, and all diagnostic details

Examples

if (FALSE) { # \dontrun{
# Basic usage
result <- run_clean_checks(
  data = my_data,
  dataset_name = "water_quality_2024",
  negative_cols = c("temperature", "do_mgl"),
  percent_cols = c("do_pct_sat", "turbidity_pct")
)

# Access cleaned data
clean_data <- result$data

# View issue log
print(result$issue_log)

# Save issue log
write.csv(result$issue_log, "cleaning_log.csv", row.names = FALSE)
} # }