Skip to contents

Splits a data frame by a single grouping column and writes each group to a separate CSV file. Optionally writes a manifest file listing the output files, their group values, and row counts.

Usage

write_by_group(data, group_col, output_dir = NULL, manifest = FALSE)

Arguments

data

A data frame or tibble to split and save.

group_col

A string. The name of the column to group by.

output_dir

A string or NULL. Path to the directory where output files will be written. Created if it does not exist. If NULL, the user must supply a path explicitly.

manifest

A logical. Whether to write a manifest.csv file to output_dir listing the output files, group values, and row counts. Defaults to FALSE.

Value

Invisibly returns output_dir.

Details

Output filenames are derived from the group values of group_col. Values are sanitized before use as filenames: converted to lowercase, spaces and special characters replaced with -, consecutive dashes collapsed, and leading/trailing dashes stripped.

If manifest = TRUE, a manifest.csv is written to output_dir containing three columns: group_value, n_rows, and file_path.

Note: output_dir has no default value. Always supply an explicit path to avoid writing files to unexpected locations. Use tempdir() for temporary output during testing or exploration.

Examples

# \donttest{
# Split a small data frame by group and write to a temp directory
data <- data.frame(
  species = c("Adelie", "Adelie", "Gentoo"),
  mass    = c(3750, 3800, 5000)
)
write_by_group(data, group_col = "species", output_dir = tempdir())
#>  Written "Adelie" (2 rows) to /tmp/Rtmpip9DJO/adelie.csv
#>  Written "Gentoo" (1 rows) to /tmp/Rtmpip9DJO/gentoo.csv

# Same but also write a manifest
write_by_group(data, group_col = "species",
               output_dir = tempdir(), manifest = TRUE)
#>  Written "Adelie" (2 rows) to /tmp/Rtmpip9DJO/adelie.csv
#>  Written "Gentoo" (1 rows) to /tmp/Rtmpip9DJO/gentoo.csv
#>  Manifest written to /tmp/Rtmpip9DJO/manifest.csv
# }