Generate an HTCondor executable shell script for an R job

htc_gen_executable() writes a ready-to-use bash script (.sh) that HTCondor runs inside the container for each job. The script changes to HTCondor's writable scratch directory, creates a results folder, runs the R script via Rscript using absolute paths to the baked-in files, and compresses the results into a tarball for transfer back to the submit node.

Usage

htc_gen_executable(
  output_file = "job.sh",
  r_script = NULL,
  data_files = NULL,
  results_folder = "results",
  home_dir = "/home",
  mode = "single",
  set_executable = TRUE,
  verbose = FALSE,
  comments = FALSE,
  output = "."
)

Arguments

output_file: A character string. Name of the shell script to write. Must end in ".sh". Defaults to "job.sh".
r_script: A character string. Name of the R script that HTCondor will run inside the container, e.g. "analysis.R". Must be supplied explicitly – there is no default. If you used toolero::create_qmd(use_purl = TRUE), the script is the .R file produced by purl.R after rendering.
data_files: A character vector or NULL. Paths to data files baked into the container that should be passed to the R script as positional arguments. These are converted to absolute paths inside the container (e.g. "data-raw/sample.csv" becomes "/home/data-raw/sample.csv"). The R script receives them via commandArgs(trailingOnly = TRUE). Defaults to NULL.
results_folder: A character string. Name of the folder created in the scratch directory to hold job outputs before compression. Defaults to "results".
home_dir: A character string. The working directory inside the container where baked-in files live. Used to construct absolute paths for Rscript and data file arguments. Must match the home_dir used in containr::generate_dockerfile(). Defaults to "/home".
mode: A character string. Execution mode. "single" (the default) runs the R script with only the data file arguments (if any), producing a single fixed-name results tarball. "multiple" also passes the subset filename as the first positional argument via ${1}, producing a per-job tarball named ${1}-results.tar.gz. Must match the mode used in htc_gen_submit().
set_executable: Logical. If TRUE, sets executable permissions on the generated script via Sys.chmod() so the file is ready to copy to the CHTC submit node without any additional steps. Defaults to TRUE. Set to FALSE if you prefer to manage permissions manually, in which case you must run chmod +x on the script before submitting your job.
verbose: Logical. If TRUE, prints progress messages as each section of the script is written. Defaults to FALSE.
comments: Logical. If TRUE, annotates each section with an explanatory comment describing what the line does. Useful for researchers learning the HTCondor executable script conventions. Defaults to FALSE.
output: A character string. Directory where the shell script will be written. Defaults to "." (current working directory).

Value

Called for its side effects. Writes a bash script to file.path(output, output_file) and sets executable permissions when set_executable = TRUE. Returns invisible(NULL).

How file paths work inside the container

The generated script uses two directories:

Reading – the R script and data files are baked into the container at build time by containr::generate_dockerfile(). They live under home_dir (default "/home"). The Rscript line uses an absolute path (e.g. Rscript /home/analysis.R) so the script is found regardless of the working directory.

Writing – the script changes to HTCondor's scratch directory (_CONDOR_SCRATCH_DIR) before creating the results folder. This directory is writable and is where HTCondor looks for transfer_output_files. The R script writes outputs to "results/" using a relative path, which resolves to the scratch directory.

This separation means the R script stays portable – "results/" works in RStudio, in quarto render, and on HTCondor – while the .sh script handles the HTCondor-specific directory setup.

Relationship to htc_gen_submit()

The executable script generated by htc_gen_executable() is the file referenced by the executable argument in htc_gen_submit(). The two functions should always use the same mode. In "multiple" mode, HTCondor passes each subset filename to the script as ${1}, which is forwarded to the R script as a positional argument. The R script must be written to accept this argument – the recommended approach is toolero::detect_execution_context():

context <- toolero::detect_execution_context()

input_file <- switch(context,
  interactive = "data-raw/sample.csv",
  quarto      = params$input_file,
  rscript     = commandArgs(trailingOnly = TRUE)[1]
)

Examples

# Single-job executable script with baked-in data
htc_gen_executable(
  r_script   = "analysis.R",
  data_files = "data-raw/sample.csv",
  output     = tempdir()
)

# Multiple-job executable script
htc_gen_executable(
  r_script = "analysis.R",
  mode     = "multiple",
  output   = tempdir()
)

# Custom names with annotations
htc_gen_executable(
  output_file = "run.sh",
  r_script    = "run-analysis.R",
  data_files  = c("data-raw/train.csv", "data-raw/test.csv"),
  comments    = TRUE,
  verbose     = TRUE,
  output      = tempdir()
)
#> Writing shebang line
#> Writing working directory change
#> Writing results folder creation
#> Writing Rscript execution line (mode: single)
#> Writing compression line
#> Set executable permissions on /tmp/RtmpCr56QM/run.sh
#> ✔ Executable script written to /tmp/RtmpCr56QM/run.sh