
Generate an HTCondor executable shell script for an R job
Source:R/htc-gen-executable.R
htc_gen_executable.Rdhtc_gen_executable() writes a ready-to-use bash script (.sh) that
HTCondor runs inside the container for each job. The script changes to
HTCondor's writable scratch directory, creates a results folder, runs
the R script via Rscript using absolute paths to the baked-in files,
and compresses the results into a tarball for transfer back to the
submit node.
Usage
htc_gen_executable(
output_file = "job.sh",
r_script = NULL,
data_files = NULL,
results_folder = "results",
home_dir = "/home",
mode = "single",
set_executable = TRUE,
verbose = FALSE,
comments = FALSE,
output = "."
)Arguments
- output_file
A character string. Name of the shell script to write. Must end in
".sh". Defaults to"job.sh".- r_script
A character string. Name of the R script that HTCondor will run inside the container, e.g.
"analysis.R". Must be supplied explicitly – there is no default. If you usedtoolero::create_qmd(use_purl = TRUE), the script is the.Rfile produced bypurl.Rafter rendering.- data_files
A character vector or
NULL. Paths to data files baked into the container that should be passed to the R script as positional arguments. These are converted to absolute paths inside the container (e.g."data-raw/sample.csv"becomes"/home/data-raw/sample.csv"). The R script receives them viacommandArgs(trailingOnly = TRUE). Defaults toNULL.- results_folder
A character string. Name of the folder created in the scratch directory to hold job outputs before compression. Defaults to
"results".- home_dir
A character string. The working directory inside the container where baked-in files live. Used to construct absolute paths for
Rscriptand data file arguments. Must match thehome_dirused incontainr::generate_dockerfile(). Defaults to"/home".- mode
A character string. Execution mode.
"single"(the default) runs the R script with only the data file arguments (if any), producing a single fixed-name results tarball."multiple"also passes the subset filename as the first positional argument via${1}, producing a per-job tarball named${1}-results.tar.gz. Must match themodeused inhtc_gen_submit().- set_executable
Logical. If
TRUE, sets executable permissions on the generated script viaSys.chmod()so the file is ready to copy to the CHTC submit node without any additional steps. Defaults toTRUE. Set toFALSEif you prefer to manage permissions manually, in which case you must runchmod +xon the script before submitting your job.- verbose
Logical. If
TRUE, prints progress messages as each section of the script is written. Defaults toFALSE.- comments
Logical. If
TRUE, annotates each section with an explanatory comment describing what the line does. Useful for researchers learning the HTCondor executable script conventions. Defaults toFALSE.- output
A character string. Directory where the shell script will be written. Defaults to
"."(current working directory).
Value
Called for its side effects. Writes a bash script to
file.path(output, output_file) and sets executable permissions when
set_executable = TRUE. Returns invisible(NULL).
How file paths work inside the container
The generated script uses two directories:
Reading – the R script and data files are baked into the container
at build time by containr::generate_dockerfile(). They live under
home_dir (default "/home"). The Rscript line uses an absolute
path (e.g. Rscript /home/analysis.R) so the script is found
regardless of the working directory.
Writing – the script changes to HTCondor's scratch directory
(_CONDOR_SCRATCH_DIR) before creating the results folder. This
directory is writable and is where HTCondor looks for
transfer_output_files. The R script writes outputs to "results/"
using a relative path, which resolves to the scratch directory.
This separation means the R script stays portable – "results/" works
in RStudio, in quarto render, and on HTCondor – while the .sh
script handles the HTCondor-specific directory setup.
Relationship to htc_gen_submit()
The executable script generated by htc_gen_executable() is the file
referenced by the executable argument in htc_gen_submit(). The two
functions should always use the same mode. In "multiple" mode,
HTCondor passes each subset filename to the script as ${1}, which is
forwarded to the R script as a positional argument. The R script must
be written to accept this argument – the recommended approach is
toolero::detect_execution_context():
context <- toolero::detect_execution_context()
input_file <- switch(context,
interactive = "data-raw/sample.csv",
quarto = params$input_file,
rscript = commandArgs(trailingOnly = TRUE)[1]
)Examples
# Single-job executable script with baked-in data
htc_gen_executable(
r_script = "analysis.R",
data_files = "data-raw/sample.csv",
output = tempdir()
)
# Multiple-job executable script
htc_gen_executable(
r_script = "analysis.R",
mode = "multiple",
output = tempdir()
)
# Custom names with annotations
htc_gen_executable(
output_file = "run.sh",
r_script = "run-analysis.R",
data_files = c("data-raw/train.csv", "data-raw/test.csv"),
comments = TRUE,
verbose = TRUE,
output = tempdir()
)
#> Writing shebang line
#> Writing working directory change
#> Writing results folder creation
#> Writing Rscript execution line (mode: single)
#> Writing compression line
#> Set executable permissions on /tmp/RtmpCr56QM/run.sh
#> ✔ Executable script written to /tmp/RtmpCr56QM/run.sh