
Getting started with toolero
Erwin Lares
Created 2026-04-30 | Last updated 2026-05-21
Source:vignettes/toolero-intro.Rmd
toolero-intro.Rmd
Background and motivation
toolero grew out of a recurring observation made while
teaching and supporting researchers at UW-Madison: the habits that make
a project reproducible, shareable, and maintainable are easiest to adopt
at the very beginning — and hardest to retrofit once a project is
already underway.
The package is heavily influenced by the workflows taught in
workshops run by The Carpentries
and the UW-Madison
Libraries. Those workshops emphasize consistent project
organization, version control, and reproducible data practices as
foundational skills — not advanced topics. toolero tries to
operationalize those principles into a small set of functions that
reduce the friction of doing the right thing from the start.
The theming and branding support in toolero is
specifically tailored to UW-Madison’s. These Quarto-based reporting
templates are baked into the package as defaults. If you are not at
UW-Madison, the branding files are optional – and you can replace with
your own. the rest of the package works independently of them.
Who is this for?
toolero is designed for researchers and analysts
who:
- Work primarily in R and use RStudio as their IDE
- Write reports or analyses in Quarto
- Want consistent, reproducible project structure without having to think about it every time
- May need to publish content to the UW-Madison Knowledge Base
The package is intentionally small. It does not try to be comprehensive. It tries to make the right defaults easy to reach for from the first line of code.
Installation
You can install toolero from CRAN:
install.packages("toolero")Or install the development version from GitHub:
pak::pak("erwinlares/toolero")Project setup: init_project() and
create_qmd()
These two functions are designed to be used together, in order.
init_project() creates the scaffold;
create_qmd() populates it with a working Quarto
document.
Starting with init_project()
Starting a new R project usually means the same manual steps every
time: create a folder, set up an RStudio project, create subdirectories
for data and scripts, initialize renv, initialize
git. None of these steps is hard on its own, but skipping
any of them — especially early on — tends to create friction later.
init_project() handles all of this in a single call:
library(toolero)
init_project(path = "~/Documents/my-project")This creates a new RStudio project at the specified path with the following folder structure already in place:
my-project/
├── data/ # input data
├── data-raw/ # original, unprocessed data
├── R/ # reusable functions
├── scripts/ # analysis scripts
├── plots/ # generated visualizations
├── images/ # static images and assets
├── results/ # processed outputs and tables
└── docs/ # notes, manuscripts, Quarto documents
Why this structure? The folder layout is opinionated but not arbitrary. Separating
data/fromdata-raw/makes it clear which files are original and which have been processed. KeepingR/distinct fromscripts/encourages moving reusable logic into functions over time, which is a natural step toward more maintainable code.
By default, init_project() also initializes
renv and git. This means the project is
reproducible and version-controlled from the first commit.
Why
renvandgitby default?renvensures that the packages your project depends on are recorded and reproducible.gitprovides a full history of changes. Both are much easier to set up at the start than to retrofit later.
If your project needs folders beyond the defaults:
init_project(
path = "~/Documents/my-project",
extra_folders = c("notebooks", "presentations")
)To apply UW-Madison branding assets to the project:
init_project(
path = "~/Documents/my-project",
uw_branding = TRUE
)This creates an assets/ folder and populates it with
styles.css, header.html, and
rci-banner.png — the same assets used in the Quarto
template scaffolded by create_qmd().
Adding a Quarto document with create_qmd()
Once the project exists, create_qmd() adds a working
Quarto document to it. The function has two modes controlled by
include_examples, and several optional features that can be
mixed and matched.
With examples (the default)
When include_examples = TRUE (the default),
create_qmd() scaffolds a complete, runnable analysis
project:
create_qmd(path = "~/Documents/my-project", filename = "analysis.qmd")This creates:
-
analysis.qmd– a Quarto document with a fully populated YAML header, three-context input resolution viadetect_execution_context(), a grouped summary, a scatterplot, and a results-saving section. The document is ready to render immediately. -
data-raw/sample.csv– a subset of the Palmer Penguins dataset to develop against. The template references this file in theparamsblock of the YAML header. -
assets/logo.png– a placeholder logo that reads “your logo goes here.” Replace it with your own branding when you’re ready. -
_quarto.yml– a project file with a post-render hook that runspurl.R -
R/purl.R– extracts R code from the rendered document into a companion.Rfile automatically on every render
The idea is that you can render the document as-is, see results, and then progressively replace the sample analysis with your own. The sample data, the analysis blocks, and the results-saving pattern are all working examples you can study before modifying.
Without examples
When include_examples = FALSE, create_qmd()
creates a minimal skeleton with no sample data and no pre-filled
analysis:
create_qmd(
path = "~/Documents/my-project",
filename = "analysis.qmd",
include_examples = FALSE
)This creates:
-
analysis.qmd– a Quarto document with the YAML header (title, author, format settings) and a setup chunk that loadslibrary(toolero). The body has a single## Introductionheading and an HTML comment prompting you to add your content. Noparamsblock, no analysis code, no references to sample data. -
_quarto.ymlandR/purl.R– the purl hook is included by default regardless ofinclude_examples
No data-raw/ folder is created. No
sample.csv is copied. No placeholder logo is placed in
assets/. The document is a blank canvas with just enough
structure to render.
Use this mode when you already know what your analysis looks like and don’t need the worked example as a starting point.
Custom styling
The use_style argument controls whether CSS and header
assets are wired into the YAML. It works independently of
include_examples:
# Blank document with UW branding (assumes init_project(uw_branding = TRUE) was called)
create_qmd(
path = "~/Documents/my-project",
filename = "report.qmd",
include_examples = FALSE,
use_style = TRUE
)
# Blank document with custom branding from a different directory
create_qmd(
path = "~/Documents/my-project",
filename = "report.qmd",
include_examples = FALSE,
use_style = "my-branding/"
)When use_style = TRUE, the function scans
assets/ for .css and .html files
and adds them to the YAML (css: and
include-before-body: respectively). When
use_style is a directory path, it scans that directory
instead. If the directory contains multiple .css or
.html files, the function errors and asks you to specify
which one via yaml_data.
Styling is managed by init_project(uw_branding = TRUE),
which copies the UW-Madison branding files into assets/.
The create_qmd() function does not copy style assets itself
– it only wires up what’s already there.
The purl hook
By default, create_qmd() scaffolds a post-render hook
that extracts R code from the rendered document into a companion
.R file:
-
_quarto.yml– contains thepost-render: ["Rscript R/purl.R"]directive -
R/purl.R– scans the project root for.qmdfiles and runsknitr::purl()on each one
The hook runs automatically on every quarto render, so
the .R file always reflects the current state of the
.qmd. This is useful for sharing the analysis as a script,
running it on a remote cluster via submitr, or archiving
the code independently of the document.
Set use_purl = FALSE to skip the hook if you don’t need
the .R companion:
create_qmd(
path = "~/Documents/my-project",
filename = "notes.qmd",
use_purl = FALSE,
include_examples = FALSE
)Pre-populating the YAML header
The yaml_data argument accepts a path to a YAML file
whose keys overwrite the corresponding placeholders in the template.
Keys not present in the file are left as-is:
create_qmd(
path = "~/Documents/my-project",
filename = "analysis.qmd",
yaml_data = "~/my-metadata.yml"
)Where my-metadata.yml might look like:
This works with both include_examples = TRUE and
FALSE. When combined with use_style, the YAML
substitution runs after the style injection, so yaml_data
can override any auto-generated keys if needed.
Summary of what gets created
| File | include_examples = TRUE |
include_examples = FALSE |
|---|---|---|
analysis.qmd |
Full example with analysis blocks and params
|
Skeleton with YAML and empty body |
data-raw/sample.csv |
Yes | No |
assets/logo.png |
Yes | No |
_quarto.yml |
Yes (when use_purl = TRUE) |
Yes (when use_purl = TRUE) |
R/purl.R |
Yes (when use_purl = TRUE) |
Yes (when use_purl = TRUE) |
| CSS/header in YAML | Only when use_style is set |
Only when use_style is set |
Working with data: read_clean_csv() and
write_by_group()
These two functions address common friction points in day-to-day data
work. They are general-purpose utilities — useful in any R project, not
just ones set up with toolero.
Reading data with read_clean_csv()
read_clean_csv() combines
readr::read_csv(), janitor::clean_names(), and
optionally tidyr::drop_na() into a single call. The goal is
to get from a raw CSV to a clean, analysis-ready tibble in one step.
The simplest call reads the file and standardizes column names:
data <- read_clean_csv("data/my-file.csv")Column names are automatically converted to lowercase with
underscores – consistent, predictable, and tidyverse-friendly. A column
called First Name becomes first_name.
Q1 Revenue ($) becomes q1_revenue.
Handling missing values
By default, read_clean_csv() treats empty strings and
"NA" as missing – the same behavior as
readr::read_csv(). If your data uses other conventions for
missing values, pass them via the na argument:
# Treat dots, dashes, and -999 as missing in addition to blanks and "NA"
data <- read_clean_csv("data/my-file.csv", na = c("", "NA", "N/A", ".", "-999"))Dropping rows with missing values
The drop_na argument controls whether incomplete rows
are removed after reading. It accepts three forms:
# Keep all rows, including those with missing values (default)
data <- read_clean_csv("data/my-file.csv", drop_na = FALSE)
# Drop any row that has a missing value in any column
data <- read_clean_csv("data/my-file.csv", drop_na = TRUE)
# Drop rows only where specific columns are missing
data <- read_clean_csv("data/my-file.csv", drop_na = c("bill_length_mm", "sex"))When rows are dropped, a message reports how many were removed and how many remain. This makes the data cleaning step visible in your console output rather than happening silently.
Ingest summary
Set summary = TRUE to print a brief report after
reading: row and column counts, how many column names were cleaned, and
the total number of missing values. The summary reflects the final state
of the tibble after any drop_na action:
data <- read_clean_csv("data/my-file.csv", summary = TRUE)Seeing column type messages
By default, the column type messages from readr are
suppressed to keep the console clean. Set verbose = TRUE to
see them – useful when debugging unexpected column types:
data <- read_clean_csv("data/my-file.csv", verbose = TRUE)Combining arguments
The arguments compose naturally. A common pattern for a first look at a new dataset combines custom missing value codes, row dropping, and the ingest summary:
data <- read_clean_csv(
"data/my-file.csv",
na = c("", "NA", "N/A", "."),
drop_na = TRUE,
summary = TRUE
)Passing arguments through to readr
Any additional arguments are forwarded to
readr::read_csv() via .... This means you can
use col_types to force column types, skip to
skip header rows, or locale to handle non-standard decimal
separators:
# Force a column to character instead of letting readr guess
data <- read_clean_csv("data/my-file.csv", col_types = cols(zip_code = col_character()))
# Skip the first two rows (e.g. metadata rows before the header)
data <- read_clean_csv("data/my-file.csv", skip = 2)
# Handle European-style decimals (comma as decimal separator)
data <- read_clean_csv("data/my-file.csv", locale = locale(decimal_mark = ","))Splitting data by group with write_by_group()
When a data frame contains multiple groups that need to be written to
separate files, write_by_group() handles the split and the
write in a single call:
write_by_group(
data = penguins,
group_col = "species",
output_dir = "results/by-species"
)Output filenames are derived from the group values and sanitized for
use as file names — converted to lowercase with spaces and special
characters replaced by dashes. A group called Chinstrap
becomes chinstrap.csv. Palmer Penguins would
become palmer-penguins.csv.
To also write a manifest listing the output files, group values, and row counts:
write_by_group(
data = penguins,
group_col = "species",
output_dir = "results/by-species",
manifest = TRUE
)Execution context: detect_execution_context()
R code often needs to behave differently depending on where it is
running — interactively in RStudio, during a quarto render,
or as a batch Rscript job on a remote cluster.
detect_execution_context() identifies which of these three
environments is active and returns one of "interactive",
"quarto", or "rscript".
The canonical use case is resolving input file paths portably:
context <- detect_execution_context()
input_file <- switch(context,
interactive = "data/sample.csv",
quarto = params$input_file,
rscript = commandArgs(trailingOnly = TRUE)[1]
)This pattern is built into the template scaffolded by
create_qmd(), so you get it for free without having to
write it yourself.
Knowledge Base export: generate_kb_xml()
This section is relevant only if you publish content to the UW-Madison Knowledge Base. If you do not, you can safely skip it.
The UW-Madison Knowledge Base requires content to be submitted as XML
with all visual assets embedded in the HTML body.
generate_kb_xml() automates this process entirely.
generate_kb_xml(
html_path = "docs/analysis.html",
output_dir = "exports"
)The function:
- Infers the source
.qmdfrom the HTML path (or accepts it explicitly viaqmd_path) - Re-renders the document with
embed-resources: trueso all CSS, images, and JavaScript are self-contained - Extracts metadata from the
.qmdYAML header —title→kb_title,description→kb_summary,categories→kb_keywords - Produces a
.xmlfile ready for direct KB import
This is why the description and categories
fields in the create_qmd() template matter — they flow
through automatically into the KB article metadata without any extra
work.
When importing into the KB, check the Decode HTML entity in body content option.