Skip to contents

Purpose

Automated grading of student R scripts requires two things: capturing what a student’s script produces, and checking whether those objects meet substantive expectations. Existing tools handle one or the other — but not in a way designed for the heterogeneous output of a political science methods course, where a single problem set might ask for a cleaned data frame, a ggplot2 visualization, an OLS or fixed-effects model, and a formatted regression table.

Robjgrader provides a unified pipeline for this task. It records every analytical object a student’s script assigns or prints, and exposes a flexible validation interface that checks objects by name, type, or structural similarity against a reference solution — without requiring students to follow a fixed naming convention. Written answers can be graded via an LLM backend (Groq or any OpenAI-compatible endpoint), with optional per-student feedback that is constrained to be constructive, precise, and written in plain English.

Installation

# remotes::install_github("NiklasHaehn/robjgrader")
library(Robjgrader)

Workflow

A typical autograder script has three stages:

library(Robjgrader)

# 1. Record all objects produced by the student's script
records <- source_student_file()

# 2. Validate individual objects
res_df <- validate(records, "clean_data",
  checks = list(nrow = 1000, required = c("year", "country", "gdp_pc"))
)

res_model <- validate(records, "ols_model",
  checks = list(estimator = "lm", required = c("gdp_pc", "polity2"))
)

# 3. Run all test cases and write Gradescope-compatible JSON
test_cases <- list(
  list(name = "clean_data", result = res_df,    max_score = 20),
  list(name = "ols_model",  result = res_model, max_score = 30)
)

run_autograder(test_cases)

run_autograder() writes results to /autograder/results/results.json (Gradescope format) and prints a summary to the console.

Grading Written Answers

Problem sets that include written interpretation questions can be graded via validate_text(), which sends the student’s answer to an OpenAI-compatible LLM endpoint (default: Groq, temperature = 0 for deterministic results). Pass a question and rubric and the full grading prompt is built automatically.

Students can submit all written answers in a single file. Use split_student_text() to extract individual sections before grading. Sections are detected automatically: Markdown files are split on # headings; plain-text and PDF files are split on labelled prefixes (Q1:, Question 1., 1., 1)) with a double-blank-line fallback.

# Student submits a single Markdown file:
#
#   # Q3: Interpretation
#   A 1-unit increase in log GDP per capita is associated with...
#
#   # Q4: Fixed Effects
#   After adding country fixed effects, the coefficient...

text <- find_student_text()            # auto-discovers the text file

res_q3 <- validate_text(
  text       = text,
  section    = "Q3",                   # matched by heading name
  n_sections = 2,
  question   = "Interpret the coefficient on gdp_pc in your regression.",
  rubric     = c(
    direction    = "Correctly identifies the sign of the coefficient.",
    magnitude    = "Interprets the substantive size of the effect.",
    significance = "Mentions statistical significance and its implications."
  ),
  reference  = "solution_q3.txt",      # optional model answer for comparison
  feedback   = TRUE                    # include per-student written feedback
)

res_q4 <- validate_text(
  text       = text,
  section    = 2L,                     # matched by position
  n_sections = 2,
  question   = "What changes in the fixed-effects model and why?",
  rubric     = c(direction_change = "...", mechanism = "...")
)

When feedback = TRUE, the LLM returns a short comment (max 3 sentences / 200 words) included in the Gradescope output for incorrect answers. Feedback is constrained to be constructive, precise, and in plain English — no greetings, sign-offs, or filler phrases.

Text results integrate directly into run_autograder() alongside object-based results. Partial credit is supported: the LLM returns a normalized score (0–1) which is multiplied by max_score:

test_cases <- list(
  list(name = "OLS model",      result = res_model, max_score = 30),
  list(name = "Interpretation", result = res_q3,    max_score = 20,
       visibility = "after_published")
)

run_autograder(test_cases)

Key Functions

Recording

Function Description
record_script() Parse and evaluate a student script expression by expression, capturing all assigned and printed objects
source_student_file() Locate and record a student R submission automatically, excluding the calling script
get_records() Retrieve and filter recorded objects by type
grab() Wrap a single object from the global environment into a one-element records list

Validation

Function Description
validate() Validate a recorded object by name, match criteria, or reference object
validate_text() Grade a written answer via an LLM, with optional rubric, reference answer, section extraction, and per-student feedback
split_student_text() Split a multi-question text file into named sections (Markdown headings or numbered prefixes)
find_student_text() Auto-discover a student text submission file in the working directory
read_student_text() Read a .txt, .md, or .pdf file into a character string

Running

Function Description
run_autograder() Execute a list of test cases and produce Gradescope-compatible output
result_to_outcome() Convert a validation result to a "SUCCESS" string or formatted failure message

Supported Object Types

  • Data frames — dimensions, column names, column types
  • ggplot2 plots — geoms, aesthetic mappings, facets, labels, scales
  • Modelslm, glm, fixest (with fixed effects and clustering), lmer/glmer
  • Regression tablesgt, flextable, tinytable, huxtable; checks for model count, terms, GOF rows, and column labels

Gradescope Integration

The package is designed for deployment on Gradescope via a Docker-based autograder. run_autograder() writes its output directly to the path Gradescope expects. Partial credit, per-test score weights, and descriptive failure messages are all supported.