Dataset Formats

Supported File Formats

hprobes accepts three file formats:

Format	Extension	Notes
JSON Lines	`.jsonl`	One JSON object per line (recommended)
JSON	`.json`	Array of objects
Parquet	`.parquet`	Apache Parquet columnar format

MCQ Sample Structure

Each sample is a dict with a question, answer options, and a ground-truth answer:

{
  "question": "Which organ is most commonly affected in sarcoidosis?",
  "options": {"A": "Heart", "B": "Lung", "C": "Liver", "D": "Kidney"},
  "answer": "B"
}

Options can also be a list (letters are assigned automatically):

{
  "question": "Which organ is most commonly affected?",
  "options": ["Heart", "Lung", "Liver", "Kidney"],
  "answer": "B"
}

Auto-Detected Formats

The CLI auto-detects three common MCQ dataset layouts:

Format	Options Key	Answer Key	Source
`mmlu`	`choices`	`answer`	MMLU / MMLU-Pro
`medqa`	`options`	`answer_idx`	MedQA
`medmcqa`	`options`	`cop`	MedMCQA

Auto-detection works by inspecting the keys in the first sample. Override with --format:

hprobes run --model google/gemma-3-4b-it --data dataset.jsonl --format mmlu

In Python, pass the keys directly:

probe.fit(samples, options_key="choices", answer_key="answer")

Answer Formats

The answer field is flexible. hprobes handles:

Input	Parsed As
`"A"`	A
`"a"`	A
`0`	A (zero-indexed)
`1`	B (one-indexed for medmcqa)
`["A"]`	A (list unwrapped)
`"Ans. The key is B..."`	B (free-text extraction)

Response-Based Format

For open-ended / free-text mode with fit_from_responses():

{
  "question": "What is the most common cause of community-acquired pneumonia?",
  "response": "The most common cause is Streptococcus pneumoniae, which accounts for...",
  "answer_tokens": ["Streptococcus", "pneumoniae"],
  "judge": "true"
}

Key	Description
`question`	The input question
`response`	The model's full generated response
`answer_tokens`	List of token strings marking the factual answer span
`judge`	Correctness label: `"true"`/`"false"` or `1`/`0`

Custom Prompt Functions

For non-standard formats, pass a custom prompt_fn:

def my_formatter(sample):
    q = sample["query"]
    opts = "\n".join(f"{k}) {v}" for k, v in sample["choices"].items())
    return f"{q}\n{opts}"

probe.fit(samples, prompt_fn=my_formatter, answer_key="correct")