CLI Reference
hprobes provides three CLI subcommands: run, responses, and transfer.
hprobes run
Fit, score, and causally validate a probe on an MCQ dataset.
hprobes run \
--model google/gemma-3-4b-it \
--data dataset.jsonl \
--samples 500 \
--l1-c 0.01 \
--batch-size 8
Required Arguments
| Argument | Description |
|---|---|
--model |
HuggingFace model ID (e.g. google/gemma-3-4b-it) |
--data |
Path to .jsonl, .json, or .parquet dataset file |
Optional Arguments
| Argument | Default | Description |
|---|---|---|
--format |
auto |
Dataset format: auto, mmlu, medqa, medmcqa |
--samples |
-1 (all) |
Number of samples to use |
--no-contrastive |
— | Disable contrastive labeling (use binary mode) |
--output |
auto-named | Base path for output files (.json + .pkl) |
--device |
auto |
Device: auto, cpu, mps, cuda |
--dtype |
auto |
Precision: auto, float16, bfloat16, float32 |
--l1-c |
0.5 |
Inverse L1 regularisation strength (Python API default is 0.01) |
--seed |
42 |
Random seed |
--layer-stride |
1 |
Sample every Nth layer |
--validation-split |
0.2 |
Fraction held out for validation |
--max-tokens |
1024 |
Max input tokens before truncation |
--alphas |
0.0,0.5,1.0,1.5,2.0 |
Comma-separated alpha values for causal validation |
--batch-size |
1 |
Batch size for CETT extraction |
Output
Produces two files:
<output>.json— Human-readable results (H-Neurons, AUROC, accuracy, causal validation)<output>.pkl— Serialized classifier for later loading
hprobes responses
Fit from pre-generated responses with judge labels (open-ended / free-text mode).
hprobes responses \
--model google/gemma-3-4b-it \
--data responses.jsonl \
--question-key question \
--response-key response \
--answer-tokens-key answer_tokens \
--label-key judge
Required Arguments
| Argument | Description |
|---|---|
--model |
HuggingFace model ID |
--data |
Path to dataset file |
Additional Arguments
| Argument | Default | Description |
|---|---|---|
--question-key |
question |
Key for question text |
--response-key |
response |
Key for generated response |
--answer-tokens-key |
answer_tokens |
Key for answer token list |
--label-key |
judge |
Key for correctness label (true/false or 1/0) |
--aggregation |
mean |
How to aggregate CETT over the answer span: mean or max |
All shared arguments (--device, --dtype, --l1-c, etc.) are the same as hprobes run.
hprobes transfer
Score a saved probe on a different model (transfer experiment).
hprobes transfer \
--probe results/gemma_medqa \
--model google/medgemma-27b-text-it \
--data dataset.jsonl
Required Arguments
| Argument | Description |
|---|---|
--probe |
Base path of saved probe (e.g. results/gemma_medqa) |
--model |
HuggingFace model ID for the target model |
--data |
Path to dataset file |
Optional Arguments
| Argument | Default | Description |
|---|---|---|
--format |
auto |
Dataset format |
--samples |
-1 (all) |
Number of samples |
--output |
auto-named | Output path |
--device |
auto |
Device |
--dtype |
auto |
Precision |
--max-tokens |
1024 |
Max input tokens |
What Transfer Does
- Loads the saved classifier (
.pkl) and attaches it to the target model - Extracts CETT from the new model using the original layer/neuron mapping
- Normalizes features using the original training statistics
- Scores with the original classifier
- Computes AUROC and random baseline on the new data