Production Scoring

After fitting a probe, you can score new prompts without ground truth. This page covers the detect() and detect_batch() methods for production hallucination risk estimation.

How Detection Works

Detection requires no generation — only forward passes. For MCQ, the model never generates text. Instead:

One forward pass reads logits at the last prompt token
The highest-logit letter among A–J is the predicted answer
In contrastive mode, a second forward pass captures CETT at the predicted answer token
The trained classifier scores the CETT vector → hallucination probability

Single Sample

risk = probe.detect("Which organ is most affected? A) Heart B) Lung\n\nAnswer:")
print(f"Risk: {risk:.3f}")  # 0.0 = likely correct, 1.0 = likely hallucinating

If you already know the model's predicted letter (e.g. from your own inference pipeline), pass it to skip the first forward pass:

risk = probe.detect(prompt, answer_letter="C")

Note

The answer_letter parameter accepts letters A–J (case-insensitive). Passing an unrecognized letter raises ValueError.

Batched Scoring

detect_batch() uses GPU-vectorized CETT extraction for throughput:

prompts = [format_prompt(s) for s in samples]
scores = probe.detect_batch(prompts, batch_size=16)

# With pre-computed letters
letters = ["A", "C", "B", "D"]
scores = probe.detect_batch(prompts, answer_letters=letters, batch_size=16)

Tip

Use batch_size=8 or batch_size=16 on GPU. Larger batches are faster but use more VRAM.

Threshold-Based Decisions

After score(), the probe calibrates a threshold using Youden's J statistic:

probe.score()
print(probe.threshold_)  # e.g. 0.63

risk = probe.detect(prompt)
is_hallucinating = risk >= probe.threshold_

The threshold is persisted through save()/load():

probe.save("results/probe")
loaded = HProbes.load("results/probe", model, tokenizer)
print(loaded.threshold_)  # same value

Warning

The default threshold before calling score() is 0.5 (naive prior). Always call score() to calibrate before using in production.

Consistency Filtering

Enable consistency filtering during fit() to skip ambiguous samples:

probe = HProbes(model, tokenizer, n_consistency=10)
probe.fit(samples, options_key="choices", answer_key="answer")

With n_consistency=10:

For each sample, 10 draws from softmax(letter_logits) are made (single forward pass)
If all 10 agree on the same letter → sample is used
If draws disagree → sample is skipped (model is uncertain)

This filters out noisy training samples where the model's prediction is unstable, yielding cleaner H-Neuron discovery.

Save and Load for Production

# After fitting and scoring
probe.save("production/medqa_probe")

# Later, in your serving pipeline
from hprobes import HProbes
probe = HProbes.load("production/medqa_probe", model, tokenizer)

# Score incoming requests
for request in incoming:
    risk = probe.detect(request.prompt, answer_letter=request.predicted_letter)
    if risk >= probe.threshold_:
        flag_for_review(request)

Performance Considerations

Scenario	Forward Passes	Notes
`detect()` without `answer_letter`	2	Predicts letter internally
`detect()` with `answer_letter`	1	Fastest single-sample path
`detect_batch()` without `answer_letters`	2 per batch	Batched letter prediction
`detect_batch()` with `answer_letters`	1 per batch	Fastest batched path
Non-contrastive `detect()`	1	Always single pass