HuggingFace Dataset Support¶
Use well-known HF datasets as prompt sources — no datasets library needed, zero new dependencies.
Problem¶
The bundled 128-prompt datasets may not surface the refusal direction for all models. Standard abliteration tools (Heretic, Labonne tutorial) use larger, well-known HF datasets like mlabonne/harmful_behaviors (416 prompts) and mlabonne/harmless_alpaca (25k prompts). Vauban lets you reference these directly in TOML configs.
Quick Start¶
Short form¶
[model]
path = "mlx-community/Llama-3.2-3B-Instruct-4bit"
[data]
harmful = "hf:mlabonne/harmful_behaviors"
harmless = "hf:mlabonne/harmless_alpaca"
Full control¶
[model]
path = "mlx-community/Llama-3.2-3B-Instruct-4bit"
[data]
harmless = "default"
[data.harmful]
hf = "JailbreakBench/JBB-Behaviors"
split = "harmful"
column = "Goal"
limit = 200
Python API¶
from vauban import DatasetRef, load_hf_prompts, resolve_prompts
# Fetch from HuggingFace (cached after first call)
ref = DatasetRef("mlabonne/harmful_behaviors")
prompts = load_hf_prompts(ref)
print(len(prompts), prompts[0][:80])
# resolve_prompts() dispatches automatically
from pathlib import Path
local = resolve_prompts(Path("my_prompts.jsonl")) # reads JSONL
remote = resolve_prompts(DatasetRef("mlabonne/harmful_behaviors")) # fetches HF
TOML Syntax¶
Four ways to specify harmful / harmless in the [data] section:
| Syntax | Example | Description |
|---|---|---|
| Bundled default | harmful = "default" |
Uses the 128-prompt bundled dataset |
| Local file | harmful = "my_prompts.jsonl" |
Path relative to the TOML file |
| HF short form | harmful = "hf:mlabonne/harmful_behaviors" |
HF dataset with default settings |
| HF table form | [data.harmful] table |
Full control over split, column, limit |
HF table fields¶
| Field | Type | Default | Description |
|---|---|---|---|
hf |
string | required | HuggingFace repo ID (e.g. "mlabonne/harmful_behaviors") |
split |
string | "train" |
Dataset split to fetch |
column |
string | "prompt" |
Column name containing the prompt text |
config |
string | none | Dataset config name (for multi-config datasets) |
limit |
integer | none | Maximum number of prompts to fetch |
Mixed sources¶
You can mix any combination:
How It Works¶
Fetching¶
Prompts are fetched via the HuggingFace Datasets Server REST API — no datasets library install required. The API is public and rate-limited but generous for typical abliteration workloads.
The endpoint returns rows in pages of 100. Vauban paginates automatically until the limit is reached or the dataset is exhausted.
Caching¶
Fetched prompts are cached locally as JSONL files:
~/.cache/vauban/datasets/
mlabonne__harmful_behaviors/
default__train__prompt.jsonl
JailbreakBench__JBB-Behaviors/
default__harmful__Goal_limit200.jsonl
On subsequent runs, prompts are read from cache with no network call. Delete the cache directory to force a re-fetch:
No new dependencies¶
The implementation uses urllib.request (stdlib) for HTTP and json (stdlib) for parsing. huggingface_hub is available as a transitive dependency from mlx-lm but is not used — the REST API is simpler and sufficient.
API Reference¶
DatasetRef¶
@dataclass(frozen=True, slots=True)
class DatasetRef:
repo_id: str # e.g. "mlabonne/harmful_behaviors"
split: str = "train"
column: str = "prompt"
config: str | None = None
limit: int | None = None
Immutable reference to a HuggingFace dataset. Used as a value in PipelineConfig.harmful_path / harmless_path.
load_hf_prompts(ref: DatasetRef) -> list[str]¶
Fetch prompts from a HuggingFace dataset. Returns cached results on subsequent calls. Raises ValueError if the column is empty or missing.
resolve_prompts(source: Path | DatasetRef) -> list[str]¶
Unified dispatcher:
- Path -> reads local JSONL (delegates to load_prompts())
- DatasetRef -> fetches from HF (delegates to load_hf_prompts())
This is what run() uses internally. Call it directly when composing custom pipelines.
Common Datasets¶
| Dataset | Prompts | Column | Notes |
|---|---|---|---|
mlabonne/harmful_behaviors |
416 | prompt |
Standard harmful prompt set |
mlabonne/harmless_alpaca |
25k | prompt |
Large harmless set (use limit) |
JailbreakBench/JBB-Behaviors |
100 | Goal |
Split: harmful. Curated jailbreak goals |
tatsu-lab/alpaca |
52k | instruction |
General instructions (use limit) |
Example with a non-default column and split:
Troubleshooting¶
"No prompts found in column X"¶
The column name doesn't match. Check the dataset on HuggingFace to find the correct column name and specify it with column = "...".
Network errors¶
The Datasets Server API requires internet access. If you're behind a firewall or offline, pre-fetch the data and use a local JSONL file instead.
Large datasets are slow¶
Use limit to cap the number of prompts. For abliteration, 200-500 harmful prompts and a similar number of harmless prompts are typically sufficient: