HuggingFace Dataset Support¶

Use well-known HF datasets as prompt sources — no datasets library needed, zero new dependencies.

Problem¶

The bundled 128-prompt datasets may not surface the refusal direction for all models. Standard abliteration tools (Heretic, Labonne tutorial) use larger, well-known HF datasets like mlabonne/harmful_behaviors (416 prompts) and mlabonne/harmless_alpaca (25k prompts). Vauban lets you reference these directly in TOML configs.

Quick Start¶

Short form¶

[model]
path = "mlx-community/Llama-3.2-3B-Instruct-4bit"

[data]
harmful = "hf:mlabonne/harmful_behaviors"
harmless = "hf:mlabonne/harmless_alpaca"

Full control¶

[model]
path = "mlx-community/Llama-3.2-3B-Instruct-4bit"

[data]
harmless = "default"

[data.harmful]
hf = "JailbreakBench/JBB-Behaviors"
split = "harmful"
column = "Goal"
limit = 200

Python API¶

from vauban import DatasetRef, load_hf_prompts, resolve_prompts

# Fetch from HuggingFace (cached after first call)
ref = DatasetRef("mlabonne/harmful_behaviors")
prompts = load_hf_prompts(ref)
print(len(prompts), prompts[0][:80])

# resolve_prompts() dispatches automatically
from pathlib import Path
local = resolve_prompts(Path("my_prompts.jsonl"))  # reads JSONL
remote = resolve_prompts(DatasetRef("mlabonne/harmful_behaviors"))  # fetches HF

TOML Syntax¶

Four ways to specify harmful / harmless in the [data] section:

Syntax	Example	Description
Bundled default	`harmful = "default"`	Uses the 128-prompt bundled dataset
Local file	`harmful = "my_prompts.jsonl"`	Path relative to the TOML file
HF short form	`harmful = "hf:mlabonne/harmful_behaviors"`	HF dataset with default settings
HF table form	`[data.harmful]` table	Full control over split, column, limit

HF table fields¶

Field	Type	Default	Description
`hf`	string	required	HuggingFace repo ID (e.g. `"mlabonne/harmful_behaviors"`)
`split`	string	`"train"`	Dataset split to fetch
`column`	string	`"prompt"`	Column name containing the prompt text
`config`	string	none	Dataset config name (for multi-config datasets)
`limit`	integer	none	Maximum number of prompts to fetch

Mixed sources¶

You can mix any combination:

[data]
harmful = "hf:mlabonne/harmful_behaviors"
harmless = "default"

[data]
harmful = "my_local_harmful.jsonl"
harmless = "hf:mlabonne/harmless_alpaca"

How It Works¶

Fetching¶

Prompts are fetched via the HuggingFace Datasets Server REST API — no datasets library install required. The API is public and rate-limited but generous for typical abliteration workloads.

The endpoint returns rows in pages of 100. Vauban paginates automatically until the limit is reached or the dataset is exhausted.

Caching¶

Fetched prompts are cached locally as JSONL files:

~/.cache/vauban/datasets/
  mlabonne__harmful_behaviors/
    default__train__prompt.jsonl
  JailbreakBench__JBB-Behaviors/
    default__harmful__Goal_limit200.jsonl

On subsequent runs, prompts are read from cache with no network call. Delete the cache directory to force a re-fetch:

rm -rf ~/.cache/vauban/datasets/

No new dependencies¶

The implementation uses urllib.request (stdlib) for HTTP and json (stdlib) for parsing. huggingface_hub is available as a transitive dependency from mlx-lm but is not used — the REST API is simpler and sufficient.

API Reference¶

`DatasetRef`¶

@dataclass(frozen=True, slots=True)
class DatasetRef:
    repo_id: str              # e.g. "mlabonne/harmful_behaviors"
    split: str = "train"
    column: str = "prompt"
    config: str | None = None
    limit: int | None = None

Immutable reference to a HuggingFace dataset. Used as a value in PipelineConfig.harmful_path / harmless_path.

`load_hf_prompts(ref: DatasetRef) -> list[str]`¶

Fetch prompts from a HuggingFace dataset. Returns cached results on subsequent calls. Raises ValueError if the column is empty or missing.

`resolve_prompts(source: Path | DatasetRef) -> list[str]`¶

Unified dispatcher: - Path -> reads local JSONL (delegates to load_prompts()) - DatasetRef -> fetches from HF (delegates to load_hf_prompts())

This is what run() uses internally. Call it directly when composing custom pipelines.

Common Datasets¶

Dataset	Prompts	Column	Notes
`mlabonne/harmful_behaviors`	416	`prompt`	Standard harmful prompt set
`mlabonne/harmless_alpaca`	25k	`prompt`	Large harmless set (use `limit`)
`JailbreakBench/JBB-Behaviors`	100	`Goal`	Split: `harmful`. Curated jailbreak goals
`tatsu-lab/alpaca`	52k	`instruction`	General instructions (use `limit`)

Example with a non-default column and split:

[data.harmful]
hf = "JailbreakBench/JBB-Behaviors"
split = "harmful"
column = "Goal"
limit = 100

Troubleshooting¶

"No prompts found in column X"¶

The column name doesn't match. Check the dataset on HuggingFace to find the correct column name and specify it with column = "...".

Network errors¶

The Datasets Server API requires internet access. If you're behind a firewall or offline, pre-fetch the data and use a local JSONL file instead.

Large datasets are slow¶

Use limit to cap the number of prompts. For abliteration, 200-500 harmful prompts and a similar number of harmless prompts are typically sufficient:

[data.harmful]
hf = "mlabonne/harmful_behaviors"
limit = 300

[data.harmless]
hf = "mlabonne/harmless_alpaca"
limit = 500