Session API¶

vauban.session.Session is the programmatic entry point. It holds the model, tokenizer, and accumulated state. Every Vauban capability is available as a method. This is the page an AI assistant reads first when using Vauban as a library.

What Session is¶

A Session wraps a loaded model and tracks the results of every operation. Tools declare prerequisites (measure must run before probe), and the Session enforces this automatically — if you call probe() before measure(), it measures first.

from vauban.session import Session

s = Session("mlx-community/Qwen2.5-1.5B-Instruct-bf16")

The constructor loads the model and tokenizer, dequantizes if needed, and loads default harmful/harmless prompt sets. Custom prompts can be passed via harmful_prompts and harmless_prompts keyword arguments.

Tokenizer — the component that converts text into tokens (numbers) and back. The model processes numbers, not letters. The tokenizer handles the translation in both directions. When Session loads a model, it loads the tokenizer too, since they must match.

Dequantize — convert a compressed (quantized) model back to full-precision floating point. Quantized models use fewer bits per number to save memory, but some operations (like extracting precise refusal directions) work better at full precision. Session handles this automatically.

Discovery¶

Four methods for understanding what is available:

s.tools() — returns a list of Tool dataclasses, each with name, description, requires, produces, and category. This is the complete capability inventory.

s.available() — returns the names of tools whose prerequisites are currently met. At session start, this includes tools that only require model (measure, detect, audit, jailbreak) and tools with no requirements (classify, score).

s.needs(tool_name) — returns the unmet prerequisites for a specific tool. s.needs("probe") returns ["direction"] if measure() has not been called.

s.state() — returns a dict of booleans showing what has been computed: model, direction, detect_result, audit_result, modified_model.

Guided workflows¶

s.guide(goal) — returns plain-text instructions for a named workflow. Available goals:

Goal	What it covers
`"audit"`	Red-team assessment with findings and report
`"compliance"`	EU AI Act compliance assessment
`"harden"`	Improve model safety via CAST/SIC
`"abliterate"`	Remove refusal via measure, cut, export
`"inspect"`	Understand model behavior via measure, probe

Calling s.guide() with no argument lists all available workflows.

s.done(goal) — returns (is_done: bool, reason: str). Call after each step to know when to stop.

s.suggest_next() — context-aware recommendations based on current state and findings. Each suggestion is labeled [FACT] (based on measured data) or [ADVICE] (heuristic). Use s.done(goal) rather than suggest_next() to determine completion — suggestions always recommend more work.

Tool categories¶

Assessment¶

Method	Description	Requires	Produces
`s.measure()`	Extract the refusal direction from activations	model	`DirectionResult`
`s.detect()`	Check if model is hardened against abliteration	model	`DetectResult`
`s.evaluate()`	Baseline metrics: refusal rate, perplexity	model, direction	`EvalResult`
`s.audit(...)`	Full red-team assessment with severity-rated findings	model	`AuditResult`

audit() accepts company_name, system_name, and thoroughness ("quick", "standard", "deep").

Inspection¶

Method	Description	Requires	Produces
`s.probe(prompt)`	Per-layer projection onto refusal direction	direction	`ProbeResult`
`s.scan(content)`	Per-token injection detection	direction	`ScanResult`

probe() returns ProbeResult with a projections list (one float per layer), layer_count, and prompt.

scan() returns ScanResult with injection_probability, overall_projection, spans, per_token_projections, and flagged.

Defense¶

Method	Description	Requires	Produces
`s.steer(prompt, alpha=1.0)`	Generate with unconditional activation steering	direction	generation result
`s.cast(prompt, alpha=1.0, threshold=0.0)`	Generate with conditional activation steering	direction	`CastResult`
`s.sic(prompts)`	Iterative input sanitization	direction	`SICResult`

cast() returns CastResult with the generated text and intervention count. sic() accepts a list of prompts and returns SICResult with per-prompt results and aggregate statistics.

Modification¶

Method	Description	Requires	Produces
`s.cut(alpha=1.0, norm_preserve=False)`	Remove refusal direction from weights	direction	modified weight dict
`s.export(output_dir)`	Save modified weights to disk	modified_model	path string

cut() modifies o_proj and down_proj weights across all layers. export() writes a standard model directory (safetensors + tokenizer + config).

Analysis¶

Method	Description	Requires	Produces
`s.classify(text)`	Score text against 13-domain harm taxonomy	none	harm scores
`s.score(prompt, response)`	5-axis quality assessment (length, structure, anti-refusal, directness, relevance)	none	score result

These are static methods — they work without a loaded model and without a measured direction.

Reporting¶

Method	Description	Requires	Produces
`s.report(fmt="markdown")`	Generate report from audit results	audit_result	markdown or JSON string
`s.report_pdf()`	Generate PDF report from audit results	audit_result	PDF bytes

report() accepts "markdown" or "dict" format. report_pdf() returns raw PDF bytes.

Prerequisite chain¶

The dependency graph between tools:

model (always available)
  |
  +-- measure() --> direction
  |     |
  |     +-- probe(prompt)
  |     +-- scan(content)
  |     +-- steer(prompt)
  |     +-- cast(prompt)
  |     +-- sic(prompts)
  |     +-- evaluate()
  |     +-- cut() --> modified_model
  |           |
  |           +-- export(path)
  |
  +-- detect()
  +-- audit() --> audit_result
  |     |
  |     +-- report()
  |     +-- report_pdf()
  |
  +-- jailbreak()

(no prerequisites)
  +-- classify(text)
  +-- score(prompt, response)

If a tool's prerequisite is not met, calling it will auto-trigger the prerequisite. For example, s.probe("test") will call s.measure() first if no direction exists.

Prerequisite tracking — the Session knows which operations depend on which results. You do not need to manually manage the order — if you call a tool that needs a direction, the Session measures one first. This makes programmatic use forgiving: you can call what you need, and dependencies resolve automatically.

Authoritative source¶

This page is a reference summary. The code and its docstrings in vauban/session.py are the authoritative source for method signatures, parameter types, return types, and behavioral details. When this page and the code disagree, the code is correct.