Torch Surface Matrix¶
Vauban's default runtime surface is Torch-first. The reason is portability: the same behavior-auditing config should run on CPU, CUDA, and MPS without changing product semantics. MLX remains a legacy/reference backend, but new default-facing examples and report workflows should not require MLX.
This matrix is a contract, not a performance claim. A row marked full means
the Torch runtime declares and tests that the primitive can produce the same
kind of Vauban evidence as the rest of the system expects. Performance claims
still require device-specific measurement.
| Primitive | Support | Evidence artifact | Required tests | Runtime proof |
|---|---|---|---|---|
load_model |
full | LoadedModel with Torch capabilities |
tests/test_model_io.py, tests/test_backend.py |
CPU/CUDA smoke; MPS pending host |
tokenize |
full | TokenizeResult |
tests/test_runtime_primitives.py |
CPU/CUDA smoke; MPS pending host |
forward |
full | ForwardResult |
tests/test_runtime_primitives.py |
CPU/CUDA smoke; MPS pending host |
runtime_trace |
full | ForwardTrace and TraceArtifact records |
tests/test_runtime_trace_types.py, tests/test_runtime_primitives.py |
CPU/CUDA smoke; MPS pending host |
generation |
full | generated text observations | tests/test_direct_generation_cuda.py, behavior trace tests |
CPU/CUDA smoke; MPS pending host |
logits |
full | runtime logits evidence | tests/test_runtime_primitives.py |
CPU/CUDA smoke; MPS pending host |
logprobs |
full | runtime token logprobs evidence | tests/test_runtime_primitives.py |
CPU/CUDA smoke; MPS pending host |
activations |
full | activation trace and projection artifacts | tests/test_runtime_trace_types.py, tests/test_runtime_primitives.py, tests/test_runtime_activation_primitive.py, tests/test_behavior_trace_toml.py |
CPU/CUDA smoke; MPS pending host |
interventions |
full | intervention records with primitive metadata | tests/test_runtime_primitives.py, tests/test_runtime_activation_primitive.py, tests/test_behavior_trace_toml.py |
CPU/CUDA smoke; MPS pending host |
kv_cache |
full | declared runtime capability | tests/test_torch_surface_docs.py |
CUDA source/runtime path; MPS pending host |
weight_access |
full | declared runtime capability | tests/test_torch_surface_docs.py |
CPU/CUDA smoke; MPS pending host |
mutable_weights |
full | declared runtime capability | tests/test_torch_surface_docs.py |
CPU/CUDA smoke; MPS pending host |
safetensors_io |
full | weight export/load artifacts | export and LoRA tests | CPU/CUDA source path; MPS device-neutral |
peft_lora_export |
full | PEFT-format adapter artifacts | LoRA export tests | CPU/CUDA source path; MPS device-neutral |
device_profile_cuda |
full | StageProfile with CUDA sync/memory fields |
CUDA smoke and alignment tests | validated on RTX 4070 Ti |
device_profile_mps |
full | StageProfile with MPS sync/memory fields |
tests/test_torch_surface_docs.py, profiling tests |
source contract; MPS hardware validation pending |
profile_sweep |
full | behavior trace runtime profile sidecars | behavior trace/diff tests | CPU/CUDA smoke; MPS pending host |
Enforcement¶
The matrix is enforced by tests/test_torch_surface_docs.py:
- every primitive above must remain present in this document;
- Torch capabilities must declare
fullfor the evidence-producing runtime capabilities used by reports; - Torch device kinds must include
cpu,cuda, andmps; - default-facing docs and examples must not reintroduce MLX-only model IDs, imports, or MLX runtime defaults.
[behavior_trace.activation_primitive]must keep projection evidence explicit in TOML and in emitted trace artifacts.
Claim Boundary¶
This document does not claim Torch is always faster than MLX. It claims Torch is the portable default surface for Vauban evidence collection. Any future MPS custom kernel must preserve the same trace artifacts and report evidence before it can replace the reference Torch implementation.