Introduction
bunsen is a batteries-included community standard library for the
burn tensor framework. It collects reusable modules,
tensor operations, shape contracts, and lifecycle utilities that fall outside
burn’s core scope but are needed by anyone building real models on top of it.
This book is the long-form companion to the API docs. The API docs answer what is this type?; this book answers why does it exist, when do I reach for it, and how do the pieces fit together?
What’s in the library?
flowchart LR
burn[burn core] --> bunsen
bunsen --> contracts[bunsen::contracts]
bunsen --> ops[bunsen::ops]
bunsen --> blocks[bunsen::blocks]
bunsen --> kits[bunsen::kits]
kits --> bimm[bimm]
kits --> gpts[gpts]
kits --> sims[sims]
bunsen --> burner[bunsen::burner]
A whirlwind tour:
bunsen::contracts— runtime tensor-shape contracts: a small DSL that turns paper-style shape notation into a runtime check, fast enough to stay enabled in release.bunsen::ops— additionalTensoroperations as pure functions: range generators, clamp, dropout, noise, RMSNorm, repeat-interleave, and convolution shape arithmetic.bunsen::blocks— reusableModulebuilding blocks: attention and rotary embeddings for transformers, conv composites / patching / pooling / stochastic regularization for image models.bunsen::kits— complete domain implementations built on top of the rest of the crate: image-model families (bimm), GPT/LLM variants (gpts), and iterative tensor simulations (sims).bunsen::burner—burn-adjacent infrastructure: parameter descriptors, module reflection, and the composite optimizer family.
Why a “standard library”?
The burn ecosystem moves quickly, and individual extension crates tend to drift
out of sync with each release. bunsen exists to:
- Track the
burnrelease cycle tightly, so dependent code doesn’t have to. - Provide a single dependency surface for common building blocks instead of a tangle of single-purpose crates.
- Centralize testing, documentation, and contracts so contributed components can be trusted across projects.
Tensor shapes and math
This book uses KaTeX for math. For example, a linear layer computes
See Contracts for how shapes like become first-class, machine-checked constraints.
How to read this book
- New to
bunsen? Start with Installation and then the Overview tour. - Already shipping models on
burn? Jump tobunsen::contracts,bunsen::ops, orbunsen::blocksfor what each layer offers. - Considering contributing? See the Contributing Guide.
Installation
bunsen is published to crates.io and tracks
the burn release cadence; the major/minor version of bunsen matches the
burn version it is built against.
Add the dependency
[dependencies]
bunsen = "0.21"
burn = "0.21"
bunsen is a workspace with several crates; the bunsen umbrella crate
re-exports the commonly used pieces from bunsen::blocks, bunsen::ops,
bunsen::contracts, and friends.
Toolchain
bunsen targets the Rust edition listed in
rust-toolchain.toml.
A reasonably recent stable toolchain is sufficient for downstream users;
contributors building the workspace should match the pinned toolchain.
Feature flags
The umbrella crate exposes feature flags that line up with the underlying crates. The most important are:
| Feature | Enables |
|---|---|
default | The most commonly used modules. |
blocks | The bunsen::blocks module catalog. |
ops | Additional tensor operations. |
See each component chapter for crate-specific features.
Overview
This page is a tour through the major pieces of bunsen, in roughly
the order you’d encounter them building a model on top of burn. The
sections below each have a dedicated chapter elsewhere in the book;
this page is the orientation map.
Tensor Contracts
Shape errors in tensor code are hard to diagnose: a bad reshape
produces the wrong meaning, not an exception, and the eventual
failure points at a symptom three layers downstream.
bunsen::contracts lets you write the
shape of a tensor the way a paper would,
and then check it at module boundaries, in one step that both asserts the pattern and unpacks the named dimensions you’ll need:
#![allow(unused)]
fn main() {
use bunsen::contracts::unpack_shape_contract;
let tensor = [12, 3 * 4, 5 * 4, 3];
let [b, h_wins, w_wins, c] = unpack_shape_contract!(
[
"batch",
"height" = "h_wins" * "window_size",
"width" = "w_wins" * "window_size",
"channels",
],
&tensor,
&["batch", "h_wins", "w_wins", "channels"],
&[("window_size", 4)],
);
}
If the shape doesn’t match, the failure is loud and specific — it names the offending dimension, the pattern it failed against, and the parameter bindings in scope. The check is fast enough (~160 ns per unpack on a four-dimensional shape) to stay enabled in release builds.
Contracts are the shared vocabulary the rest of bunsen is built
on; almost every block in the crate uses them at its module
boundaries.
See Contracts for the pattern grammar, the full macro surface, the cost-control mechanisms, and the rationale behind the design.
Ops — pure tensor functions
bunsen::ops is the functional layer: pure
functions over tensors that extend burn::tensor::Tensor’s surface
without owning any trainable parameters. Range generators, clamping,
dropout, noise, RMSNorm, repeat-interleave, and a substantial
collection of convolution-shape arithmetic and functional-conv
helpers.
A typical block’s forward is a parametric layer (a Linear, a
Conv2d) wrapped in two or three ops calls. Lifting the
non-parametric work out makes the block readable and makes the
underlying math testable in isolation.
contracts validate shapes between layers
│
▼
ops pure functions over Tensors
│
▼
blocks stateful parameter-owning Modules
│
▼
kits whole models built from blocks
The ops surface also includes small value-object types
(ClampOp, NoiseConfig) designed to be embedded directly into a
#[derive(Config)] struct of a downstream module.
Blocks — reusable Module components
bunsen::blocks is the stateful layer:
burn::module::Module building blocks that own parameters and can be
trained. Organized by domain:
- Transformers — multi-head causal self-attention with grouped-query support, a KV cache for autoregressive decoding, scaled-dot-product attention helpers, and rotary positional embeddings.
- Images — conv composites (
ConvNorm2d,CNA2d), ViT-style patch tokenization (PatchEmbed), TF-style same-padding pooling, and stochastic regularization layers (DropBlock,DropPath).
Every block follows the patterns documented in
Building Reusable Modules:
Meta traits for cross-module introspection, Contract→Structure
config splits where the user-facing knobs differ from the
implementation parameter list, and inline shape contracts at module
boundaries.
Kits — complete domain implementations
bunsen::kits is for whole things you pick up
and use end-to-end. The current categories:
bimm— Bunsen/Burn Image Models, an incremental port of thetimmecosystem. Currently includes theResNetfamily with pretrained-weight loaders and the Swin Transformer V2 family.gpts— full GPT / LLM variants. Currently includesNanoChat, a compact GPT suitable for experimentation and fine-tuning.sims— iterative tensor simulations. Currently includes Conway’s Game of Life in 2D and 3D, and a D2Q9 lattice-Boltzmann fluid solver.
Kits compose the lower layers: each one uses contracts, ops, and
blocks (and, where training matters, burner) to deliver a full
domain solution rather than a building block. They’re also where to
look for worked examples of every convention in real code.
Burner — burn-adjacent infrastructure
bunsen::burner is the infrastructure
layer: the pieces that sit next to burn itself, not on top of
its tensor surface. Most code that uses bunsen won’t import from
burner at all — you reach for it when you need to:
- introspect a model generically — the
reflection layer turns a
Moduleinto a queryable XML document with an XPath query API, for “select every rank-2 weight under the transformer blocks” problems; - compose optimizers — the
GroupOptimizerAdaptor{N}family mounts multiple optimizers on a single module (e.g. Muon for matrix parameters, AdamW for the rest), each driving a disjoint group of parameters, with per-group learning-rate selectors; - carry tensor metadata in non-generic code paths
(
TensorParamDesc); - work with
burn::recordoutside what the derive macros provide for free.
The reflection and group-optimizer pieces compose: the canonical
pattern is to walk a model with XmlModuleTree, slice it into
parameter groups with XPath, and hand the groups to a
GroupOptimizerAdaptor. The NanoChat training demo
(demos/chat/examples/train) does exactly that.
Contracts Overview
bunsen::contracts is a no_std inline contract programming library
for tensor geometry. It is built around three goals:
- contracts should be easy to read, write, and use,
- contracts should be fast enough at runtime to always be enabled,
- contracts should produce verbose, helpful error messages when they fail.
In practice this means contracts are written next to the code they guard, match the way shapes are written in a paper (), and stay enabled in release builds.
API: https://docs.rs/bunsen/latest/bunsen/contracts/
Why a contract system?
Shape errors in tensor code are notoriously hard to diagnose. A
reshape that almost-works produces a tensor with the wrong meaning,
not an exception. An off-by-one in a transpose silently swaps batch
and channels. The eventual failure — a matmul panic three
layers later, a loss that won’t go down — points at a symptom,
not a cause.
A contract answers that by writing the expected shape down, as a pattern, in code:
"batch",
"height" = "h_wins" * "window_size",
"width" = "w_wins" * "window_size",
"channels",
This pattern is both a piece of documentation (the paper-style shape ) and a runtime check. Every contract is then used for one or both of two things:
- Assert. Does this tensor’s shape match the pattern? If not, fail loudly with a useful message.
- Unpack. Assuming it matches, give back the named dimension
sizes as concrete
usizevalues to use in arithmetic and reshapes.
These are not separate features — unpacking implies asserting. The same single pass through the shape both validates the pattern and solves for whatever named parameters the caller asks for.
This unifies two things that are otherwise written separately. In ad-hoc code you tend to see, for the same tensor:
assert_eq!(tensor.dims()[0], batch);
assert!(tensor.dims()[1] % window_size == 0);
let h_wins = tensor.dims()[1] / window_size;
// ...and so on.
With a contract the assertion and the variable bindings come from one declaration, and they cannot drift out of sync.
Where contracts go
The natural home for a contract is a module boundary or function boundary: the place where one piece of code hands shape responsibility to another. The pattern is usually:
- Unpack the inputs at the top of the function. You needed the dimensions anyway; the validation comes free.
- Do the actual work, expressed in terms of the unpacked names.
- Assert (often periodically) the shapes of intermediates or outputs that the function promises but doesn’t otherwise consume.
When the function’s docstring says “Input tensor of shape ”, the contract at the top is the machine-checked version of that same sentence.
Why it’s enabled in release
A contract that’s too slow to keep on in release is one that only catches bugs the author already hit in tests. The library is designed around the assumption that contracts stay on:
- the pattern parser is a
const-evaluable macro — contracts compile down to astaticvalue, no per-call construction, - the runtime path is stack-allocated and allocation-free on the happy path,
- a full unpack on a four-dimensional shape benches at ~160 ns,
- and an exponential-backoff variant (periodic asserts) brings the amortized cost on a hot path down to ~4 ns/call.
The user-facing surface
Most code only ever touches three macros:
unpack_shape_contract!— assert a contract and return named dimension sizes.assert_shape_contract!— assert a contract for its side effect.assert_shape_contract_periodically!— same, but amortized via an exponential-backoff scheduler.
These wrap a small layer-2 API that you can reach for when you want to hoist work out of a hot path:
shape_contract!— parse a contract pattern into a value.define_shape_contract!— bind a parsed contract to astatic.ShapeContract::assert_shape/ShapeContract::unpack_shape— the underlying methods (plus theirtry_*siblings that returnResult<_, String>instead of panicking).run_periodically!— the amortization primitive used byassert_shape_contract_periodically!.
Sub-chapters
The rest of this section unpacks the surface above:
- Pattern Syntax — the contract DSL
itself, plus what counts as a shape (
ShapeView). - Asserting and Unpacking —
the
unpack_shape_contract!/assert_shape_contract!/define_shape_contract!mechanics, including the shorthand forms. - Cost Control — how to keep contracts
enabled even on hot paths: periodic asserts,
#[cfg(debug_assertions)], and a comparison table. - Error Messages — what a failing
contract looks like, and the panic-vs-
try_*split.
Pattern Syntax
A shape contract is written in a small DSL: a comma-separated list of dimension matchers, each describing one dimension of the expected shape.
Dimension matchers
Each matcher is one of:
_— any dimension size; requires the position to exist but does not constrain its size....— an ellipsis matching any number of dimensions. At most one ellipsis may appear in a pattern.- a dimension expression — an integer-valued expression over named parameters that must equal the dimension’s size.
Dimension expressions are written in ordinary infix notation with string-literal identifiers, and may be given an optional label:
ShapeContract => <LabeledExpr> { ',' <LabeledExpr> }* ','?
LabeledExpr => { Param '=' }? <Expr>
Expr => <Term> { <AddOp> <Term> }
Term => <Power> { <MulOp> <Power> }
Power => <Factor> [ '^' <usize> ]
Factor => <Param> | '(' <Expression> ')' | NegOp <Factor>
Param => '"' <identifier> '"'
identifier => { <alpha> | '_' } { <alphanumeric> | '_' }*
NegOp => '+' | '-'
AddOp => '+' | '-'
MulOp => '*'
For example, a (B, H, W, C) image with windowed height and width
is:
#![allow(unused)]
fn main() {
use bunsen::contracts::{ShapeContract, shape_contract};
static CONTRACT: ShapeContract = shape_contract![
"batch",
"height" = "h_wins" * "window_size",
"width" = "w_wins" * "window_size",
"channels",
];
}
"height" here is a label on the expression
"h_wins" * "window_size". The label is what appears in error
messages and what you can ask unpack_shape to return.
A few patterns worth knowing:
["batch", _, _, "c"]— a rank-4 shape where you care about batch and channels but not spatial dims.[..., "c"]— any rank, with channels last.["b", _, "h" * "w" * "c"]— a flattened channel-spatial dimension that must equalh * w * c.["a", "a"]— a square shape: the same parameter"a"must satisfy both dimensions.
What counts as a shape: ShapeView
The contract methods accept anything that implements
ShapeView.
Out-of-the-box that includes:
&[usize],&[usize; D]&[u32],&[u32; D]&[i32],&[i32; D]&Vec<usize>,&Vec<u32>,&Vec<i32>
With features = ["burner"] enabled it also includes:
burner::prelude::Shape,&burner::prelude::Shape&burner::prelude::Tensor<_, _, _>
The last one is the common case: you hand a &Tensor directly to
the macro and the contract reads its shape.
Asserting and Unpacking
The three day-to-day contract macros, plus the hoisting helper for naming a contract once and reusing it.
Unpacking: unpack_shape_contract!
unpack_shape_contract! is the workhorse. It matches a shape against
a pattern, solves for unknown parameters, and returns the keys you
ask for as a fixed-size array.
#![allow(unused)]
fn main() {
use bunsen::contracts::unpack_shape_contract;
let shape = [12, 3 * 4, 5 * 4, 3];
// In release builds this benchmarks at ~160 ns.
let [b, h_wins, w_wins, c] = unpack_shape_contract!(
[
"batch",
"height" = "h_wins" * "window_size",
"width" = "w_wins" * "window_size",
"channels",
],
&shape,
&["batch", "h_wins", "w_wins", "channels"],
&[("window_size", 4)],
);
assert_eq!((b, h_wins, w_wins, c), (12, 3, 4, 3));
}
The arguments, in order, are:
- The contract pattern (or the name of a pre-defined contract).
- The shape (anything
ShapeView). - The keys to unpack — a
&[&str]of lengthK; the macro returns[usize; K]in the same order. - The bindings — a
&[(&str, usize)]of parameters whose values are known up front.
Two shorthand forms are accepted:
-
No bindings. When every parameter can be solved from the shape, the bindings slice may be omitted:
#![allow(unused)] fn main() { use bunsen::contracts::unpack_shape_contract; let shape = [1, 2, 3, 4 * 2, 5 * 2, 3]; let [h, h_win, w, w_win, ws, c] = unpack_shape_contract!( [..., "h" = "h_win" * "ws", "w" = "w_win" * "ws", "c"], &shape, &["h", "h_win", "w", "w_win", "ws", "c"], ); } -
Keys are the pattern. When the pattern is just a list of bare identifiers, you can drop the keys slice as well:
#![allow(unused)] fn main() { use bunsen::contracts::unpack_shape_contract; let shape = [4, 5, 3]; let [h, w, c] = unpack_shape_contract!(["h", "w", "c"], &shape); assert_eq!((h, w, c), (4, 5, 3)); }
If the shape does not match, or if the bindings are inconsistent
with the shape, the macro panics. See
Error Messages for what that panic looks like
and the try_* variants that return a Result instead.
Asserting without unpacking: assert_shape_contract!
When you only care about validating a shape and don’t need any
dimensions back, use assert_shape_contract!:
#![allow(unused)]
fn main() {
use bunsen::contracts::assert_shape_contract;
let shape = [1, 2, 3, 4 * 2, 5 * 2, 3];
assert_shape_contract!(
[..., "h" = "h_win" * "ws", "w" = "w_win" * "ws", "c"],
&shape,
&[("ws", 2)],
);
}
Same panic-on-mismatch semantics as unpack_shape_contract!. The
non-panicking form is
ShapeContract::try_assert_shape.
Hoisting a contract: define_shape_contract!
If the same contract is used in more than one place — or you
simply want to read it once at the top of a function — bind it
to a static with define_shape_contract! and pass the name into
the assert or unpack macros:
#![allow(unused)]
fn main() {
use bunsen::contracts::{
assert_shape_contract,
define_shape_contract,
unpack_shape_contract,
};
let shape = [1, 2, 3, 4 * 2, 5 * 2, 3];
define_shape_contract!(
CONTRACT,
[..., "h" = "h_win" * "ws", "w" = "w_win" * "ws", "c"],
);
assert_shape_contract!(CONTRACT, &shape, &[("ws", 2)]);
let [h, h_win, w, w_win, c] = unpack_shape_contract!(
CONTRACT,
&shape,
&["h", "h_win", "w", "w_win", "c"],
&[("ws", 2)],
);
}
shape_contract![...] itself is a const-evaluable expression, so
the static carries no runtime construction cost.
For the hot-path variants of these macros —
assert_shape_contract_periodically! and the
#[cfg(debug_assertions)] pattern — see
Cost Control.
Cost Control
bunsen::contracts is designed to stay enabled in release builds,
but a 160 ns check inside a tight inner loop still adds up. Two
mechanisms exist to keep contracts on while bringing their cost
down: an exponential-backoff periodic check, and a
#[cfg(debug_assertions)] gate that strips the check entirely from
release builds.
Periodic asserts
assert_shape_contract_periodically! wraps an
assert_shape_contract! in
run_periodically!:
the assertion runs the first 10 calls, then on a doubling
schedule (every 16, 32, 64, … calls) until it reaches a
configurable maximum period (default 1000).
The amortized cost drops to ~4 ns/call in release builds:
assert_shape_every_nth time: [4.4057 ns 4.4769 ns 4.5726 ns]
It takes the same arguments as assert_shape_contract!:
#![allow(unused)]
fn main() {
use bunsen::contracts::assert_shape_contract_periodically;
let shape = [1, 2, 3, 4 * 2, 5 * 2, 3];
assert_shape_contract_periodically!(
[..., "h" = "h_win" * "ws", "w" = "w_win" * "ws", "c"],
&shape,
&[("ws", 2)],
);
}
The typical pattern in a module method is:
- Unpack the inputs once at the top — you need the dimensions anyway, and the assertion comes free.
- Periodically assert intermediate or output shapes you don’t need to unpack, so violations are still caught without paying the full cost on every call.
Debug-only contracts: #[cfg(debug_assertions)]
Periodic asserts amortize cost; sometimes you’d rather pay zero
cost in release and accept that the check only runs in development
builds. The standard Rust way to do that is
#[cfg(debug_assertions)], which gates an item on the same flag
that controls debug_assert!. The flag is on for cargo build /
cargo test and off for cargo build --release.
Applied to a contract, this strips the entire macro call — pattern parsing, binding solve, everything — from release builds:
pub fn next_interior_3d<B: Backend>(
state: Tensor<B, 3, Bool>,
rules: &LifeRules,
) -> Tensor<B, 3, Bool> {
// Debug only: unpack [H, W, Z] for the output check below.
#[cfg(debug_assertions)]
let [h, w, z] = unpack_shape_contract!(["h", "w", "z"], &state.dims());
// ... do work, producing `update` with shape [H-2, W-2, Z-2] ...
let update = state;
// Debug only: confirm the interior shrink happened correctly.
#[cfg(debug_assertions)]
assert_shape_contract_periodically!(
["h" - "pad", "w" - "pad", "z" - "pad"],
&update.dims(),
&[("h", h), ("w", w), ("z", z), ("pad", 2)],
);
update
}
This is the pattern used by bunsen::kits::sims::conway::life3d for
its interior-step kernel.
A few things to note:
- Bindings produced by a gated unpack are themselves gated.
Above,
h,w, andzonly exist in debug builds. Anything that consumes them — the downstream periodic assert — must be gated too, or release builds won’t compile. This is usually a feature: the compiler tells you when you accidentally depend on a debug-only binding outside its#[cfg]. - Pair
#[cfg(debug_assertions)]with the unpack/assert that produced its bindings. Don’t sprinkle the attribute across only half of a chain. - Combine freely with
assert_shape_contract_periodically!. Even inside a debug-only branch, a periodic check is still worthwhile if the function is called many times in tests or development runs: the amortization keeps debug builds responsive.
Choosing the cost model
bunsen::contracts gives you three strategies for managing the
runtime cost of a check; they’re not exclusive, and a function often
uses more than one:
| Strategy | Debug cost | Release cost | Use when |
|---|---|---|---|
assert_shape_contract! / unpack_shape_contract! | full (~160 ns) | full (~160 ns) | The check is on a cold or once-per-batch path, or you want production failures. |
assert_shape_contract_periodically! | amortized (~4 ns/call) | amortized (~4 ns/call) | The check is on a hot path but you still want some coverage in production. |
#[cfg(debug_assertions)] on either of the above | full or amortized | zero | The check is purely a developer aid; the calling code is trusted in release, or the cost is unacceptable on any hot path. |
The default should be “no #[cfg]”: contracts are designed to stay
on. Reach for #[cfg(debug_assertions)] when you’ve measured a cost
you can’t afford or when the check exists only to catch upstream
bugs that release builds cannot introduce.
Putting it together
A canonical use site from the window_partition example in the mod
docs:
use burner::prelude::{Tensor, Backend};
use burner::tensor::BasicOps;
use bunsen::contracts::{
assert_shape_contract_periodically,
unpack_shape_contract,
};
/// Window Partition
///
/// ## Parameters
///
/// - `tensor`: Input tensor of shape (B, h_wins * window_size, w_wins * window_size, C).
/// - `window_size`: Window size.
///
/// ## Returns
///
/// Output tensor of shape (B * h_windows * w_windows, window_size, window_size, C).
pub fn window_partition<B: Backend, K>(
tensor: Tensor<B, 4, K>,
window_size: usize,
) -> Tensor<B, 4, K>
where
K: BasicOps<B>,
{
// Validate the input and pull out what we need. ~160 ns.
let [b, h_wins, w_wins, c] = unpack_shape_contract!(
[
"batch",
"height" = "h_wins" * "window_size",
"width" = "w_wins" * "window_size",
"channels",
],
&tensor,
&["batch", "h_wins", "w_wins", "channels"],
&[("window_size", window_size)],
);
let tensor = tensor
.reshape([b, h_wins, window_size, w_wins, window_size, c])
.swap_dims(2, 3)
.reshape([b * h_wins * w_wins, window_size, window_size, c]);
// Amortized check on the output. ~4 ns averaged.
assert_shape_contract_periodically!(
[
"batch" * "h_wins" * "w_wins",
"window_size",
"window_size",
"channels",
],
&tensor,
&[
("batch", b),
("h_wins", h_wins),
("w_wins", w_wins),
("window_size", window_size),
("channels", c),
],
);
tensor
}
The input contract unpacks the dimensions the function needs; the output contract asserts — periodically — that the reshape sequence produced what the docstring promised.
Error Messages
When a contract fails, the panic message names exactly what didn’t fit. Producing useful diagnostics is the main user-visible reason to write a contract in the first place; the unpacking and the performance work exist to make using contracts cheap enough that you reach for them by default.
Anatomy of a failure
A failing contract reports the shape, the pattern that was expected, the bindings in scope, and the specific dimension that did not solve:
#![allow(unused)]
fn main() {
use bunsen::contracts::{ShapeContract, shape_contract};
use indoc::indoc;
static CONTRACT: ShapeContract = shape_contract![
...,
"height" = "h_wins" * "window",
"width" = "w_wins" * "window",
"color",
];
let h_wins = 2;
let w_wins = 3;
let window = 4;
let color = 3;
let shape = [1, 2, 3, h_wins * window, w_wins * window, color];
// Passes: window=4 is consistent with the shape.
let [h, w] = CONTRACT.unpack_shape(
&shape,
&["h_wins", "w_wins"],
&[("window", window), ("color", color)],
);
assert_eq!((h, w), (h_wins, w_wins));
// Fails: we lied about window. The error names the offending dim.
assert_eq!(
CONTRACT
.try_unpack_shape(
&shape,
&["h_wins", "w_wins"],
&[("window", window + 1), ("color", color)],
)
.unwrap_err(),
indoc! {r#"
Shape Error:: 8 !~ height=(h_wins*window) :: No integer solution.
shape:
[1, 2, 3, 8, 12, 3]
expected:
[..., height=(h_wins*window), width=(w_wins*window), color]
{"window": 5, "color": 3}"#
},
);
}
Three things to notice in the error:
- the dimension is named (
height=(h_wins*window)), not just indexed, - the offending value (
8) and the offending pattern are paired up, - the active bindings (
{"window": 5, "color": 3}) are printed so you can see what the contract was working with.
This is the payoff for writing the contract in the first place:
when something goes wrong, the panic message tells you which
invariant broke and with what numbers, instead of a bare
assertion failed: lhs == rhs from somewhere downstream.
Panic vs Result
The macros (assert_shape_contract!, unpack_shape_contract!,
assert_shape_contract_periodically!) panic on mismatch. That’s
the right default for the “contract guards a function boundary” use
case — a violation indicates a bug, and a panic is the
loudest, most-debuggable response.
When you want the failure as a value instead, use the underlying
methods on ShapeContract:
ShapeContract::try_assert_shape—Result<(), String>.ShapeContract::try_unpack_shape—Result<[usize; K], String>.
The Err payload is exactly the multi-line message shown above, so
you can log it, surface it to users, or run snapshot tests against
it (the example here uses indoc! for exactly that purpose).
Ops Overview
bunsen::ops is a functional tensor API: standalone functions and
small configuration value-objects that extend the surface of
burn::tensor::Tensor itself. It’s where the utilities go that you’d
reach for inside a Module::forward body but that don’t have a home
on Tensor upstream.
Nothing in ops owns trainable parameters — that’s the job of
bunsen::blocks. The two layers compose: a
block bundles a parametric layer (a Linear, a Conv2d, …) with a
sequence of ops::* calls in its forward.
API: https://docs.rs/bunsen/latest/bunsen/ops/
How ops fits in the stack
contracts validate shapes between layers
│
▼
ops pure functions over Tensors ◄── you are here
│
▼
blocks stateful parameter-owning Modules
│
▼
kits whole models built from blocks
A blocks forward typically pulls in two or three ops calls
around its parametric core. Lifting those into named functions keeps
the block readable and makes the underlying math testable in
isolation — the ops are what get unit-tested, the block
asserts they’re wired correctly.
Map of the module
The contents are split across two sub-chapters by character:
- Tensor Functions — element-wise
and shape-level helpers: range generation (
arange), clamping (ClampOp), dropout (drop), noise (NoiseConfig), RMS normalization (rms_norm), andrepeat_interleave. - Convolution Support — the larger
convsubmodule: shape arithmetic for conv outputs, functional convolution (convolve_func_2d) for kernels that aren’t linear, and helper filters.
Value-objects as configuration
Several of the “options” types in ops (notably
ClampOp
and NoiseConfig)
are designed to be embedded in Config structs elsewhere in the
crate. That’s the intentional shape of the seam: when an op has
parameters worth naming, it has a value-object, and that
value-object is the unit of configuration. A Module::Config can
then #[config(default = ...)] one of these directly instead of
duplicating its fields.
Tensor Functions
The element-wise, shape-level, and generation helpers in
bunsen::ops. Organized by intent rather than alphabet.
Tensor generation
arange — floating-point ranges
arange
fills the gap between integer Tensor::arange and the
numpy.arange / numpy.linspace family. The functions come in two
flavours:
- Host-side
Vec<f64>builders when you want values to feed into other configuration:float_vec_arange(start, end, step)float_vec_linspace(start, end, n)
- Device-side
Tensorbuilders when you want the values on the same backend as the rest of your computation:float_arange::<B>(start, end, step, &device)float_linspace::<B>(start, end, n, &device)
step is optional on arange; it defaults to 1.0 for ascending
ranges and -1.0 for descending.
noise — distribution + clamp value-objects
NoiseConfig
bundles a burn::tensor::Distribution with an optional
ClampOp into
a single reusable value:
use bunsen::ops::{clamp::ClampOp, noise::NoiseConfig};
use burn::tensor::Distribution;
let cfg = NoiseConfig::default()
.with_distribution(Distribution::Normal(0.0, 1.0))
.with_clamp(ClampOp::new(Some(-3.0), Some(3.0)));
// Materialize:
let t = cfg.noise::<B, _, 3>([batch, channels, length], &device);
let t2 = cfg.noise_like(&existing_tensor);
noise() takes a shape and a device; noise_like(t) takes another
tensor and matches its shape and device. The clamp, when present, is
applied in the same pass.
Element-wise transforms
clamp — optional min and max
ClampOp
captures an optional minimum and an optional maximum as a single
serializable value, with .clamp(tensor) to apply it. It’s designed
for places clamping shows up as a setting — a knob on a noise
generator, an entry in a serialized config — not just a
one-shot call:
let op = ClampOp::new(Some(0.0), None); // ReLU-style: clamp below at 0
let y = op.clamp(x);
ClampOp is Config-friendly: it implements Serialize,
Deserialize, and ModuleDisplay, so it drops cleanly into
upstream #[derive(Config)] blocks.
drop — functional dropout
dropout(prob, input)
is the functional dropout op: at prob == 0.0 it short-circuits to
the input unchanged; otherwise it samples a Bernoulli mask and
rescales by 1 / (1 - prob). Use this when you need a single call
inside a free function; for a trainable layer with its own toggle
state, use the modules in
bunsen::blocks::images::drop.
Normalization
rms_norm — parameter-free RMS normalization
rms_norm(input, &options)
applies RMSNorm without a trainable gain. Configured via
RmsNormOptions,
which carries the epsilon (with_eps(...)).
This is the right call when you need RMSNorm inside a free function
or unit test. The parametric layer with a learned gain lives in
burn::nn::norm::RmsNorm; the two share the same numerics.
Shape transforms
repeat_interleave — NumPy-style interleaving
repeat_interleave(input, repeats, dim)
repeats elements along a single axis, in-place rather than tiled.
Given [a, b, c] and repeats = 2, it returns [a, a, b, b, c, c]
— matching NumPy / PyTorch semantics, including negative
indexing for dim.
Convolution Support
bunsen::ops::conv
collects everything around convolution that isn’t itself a trainable
layer: shape arithmetic, a functional convolution kernel, and a few
helper filters.
Shape arithmetic
Given a Conv*d’s kernel size, stride, padding, and dilation, what
output shape does it produce? burn answers this implicitly when you
call forward; bunsen::ops::conv gives you the same answer as a
pure function, which is what you reach for inside Module::Config
methods that need to advertise output sizes before any tensors
exist.
General N-D
maybe_conv_output_shape::<D>(input_shape, kernel, stride, padding, dilation)— const-generic over the spatial rankD. ReturnsOption<[usize; D]>—Noneif the configuration is impossible (e.g., kernel larger than the padded input).expect_conv_output_shape::<D>(...)— same but panics on impossible configurations, with a message naming the offending dimension.
The _dyn siblings (maybe_conv_output_shape_dyn,
expect_conv_output_shape_dyn) take slices instead of const-generic
arrays, for cases where the rank isn’t known at compile time.
1-D scalar case
maybe_conv1d_output_size(input, kernel, stride, padding, dilation)— the per-axis math broken out as a single scalar function, for when you only care about one dimension.expect_conv1d_output_size(...)— same, panicking.
Common shortcuts
stride_div_output_resolution(input_resolution, stride)— the “downsample by stride” arithmetic that comes up in ResNet-style blocks, with a check that the input is a multiple of the stride.get_square_conv2d_padding(kernel)— the “same-padding for a square odd kernel” formula.build_square_conv2d_padding_config(kernel)— the same result wrapped inburn::nn::PaddingConfig2d, for handing straight to aConv2dConfig.
Functional convolution
convolve_func_2d
convolve_func_2d
folds a user-supplied closure across the windows of a 2-D
convolution-shaped iteration:
// Pseudocode of the contract.
fn convolve_func_2d<B, KIn, KOut, F>(
input: Tensor<B, 4, KIn>,
kernel_size: [usize; 2],
stride: [usize; 2],
padding: [usize; 2],
f: F, // window -> per-window output
) -> Tensor<B, 4, KOut>
where
F: FnMut(Tensor<B, 4, KIn>) -> Tensor<B, 4, KOut>;
The closure receives one conv window at a time and returns the
corresponding output. This is the starting point for kernels that
aren’t expressible as a linear Conv2d: median filters, mode
filters, custom per-window scoring, and so on. If your closure is
linear, you almost certainly want burn::nn::conv::Conv2d instead
— this function intentionally trades performance for
flexibility.
Filters
conv2d_kernel_midpoint_filter
conv2d_kernel_midpoint_filter
is a kernel that picks the spatial mid-point sample from each
window — cheap to compute, useful as a stride-aware
“downsample to the centre pixel” filter and as a sanity-check
baseline for more sophisticated kernels.
Blocks Overview
bunsen::blocks is the library of reusable burn::module::Module
components — the parts you compose into larger models. Where
bunsen::ops supplies pure functional tensor
operations, blocks supplies the stateful layers that own trainable
parameters. Where bunsen::kits supplies whole
end-to-end models, blocks supplies the sub-modules those kits are
assembled from.
A typical user picks a kit; an author building a new model reaches
into blocks and stitches it together.
API: https://docs.rs/bunsen/latest/bunsen/blocks/
Map of the module
blocks is organized by domain: each major sub-chapter covers one
domain’s worth of building blocks.
- Transformers — attention
(
CausalSelfAttention, scaled-dot-product attention helpers,KVCache) and positional embedding (RotaryEmbedding). - Images — convolutional composites
(
ConvNorm2d,CNA2d), patch tokenization (PatchEmbed), same-padding pooling (AvgPool2dSame), and stochastic regularization layers (DropBlock,DropPath).
Conventions used across blocks
Every block follows the patterns described in Building Reusable Modules:
{Block}Metatrait when other modules will need to introspect this one at runtime.CausalSelfAttentionMetais implemented on bothCausalSelfAttentionConfigandCausalSelfAttention<B>, so a parent transformer can askn_head/head_dimof whichever form it’s holding without copying metadata.- Contract → Structure config split when the user-facing
knobs differ from the implementation parameter list. The
ResidualBlocktriple inbimm::resnetis the in-tree reference example. - Inline shape contracts at module boundaries via
bunsen::contracts. Every block’sforwarddocuments its shape in the docstring; the matchingunpack_shape_contract!/assert_shape_contract_periodically!calls turn that docstring into a runtime check.
Where blocks come from
Most blocks land here in one of three ways:
- Direct ports. Implementations of well-known layers from the
timm/torchvision/ reference-paper ecosystems, kept inburnform so they’re available across the entirebunsenstack. - Extraction from kits. When a layer in a
kitsmodel is reusable beyond that one model, it gets promoted out of the kit and intoblocks. burn-gap fills. Layers that aren’t inburncore today but are needed by everything else (e.g.,AvgPool2dSame).
The result is a curated catalog, not a comprehensive one — new blocks land here when a kit or downstream user needs them.
Transformer Blocks
bunsen::blocks::transformers collects the building blocks used by
transformer-family models — attention layers, their caching
machinery, and positional embeddings.
API: https://docs.rs/bunsen/latest/bunsen/blocks/transformers/
Attention
The attention
submodule houses the attention layers themselves and the helpers
they’re built from.
CausalSelfAttention
CausalSelfAttention
is multi-head causal self-attention with optional KV-grouping. The
config carries:
n_head— number of query heads,n_kv_head— number of key/value heads (must dividen_head; equalsn_headfor plain MHA, less for grouped-query attention),n_embed— embedding dimension,- a pluggable
NormalizationConfigapplied inside the block.
The module exposes a CausalSelfAttentionMeta trait, implemented on
both the config and the live module. Parents can read n_head,
n_kv_head, and head_dim of whichever form they’re holding, so
larger transformers don’t need to cache those numbers themselves.
This is the pattern documented in
Building Reusable Modules.
forward takes the input embedding plus an optional &mut KVCache
for autoregressive decoding. When the cache is None the layer runs
in training/prefill mode and recomputes K and V each call; when it’s
Some, K and V are appended into the cache and read back across the
full sequence.
KVCache
KVCache
is the per-layer key/value tensor cache for fast incremental
decoding. Built from a KVCacheConfig carrying batch_size,
num_heads, seq_len, head_dim, and num_layers, it provides:
pos()— the current write head position,prefill(...)— bulk-load K/V from a prompt encode,insert_kv(...)— append a single decoded step’s K/V,reset()— rewind to position 0 without reallocating.
NanoChatGpt uses one shared KVCache across all its layers; see
bunsen::kits::gpts::nanochat for the integrated
example.
Scaled-dot-product helpers
When you need to wire attention by hand — for a custom block, a fused-kernel experiment, or unit tests — the functional API is available:
scaled_dot_product_attention— the full SDPA op given Q, K, V and an optional mask/bias.sdpa_attn_weight— just the softmax-of-scaled-QK^T factor.sdpa_bias— build an additive bias tensor (causal mask, ALiBi, etc.) of the right shape for SDPA.
Embedding
The embedding
submodule collects positional embeddings.
RotaryEmbedding
RotaryEmbedding
is RoPE with a precomputed frequency table:
RotaryEmbeddingConfig::new(seq_len, head_dim)then.init(device)allocates the table once for the maximum sequence length.apply(q, k)rotates query and key tensors.clip_range(t0..t1)returns a sliced view for serving a partial sequence — the natural fit for KV-cache decoding, where each step only needs the rotations for the new positions.cast(dtype)converts the precomputed table between float dtypes without recomputing the trigonometric values.
The free functions inverse_frequency_table and
positional_frequency_table are exposed for callers that want to
build their own variant of rotary embedding without going through
the packaged module.
Image Blocks
bunsen::blocks::images collects the building blocks used by
2-D vision models — convolutional composites, patch
tokenization, same-padding pooling, and stochastic-depth-style
regularization layers.
API: https://docs.rs/bunsen/latest/bunsen/blocks/images/
Conv composites
The conv
submodule packages the conv-plus-something composites that show up
across ResNet, EfficientNet, and friends.
ConvNorm2d
ConvNorm2d
is the standard Conv2d + BatchNorm pairing with a single
forward. Beyond the convenience of one module instead of two, it
carries zero_init_norm() — the “zero-initialize the last batch
norm in a residual branch” trick used by ResNet and successors to
make residual branches start as identities.
CNA2d
CNA2d
is the more general Conv / Norm / Activation block. Beyond the
basic forward, it provides:
match_norm_features()— adapts a genericNormalizationConfig(BatchNorm::new(0),RmsNorm::new(0), etc.) to the right channel count after the conv. Lets callers pass a norm config without knowing the channel count yet.map_forward(f)— runs the conv and norm, then hands the intermediate tensor to a user closure before the activation. Useful for inserting attention, channel reweighting, or per-residual side-effects without copying out the rest of the block.
Patching
patching
holds patch tokenization, the entry point for transformer-style
vision models.
PatchEmbed
PatchEmbed
is ViT-style patch tokenization: it takes an
image, splits it into non-overlapping patches, and
projects each patch into an embedding, producing a sequence of
tokens of width embed_dim.
Pooling
pool
holds the pooling layers that don’t fit burn’s defaults.
AvgPool2dSame
AvgPool2dSame
is TensorFlow-style same padding for average pooling —
asymmetric where needed to keep the spatial dimensions of the output
aligned with ceil(input / stride). Helpers get_same_padding and
pad_same are exposed for the underlying arithmetic, useful when
you’re matching a Keras / TF-Slim reference implementation.
Stochastic regularization
drop
collects regularization layers that drop structured pieces of the
activations rather than individual scalars.
DropBlock
DropBlock
is structured spatial dropout from
Ghiasi et al., 2018: instead
of dropping independent pixels, it drops contiguous blocks of
activations. For convnets this acts as a substantially stronger
regularizer than plain dropout, because adjacent pixels are highly
correlated and independent dropout barely removes information.
DropPath
DropPath
is stochastic depth from
Huang et al., 2016: with some
probability, the entire residual branch is zeroed for a given
sample, so the network sees a shorter effective depth on each
training step.
Supporting types
progressive_dpr— rate table that linearly ramps drop rates over a network’s depth, matching the SWIN V2 /timmconvention of giving deeper blocks higher drop probabilities.SizeConfig— a small enum describing a size as eitherDefault, aRatio(f64), or aFixed(usize). Used by the drop layers when the effective region size is relative to a wrapped layer’s spatial dimensions.
bunsen::kits::bimm
Bunsen/Burn Image Models — an incremental port of the
timm (Torch
Image Models) ecosystem to burn.
bimm is the home for full image-recognition model families inside
bunsen. Each model lives next to its prefab configurations and
pretrained-weight loaders, so picking one up is closer to “pick a
prefab, fetch weights, init” than “wire together a stack of blocks.”
API: https://docs.rs/bunsen/latest/bunsen/kits/bimm/
Current models
ResNet
The classic ResNet family
(arXiv:1512.03385), with prefab
configurations and a pretrained-weight loader that pulls from the
torchvision checkpoints.
A representative usage:
#![allow(unused)]
fn main() {
use bunsen::{
cache::DiskCacheConfig,
kits::bimm::resnet::{PREFAB_RESNET_MAP, ResNet},
};
use burn::backend::Flex;
let device = Default::default();
let prefab = PREFAB_RESNET_MAP.expect_lookup_prefab("resnet18");
let weights = prefab
.expect_lookup_pretrained_weights("tv_in1k")
.fetch_weights(&DiskCacheConfig::default())
.expect("Failed to fetch weights");
let model: ResNet<Flex> = prefab
.to_config()
.to_structure()
.init(&device)
.load_pytorch_weights(weights)
.expect("Failed to load weights")
// Re-head to 10 classes:
.with_classes(10)
// Stochastic block drops for training:
.with_stochastic_drop_block(0.2)
// Stochastic depth for training:
.with_stochastic_path_depth(0.1);
}
Swin Transformer V2
The Swin Transformer V2 family (reference implementation), implemented in terms of windowed self-attention blocks, patch merging, and relative-position biases.
See the bunsen::kits::bimm::swin::v2
module for the full configuration API.
bunsen::kits::gpts
Full GPT / LLM variants. Where bunsen::blocks
provides reusable transformer sub-modules, gpts is for whole
language-model architectures: end-to-end models, tokenizer wiring, and
the training/inference surface around them.
API: https://docs.rs/bunsen/latest/bunsen/kits/gpts/
Current models
nanochat
A compact GPT in the spirit of the “nano” GPT lineage — small enough to train on modest hardware, opinionated enough to be a useful reference implementation.
The model lives in
bunsen::kits::gpts::nanochat
and is split into:
- the per-layer MLP,
- the transformer block (attention + MLP + norms),
- the full model wrapper that stacks the blocks and adds embedding and head layers.
gpts is a work in progress; further GPT/LLM variants will land here as
the port from upstream reference implementations progresses.
bunsen::kits::sims
Iterative tensor simulations. These are runnable physics / cellular kernels expressed as tensor operations: each step is a pure function from one state tensor to the next, which makes them straightforward to batch, GPU-accelerate, and feed back into ML pipelines.
API: https://docs.rs/bunsen/latest/bunsen/kits/sims/
Current simulations
conway — Conway’s Game of Life
Cellular-automaton implementations expressed as windowed sums over a boolean state tensor:
life2d— classic 2D Conway’s Game of Life over an board.life3d— a 3D generalization over an board, with a configurable spawn/survive ruleset (LifeRules).
Both expose an next_interior_* step kernel that takes the current
state and produces the next interior state, plus padding-aware wrappers
for the boundary.
lbm::d2q9 — Lattice-Boltzmann Fluid
A 2D, 9-velocity (D2Q9) lattice-Boltzmann fluid solver (Wikipedia). The simulation is split into the orthogonal operations that compose a single LBM step:
- streaming — particle distributions propagate to neighbour cells along their velocity directions,
- collision — on-cell relaxation toward equilibrium,
- reflection — boundary handling,
- thermal and relaxation — configurable physics parameters,
- simulation — the driver that runs a sequence of steps.
Each piece is a tensor kernel; the whole thing runs on any burn
backend.
Burner Overview
bunsen::burner is the layer that sits next to burn itself. Where
bunsen::blocks gives you new Modules and
bunsen::kits gives you whole models, burner
gives you the infrastructure for working with burn modules,
optimizers, and records at a level burn’s default surface doesn’t
expose.
API: https://docs.rs/bunsen/latest/bunsen/burner/
The module currently contains:
-
descriptors— type-erased descriptors for tensors and parameters. ATensorParamDesccaptures the metadata of anyParam<Tensor<B, R, K>>(itsParamId,Shape, rank, dtype, kind) without carrying the generics that the underlying tensor type does. This is the lingua franca used by the reflection and optimizer machinery to talk about parameters uniformly. -
module— module-side helpers: a type-mapper forModulefield re-typing, and (underfeatures = ["reflection"]) the XML/XPath reflection layer documented in Module Introspection. -
optim— optimizer extensions (underfeatures = ["train"]). The headline feature is theGroupOptimizerAdaptor{N}family, which lets you mount multiple optimizers on a singleModule, each driving a disjoint group of parameters. Covered in Composite Optimizers. -
record— helpers for working withburn::recordtypes. -
tensor— tensor helpers that don’t fit neatly inbunsen::ops, including aDataViewabstraction. -
distribution— distribution-related utilities.
When to reach for burner
Most code that uses bunsen won’t import from burner at all — the
ops, blocks, and contracts are what you write models against. You
reach for burner when:
- you need to introspect a model to drive something else: split parameters into groups, audit shapes, build a parameter manifest;
- you need to compose optimizers (e.g., Muon for matrix parameters, AdamW for everything else; different learning rates per group);
- you need to carry tensor / parameter metadata in non-generic
code paths (a
TensorParamDescinstead of aParam<Tensor<B, R, K>>); - you need to manipulate records outside of what the derive macros give you for free.
The two largest user-facing surfaces — the reflection / XPath query machinery and the group-optimizer family — get their own chapters in this section.
Module Introspection
When you want to do something with a subset of a model’s parameters
— group them for different optimizers, apply weight decay to some but
not others, audit which tensors of which shapes a model contains —
burn itself gives you very little to work with. A Module is a Rust
type; it tells the compiler about its sub-modules but doesn’t expose a
queryable structure at runtime.
bunsen::burner::module::reflection fills that gap. It walks a
Module and produces an XML document mirroring its structure, then
hands you an XPath-based query API to select pieces of it. The result
is that “select every rank-2 weight tensor in this submodule” becomes
a one-line query instead of a hand-rolled visitor.
This chapter covers the user-facing surface. The full reference is at https://docs.rs/bunsen/latest/bunsen/burner/module/reflection/ and the module rustdoc has a long, executable walkthrough that exercises every method.
Enable with features = ["reflection"].
Building a tree
XmlModuleTree::build(&module) walks a &impl Module<B> and produces
the XML model:
use bunsen::burner::module::reflection::XmlModuleTree;
let module: Linear<B> = LinearConfig::new(2, 3).init(&device);
let mut mtree = XmlModuleTree::build(&module);
mtree.to_xml(true) dumps the underlying document pretty-printed,
which is the right first step when you’re figuring out what to query
against. For a single Linear it looks like:
<XmlModuleTree version="...">
<Structure>
<Linear id="n:1" class="struct">
<Param id="n:2" name="weight" param_id="..."
class="tensor" kind="Float" dtype="..." shape="2 3" rank="2"/>
<Param id="n:3" name="bias" param_id="..."
class="tensor" kind="Float" dtype="..." shape="3" rank="1"/>
</Linear>
</Structure>
</XmlModuleTree>
A few things to notice:
- The query “context” is always
/XmlModuleTree/Structure; that’s the implicit prefix every query starts under. - Each structural element has the type name as its tag
(
Linear,Vec,Array,Tuple, …) and aclassattribute (struct,builtin, …) distinguishing user-defined modules from the container typesburn’s derive uses. - Each parameter is a
<Param>leaf withname,param_id,kind,dtype,shape, andrankattributes. - Container children (
Vec,Array,Tuple) have no@name; they have positional indices instead (XPath indexes from1).
Querying
mtree.query() returns an XPathModuleQuery you can chain methods
against:
.select(expr)— append"/expr"to the current XPath..filter(expr)— append"[expr]"..params()— descend to allParamelements (shorthand fordescendant-or-self::Param).
Terminators:
.to_param_ids()—Result<Vec<ParamId>, _>..to_param_descs()—Result<Vec<TensorParamDesc>, _>..to_fragments(pretty)—Result<Vec<String>, _>, the matched nodes serialized as XML. Useful for debugging the query itself..expr()— the XPath string accumulated so far.
XmlModuleTree also offers convenience shortcuts that wrap the
common cases:
mtree.param_ids()— everyParamIdin the tree.mtree.param_descs()— everyTensorParamDescin the tree.mtree.select(expr)— equivalent tomtree.query().select(expr).mtree.select_params(expr)—select(expr)then.params(), which is the right starting point for almost every parameter-selection query.
A small XPath crib
You don’t need to learn all of XPath to use this. The pieces that come up:
| Pattern | Meaning |
|---|---|
Linear | Children of the current context named Linear. |
* | All children, regardless of name. |
*//Linear | All Linear descendants (anywhere below). |
*[@name='weight'] | Children whose @name attribute is weight. |
*[@rank=2] | Children whose @rank attribute is 2. |
*[2] | The second child (1-indexed). |
descendant-or-self::Param | Every <Param> at or below the current context (what .params() does for you). |
Predicates can be combined: *[@name='weight'][@rank=2] selects
2-D weights.
Worked example
A Linear module wrapped in some container shapes:
let module = (
LinearConfig::new(2, 3).init::<B>(&device),
[LinearConfig::new(4, 5).init::<B>(&device)],
vec![
LinearConfig::new(6, 7).init::<B>(&device),
LinearConfig::new(8, 9).init::<B>(&device),
],
);
let mut mtree = XmlModuleTree::build(&module);
// All Linear modules anywhere in the tree:
let linear_ids = mtree
.select("*//Linear")
.params()
.to_param_ids()?;
// Just the 2-D weight tensors (skips bias, which is rank-1):
let weight_ids = mtree
.query()
.params()
.filter("@rank=2")
.to_param_ids()?;
// Everything under the third top-level child (the Vec) — by position:
let vec_param_ids = mtree
.select("*/*[3]")
.params()
.to_param_ids()?;
When to use it
Reflection is heavier than just calling fields on a module. Reach for it when:
- you need a set of
ParamIds defined by structure rather than by the variable names in your code (e.g., “every rank-2 weight under the transformer blocks”), or - you’re writing tooling that doesn’t know the model’s type up front and needs to walk it generically.
For the typical “give these layers a different learning rate” use
case, this machinery feeds directly into the
GroupOptimizerAdaptor{N} family.
Composite Optimizers
burn’s Optimizer<M, B> trait owns a single optimizer that touches
every parameter in a Module. For real training runs that isn’t
always what you want:
- 2-D matrix parameters benefit from
Muon, while embeddings and scalars are better served byAdamW. - Different parameter groups want different learning-rate schedules
(the NanoChat recipe scales
lm_headand embedding learning rates differently, and applies ad_modelfactor on top). - You may want different weight-decay or
betasettings per group.
bunsen::burner::optim provides the GroupOptimizerAdaptor{N}
family: a single Optimizer<M, B> that mounts N kinds of
optimizer, each with one or more parameter groups, dispatching each
parameter’s gradient to the optimizer it belongs to. Pair it with
the Module Introspection machinery and
you can carve a model into parameter groups with XPath queries.
Enable with features = ["train"].
API: https://docs.rs/bunsen/latest/bunsen/burner/optim/
The building blocks
OptimizerGroup<B, O>
One group — a HashSet<ParamId> plus an optimizer of type O
plus an optional per-group LrSelector for learning-rate mapping:
use bunsen::burner::optim::OptimizerGroup;
let group = OptimizerGroup::from_adaptor(
param_ids, // anything IntoIterator<Item = ParamId>
&AdamWConfig::new()
.with_weight_decay(0.01)
.init::<B, MyModel<B>>(),
)
.with_fixed_lr(3e-4); // or .with_lr_selector(closure)
.with_lr_selector(closure) takes any FnMut(LearningRate, &HashMap<String, LearningRate>) -> LearningRate,
so per-group warmup, decay, or scaling factors live inside the group
itself.
GroupOptimizerAdaptor{N}
GroupOptimizerAdaptor2, …3, …4, … (defined up through 6) each
take N Vec<OptimizerGroup<B, O_i>> arguments — one vector per
kind of optimizer:
use bunsen::burner::optim::GroupOptimizerAdaptor2;
let optimizer = GroupOptimizerAdaptor2::new(
/* groups of optimizer kind 1: */ vec![adamw_group_a, adamw_group_b],
/* groups of optimizer kind 2: */ vec![muon_group],
)?;
The adaptor implements Optimizer<M, B>, so it slots into
burn::train::Learner exactly where a single optimizer would.
Constructor validation: each ParamId may appear in at most one
group across all kinds. A duplicate returns
GroupOptimizerError::DuplicateParamId with the conflicting
positions.
The pattern
1. Build an XmlModuleTree over the live module.
2. Use XPath to extract disjoint HashSet<ParamId>s for each group.
3. Wrap each set in an OptimizerGroup with its optimizer + LR selector.
4. Compose with GroupOptimizerAdaptorN::new(...).
5. Hand the result to Learner.
The disjointness check at step 4 is your guard that nothing is double-counted or accidentally dropped.
Worked example: the NanoChat recipe
The demos/chat/examples/train
example trains a NanoChatGpt with two optimizer kinds and four
groups — three driven by AdamW, one by Muon. Stripped to the
essentials:
use std::collections::HashSet;
use bunsen::{
burner::{
module::reflection::XmlModuleTree,
optim::{GroupOptimizerAdaptor2, OptimizerGroup},
},
public::burn::{module::ParamId, optim::LearningRate},
};
let mut mtree = XmlModuleTree::build(&host);
// 1. Carve the model into disjoint parameter sets using XPath.
// 2-D weight matrices inside the transformer block sequence.
let matrix_params: HashSet<ParamId> = mtree
.select_params("GptHost/GPT/*[@name='h']/Linear/*[@name='weight',@rank=2]")
.to_param_ids()?
.into_iter()
.collect();
let embedding_params: HashSet<ParamId> = mtree
.select_params("GptHost/GPT/*[@name='wte']")
.to_param_ids()?
.into_iter()
.collect();
let lm_head_params: HashSet<ParamId> = mtree
.select_params("GptHost/GPT/*[@name='lm_head']")
.to_param_ids()?
.into_iter()
.collect();
// Everything left over (norms, biases, scalars, ...).
let remnant_params: HashSet<ParamId> = mtree
.param_ids()?
.into_iter()
.collect::<HashSet<_>>()
.difference(&matrix_params).cloned().collect::<HashSet<_>>()
.difference(&embedding_params).cloned().collect::<HashSet<_>>()
.difference(&lm_head_params).cloned().collect();
// 2. Build groups: AdamW with three flavours, Muon for matrix params.
let optimizer = GroupOptimizerAdaptor2::new(
// Kind 1: AdamW, three groups with different LR scales + betas.
vec![
OptimizerGroup::from_adaptor(
lm_head_params,
&AdamWConfig::new()
.with_beta_1(0.8).with_beta_2(0.96)
.with_weight_decay(0.01)
.init::<B, GptHost<B>>(),
)
.with_lr_selector(move |lr: f64, _| lr * lm_head_lr),
OptimizerGroup::from_adaptor(
embedding_params,
&AdamWConfig::new()
.with_beta_1(0.8).with_beta_2(0.995)
.with_weight_decay(0.001)
.init::<B, GptHost<B>>(),
)
.with_lr_selector(move |lr, _| lr * embedding_lr),
OptimizerGroup::from_adaptor(
remnant_params,
&AdamWConfig::new()
.with_beta_1(0.8).with_beta_2(0.96)
.with_weight_decay(0.01)
.init::<B, GptHost<B>>(),
)
.with_lr_selector(move |lr, _| lr * scalar_lr),
],
// Kind 2: Muon for the 2-D matrices.
vec![
OptimizerGroup::from_adaptor(
matrix_params,
&MuonConfig::new()
.with_weight_decay(Some(WeightDecayConfig { penalty }))
.init::<B, GptHost<B>>(),
)
.with_lr_selector(move |lr, _| lr * matrix_lr),
],
)?;
// 3. Use exactly as a single Optimizer:
let result = training.launch(Learner::new(host, optimizer, warmup_scheduler));
What this buys you over a hand-rolled solution:
- Selection is declarative. Each group’s membership is an XPath
expression. Renaming a field elsewhere can’t silently drop
parameters from a group — the XPath either still matches or it
doesn’t, and the
param_ids()cross-check makes the gap obvious. - Disjointness is verified.
GroupOptimizerAdaptor::newreturnsErr(DuplicateParamId)if two groups claim the same parameter, so you can’t accidentally optimize a tensor twice. - Per-group LR is a closure, not a separate scheduler tree. The
global learning rate flowing in from
burn::train::Learneris handed to every group’sLrSelector, and the group decides how to shape it.
Selecting the right N
GroupOptimizerAdaptor2 if you need two kinds of optimizer
(AdamW + Muon, AdamW + SGD-with-momentum, …). The family runs up
through GroupOptimizerAdaptor6 — pick the smallest N that fits
your kinds. Adding more groups of the same kind doesn’t increase
N; that just extends the Vec for that kind.
Building Reusable Modules
When you wrap a burn::Module, burn gives you the basic ingredients:
a Config struct, derived via #[derive(Config)], that builds a
Module via init(). This is enough for a single self-contained
module, but it strains once your modules start composing into larger
ones.
Two conventions show up repeatedly in bunsen to manage that strain:
- A
{Module}Metatrait — a shared introspection API implemented by both the configs and the built module, so anyone holding any of those forms can ask the same structural questions. - A
{Module}ContractConfig→{Module}StructureConfigsplit — two configs at two levels of abstraction. The contract describes what the module is for; the structure describes how it’s built.
Neither is required by burn. Both pay off once a module is used
inside something else, or once its parameter surface starts evolving
faster than its callers want.
The {Module}Meta trait
Why
A parent module that owns a child needs to know structural things
about the child at inference time — its embedding dimension, its
number of heads, its sequence length. The naive answer is to copy
those numbers into the parent’s own fields. That works, but now the
same number lives in two places, and updating the child means
remembering to update the parent. Worse, configuration values copied
into a Module’s state are awkward to keep in sync with the actual
tensor shapes that ended up inside.
Solution
Define a trait that exposes the structural questions, and implement it on every form that can answer them: the user-facing config, the lowered structure config (see below), and the built module itself.
pub trait MlpMeta {
/// Input/output embedding dimension.
fn embed_dim(&self) -> usize;
/// Hidden dimension inside the MLP.
fn hidden_dim(&self) -> usize;
}
Now anything holding an &impl MlpMeta can ask the question, and the
answer comes from whichever source actually knows: a field on the
config, or a tensor .dims() on the module.
Toy example
use burn::{prelude::*, nn::{Linear, LinearConfig}};
pub trait MlpMeta {
fn embed_dim(&self) -> usize;
fn hidden_dim(&self) -> usize;
}
#[derive(Config, Debug)]
pub struct MlpConfig {
pub embed_dim: usize,
#[config(default = "4")]
pub expansion_factor: usize,
}
impl MlpMeta for MlpConfig {
fn embed_dim(&self) -> usize { self.embed_dim }
fn hidden_dim(&self) -> usize { self.expansion_factor * self.embed_dim }
}
#[derive(Module, Debug)]
pub struct Mlp<B: Backend> {
in_proj: Linear<B>,
out_proj: Linear<B>,
}
impl<B: Backend> MlpMeta for Mlp<B> {
fn embed_dim(&self) -> usize {
// Derived from the live tensor shape — no cached field.
self.in_proj.weight.dims()[0]
}
fn hidden_dim(&self) -> usize {
self.in_proj.weight.dims()[1]
}
}
A parent that contains an Mlp can now read mlp.embed_dim()
directly from the live module, instead of carrying its own
mlp_embed_dim: usize field. The metadata stays in one place per
form, and the forms agree by construction.
Where this shows up
NanoChatGptMetais implemented by three types:NanoChatGptConfig(user-facing knobs),NanoChatGptStructureConfig(lowered per-layer configs), andNanoChatGpt<B>(the built module reading dims from its actual layers). All three answer the same questions aboutn_embed,n_head,head_dim,n_layer, and so on.ResidualBlockMetais implemented on the structure config and the built block. AResNetmodel that holds aVec<ResidualBlock<B>>can callblock.output_resolution([h, w])to walk the resolution through the network from the live modules, with no separate “shape table” alongside.
Contract → Structure config split
Why
A module’s parameter list grows in two unrelated directions:
- Intent-level. “How do I describe this thing at the level of what
it’s for?” —
embed_dim,n_layer,vocab_size. These are the knobs a user actually wants to set when they say “give me a 12-layer GPT”. Short list, stable shape, evolves slowly. - Implementation-level. “What does
initneed to wire?” — explicit per-layerLinearConfigs,EmbeddingConfigs,RotaryEmbeddingConfigs, normalization choices per sub-block. Long list, evolves with the implementation.
Cramming both into one Config makes it tedious to instantiate (the
user has to fill in fields they don’t care about) and hard to
evolve (every implementation change is an API break for callers who
only wanted to say “12 layers please”).
Solution
Split the config in two:
{Module}ContractConfig— the intent-level description.{Module}StructureConfig— the implementation parameter list, one field per sub-module config.- A
to_structure()/into_structure()method on the contract that produces the structure config. - The
initthat builds the module hangs off the structure config.
The contract is small, friendly, and stable. The structure is verbose but maps one-to-one onto the implementation; it’s the natural home for serialization, pretrained-weight loaders, and any code that needs to reason about the actual layers.
Toy example
Continuing the Mlp from above, split the single MlpConfig into a
contract and a structure:
#[derive(Config, Debug)]
pub struct MlpContractConfig {
pub embed_dim: usize,
#[config(default = "4")]
pub expansion_factor: usize,
}
impl MlpMeta for MlpContractConfig {
fn embed_dim(&self) -> usize { self.embed_dim }
fn hidden_dim(&self) -> usize { self.expansion_factor * self.embed_dim }
}
impl MlpContractConfig {
/// Lower the contract into a concrete per-layer structure.
pub fn to_structure(&self) -> MlpStructureConfig {
MlpStructureConfig {
in_proj: LinearConfig::new(self.embed_dim, self.hidden_dim()),
out_proj: LinearConfig::new(self.hidden_dim(), self.embed_dim),
}
}
}
#[derive(Config, Debug)]
pub struct MlpStructureConfig {
pub in_proj: LinearConfig,
pub out_proj: LinearConfig,
}
impl MlpMeta for MlpStructureConfig {
fn embed_dim(&self) -> usize { self.in_proj.d_input }
fn hidden_dim(&self) -> usize { self.in_proj.d_output }
}
impl MlpStructureConfig {
pub fn init<B: Backend>(self, device: &B::Device) -> Mlp<B> {
Mlp {
in_proj: self.in_proj.init(device),
out_proj: self.out_proj.init(device),
}
}
}
A typical caller stays in contract-land:
let mlp: Mlp<B> = MlpContractConfig::new(768)
.with_expansion_factor(4)
.to_structure()
.init(&device);
…but a power user or a pretrained-weight loader that needs to set
per-layer details can drop down to MlpStructureConfig directly.
What the split buys you
- Multiple contracts, shared structure. A
GatedMlpContractConfigthat adds a SiLU gate can lower to a slightly extended structure; you can also have several “kinds” of contract sharing the sameMlpStructureConfigfamily. The user picks the contract that matches their intent; the implementation only knows about structures. - Prefabs live on the contract. Named presets (“resnet18”,
“resnet50”) are small, intent-level descriptions and naturally fit
as
ContractConfigconstructors. The big, verboseStructureConfigdoesn’t need a constructor per preset. - Stable user API across implementation churn. Adding a new
sub-module to the implementation extends
StructureConfigwithout touchingContractConfig. Callers who only ever wroteMlpContractConfig::new(768)don’t notice. - A natural seam for tooling. Serialization, weight loading, and
inspection tools work against
StructureConfig, where the layers are spelled out. Documentation and tutorials work againstContractConfig, where the surface stays small.
Where this shows up
ResNetContractConfig/ResNetStructureConfig. The contract says “a ResNet with these block counts, optionally bottlenecked”; the structure spells out the stem, layer blocks, and head.PREFAB_RESNET_MAPshipsContractConfigbuilders for the standard variants (“resnet18”, “resnet50”, …).ResidualBlockContractConfig/ResidualBlockStructureConfig. The contract describes “downsample input, use a bottleneck policy”; the structure is an enum that dispatches to either aBasicBlockor aBottleneckBlock. Two different concrete implementations sit behind one contract.NanoChatGptConfig/NanoChatGptStructureConfig. The contract names embedding width, head counts, layer count; the structure spells out the embedding, per-block configs, LM head, rotary embedding, and final norm.
When to reach for each
- Always implement
{Module}Metaif anything else (a parent module, a builder, a test) is going to ask structural questions about this module at runtime. The cost is one trait and a few one-line methods; the payoff is no duplicated metadata. - Reach for the Contract → Structure split when:
- the user-facing knobs differ in number or shape from the implementation parameters,
- you anticipate multiple intent-level “kinds” of this module
sharing one implementation (or one kind backed by multiple
implementations, as with
ResidualBlock), - you want a clean place to land prefab / preset constructors and weight-loader hooks.
- Skip the split for tiny modules whose user-facing config is the implementation. Both conventions exist to manage growth; don’t pay their cost up front.
Contributing
Contributions are welcome — this library exists to be a community standard library, and that means it needs the community.
The short version:
- Open an issue describing what you want to change, especially for new components or breaking API tweaks. For small fixes you can skip straight to a PR.
- Build and test the workspace locally (see Development Setup).
- Match the existing Style and Conventions.
- Open a pull request against
main. Opening a PR is taken as agreement to the project’s dual MIT / Apache-2.0 license, as described inCONTRIBUTING.md.
Where to talk
- GitHub issues: https://github.com/zspacelabs/bunsen/issues
- Discord: https://discord.gg/vBgXHWCeah
Code of conduct
Be respectful, assume good faith, and keep technical disagreements technical.
Development Setup
Prerequisites
- The Rust toolchain pinned in
rust-toolchain.toml. cargo-makefor the project’s task runner.- For working on the book:
mdbook,mdbook-mermaid,mdbook-katex,mdbook-linkcheck.
cargo install cargo-make
cargo install mdbook mdbook-mermaid mdbook-katex mdbook-linkcheck
Common tasks
# Default: fix + ci (format, clippy, tests).
cargo make
# Targeted tasks:
cargo make test
cargo make clippy
cargo make format
# Book tasks (see "Add cargo-make tasks" in the project Makefile.toml):
cargo make book # build the book
cargo make book-serve # build + watch + serve on localhost
cargo make book-check # run mdbook-linkcheck
Working on the book
The book lives in book/.
Source files are Markdown under book/src/, and the output is written to
book/book/ (ignored by git).
cd book
mdbook serve --open
Style and Conventions
Rust
- Formatting is enforced by
rustfmt(nightly toolchain, settings live inrustfmt.toml). Runcargo make formatbefore committing. - Lints are enforced by
cargo clippy --no-depswith workspace-levelwarnings = "deny". Runcargo make clippy. - Public items need rustdoc. Module-level docs should explain why the module exists, not just what it contains.
- Avoid leaking
anyhow::Erroracross crate boundaries; prefer crate-local error types.
Tests
- Bare
assert_eq!on floats is almost always wrong; prefer the approximate-equality helpers in the underlying tensor framework. - New public APIs ship with at least one test.
Documentation
- The Book uses Markdown with the mermaid and KaTeX preprocessors.
- Prefer prose that explains the motivation and links out to API docs for
the signatures. The book is not a substitute for
rustdoc. - Cross-reference within the book using relative links so
mdbook-linkcheckcan validate them.
Releasing
bunsen’s major/minor version tracks the burn release it builds against.
TODO: document the release checklist (bump version, update changelog, tag, publish to crates.io, deploy book).
License
bunsen is distributed under the terms of both the MIT license and the
Apache License (Version 2.0). See
LICENSE-APACHE
and
LICENSE-MIT
for details.
Opening a pull request is assumed to signal agreement with these licensing terms.