Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

bunsen is a batteries-included community standard library for the burn tensor framework. It collects reusable modules, tensor operations, shape contracts, and lifecycle utilities that fall outside burn’s core scope but are needed by anyone building real models on top of it.

This book is the long-form companion to the API docs. The API docs answer what is this type?; this book answers why does it exist, when do I reach for it, and how do the pieces fit together?

What’s in the library?

flowchart LR
    burn[burn core] --> bunsen
    bunsen --> contracts[bunsen::contracts]
    bunsen --> ops[bunsen::ops]
    bunsen --> blocks[bunsen::blocks]
    bunsen --> kits[bunsen::kits]
    kits --> bimm[bimm]
    kits --> gpts[gpts]
    kits --> sims[sims]
    bunsen --> burner[bunsen::burner]

A whirlwind tour:

  • bunsen::contracts — runtime tensor-shape contracts: a small DSL that turns paper-style shape notation into a runtime check, fast enough to stay enabled in release.
  • bunsen::ops — additional Tensor operations as pure functions: range generators, clamp, dropout, noise, RMSNorm, repeat-interleave, and convolution shape arithmetic.
  • bunsen::blocks — reusable Module building blocks: attention and rotary embeddings for transformers, conv composites / patching / pooling / stochastic regularization for image models.
  • bunsen::kits — complete domain implementations built on top of the rest of the crate: image-model families (bimm), GPT/LLM variants (gpts), and iterative tensor simulations (sims).
  • bunsen::burnerburn-adjacent infrastructure: parameter descriptors, module reflection, and the composite optimizer family.

Why a “standard library”?

The burn ecosystem moves quickly, and individual extension crates tend to drift out of sync with each release. bunsen exists to:

  1. Track the burn release cycle tightly, so dependent code doesn’t have to.
  2. Provide a single dependency surface for common building blocks instead of a tangle of single-purpose crates.
  3. Centralize testing, documentation, and contracts so contributed components can be trusted across projects.

Tensor shapes and math

This book uses KaTeX for math. For example, a linear layer computes

See Contracts for how shapes like become first-class, machine-checked constraints.

How to read this book

Installation

bunsen is published to crates.io and tracks the burn release cadence; the major/minor version of bunsen matches the burn version it is built against.

Add the dependency

[dependencies]
bunsen = "0.21"
burn   = "0.21"

bunsen is a workspace with several crates; the bunsen umbrella crate re-exports the commonly used pieces from bunsen::blocks, bunsen::ops, bunsen::contracts, and friends.

Toolchain

bunsen targets the Rust edition listed in rust-toolchain.toml. A reasonably recent stable toolchain is sufficient for downstream users; contributors building the workspace should match the pinned toolchain.

Feature flags

The umbrella crate exposes feature flags that line up with the underlying crates. The most important are:

FeatureEnables
defaultThe most commonly used modules.
blocksThe bunsen::blocks module catalog.
opsAdditional tensor operations.

See each component chapter for crate-specific features.

Overview

This page is a tour through the major pieces of bunsen, in roughly the order you’d encounter them building a model on top of burn. The sections below each have a dedicated chapter elsewhere in the book; this page is the orientation map.

Tensor Contracts

Shape errors in tensor code are hard to diagnose: a bad reshape produces the wrong meaning, not an exception, and the eventual failure points at a symptom three layers downstream. bunsen::contracts lets you write the shape of a tensor the way a paper would,

and then check it at module boundaries, in one step that both asserts the pattern and unpacks the named dimensions you’ll need:

#![allow(unused)]
fn main() {
use bunsen::contracts::unpack_shape_contract;
let tensor = [12, 3 * 4, 5 * 4, 3];
let [b, h_wins, w_wins, c] = unpack_shape_contract!(
    [
        "batch",
        "height" = "h_wins" * "window_size",
        "width"  = "w_wins" * "window_size",
        "channels",
    ],
    &tensor,
    &["batch", "h_wins", "w_wins", "channels"],
    &[("window_size", 4)],
);
}

If the shape doesn’t match, the failure is loud and specific — it names the offending dimension, the pattern it failed against, and the parameter bindings in scope. The check is fast enough (~160 ns per unpack on a four-dimensional shape) to stay enabled in release builds.

Contracts are the shared vocabulary the rest of bunsen is built on; almost every block in the crate uses them at its module boundaries.

See Contracts for the pattern grammar, the full macro surface, the cost-control mechanisms, and the rationale behind the design.

Ops — pure tensor functions

bunsen::ops is the functional layer: pure functions over tensors that extend burn::tensor::Tensor’s surface without owning any trainable parameters. Range generators, clamping, dropout, noise, RMSNorm, repeat-interleave, and a substantial collection of convolution-shape arithmetic and functional-conv helpers.

A typical block’s forward is a parametric layer (a Linear, a Conv2d) wrapped in two or three ops calls. Lifting the non-parametric work out makes the block readable and makes the underlying math testable in isolation.

   contracts        validate shapes between layers
       │
       ▼
   ops              pure functions over Tensors
       │
       ▼
   blocks           stateful parameter-owning Modules
       │
       ▼
   kits             whole models built from blocks

The ops surface also includes small value-object types (ClampOp, NoiseConfig) designed to be embedded directly into a #[derive(Config)] struct of a downstream module.

Blocks — reusable Module components

bunsen::blocks is the stateful layer: burn::module::Module building blocks that own parameters and can be trained. Organized by domain:

  • Transformers — multi-head causal self-attention with grouped-query support, a KV cache for autoregressive decoding, scaled-dot-product attention helpers, and rotary positional embeddings.
  • Images — conv composites (ConvNorm2d, CNA2d), ViT-style patch tokenization (PatchEmbed), TF-style same-padding pooling, and stochastic regularization layers (DropBlock, DropPath).

Every block follows the patterns documented in Building Reusable Modules: Meta traits for cross-module introspection, Contract→Structure config splits where the user-facing knobs differ from the implementation parameter list, and inline shape contracts at module boundaries.

Kits — complete domain implementations

bunsen::kits is for whole things you pick up and use end-to-end. The current categories:

  • bimm — Bunsen/Burn Image Models, an incremental port of the timm ecosystem. Currently includes the ResNet family with pretrained-weight loaders and the Swin Transformer V2 family.
  • gpts — full GPT / LLM variants. Currently includes NanoChat, a compact GPT suitable for experimentation and fine-tuning.
  • sims — iterative tensor simulations. Currently includes Conway’s Game of Life in 2D and 3D, and a D2Q9 lattice-Boltzmann fluid solver.

Kits compose the lower layers: each one uses contracts, ops, and blocks (and, where training matters, burner) to deliver a full domain solution rather than a building block. They’re also where to look for worked examples of every convention in real code.

Burner — burn-adjacent infrastructure

bunsen::burner is the infrastructure layer: the pieces that sit next to burn itself, not on top of its tensor surface. Most code that uses bunsen won’t import from burner at all — you reach for it when you need to:

  • introspect a model generically — the reflection layer turns a Module into a queryable XML document with an XPath query API, for “select every rank-2 weight under the transformer blocks” problems;
  • compose optimizers — the GroupOptimizerAdaptor{N} family mounts multiple optimizers on a single module (e.g. Muon for matrix parameters, AdamW for the rest), each driving a disjoint group of parameters, with per-group learning-rate selectors;
  • carry tensor metadata in non-generic code paths (TensorParamDesc);
  • work with burn::record outside what the derive macros provide for free.

The reflection and group-optimizer pieces compose: the canonical pattern is to walk a model with XmlModuleTree, slice it into parameter groups with XPath, and hand the groups to a GroupOptimizerAdaptor. The NanoChat training demo (demos/chat/examples/train) does exactly that.

Contracts Overview

bunsen::contracts is a no_std inline contract programming library for tensor geometry. It is built around three goals:

  • contracts should be easy to read, write, and use,
  • contracts should be fast enough at runtime to always be enabled,
  • contracts should produce verbose, helpful error messages when they fail.

In practice this means contracts are written next to the code they guard, match the way shapes are written in a paper (), and stay enabled in release builds.

API: https://docs.rs/bunsen/latest/bunsen/contracts/

Why a contract system?

Shape errors in tensor code are notoriously hard to diagnose. A reshape that almost-works produces a tensor with the wrong meaning, not an exception. An off-by-one in a transpose silently swaps batch and channels. The eventual failure — a matmul panic three layers later, a loss that won’t go down — points at a symptom, not a cause.

A contract answers that by writing the expected shape down, as a pattern, in code:

"batch",
"height" = "h_wins" * "window_size",
"width"  = "w_wins" * "window_size",
"channels",

This pattern is both a piece of documentation (the paper-style shape ) and a runtime check. Every contract is then used for one or both of two things:

  • Assert. Does this tensor’s shape match the pattern? If not, fail loudly with a useful message.
  • Unpack. Assuming it matches, give back the named dimension sizes as concrete usize values to use in arithmetic and reshapes.

These are not separate features — unpacking implies asserting. The same single pass through the shape both validates the pattern and solves for whatever named parameters the caller asks for.

This unifies two things that are otherwise written separately. In ad-hoc code you tend to see, for the same tensor:

assert_eq!(tensor.dims()[0], batch);
assert!(tensor.dims()[1] % window_size == 0);
let h_wins = tensor.dims()[1] / window_size;
// ...and so on.

With a contract the assertion and the variable bindings come from one declaration, and they cannot drift out of sync.

Where contracts go

The natural home for a contract is a module boundary or function boundary: the place where one piece of code hands shape responsibility to another. The pattern is usually:

  1. Unpack the inputs at the top of the function. You needed the dimensions anyway; the validation comes free.
  2. Do the actual work, expressed in terms of the unpacked names.
  3. Assert (often periodically) the shapes of intermediates or outputs that the function promises but doesn’t otherwise consume.

When the function’s docstring says “Input tensor of shape ”, the contract at the top is the machine-checked version of that same sentence.

Why it’s enabled in release

A contract that’s too slow to keep on in release is one that only catches bugs the author already hit in tests. The library is designed around the assumption that contracts stay on:

  • the pattern parser is a const-evaluable macro — contracts compile down to a static value, no per-call construction,
  • the runtime path is stack-allocated and allocation-free on the happy path,
  • a full unpack on a four-dimensional shape benches at ~160 ns,
  • and an exponential-backoff variant (periodic asserts) brings the amortized cost on a hot path down to ~4 ns/call.

The user-facing surface

Most code only ever touches three macros:

These wrap a small layer-2 API that you can reach for when you want to hoist work out of a hot path:

Sub-chapters

The rest of this section unpacks the surface above:

  • Pattern Syntax — the contract DSL itself, plus what counts as a shape (ShapeView).
  • Asserting and Unpacking — the unpack_shape_contract! / assert_shape_contract! / define_shape_contract! mechanics, including the shorthand forms.
  • Cost Control — how to keep contracts enabled even on hot paths: periodic asserts, #[cfg(debug_assertions)], and a comparison table.
  • Error Messages — what a failing contract looks like, and the panic-vs-try_* split.

Pattern Syntax

A shape contract is written in a small DSL: a comma-separated list of dimension matchers, each describing one dimension of the expected shape.

Dimension matchers

Each matcher is one of:

  • _ — any dimension size; requires the position to exist but does not constrain its size.
  • ... — an ellipsis matching any number of dimensions. At most one ellipsis may appear in a pattern.
  • a dimension expression — an integer-valued expression over named parameters that must equal the dimension’s size.

Dimension expressions are written in ordinary infix notation with string-literal identifiers, and may be given an optional label:

ShapeContract => <LabeledExpr> { ',' <LabeledExpr> }* ','?
LabeledExpr   => { Param '=' }? <Expr>
Expr          => <Term> { <AddOp> <Term> }
Term          => <Power> { <MulOp> <Power> }
Power         => <Factor> [ '^' <usize> ]
Factor        => <Param> | '(' <Expression> ')' | NegOp <Factor>
Param         => '"' <identifier> '"'
identifier    => { <alpha> | '_' } { <alphanumeric> | '_' }*
NegOp         => '+' | '-'
AddOp         => '+' | '-'
MulOp         => '*'

For example, a (B, H, W, C) image with windowed height and width is:

#![allow(unused)]
fn main() {
use bunsen::contracts::{ShapeContract, shape_contract};
static CONTRACT: ShapeContract = shape_contract![
    "batch",
    "height" = "h_wins" * "window_size",
    "width"  = "w_wins" * "window_size",
    "channels",
];
}

"height" here is a label on the expression "h_wins" * "window_size". The label is what appears in error messages and what you can ask unpack_shape to return.

A few patterns worth knowing:

  • ["batch", _, _, "c"] — a rank-4 shape where you care about batch and channels but not spatial dims.
  • [..., "c"] — any rank, with channels last.
  • ["b", _, "h" * "w" * "c"] — a flattened channel-spatial dimension that must equal h * w * c.
  • ["a", "a"] — a square shape: the same parameter "a" must satisfy both dimensions.

What counts as a shape: ShapeView

The contract methods accept anything that implements ShapeView. Out-of-the-box that includes:

  • &[usize], &[usize; D]
  • &[u32], &[u32; D]
  • &[i32], &[i32; D]
  • &Vec<usize>, &Vec<u32>, &Vec<i32>

With features = ["burner"] enabled it also includes:

  • burner::prelude::Shape, &burner::prelude::Shape
  • &burner::prelude::Tensor<_, _, _>

The last one is the common case: you hand a &Tensor directly to the macro and the contract reads its shape.

Asserting and Unpacking

The three day-to-day contract macros, plus the hoisting helper for naming a contract once and reusing it.

Unpacking: unpack_shape_contract!

unpack_shape_contract! is the workhorse. It matches a shape against a pattern, solves for unknown parameters, and returns the keys you ask for as a fixed-size array.

#![allow(unused)]
fn main() {
use bunsen::contracts::unpack_shape_contract;
let shape = [12, 3 * 4, 5 * 4, 3];

// In release builds this benchmarks at ~160 ns.
let [b, h_wins, w_wins, c] = unpack_shape_contract!(
    [
        "batch",
        "height" = "h_wins" * "window_size",
        "width"  = "w_wins" * "window_size",
        "channels",
    ],
    &shape,
    &["batch", "h_wins", "w_wins", "channels"],
    &[("window_size", 4)],
);

assert_eq!((b, h_wins, w_wins, c), (12, 3, 4, 3));
}

The arguments, in order, are:

  1. The contract pattern (or the name of a pre-defined contract).
  2. The shape (anything ShapeView).
  3. The keys to unpack — a &[&str] of length K; the macro returns [usize; K] in the same order.
  4. The bindings — a &[(&str, usize)] of parameters whose values are known up front.

Two shorthand forms are accepted:

  • No bindings. When every parameter can be solved from the shape, the bindings slice may be omitted:

    #![allow(unused)]
    fn main() {
    use bunsen::contracts::unpack_shape_contract;
    let shape = [1, 2, 3, 4 * 2, 5 * 2, 3];
    let [h, h_win, w, w_win, ws, c] = unpack_shape_contract!(
        [..., "h" = "h_win" * "ws", "w" = "w_win" * "ws", "c"],
        &shape,
        &["h", "h_win", "w", "w_win", "ws", "c"],
    );
    }
  • Keys are the pattern. When the pattern is just a list of bare identifiers, you can drop the keys slice as well:

    #![allow(unused)]
    fn main() {
    use bunsen::contracts::unpack_shape_contract;
    let shape = [4, 5, 3];
    let [h, w, c] = unpack_shape_contract!(["h", "w", "c"], &shape);
    assert_eq!((h, w, c), (4, 5, 3));
    }

If the shape does not match, or if the bindings are inconsistent with the shape, the macro panics. See Error Messages for what that panic looks like and the try_* variants that return a Result instead.

Asserting without unpacking: assert_shape_contract!

When you only care about validating a shape and don’t need any dimensions back, use assert_shape_contract!:

#![allow(unused)]
fn main() {
use bunsen::contracts::assert_shape_contract;
let shape = [1, 2, 3, 4 * 2, 5 * 2, 3];

assert_shape_contract!(
    [..., "h" = "h_win" * "ws", "w" = "w_win" * "ws", "c"],
    &shape,
    &[("ws", 2)],
);
}

Same panic-on-mismatch semantics as unpack_shape_contract!. The non-panicking form is ShapeContract::try_assert_shape.

Hoisting a contract: define_shape_contract!

If the same contract is used in more than one place — or you simply want to read it once at the top of a function — bind it to a static with define_shape_contract! and pass the name into the assert or unpack macros:

#![allow(unused)]
fn main() {
use bunsen::contracts::{
    assert_shape_contract,
    define_shape_contract,
    unpack_shape_contract,
};
let shape = [1, 2, 3, 4 * 2, 5 * 2, 3];
define_shape_contract!(
    CONTRACT,
    [..., "h" = "h_win" * "ws", "w" = "w_win" * "ws", "c"],
);

assert_shape_contract!(CONTRACT, &shape, &[("ws", 2)]);

let [h, h_win, w, w_win, c] = unpack_shape_contract!(
    CONTRACT,
    &shape,
    &["h", "h_win", "w", "w_win", "c"],
    &[("ws", 2)],
);
}

shape_contract![...] itself is a const-evaluable expression, so the static carries no runtime construction cost.

For the hot-path variants of these macros — assert_shape_contract_periodically! and the #[cfg(debug_assertions)] pattern — see Cost Control.

Cost Control

bunsen::contracts is designed to stay enabled in release builds, but a 160 ns check inside a tight inner loop still adds up. Two mechanisms exist to keep contracts on while bringing their cost down: an exponential-backoff periodic check, and a #[cfg(debug_assertions)] gate that strips the check entirely from release builds.

Periodic asserts

assert_shape_contract_periodically! wraps an assert_shape_contract! in run_periodically!: the assertion runs the first 10 calls, then on a doubling schedule (every 16, 32, 64, … calls) until it reaches a configurable maximum period (default 1000).

The amortized cost drops to ~4 ns/call in release builds:

assert_shape_every_nth   time: [4.4057 ns 4.4769 ns 4.5726 ns]

It takes the same arguments as assert_shape_contract!:

#![allow(unused)]
fn main() {
use bunsen::contracts::assert_shape_contract_periodically;
let shape = [1, 2, 3, 4 * 2, 5 * 2, 3];

assert_shape_contract_periodically!(
    [..., "h" = "h_win" * "ws", "w" = "w_win" * "ws", "c"],
    &shape,
    &[("ws", 2)],
);
}

The typical pattern in a module method is:

  1. Unpack the inputs once at the top — you need the dimensions anyway, and the assertion comes free.
  2. Periodically assert intermediate or output shapes you don’t need to unpack, so violations are still caught without paying the full cost on every call.

Debug-only contracts: #[cfg(debug_assertions)]

Periodic asserts amortize cost; sometimes you’d rather pay zero cost in release and accept that the check only runs in development builds. The standard Rust way to do that is #[cfg(debug_assertions)], which gates an item on the same flag that controls debug_assert!. The flag is on for cargo build / cargo test and off for cargo build --release.

Applied to a contract, this strips the entire macro call — pattern parsing, binding solve, everything — from release builds:

pub fn next_interior_3d<B: Backend>(
    state: Tensor<B, 3, Bool>,
    rules: &LifeRules,
) -> Tensor<B, 3, Bool> {
    // Debug only: unpack [H, W, Z] for the output check below.
    #[cfg(debug_assertions)]
    let [h, w, z] = unpack_shape_contract!(["h", "w", "z"], &state.dims());

    // ... do work, producing `update` with shape [H-2, W-2, Z-2] ...
    let update = state;

    // Debug only: confirm the interior shrink happened correctly.
    #[cfg(debug_assertions)]
    assert_shape_contract_periodically!(
        ["h" - "pad", "w" - "pad", "z" - "pad"],
        &update.dims(),
        &[("h", h), ("w", w), ("z", z), ("pad", 2)],
    );

    update
}

This is the pattern used by bunsen::kits::sims::conway::life3d for its interior-step kernel.

A few things to note:

  • Bindings produced by a gated unpack are themselves gated. Above, h, w, and z only exist in debug builds. Anything that consumes them — the downstream periodic assert — must be gated too, or release builds won’t compile. This is usually a feature: the compiler tells you when you accidentally depend on a debug-only binding outside its #[cfg].
  • Pair #[cfg(debug_assertions)] with the unpack/assert that produced its bindings. Don’t sprinkle the attribute across only half of a chain.
  • Combine freely with assert_shape_contract_periodically!. Even inside a debug-only branch, a periodic check is still worthwhile if the function is called many times in tests or development runs: the amortization keeps debug builds responsive.

Choosing the cost model

bunsen::contracts gives you three strategies for managing the runtime cost of a check; they’re not exclusive, and a function often uses more than one:

StrategyDebug costRelease costUse when
assert_shape_contract! / unpack_shape_contract!full (~160 ns)full (~160 ns)The check is on a cold or once-per-batch path, or you want production failures.
assert_shape_contract_periodically!amortized (~4 ns/call)amortized (~4 ns/call)The check is on a hot path but you still want some coverage in production.
#[cfg(debug_assertions)] on either of the abovefull or amortizedzeroThe check is purely a developer aid; the calling code is trusted in release, or the cost is unacceptable on any hot path.

The default should be “no #[cfg]”: contracts are designed to stay on. Reach for #[cfg(debug_assertions)] when you’ve measured a cost you can’t afford or when the check exists only to catch upstream bugs that release builds cannot introduce.

Putting it together

A canonical use site from the window_partition example in the mod docs:

use burner::prelude::{Tensor, Backend};
use burner::tensor::BasicOps;
use bunsen::contracts::{
    assert_shape_contract_periodically,
    unpack_shape_contract,
};

/// Window Partition
///
/// ## Parameters
///
/// - `tensor`: Input tensor of shape (B, h_wins * window_size, w_wins * window_size, C).
/// - `window_size`: Window size.
///
/// ## Returns
///
/// Output tensor of shape (B * h_windows * w_windows, window_size, window_size, C).
pub fn window_partition<B: Backend, K>(
    tensor: Tensor<B, 4, K>,
    window_size: usize,
) -> Tensor<B, 4, K>
where
    K: BasicOps<B>,
{
    // Validate the input and pull out what we need. ~160 ns.
    let [b, h_wins, w_wins, c] = unpack_shape_contract!(
        [
            "batch",
            "height" = "h_wins" * "window_size",
            "width"  = "w_wins" * "window_size",
            "channels",
        ],
        &tensor,
        &["batch", "h_wins", "w_wins", "channels"],
        &[("window_size", window_size)],
    );

    let tensor = tensor
        .reshape([b, h_wins, window_size, w_wins, window_size, c])
        .swap_dims(2, 3)
        .reshape([b * h_wins * w_wins, window_size, window_size, c]);

    // Amortized check on the output. ~4 ns averaged.
    assert_shape_contract_periodically!(
        [
            "batch" * "h_wins" * "w_wins",
            "window_size",
            "window_size",
            "channels",
        ],
        &tensor,
        &[
            ("batch", b),
            ("h_wins", h_wins),
            ("w_wins", w_wins),
            ("window_size", window_size),
            ("channels", c),
        ],
    );

    tensor
}

The input contract unpacks the dimensions the function needs; the output contract asserts — periodically — that the reshape sequence produced what the docstring promised.

Error Messages

When a contract fails, the panic message names exactly what didn’t fit. Producing useful diagnostics is the main user-visible reason to write a contract in the first place; the unpacking and the performance work exist to make using contracts cheap enough that you reach for them by default.

Anatomy of a failure

A failing contract reports the shape, the pattern that was expected, the bindings in scope, and the specific dimension that did not solve:

#![allow(unused)]
fn main() {
use bunsen::contracts::{ShapeContract, shape_contract};
use indoc::indoc;
static CONTRACT: ShapeContract = shape_contract![
    ...,
    "height" = "h_wins" * "window",
    "width"  = "w_wins" * "window",
    "color",
];

let h_wins = 2;
let w_wins = 3;
let window = 4;
let color  = 3;

let shape = [1, 2, 3, h_wins * window, w_wins * window, color];

// Passes: window=4 is consistent with the shape.
let [h, w] = CONTRACT.unpack_shape(
    &shape,
    &["h_wins", "w_wins"],
    &[("window", window), ("color", color)],
);
assert_eq!((h, w), (h_wins, w_wins));

// Fails: we lied about window. The error names the offending dim.
assert_eq!(
    CONTRACT
        .try_unpack_shape(
            &shape,
            &["h_wins", "w_wins"],
            &[("window", window + 1), ("color", color)],
        )
        .unwrap_err(),
    indoc! {r#"
        Shape Error:: 8 !~ height=(h_wins*window) :: No integer solution.
         shape:
          [1, 2, 3, 8, 12, 3]
         expected:
          [..., height=(h_wins*window), width=(w_wins*window), color]
          {"window": 5, "color": 3}"#
    },
);
}

Three things to notice in the error:

  • the dimension is named (height=(h_wins*window)), not just indexed,
  • the offending value (8) and the offending pattern are paired up,
  • the active bindings ({"window": 5, "color": 3}) are printed so you can see what the contract was working with.

This is the payoff for writing the contract in the first place: when something goes wrong, the panic message tells you which invariant broke and with what numbers, instead of a bare assertion failed: lhs == rhs from somewhere downstream.

Panic vs Result

The macros (assert_shape_contract!, unpack_shape_contract!, assert_shape_contract_periodically!) panic on mismatch. That’s the right default for the “contract guards a function boundary” use case — a violation indicates a bug, and a panic is the loudest, most-debuggable response.

When you want the failure as a value instead, use the underlying methods on ShapeContract:

The Err payload is exactly the multi-line message shown above, so you can log it, surface it to users, or run snapshot tests against it (the example here uses indoc! for exactly that purpose).

Ops Overview

bunsen::ops is a functional tensor API: standalone functions and small configuration value-objects that extend the surface of burn::tensor::Tensor itself. It’s where the utilities go that you’d reach for inside a Module::forward body but that don’t have a home on Tensor upstream.

Nothing in ops owns trainable parameters — that’s the job of bunsen::blocks. The two layers compose: a block bundles a parametric layer (a Linear, a Conv2d, …) with a sequence of ops::* calls in its forward.

API: https://docs.rs/bunsen/latest/bunsen/ops/

How ops fits in the stack

   contracts        validate shapes between layers
       │
       ▼
   ops              pure functions over Tensors          ◄── you are here
       │
       ▼
   blocks           stateful parameter-owning Modules
       │
       ▼
   kits             whole models built from blocks

A blocks forward typically pulls in two or three ops calls around its parametric core. Lifting those into named functions keeps the block readable and makes the underlying math testable in isolation — the ops are what get unit-tested, the block asserts they’re wired correctly.

Map of the module

The contents are split across two sub-chapters by character:

  • Tensor Functions — element-wise and shape-level helpers: range generation (arange), clamping (ClampOp), dropout (drop), noise (NoiseConfig), RMS normalization (rms_norm), and repeat_interleave.
  • Convolution Support — the larger conv submodule: shape arithmetic for conv outputs, functional convolution (convolve_func_2d) for kernels that aren’t linear, and helper filters.

Value-objects as configuration

Several of the “options” types in ops (notably ClampOp and NoiseConfig) are designed to be embedded in Config structs elsewhere in the crate. That’s the intentional shape of the seam: when an op has parameters worth naming, it has a value-object, and that value-object is the unit of configuration. A Module::Config can then #[config(default = ...)] one of these directly instead of duplicating its fields.

Tensor Functions

The element-wise, shape-level, and generation helpers in bunsen::ops. Organized by intent rather than alphabet.

Tensor generation

arange — floating-point ranges

arange fills the gap between integer Tensor::arange and the numpy.arange / numpy.linspace family. The functions come in two flavours:

  • Host-side Vec<f64> builders when you want values to feed into other configuration:
    • float_vec_arange(start, end, step)
    • float_vec_linspace(start, end, n)
  • Device-side Tensor builders when you want the values on the same backend as the rest of your computation:
    • float_arange::<B>(start, end, step, &device)
    • float_linspace::<B>(start, end, n, &device)

step is optional on arange; it defaults to 1.0 for ascending ranges and -1.0 for descending.

noise — distribution + clamp value-objects

NoiseConfig bundles a burn::tensor::Distribution with an optional ClampOp into a single reusable value:

use bunsen::ops::{clamp::ClampOp, noise::NoiseConfig};
use burn::tensor::Distribution;

let cfg = NoiseConfig::default()
    .with_distribution(Distribution::Normal(0.0, 1.0))
    .with_clamp(ClampOp::new(Some(-3.0), Some(3.0)));

// Materialize:
let t  = cfg.noise::<B, _, 3>([batch, channels, length], &device);
let t2 = cfg.noise_like(&existing_tensor);

noise() takes a shape and a device; noise_like(t) takes another tensor and matches its shape and device. The clamp, when present, is applied in the same pass.

Element-wise transforms

clamp — optional min and max

ClampOp captures an optional minimum and an optional maximum as a single serializable value, with .clamp(tensor) to apply it. It’s designed for places clamping shows up as a setting — a knob on a noise generator, an entry in a serialized config — not just a one-shot call:

let op = ClampOp::new(Some(0.0), None);   // ReLU-style: clamp below at 0
let y  = op.clamp(x);

ClampOp is Config-friendly: it implements Serialize, Deserialize, and ModuleDisplay, so it drops cleanly into upstream #[derive(Config)] blocks.

drop — functional dropout

dropout(prob, input) is the functional dropout op: at prob == 0.0 it short-circuits to the input unchanged; otherwise it samples a Bernoulli mask and rescales by 1 / (1 - prob). Use this when you need a single call inside a free function; for a trainable layer with its own toggle state, use the modules in bunsen::blocks::images::drop.

Normalization

rms_norm — parameter-free RMS normalization

rms_norm(input, &options) applies RMSNorm without a trainable gain. Configured via RmsNormOptions, which carries the epsilon (with_eps(...)).

This is the right call when you need RMSNorm inside a free function or unit test. The parametric layer with a learned gain lives in burn::nn::norm::RmsNorm; the two share the same numerics.

Shape transforms

repeat_interleave — NumPy-style interleaving

repeat_interleave(input, repeats, dim) repeats elements along a single axis, in-place rather than tiled. Given [a, b, c] and repeats = 2, it returns [a, a, b, b, c, c] — matching NumPy / PyTorch semantics, including negative indexing for dim.

Convolution Support

bunsen::ops::conv collects everything around convolution that isn’t itself a trainable layer: shape arithmetic, a functional convolution kernel, and a few helper filters.

Shape arithmetic

Given a Conv*d’s kernel size, stride, padding, and dilation, what output shape does it produce? burn answers this implicitly when you call forward; bunsen::ops::conv gives you the same answer as a pure function, which is what you reach for inside Module::Config methods that need to advertise output sizes before any tensors exist.

General N-D

  • maybe_conv_output_shape::<D>(input_shape, kernel, stride, padding, dilation) — const-generic over the spatial rank D. Returns Option<[usize; D]>None if the configuration is impossible (e.g., kernel larger than the padded input).
  • expect_conv_output_shape::<D>(...) — same but panics on impossible configurations, with a message naming the offending dimension.

The _dyn siblings (maybe_conv_output_shape_dyn, expect_conv_output_shape_dyn) take slices instead of const-generic arrays, for cases where the rank isn’t known at compile time.

1-D scalar case

  • maybe_conv1d_output_size(input, kernel, stride, padding, dilation) — the per-axis math broken out as a single scalar function, for when you only care about one dimension.
  • expect_conv1d_output_size(...) — same, panicking.

Common shortcuts

  • stride_div_output_resolution(input_resolution, stride) — the “downsample by stride” arithmetic that comes up in ResNet-style blocks, with a check that the input is a multiple of the stride.
  • get_square_conv2d_padding(kernel) — the “same-padding for a square odd kernel” formula.
  • build_square_conv2d_padding_config(kernel) — the same result wrapped in burn::nn::PaddingConfig2d, for handing straight to a Conv2dConfig.

Functional convolution

convolve_func_2d

convolve_func_2d folds a user-supplied closure across the windows of a 2-D convolution-shaped iteration:

// Pseudocode of the contract.
fn convolve_func_2d<B, KIn, KOut, F>(
    input: Tensor<B, 4, KIn>,
    kernel_size: [usize; 2],
    stride: [usize; 2],
    padding: [usize; 2],
    f: F,                          // window -> per-window output
) -> Tensor<B, 4, KOut>
where
    F: FnMut(Tensor<B, 4, KIn>) -> Tensor<B, 4, KOut>;

The closure receives one conv window at a time and returns the corresponding output. This is the starting point for kernels that aren’t expressible as a linear Conv2d: median filters, mode filters, custom per-window scoring, and so on. If your closure is linear, you almost certainly want burn::nn::conv::Conv2d instead — this function intentionally trades performance for flexibility.

Filters

conv2d_kernel_midpoint_filter

conv2d_kernel_midpoint_filter is a kernel that picks the spatial mid-point sample from each window — cheap to compute, useful as a stride-aware “downsample to the centre pixel” filter and as a sanity-check baseline for more sophisticated kernels.

Blocks Overview

bunsen::blocks is the library of reusable burn::module::Module components — the parts you compose into larger models. Where bunsen::ops supplies pure functional tensor operations, blocks supplies the stateful layers that own trainable parameters. Where bunsen::kits supplies whole end-to-end models, blocks supplies the sub-modules those kits are assembled from.

A typical user picks a kit; an author building a new model reaches into blocks and stitches it together.

API: https://docs.rs/bunsen/latest/bunsen/blocks/

Map of the module

blocks is organized by domain: each major sub-chapter covers one domain’s worth of building blocks.

  • Transformers — attention (CausalSelfAttention, scaled-dot-product attention helpers, KVCache) and positional embedding (RotaryEmbedding).
  • Images — convolutional composites (ConvNorm2d, CNA2d), patch tokenization (PatchEmbed), same-padding pooling (AvgPool2dSame), and stochastic regularization layers (DropBlock, DropPath).

Conventions used across blocks

Every block follows the patterns described in Building Reusable Modules:

  • {Block}Meta trait when other modules will need to introspect this one at runtime. CausalSelfAttentionMeta is implemented on both CausalSelfAttentionConfig and CausalSelfAttention<B>, so a parent transformer can ask n_head / head_dim of whichever form it’s holding without copying metadata.
  • Contract → Structure config split when the user-facing knobs differ from the implementation parameter list. The ResidualBlock triple in bimm::resnet is the in-tree reference example.
  • Inline shape contracts at module boundaries via bunsen::contracts. Every block’s forward documents its shape in the docstring; the matching unpack_shape_contract! / assert_shape_contract_periodically! calls turn that docstring into a runtime check.

Where blocks come from

Most blocks land here in one of three ways:

  1. Direct ports. Implementations of well-known layers from the timm / torchvision / reference-paper ecosystems, kept in burn form so they’re available across the entire bunsen stack.
  2. Extraction from kits. When a layer in a kits model is reusable beyond that one model, it gets promoted out of the kit and into blocks.
  3. burn-gap fills. Layers that aren’t in burn core today but are needed by everything else (e.g., AvgPool2dSame).

The result is a curated catalog, not a comprehensive one — new blocks land here when a kit or downstream user needs them.

Transformer Blocks

bunsen::blocks::transformers collects the building blocks used by transformer-family models — attention layers, their caching machinery, and positional embeddings.

API: https://docs.rs/bunsen/latest/bunsen/blocks/transformers/

Attention

The attention submodule houses the attention layers themselves and the helpers they’re built from.

CausalSelfAttention

CausalSelfAttention is multi-head causal self-attention with optional KV-grouping. The config carries:

  • n_head — number of query heads,
  • n_kv_head — number of key/value heads (must divide n_head; equals n_head for plain MHA, less for grouped-query attention),
  • n_embed — embedding dimension,
  • a pluggable NormalizationConfig applied inside the block.

The module exposes a CausalSelfAttentionMeta trait, implemented on both the config and the live module. Parents can read n_head, n_kv_head, and head_dim of whichever form they’re holding, so larger transformers don’t need to cache those numbers themselves. This is the pattern documented in Building Reusable Modules.

forward takes the input embedding plus an optional &mut KVCache for autoregressive decoding. When the cache is None the layer runs in training/prefill mode and recomputes K and V each call; when it’s Some, K and V are appended into the cache and read back across the full sequence.

KVCache

KVCache is the per-layer key/value tensor cache for fast incremental decoding. Built from a KVCacheConfig carrying batch_size, num_heads, seq_len, head_dim, and num_layers, it provides:

  • pos() — the current write head position,
  • prefill(...) — bulk-load K/V from a prompt encode,
  • insert_kv(...) — append a single decoded step’s K/V,
  • reset() — rewind to position 0 without reallocating.

NanoChatGpt uses one shared KVCache across all its layers; see bunsen::kits::gpts::nanochat for the integrated example.

Scaled-dot-product helpers

When you need to wire attention by hand — for a custom block, a fused-kernel experiment, or unit tests — the functional API is available:

  • scaled_dot_product_attention — the full SDPA op given Q, K, V and an optional mask/bias.
  • sdpa_attn_weight — just the softmax-of-scaled-QK^T factor.
  • sdpa_bias — build an additive bias tensor (causal mask, ALiBi, etc.) of the right shape for SDPA.

Embedding

The embedding submodule collects positional embeddings.

RotaryEmbedding

RotaryEmbedding is RoPE with a precomputed frequency table:

  • RotaryEmbeddingConfig::new(seq_len, head_dim) then .init(device) allocates the table once for the maximum sequence length.
  • apply(q, k) rotates query and key tensors.
  • clip_range(t0..t1) returns a sliced view for serving a partial sequence — the natural fit for KV-cache decoding, where each step only needs the rotations for the new positions.
  • cast(dtype) converts the precomputed table between float dtypes without recomputing the trigonometric values.

The free functions inverse_frequency_table and positional_frequency_table are exposed for callers that want to build their own variant of rotary embedding without going through the packaged module.

Image Blocks

bunsen::blocks::images collects the building blocks used by 2-D vision models — convolutional composites, patch tokenization, same-padding pooling, and stochastic-depth-style regularization layers.

API: https://docs.rs/bunsen/latest/bunsen/blocks/images/

Conv composites

The conv submodule packages the conv-plus-something composites that show up across ResNet, EfficientNet, and friends.

ConvNorm2d

ConvNorm2d is the standard Conv2d + BatchNorm pairing with a single forward. Beyond the convenience of one module instead of two, it carries zero_init_norm() — the “zero-initialize the last batch norm in a residual branch” trick used by ResNet and successors to make residual branches start as identities.

CNA2d

CNA2d is the more general Conv / Norm / Activation block. Beyond the basic forward, it provides:

  • match_norm_features() — adapts a generic NormalizationConfig (BatchNorm::new(0), RmsNorm::new(0), etc.) to the right channel count after the conv. Lets callers pass a norm config without knowing the channel count yet.
  • map_forward(f) — runs the conv and norm, then hands the intermediate tensor to a user closure before the activation. Useful for inserting attention, channel reweighting, or per-residual side-effects without copying out the rest of the block.

Patching

patching holds patch tokenization, the entry point for transformer-style vision models.

PatchEmbed

PatchEmbed is ViT-style patch tokenization: it takes an image, splits it into non-overlapping patches, and projects each patch into an embedding, producing a sequence of tokens of width embed_dim.

Pooling

pool holds the pooling layers that don’t fit burn’s defaults.

AvgPool2dSame

AvgPool2dSame is TensorFlow-style same padding for average pooling — asymmetric where needed to keep the spatial dimensions of the output aligned with ceil(input / stride). Helpers get_same_padding and pad_same are exposed for the underlying arithmetic, useful when you’re matching a Keras / TF-Slim reference implementation.

Stochastic regularization

drop collects regularization layers that drop structured pieces of the activations rather than individual scalars.

DropBlock

DropBlock is structured spatial dropout from Ghiasi et al., 2018: instead of dropping independent pixels, it drops contiguous blocks of activations. For convnets this acts as a substantially stronger regularizer than plain dropout, because adjacent pixels are highly correlated and independent dropout barely removes information.

DropPath

DropPath is stochastic depth from Huang et al., 2016: with some probability, the entire residual branch is zeroed for a given sample, so the network sees a shorter effective depth on each training step.

Supporting types

  • progressive_dpr — rate table that linearly ramps drop rates over a network’s depth, matching the SWIN V2 / timm convention of giving deeper blocks higher drop probabilities.
  • SizeConfig — a small enum describing a size as either Default, a Ratio(f64), or a Fixed(usize). Used by the drop layers when the effective region size is relative to a wrapped layer’s spatial dimensions.

bunsen::kits::bimm

Bunsen/Burn Image Models — an incremental port of the timm (Torch Image Models) ecosystem to burn.

bimm is the home for full image-recognition model families inside bunsen. Each model lives next to its prefab configurations and pretrained-weight loaders, so picking one up is closer to “pick a prefab, fetch weights, init” than “wire together a stack of blocks.”

API: https://docs.rs/bunsen/latest/bunsen/kits/bimm/

Current models

ResNet

The classic ResNet family (arXiv:1512.03385), with prefab configurations and a pretrained-weight loader that pulls from the torchvision checkpoints.

A representative usage:

#![allow(unused)]
fn main() {
use bunsen::{
    cache::DiskCacheConfig,
    kits::bimm::resnet::{PREFAB_RESNET_MAP, ResNet},
};
use burn::backend::Flex;

let device = Default::default();

let prefab = PREFAB_RESNET_MAP.expect_lookup_prefab("resnet18");

let weights = prefab
    .expect_lookup_pretrained_weights("tv_in1k")
    .fetch_weights(&DiskCacheConfig::default())
    .expect("Failed to fetch weights");

let model: ResNet<Flex> = prefab
    .to_config()
    .to_structure()
    .init(&device)
    .load_pytorch_weights(weights)
    .expect("Failed to load weights")
    // Re-head to 10 classes:
    .with_classes(10)
    // Stochastic block drops for training:
    .with_stochastic_drop_block(0.2)
    // Stochastic depth for training:
    .with_stochastic_path_depth(0.1);
}

Swin Transformer V2

The Swin Transformer V2 family (reference implementation), implemented in terms of windowed self-attention blocks, patch merging, and relative-position biases.

See the bunsen::kits::bimm::swin::v2 module for the full configuration API.

bunsen::kits::gpts

Full GPT / LLM variants. Where bunsen::blocks provides reusable transformer sub-modules, gpts is for whole language-model architectures: end-to-end models, tokenizer wiring, and the training/inference surface around them.

API: https://docs.rs/bunsen/latest/bunsen/kits/gpts/

Current models

nanochat

A compact GPT in the spirit of the “nano” GPT lineage — small enough to train on modest hardware, opinionated enough to be a useful reference implementation.

The model lives in bunsen::kits::gpts::nanochat and is split into:

  • the per-layer MLP,
  • the transformer block (attention + MLP + norms),
  • the full model wrapper that stacks the blocks and adds embedding and head layers.

gpts is a work in progress; further GPT/LLM variants will land here as the port from upstream reference implementations progresses.

bunsen::kits::sims

Iterative tensor simulations. These are runnable physics / cellular kernels expressed as tensor operations: each step is a pure function from one state tensor to the next, which makes them straightforward to batch, GPU-accelerate, and feed back into ML pipelines.

API: https://docs.rs/bunsen/latest/bunsen/kits/sims/

Current simulations

conway — Conway’s Game of Life

Cellular-automaton implementations expressed as windowed sums over a boolean state tensor:

  • life2d — classic 2D Conway’s Game of Life over an board.
  • life3d — a 3D generalization over an board, with a configurable spawn/survive ruleset (LifeRules).

Both expose an next_interior_* step kernel that takes the current state and produces the next interior state, plus padding-aware wrappers for the boundary.

lbm::d2q9 — Lattice-Boltzmann Fluid

A 2D, 9-velocity (D2Q9) lattice-Boltzmann fluid solver (Wikipedia). The simulation is split into the orthogonal operations that compose a single LBM step:

  • streaming — particle distributions propagate to neighbour cells along their velocity directions,
  • collision — on-cell relaxation toward equilibrium,
  • reflection — boundary handling,
  • thermal and relaxation — configurable physics parameters,
  • simulation — the driver that runs a sequence of steps.

Each piece is a tensor kernel; the whole thing runs on any burn backend.

Burner Overview

bunsen::burner is the layer that sits next to burn itself. Where bunsen::blocks gives you new Modules and bunsen::kits gives you whole models, burner gives you the infrastructure for working with burn modules, optimizers, and records at a level burn’s default surface doesn’t expose.

API: https://docs.rs/bunsen/latest/bunsen/burner/

The module currently contains:

  • descriptors — type-erased descriptors for tensors and parameters. A TensorParamDesc captures the metadata of any Param<Tensor<B, R, K>> (its ParamId, Shape, rank, dtype, kind) without carrying the generics that the underlying tensor type does. This is the lingua franca used by the reflection and optimizer machinery to talk about parameters uniformly.

  • module — module-side helpers: a type-mapper for Module field re-typing, and (under features = ["reflection"]) the XML/XPath reflection layer documented in Module Introspection.

  • optim — optimizer extensions (under features = ["train"]). The headline feature is the GroupOptimizerAdaptor{N} family, which lets you mount multiple optimizers on a single Module, each driving a disjoint group of parameters. Covered in Composite Optimizers.

  • record — helpers for working with burn::record types.

  • tensor — tensor helpers that don’t fit neatly in bunsen::ops, including a DataView abstraction.

  • distribution — distribution-related utilities.

When to reach for burner

Most code that uses bunsen won’t import from burner at all — the ops, blocks, and contracts are what you write models against. You reach for burner when:

  • you need to introspect a model to drive something else: split parameters into groups, audit shapes, build a parameter manifest;
  • you need to compose optimizers (e.g., Muon for matrix parameters, AdamW for everything else; different learning rates per group);
  • you need to carry tensor / parameter metadata in non-generic code paths (a TensorParamDesc instead of a Param<Tensor<B, R, K>>);
  • you need to manipulate records outside of what the derive macros give you for free.

The two largest user-facing surfaces — the reflection / XPath query machinery and the group-optimizer family — get their own chapters in this section.

Module Introspection

When you want to do something with a subset of a model’s parameters — group them for different optimizers, apply weight decay to some but not others, audit which tensors of which shapes a model contains — burn itself gives you very little to work with. A Module is a Rust type; it tells the compiler about its sub-modules but doesn’t expose a queryable structure at runtime.

bunsen::burner::module::reflection fills that gap. It walks a Module and produces an XML document mirroring its structure, then hands you an XPath-based query API to select pieces of it. The result is that “select every rank-2 weight tensor in this submodule” becomes a one-line query instead of a hand-rolled visitor.

This chapter covers the user-facing surface. The full reference is at https://docs.rs/bunsen/latest/bunsen/burner/module/reflection/ and the module rustdoc has a long, executable walkthrough that exercises every method.

Enable with features = ["reflection"].

Building a tree

XmlModuleTree::build(&module) walks a &impl Module<B> and produces the XML model:

use bunsen::burner::module::reflection::XmlModuleTree;

let module: Linear<B> = LinearConfig::new(2, 3).init(&device);
let mut mtree = XmlModuleTree::build(&module);

mtree.to_xml(true) dumps the underlying document pretty-printed, which is the right first step when you’re figuring out what to query against. For a single Linear it looks like:

<XmlModuleTree version="...">
  <Structure>
    <Linear id="n:1" class="struct">
      <Param id="n:2" name="weight" param_id="..."
             class="tensor" kind="Float" dtype="..." shape="2 3" rank="2"/>
      <Param id="n:3" name="bias"   param_id="..."
             class="tensor" kind="Float" dtype="..." shape="3"   rank="1"/>
    </Linear>
  </Structure>
</XmlModuleTree>

A few things to notice:

  • The query “context” is always /XmlModuleTree/Structure; that’s the implicit prefix every query starts under.
  • Each structural element has the type name as its tag (Linear, Vec, Array, Tuple, …) and a class attribute (struct, builtin, …) distinguishing user-defined modules from the container types burn’s derive uses.
  • Each parameter is a <Param> leaf with name, param_id, kind, dtype, shape, and rank attributes.
  • Container children (Vec, Array, Tuple) have no @name; they have positional indices instead (XPath indexes from 1).

Querying

mtree.query() returns an XPathModuleQuery you can chain methods against:

  • .select(expr) — append "/expr" to the current XPath.
  • .filter(expr) — append "[expr]".
  • .params() — descend to all Param elements (shorthand for descendant-or-self::Param).

Terminators:

  • .to_param_ids()Result<Vec<ParamId>, _>.
  • .to_param_descs()Result<Vec<TensorParamDesc>, _>.
  • .to_fragments(pretty)Result<Vec<String>, _>, the matched nodes serialized as XML. Useful for debugging the query itself.
  • .expr() — the XPath string accumulated so far.

XmlModuleTree also offers convenience shortcuts that wrap the common cases:

  • mtree.param_ids() — every ParamId in the tree.
  • mtree.param_descs() — every TensorParamDesc in the tree.
  • mtree.select(expr) — equivalent to mtree.query().select(expr).
  • mtree.select_params(expr)select(expr) then .params(), which is the right starting point for almost every parameter-selection query.

A small XPath crib

You don’t need to learn all of XPath to use this. The pieces that come up:

PatternMeaning
LinearChildren of the current context named Linear.
*All children, regardless of name.
*//LinearAll Linear descendants (anywhere below).
*[@name='weight']Children whose @name attribute is weight.
*[@rank=2]Children whose @rank attribute is 2.
*[2]The second child (1-indexed).
descendant-or-self::ParamEvery <Param> at or below the current context (what .params() does for you).

Predicates can be combined: *[@name='weight'][@rank=2] selects 2-D weights.

Worked example

A Linear module wrapped in some container shapes:

let module = (
    LinearConfig::new(2, 3).init::<B>(&device),
    [LinearConfig::new(4, 5).init::<B>(&device)],
    vec![
        LinearConfig::new(6, 7).init::<B>(&device),
        LinearConfig::new(8, 9).init::<B>(&device),
    ],
);

let mut mtree = XmlModuleTree::build(&module);

// All Linear modules anywhere in the tree:
let linear_ids = mtree
    .select("*//Linear")
    .params()
    .to_param_ids()?;

// Just the 2-D weight tensors (skips bias, which is rank-1):
let weight_ids = mtree
    .query()
    .params()
    .filter("@rank=2")
    .to_param_ids()?;

// Everything under the third top-level child (the Vec) — by position:
let vec_param_ids = mtree
    .select("*/*[3]")
    .params()
    .to_param_ids()?;

When to use it

Reflection is heavier than just calling fields on a module. Reach for it when:

  • you need a set of ParamIds defined by structure rather than by the variable names in your code (e.g., “every rank-2 weight under the transformer blocks”), or
  • you’re writing tooling that doesn’t know the model’s type up front and needs to walk it generically.

For the typical “give these layers a different learning rate” use case, this machinery feeds directly into the GroupOptimizerAdaptor{N} family.

Composite Optimizers

burn’s Optimizer<M, B> trait owns a single optimizer that touches every parameter in a Module. For real training runs that isn’t always what you want:

  • 2-D matrix parameters benefit from Muon, while embeddings and scalars are better served by AdamW.
  • Different parameter groups want different learning-rate schedules (the NanoChat recipe scales lm_head and embedding learning rates differently, and applies a d_model factor on top).
  • You may want different weight-decay or beta settings per group.

bunsen::burner::optim provides the GroupOptimizerAdaptor{N} family: a single Optimizer<M, B> that mounts N kinds of optimizer, each with one or more parameter groups, dispatching each parameter’s gradient to the optimizer it belongs to. Pair it with the Module Introspection machinery and you can carve a model into parameter groups with XPath queries.

Enable with features = ["train"].

API: https://docs.rs/bunsen/latest/bunsen/burner/optim/

The building blocks

OptimizerGroup<B, O>

One group — a HashSet<ParamId> plus an optimizer of type O plus an optional per-group LrSelector for learning-rate mapping:

use bunsen::burner::optim::OptimizerGroup;

let group = OptimizerGroup::from_adaptor(
    param_ids,                       // anything IntoIterator<Item = ParamId>
    &AdamWConfig::new()
        .with_weight_decay(0.01)
        .init::<B, MyModel<B>>(),
)
.with_fixed_lr(3e-4);                // or .with_lr_selector(closure)

.with_lr_selector(closure) takes any FnMut(LearningRate, &HashMap<String, LearningRate>) -> LearningRate, so per-group warmup, decay, or scaling factors live inside the group itself.

GroupOptimizerAdaptor{N}

GroupOptimizerAdaptor2, …3, …4, … (defined up through 6) each take N Vec<OptimizerGroup<B, O_i>> arguments — one vector per kind of optimizer:

use bunsen::burner::optim::GroupOptimizerAdaptor2;

let optimizer = GroupOptimizerAdaptor2::new(
    /* groups of optimizer kind 1: */ vec![adamw_group_a, adamw_group_b],
    /* groups of optimizer kind 2: */ vec![muon_group],
)?;

The adaptor implements Optimizer<M, B>, so it slots into burn::train::Learner exactly where a single optimizer would.

Constructor validation: each ParamId may appear in at most one group across all kinds. A duplicate returns GroupOptimizerError::DuplicateParamId with the conflicting positions.

The pattern

1. Build an XmlModuleTree over the live module.
2. Use XPath to extract disjoint HashSet<ParamId>s for each group.
3. Wrap each set in an OptimizerGroup with its optimizer + LR selector.
4. Compose with GroupOptimizerAdaptorN::new(...).
5. Hand the result to Learner.

The disjointness check at step 4 is your guard that nothing is double-counted or accidentally dropped.

Worked example: the NanoChat recipe

The demos/chat/examples/train example trains a NanoChatGpt with two optimizer kinds and four groups — three driven by AdamW, one by Muon. Stripped to the essentials:

use std::collections::HashSet;
use bunsen::{
    burner::{
        module::reflection::XmlModuleTree,
        optim::{GroupOptimizerAdaptor2, OptimizerGroup},
    },
    public::burn::{module::ParamId, optim::LearningRate},
};

let mut mtree = XmlModuleTree::build(&host);

// 1. Carve the model into disjoint parameter sets using XPath.

// 2-D weight matrices inside the transformer block sequence.
let matrix_params: HashSet<ParamId> = mtree
    .select_params("GptHost/GPT/*[@name='h']/Linear/*[@name='weight',@rank=2]")
    .to_param_ids()?
    .into_iter()
    .collect();

let embedding_params: HashSet<ParamId> = mtree
    .select_params("GptHost/GPT/*[@name='wte']")
    .to_param_ids()?
    .into_iter()
    .collect();

let lm_head_params: HashSet<ParamId> = mtree
    .select_params("GptHost/GPT/*[@name='lm_head']")
    .to_param_ids()?
    .into_iter()
    .collect();

// Everything left over (norms, biases, scalars, ...).
let remnant_params: HashSet<ParamId> = mtree
    .param_ids()?
    .into_iter()
    .collect::<HashSet<_>>()
    .difference(&matrix_params).cloned().collect::<HashSet<_>>()
    .difference(&embedding_params).cloned().collect::<HashSet<_>>()
    .difference(&lm_head_params).cloned().collect();

// 2. Build groups: AdamW with three flavours, Muon for matrix params.

let optimizer = GroupOptimizerAdaptor2::new(
    // Kind 1: AdamW, three groups with different LR scales + betas.
    vec![
        OptimizerGroup::from_adaptor(
            lm_head_params,
            &AdamWConfig::new()
                .with_beta_1(0.8).with_beta_2(0.96)
                .with_weight_decay(0.01)
                .init::<B, GptHost<B>>(),
        )
        .with_lr_selector(move |lr: f64, _| lr * lm_head_lr),

        OptimizerGroup::from_adaptor(
            embedding_params,
            &AdamWConfig::new()
                .with_beta_1(0.8).with_beta_2(0.995)
                .with_weight_decay(0.001)
                .init::<B, GptHost<B>>(),
        )
        .with_lr_selector(move |lr, _| lr * embedding_lr),

        OptimizerGroup::from_adaptor(
            remnant_params,
            &AdamWConfig::new()
                .with_beta_1(0.8).with_beta_2(0.96)
                .with_weight_decay(0.01)
                .init::<B, GptHost<B>>(),
        )
        .with_lr_selector(move |lr, _| lr * scalar_lr),
    ],
    // Kind 2: Muon for the 2-D matrices.
    vec![
        OptimizerGroup::from_adaptor(
            matrix_params,
            &MuonConfig::new()
                .with_weight_decay(Some(WeightDecayConfig { penalty }))
                .init::<B, GptHost<B>>(),
        )
        .with_lr_selector(move |lr, _| lr * matrix_lr),
    ],
)?;

// 3. Use exactly as a single Optimizer:
let result = training.launch(Learner::new(host, optimizer, warmup_scheduler));

What this buys you over a hand-rolled solution:

  1. Selection is declarative. Each group’s membership is an XPath expression. Renaming a field elsewhere can’t silently drop parameters from a group — the XPath either still matches or it doesn’t, and the param_ids() cross-check makes the gap obvious.
  2. Disjointness is verified. GroupOptimizerAdaptor::new returns Err(DuplicateParamId) if two groups claim the same parameter, so you can’t accidentally optimize a tensor twice.
  3. Per-group LR is a closure, not a separate scheduler tree. The global learning rate flowing in from burn::train::Learner is handed to every group’s LrSelector, and the group decides how to shape it.

Selecting the right N

GroupOptimizerAdaptor2 if you need two kinds of optimizer (AdamW + Muon, AdamW + SGD-with-momentum, …). The family runs up through GroupOptimizerAdaptor6 — pick the smallest N that fits your kinds. Adding more groups of the same kind doesn’t increase N; that just extends the Vec for that kind.

Building Reusable Modules

When you wrap a burn::Module, burn gives you the basic ingredients: a Config struct, derived via #[derive(Config)], that builds a Module via init(). This is enough for a single self-contained module, but it strains once your modules start composing into larger ones.

Two conventions show up repeatedly in bunsen to manage that strain:

  1. A {Module}Meta trait — a shared introspection API implemented by both the configs and the built module, so anyone holding any of those forms can ask the same structural questions.
  2. A {Module}ContractConfig{Module}StructureConfig split — two configs at two levels of abstraction. The contract describes what the module is for; the structure describes how it’s built.

Neither is required by burn. Both pay off once a module is used inside something else, or once its parameter surface starts evolving faster than its callers want.

The {Module}Meta trait

Why

A parent module that owns a child needs to know structural things about the child at inference time — its embedding dimension, its number of heads, its sequence length. The naive answer is to copy those numbers into the parent’s own fields. That works, but now the same number lives in two places, and updating the child means remembering to update the parent. Worse, configuration values copied into a Module’s state are awkward to keep in sync with the actual tensor shapes that ended up inside.

Solution

Define a trait that exposes the structural questions, and implement it on every form that can answer them: the user-facing config, the lowered structure config (see below), and the built module itself.

pub trait MlpMeta {
    /// Input/output embedding dimension.
    fn embed_dim(&self) -> usize;

    /// Hidden dimension inside the MLP.
    fn hidden_dim(&self) -> usize;
}

Now anything holding an &impl MlpMeta can ask the question, and the answer comes from whichever source actually knows: a field on the config, or a tensor .dims() on the module.

Toy example

use burn::{prelude::*, nn::{Linear, LinearConfig}};

pub trait MlpMeta {
    fn embed_dim(&self) -> usize;
    fn hidden_dim(&self) -> usize;
}

#[derive(Config, Debug)]
pub struct MlpConfig {
    pub embed_dim: usize,
    #[config(default = "4")]
    pub expansion_factor: usize,
}

impl MlpMeta for MlpConfig {
    fn embed_dim(&self) -> usize { self.embed_dim }
    fn hidden_dim(&self) -> usize { self.expansion_factor * self.embed_dim }
}

#[derive(Module, Debug)]
pub struct Mlp<B: Backend> {
    in_proj: Linear<B>,
    out_proj: Linear<B>,
}

impl<B: Backend> MlpMeta for Mlp<B> {
    fn embed_dim(&self) -> usize {
        // Derived from the live tensor shape — no cached field.
        self.in_proj.weight.dims()[0]
    }
    fn hidden_dim(&self) -> usize {
        self.in_proj.weight.dims()[1]
    }
}

A parent that contains an Mlp can now read mlp.embed_dim() directly from the live module, instead of carrying its own mlp_embed_dim: usize field. The metadata stays in one place per form, and the forms agree by construction.

Where this shows up

  • NanoChatGptMeta is implemented by three types: NanoChatGptConfig (user-facing knobs), NanoChatGptStructureConfig (lowered per-layer configs), and NanoChatGpt<B> (the built module reading dims from its actual layers). All three answer the same questions about n_embed, n_head, head_dim, n_layer, and so on.
  • ResidualBlockMeta is implemented on the structure config and the built block. A ResNet model that holds a Vec<ResidualBlock<B>> can call block.output_resolution([h, w]) to walk the resolution through the network from the live modules, with no separate “shape table” alongside.

Contract → Structure config split

Why

A module’s parameter list grows in two unrelated directions:

  • Intent-level. “How do I describe this thing at the level of what it’s for?” — embed_dim, n_layer, vocab_size. These are the knobs a user actually wants to set when they say “give me a 12-layer GPT”. Short list, stable shape, evolves slowly.
  • Implementation-level. “What does init need to wire?” — explicit per-layer LinearConfigs, EmbeddingConfigs, RotaryEmbeddingConfigs, normalization choices per sub-block. Long list, evolves with the implementation.

Cramming both into one Config makes it tedious to instantiate (the user has to fill in fields they don’t care about) and hard to evolve (every implementation change is an API break for callers who only wanted to say “12 layers please”).

Solution

Split the config in two:

  • {Module}ContractConfig — the intent-level description.
  • {Module}StructureConfig — the implementation parameter list, one field per sub-module config.
  • A to_structure() / into_structure() method on the contract that produces the structure config.
  • The init that builds the module hangs off the structure config.

The contract is small, friendly, and stable. The structure is verbose but maps one-to-one onto the implementation; it’s the natural home for serialization, pretrained-weight loaders, and any code that needs to reason about the actual layers.

Toy example

Continuing the Mlp from above, split the single MlpConfig into a contract and a structure:

#[derive(Config, Debug)]
pub struct MlpContractConfig {
    pub embed_dim: usize,
    #[config(default = "4")]
    pub expansion_factor: usize,
}

impl MlpMeta for MlpContractConfig {
    fn embed_dim(&self) -> usize { self.embed_dim }
    fn hidden_dim(&self) -> usize { self.expansion_factor * self.embed_dim }
}

impl MlpContractConfig {
    /// Lower the contract into a concrete per-layer structure.
    pub fn to_structure(&self) -> MlpStructureConfig {
        MlpStructureConfig {
            in_proj:  LinearConfig::new(self.embed_dim, self.hidden_dim()),
            out_proj: LinearConfig::new(self.hidden_dim(), self.embed_dim),
        }
    }
}

#[derive(Config, Debug)]
pub struct MlpStructureConfig {
    pub in_proj:  LinearConfig,
    pub out_proj: LinearConfig,
}

impl MlpMeta for MlpStructureConfig {
    fn embed_dim(&self) -> usize  { self.in_proj.d_input }
    fn hidden_dim(&self) -> usize { self.in_proj.d_output }
}

impl MlpStructureConfig {
    pub fn init<B: Backend>(self, device: &B::Device) -> Mlp<B> {
        Mlp {
            in_proj:  self.in_proj.init(device),
            out_proj: self.out_proj.init(device),
        }
    }
}

A typical caller stays in contract-land:

let mlp: Mlp<B> = MlpContractConfig::new(768)
    .with_expansion_factor(4)
    .to_structure()
    .init(&device);

…but a power user or a pretrained-weight loader that needs to set per-layer details can drop down to MlpStructureConfig directly.

What the split buys you

  1. Multiple contracts, shared structure. A GatedMlpContractConfig that adds a SiLU gate can lower to a slightly extended structure; you can also have several “kinds” of contract sharing the same MlpStructureConfig family. The user picks the contract that matches their intent; the implementation only knows about structures.
  2. Prefabs live on the contract. Named presets (“resnet18”, “resnet50”) are small, intent-level descriptions and naturally fit as ContractConfig constructors. The big, verbose StructureConfig doesn’t need a constructor per preset.
  3. Stable user API across implementation churn. Adding a new sub-module to the implementation extends StructureConfig without touching ContractConfig. Callers who only ever wrote MlpContractConfig::new(768) don’t notice.
  4. A natural seam for tooling. Serialization, weight loading, and inspection tools work against StructureConfig, where the layers are spelled out. Documentation and tutorials work against ContractConfig, where the surface stays small.

Where this shows up

  • ResNetContractConfig / ResNetStructureConfig. The contract says “a ResNet with these block counts, optionally bottlenecked”; the structure spells out the stem, layer blocks, and head. PREFAB_RESNET_MAP ships ContractConfig builders for the standard variants (“resnet18”, “resnet50”, …).
  • ResidualBlockContractConfig / ResidualBlockStructureConfig. The contract describes “downsample input, use a bottleneck policy”; the structure is an enum that dispatches to either a BasicBlock or a BottleneckBlock. Two different concrete implementations sit behind one contract.
  • NanoChatGptConfig / NanoChatGptStructureConfig. The contract names embedding width, head counts, layer count; the structure spells out the embedding, per-block configs, LM head, rotary embedding, and final norm.

When to reach for each

  • Always implement {Module}Meta if anything else (a parent module, a builder, a test) is going to ask structural questions about this module at runtime. The cost is one trait and a few one-line methods; the payoff is no duplicated metadata.
  • Reach for the Contract → Structure split when:
    • the user-facing knobs differ in number or shape from the implementation parameters,
    • you anticipate multiple intent-level “kinds” of this module sharing one implementation (or one kind backed by multiple implementations, as with ResidualBlock),
    • you want a clean place to land prefab / preset constructors and weight-loader hooks.
  • Skip the split for tiny modules whose user-facing config is the implementation. Both conventions exist to manage growth; don’t pay their cost up front.

Contributing

Contributions are welcome — this library exists to be a community standard library, and that means it needs the community.

The short version:

  1. Open an issue describing what you want to change, especially for new components or breaking API tweaks. For small fixes you can skip straight to a PR.
  2. Build and test the workspace locally (see Development Setup).
  3. Match the existing Style and Conventions.
  4. Open a pull request against main. Opening a PR is taken as agreement to the project’s dual MIT / Apache-2.0 license, as described in CONTRIBUTING.md.

Where to talk

Code of conduct

Be respectful, assume good faith, and keep technical disagreements technical.

Development Setup

Prerequisites

  • The Rust toolchain pinned in rust-toolchain.toml.
  • cargo-make for the project’s task runner.
  • For working on the book: mdbook, mdbook-mermaid, mdbook-katex, mdbook-linkcheck.
cargo install cargo-make
cargo install mdbook mdbook-mermaid mdbook-katex mdbook-linkcheck

Common tasks

# Default: fix + ci (format, clippy, tests).
cargo make

# Targeted tasks:
cargo make test
cargo make clippy
cargo make format

# Book tasks (see "Add cargo-make tasks" in the project Makefile.toml):
cargo make book          # build the book
cargo make book-serve    # build + watch + serve on localhost
cargo make book-check    # run mdbook-linkcheck

Working on the book

The book lives in book/. Source files are Markdown under book/src/, and the output is written to book/book/ (ignored by git).

cd book
mdbook serve --open

Style and Conventions

Rust

  • Formatting is enforced by rustfmt (nightly toolchain, settings live in rustfmt.toml). Run cargo make format before committing.
  • Lints are enforced by cargo clippy --no-deps with workspace-level warnings = "deny". Run cargo make clippy.
  • Public items need rustdoc. Module-level docs should explain why the module exists, not just what it contains.
  • Avoid leaking anyhow::Error across crate boundaries; prefer crate-local error types.

Tests

  • Bare assert_eq! on floats is almost always wrong; prefer the approximate-equality helpers in the underlying tensor framework.
  • New public APIs ship with at least one test.

Documentation

  • The Book uses Markdown with the mermaid and KaTeX preprocessors.
  • Prefer prose that explains the motivation and links out to API docs for the signatures. The book is not a substitute for rustdoc.
  • Cross-reference within the book using relative links so mdbook-linkcheck can validate them.

Releasing

bunsen’s major/minor version tracks the burn release it builds against.

TODO: document the release checklist (bump version, update changelog, tag, publish to crates.io, deploy book).

License

bunsen is distributed under the terms of both the MIT license and the Apache License (Version 2.0). See LICENSE-APACHE and LICENSE-MIT for details.

Opening a pull request is assumed to signal agreement with these licensing terms.