Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Image Blocks

bunsen::blocks::images collects the building blocks used by 2-D vision models — convolutional composites, patch tokenization, same-padding pooling, and stochastic-depth-style regularization layers.

API: https://docs.rs/bunsen/latest/bunsen/blocks/images/

Conv composites

The conv submodule packages the conv-plus-something composites that show up across ResNet, EfficientNet, and friends.

ConvNorm2d

ConvNorm2d is the standard Conv2d + BatchNorm pairing with a single forward. Beyond the convenience of one module instead of two, it carries zero_init_norm() — the “zero-initialize the last batch norm in a residual branch” trick used by ResNet and successors to make residual branches start as identities.

CNA2d

CNA2d is the more general Conv / Norm / Activation block. Beyond the basic forward, it provides:

  • match_norm_features() — adapts a generic NormalizationConfig (BatchNorm::new(0), RmsNorm::new(0), etc.) to the right channel count after the conv. Lets callers pass a norm config without knowing the channel count yet.
  • map_forward(f) — runs the conv and norm, then hands the intermediate tensor to a user closure before the activation. Useful for inserting attention, channel reweighting, or per-residual side-effects without copying out the rest of the block.

Patching

patching holds patch tokenization, the entry point for transformer-style vision models.

PatchEmbed

PatchEmbed is ViT-style patch tokenization: it takes an image, splits it into non-overlapping patches, and projects each patch into an embedding, producing a sequence of tokens of width embed_dim.

Pooling

pool holds the pooling layers that don’t fit burn’s defaults.

AvgPool2dSame

AvgPool2dSame is TensorFlow-style same padding for average pooling — asymmetric where needed to keep the spatial dimensions of the output aligned with ceil(input / stride). Helpers get_same_padding and pad_same are exposed for the underlying arithmetic, useful when you’re matching a Keras / TF-Slim reference implementation.

Stochastic regularization

drop collects regularization layers that drop structured pieces of the activations rather than individual scalars.

DropBlock

DropBlock is structured spatial dropout from Ghiasi et al., 2018: instead of dropping independent pixels, it drops contiguous blocks of activations. For convnets this acts as a substantially stronger regularizer than plain dropout, because adjacent pixels are highly correlated and independent dropout barely removes information.

DropPath

DropPath is stochastic depth from Huang et al., 2016: with some probability, the entire residual branch is zeroed for a given sample, so the network sees a shorter effective depth on each training step.

Supporting types

  • progressive_dpr — rate table that linearly ramps drop rates over a network’s depth, matching the SWIN V2 / timm convention of giving deeper blocks higher drop probabilities.
  • SizeConfig — a small enum describing a size as either Default, a Ratio(f64), or a Fixed(usize). Used by the drop layers when the effective region size is relative to a wrapped layer’s spatial dimensions.