Back to projects

fastnn

High-performance Rust neural network runtime with AOT compilation and Python bindings

Completed2026AI/ML
Commits846
Releases10
Stars4
LicenseMIT

A Rust neural-network runtime and training library with Python bindings. Provides an eager tensor and autograd API alongside an ahead-of-time (AOT) graph compiler for experiments with compiler passes, arena memory planning, weight quantisation, and backend dispatch.

Eager Mode

Full tensor operations with automatic differentiation, neural network modules (linear, convolution, normalisation, activation), optimisers (AdamW, Muon, Lion), data loaders, and training callbacks. Models can be trained entirely in Rust or through the Python bindings.

AOT Graph Compiler

Models (imported via ONNX or built through the Rust API) are parsed into a ComputeGraph intermediate representation. The compiler runs shape and type inference, operator fusion (e.g. Conv2D + Add + ReLU into a single kernel), and arena memory planning that maps all tensor lifetimes to a single static buffer — guaranteeing zero dynamic allocation during inference.

Quantisation and Backends

Weight quantisation to U4/U8 formats with per-channel scaling factors is integrated into the compilation pass. The CPU backend dispatches to AVX-512, AVX2, and ARM NEON SIMD instruction paths. An optional WGPU backend enables GPU execution across Vulkan, Metal, and DX12. Some quantised inference paths and backend routes remain experimental.

Tech Stack
neural-networkscompilersimdembeddedwgpuaotquantization