Back to projects

Neural Quadrotor Stabilization

Deep RL-based attitude control for quadrotors deployed on bare-metal MCUs

Research2026AI/MLRobotics
Inference Latency70-90 μs
MCU Footprint~7.5 KB
SimulationMuJoCo

Investigates replacing a quadrotor's classical PID attitude controller with a deep Reinforcement Learning policy deployed on bare-metal microcontrollers.

Teacher-Student Distillation

A privileged teacher trained in MuJoCo with access to hidden physics states (exact mass, wind vectors, centre of mass) is distilled into a compact recurrent student network that relies only on deployable sensor inputs. This bridges the sim-to-real gap while keeping the inference model small enough for MCU deployment.

State Space Design

The student's input vector is engineered for robust disturbance rejection. Linear acceleration from the IMU provides instantaneous payload-drop detection before altitude loss begins. A 200 ms dilated history of past motor commands lets the network model actuator lag and aerodynamic delay. Integral error feedback eliminates steady-state drift from permanent mass changes.

Edge Deployment

The student model is deployed using fastnn, a custom #![no_std] Rust neural network runtime. Q4 weight quantisation and single-cycle MAC instructions on the Cortex-M4F processor achieve 70–90 μs inference latency with approximately 7.5 KB of flash memory — demonstrating that neural flight controllers can operate within the constraints of classical embedded hardware.

Tech Stack
reinforcement-learningquadrotorcontrol-systemsedge-aimujocofastnn