Deep RL-based attitude control for quadrotors deployed on bare-metal MCUs
Investigates replacing a quadrotor's classical PID attitude controller with a deep Reinforcement Learning policy deployed on bare-metal microcontrollers.
A privileged teacher trained in MuJoCo with access to hidden physics states (exact mass, wind vectors, centre of mass) is distilled into a compact recurrent student network that relies only on deployable sensor inputs. This bridges the sim-to-real gap while keeping the inference model small enough for MCU deployment.
The student's input vector is engineered for robust disturbance rejection. Linear acceleration from the IMU provides instantaneous payload-drop detection before altitude loss begins. A 200 ms dilated history of past motor commands lets the network model actuator lag and aerodynamic delay. Integral error feedback eliminates steady-state drift from permanent mass changes.
The student model is deployed using fastnn, a custom #![no_std] Rust neural network runtime. Q4 weight quantisation and single-cycle MAC instructions on the Cortex-M4F processor achieve 70–90 μs inference latency with approximately 7.5 KB of flash memory — demonstrating that neural flight controllers can operate within the constraints of classical embedded hardware.