Personal ProjectN/ABerlin, Germany

Humanoid Locomotion RL in MuJoCo

A physics-based humanoid simulation and reinforcement-learning training pipeline for bipedal locomotion, built on MuJoCo, Gymnasium, and Stable-Baselines3. It runs on both Apple Silicon and Linux, and spans everything from classical control baselines to deep-RL policies that learn to walk from scratch.

What It Does

Classical control baselines: PD-controlled walking and running gaits (no learning) to validate the model and generate reference motion.
Deep RL training: PPO, SAC, and TD3 policies trained toward stable locomotion, with configurable reward shaping around foot contact, upright posture, and forward velocity.
Curriculum & minimal mocap: staged difficulty and small amounts of motion-capture data to bootstrap a gait before optimizing for speed.

Why It's Interesting

Humanoid locomotion is a proving ground for reward design — and most of the real lessons come from failure. This project documents them honestly: policies that learn to cheat rather than walk, silent normalization bugs that hid an already-working policy, and a negative result on using LLMs to iterate reward functions.

Deep-dives on the build:

Stack

MuJoCo · Python · Gymnasium · Stable-Baselines3 (PPO / SAC / TD3) · NumPy

Back to Portfolio

Follow Me