Originally published on Medium in December 2025. This is a re-publish with light voice edits and a minor scrub of an absolute filesystem path that appeared in the original code block. Canonical URL points to the Medium original.
By the end of this project you'll have a simulation environment where a differential-drive robot navigates a warehouse cluttered with obstacles. The robot is equipped with a full sensor suite — lidar, RGB camera, depth camera, and IMU — and you'll have trained an RL agent using Proximal Policy Optimization (PPO) that consistently reaches its target.
The complete code is on GitHub: github.com/padawanabhi/pybullet_sim.
If you'd like a 2D primer covering the same ideas with simpler physics, see the 2D companion post — IR-SIM, classical control, and a smaller PPO setup.
Before building anything, a quick install note. If you're on macOS, you might hit a wall: pip install pybullet may throw build errors because pip tries to compile from source rather than using pre-built wheels.
The fix is conda — more reliable for packages with complex compiled dependencies.
conda create -n pybullet_sim python=3.10 conda activate pybullet_sim conda install -c conda-forge pybullet pip install gymnasium stable-baselines3 numpy opencv-python
If you're on Linux or Windows, pip might work fine. If you see errors about "no matching distribution" or failed wheel builds, conda is your friend.
Every robot in PyBullet starts as a URDF file — a universal format for describing robot geometry, joints, and physical properties. Our differential-drive robot is relatively simple: a rectangular body, two powered wheels, and a passive caster wheel for balance.
The key design decisions involve matching real-world proportions. Our robot has a wheelbase of 0.22 m (the distance between left and right wheels) and wheels with a 0.05 m radius. The body weighs 2 kg, with each wheel adding 0.1 kg. These numbers matter because physics simulation relies on realistic mass and inertia.
One important detail: wheel joints are defined as continuous rather than revolute. A revolute joint has angle limits; wheels need to spin freely forever. Continuous joints allow unlimited rotation in both directions.
<?xml version="1.0"?> <robot name="diff_drive_robot"> <!-- Base Link (Robot Body) --> <link name="base_link"> <visual> <geometry><box size="0.3 0.2 0.1"/></geometry> <origin xyz="0 0 0.05" rpy="0 0 0"/> <material name="blue"><color rgba="0.2 0.2 0.8 1"/></material> </visual> <collision> <geometry><box size="0.3 0.2 0.1"/></geometry> <origin xyz="0 0 0.05" rpy="0 0 0"/> </collision> <inertial> <mass value="2.0"/> <origin xyz="0 0 0.05" rpy="0 0 0"/> <inertia ixx="0.01" ixy="0" ixz="0" iyy="0.01" iyz="0" izz="0.02"/> </inertial> </link> <!-- Left Wheel --> <link name="left_wheel"> <visual> <geometry><cylinder radius="0.05" length="0.02"/></geometry> <origin xyz="0 0 0" rpy="1.5708 0 0"/> <material name="black"><color rgba="0.1 0.1 0.1 1"/></material> </visual> <collision> <geometry><cylinder radius="0.05" length="0.02"/></geometry> <origin xyz="0 0 0" rpy="1.5708 0 0"/> </collision> <inertial> <mass value="0.1"/> <origin xyz="0 0 0" rpy="1.5708 0 0"/> <inertia ixx="0.0001" ixy="0" ixz="0" iyy="0.0001" iyz="0" izz="0.0001"/> </inertial> </link> <!-- Right Wheel (mirror of left) --> <link name="right_wheel"> <visual> <geometry><cylinder radius="0.05" length="0.02"/></geometry> <origin xyz="0 0 0" rpy="1.5708 0 0"/> <material name="black"><color rgba="0.1 0.1 0.1 1"/></material> </visual> <collision> <geometry><cylinder radius="0.05" length="0.02"/></geometry> <origin xyz="0 0 0" rpy="1.5708 0 0"/> </collision> <inertial> <mass value="0.1"/> <origin xyz="0 0 0" rpy="1.5708 0 0"/> <inertia ixx="0.0001" ixy="0" ixz="0" iyy="0.0001" iyz="0" izz="0.0001"/> </inertial> </link> <!-- Caster Wheel (passive) --> <link name="caster_wheel"> <visual> <geometry><sphere radius="0.025"/></geometry> <material name="gray"><color rgba="0.5 0.5 0.5 1"/></material> </visual> <collision><geometry><sphere radius="0.025"/></geometry></collision> <inertial> <mass value="0.05"/> <inertia ixx="0.00001" ixy="0" ixz="0" iyy="0.00001" iyz="0" izz="0.00001"/> </inertial> </link> <!-- Wheel joints (continuous = free rotation) --> <joint name="left_wheel_joint" type="continuous"> <parent link="base_link"/><child link="left_wheel"/> <origin xyz="0 0.11 0.05" rpy="0 0 0"/> <axis xyz="0 1 0"/> <dynamics damping="0.1" friction="0.1"/> </joint> <joint name="right_wheel_joint" type="continuous"> <parent link="base_link"/><child link="right_wheel"/> <origin xyz="0 -0.11 0.05" rpy="0 0 0"/> <axis xyz="0 1 0"/> <dynamics damping="0.1" friction="0.1"/> </joint> <!-- Caster joint (fixed for simplicity) --> <joint name="caster_joint" type="fixed"> <parent link="base_link"/><child link="caster_wheel"/> <origin xyz="-0.12 0 0.025" rpy="0 0 0"/> </joint> <!-- Lidar mount --> <link name="lidar_link"> <visual> <geometry><cylinder radius="0.03" length="0.02"/></geometry> <material name="red"><color rgba="0.8 0.2 0.2 1"/></material> </visual> <inertial> <mass value="0.01"/> <inertia ixx="0.00001" ixy="0" ixz="0" iyy="0.00001" iyz="0" izz="0.00001"/> </inertial> </link> <joint name="lidar_joint" type="fixed"> <parent link="base_link"/><child link="lidar_link"/> <origin xyz="0.1 0 0.11" rpy="0 0 0"/> </joint> </robot>
A robot without sensors is just a paperweight with ambition. Our simulation includes four sensor types, each serving a distinct purpose.
Lidar is the primary navigation sensor. Using PyBullet's rayTestBatch(), we cast 36 rays in a 360° sweep around the robot, each with a 5 m range. The batch operation is crucial for performance — individual ray tests would be painfully slow. Each ray returns the distance to the nearest obstacle, giving us a 2D slice of the environment around the robot.
RGB and depth cameras provide visual information. The RGB camera captures what a human would see; the depth camera measures distance per pixel. We use 320×240 resolution with a 60° field of view. Camera positioning took some trial and error — the camera sits above and behind the robot body, looking forward along the robot's heading.
IMU (inertial measurement unit) simulates an accelerometer and gyroscope. Real IMU sensors are noisy, so we add Gaussian noise to make the simulation more realistic. This becomes important when training RL agents — if your simulation is too clean, your trained policy might fail on real hardware.
Odometry computes the robot's velocity from wheel rotations. The differential-drive kinematics are straightforward: linear velocity is the average of both wheel speeds, and angular velocity depends on the difference between them.
PyBullet has built-in visualization for camera feeds that I initially overlooked. Instead of managing a separate matplotlib window (which kept stealing focus), you can enable sensor-buffer previews directly in the PyBullet GUI:
p.configureDebugVisualizer(p.COV_ENABLE_RGB_BUFFER_PREVIEW, 1) p.configureDebugVisualizer(p.COV_ENABLE_DEPTH_BUFFER_PREVIEW, 1) p.configureDebugVisualizer(p.COV_ENABLE_SEGMENTATION_MARK_PREVIEW, 1)
This gives you RGB, depth, and segmentation views in tabs alongside the main simulation window. Much cleaner than fighting with window managers.
Here's where I made my first major mistake. My initial navigation controller was… ambitious. It had stuck detection, position-history tracking, oscillation prevention, and a state machine with more modes than a Swiss Army knife. The result? The robot would stop, spin in circles, and make no progress toward the goal.
The problem was competing behaviours. When multiple subsystems all have opinions about what the robot should do, you get confusion, not coordination.
The fix was radical simplification. I replaced everything with a basic PID controller focused on two things: point the robot at the goal, and move forward. Obstacle avoidance became a simple modification of the desired heading — when lidar detects something close in front, blend the goal direction with an avoidance direction based on which side has more clearance.
# Simplified obstacle-avoidance logic if min_front_distance < safety_distance: avoidance_weight = 1.0 - (min_front_distance / safety_distance) desired_heading = blend(goal_heading, avoidance_heading, avoidance_weight)
The simplified controller reached the goal in 892 steps through a warehouse with 31 obstacles. Sometimes less really is more.
Once basic navigation worked, I added proper path planning. Two algorithms are implemented: A* and RRT.
A* operates on a discretized grid. Lidar scans get converted to an occupancy grid with 0.2 m resolution, and A* finds the optimal path using a Euclidean heuristic. It's fast — about 0.1 ms per planning cycle.
RRT (Rapidly-exploring Random Tree) works in continuous space. It grows a tree by randomly sampling positions and connecting them to the nearest existing node. RRT is slower (around 35 ms) but produces smoother paths in complex environments.
Interestingly, while A* plans faster, RRT led to faster overall navigation in my tests. The paths were more direct, requiring less replanning. This kind of counter-intuitive result is exactly why we run experiments rather than assuming.
Reinforcement-learning frameworks like Stable-Baselines3 expect environments to follow the Gymnasium interface. This means implementing reset(), step(), and defining observation and action spaces.
Observation space is a 43-dimensional vector containing normalized lidar readings (36 values), heading error to goal, linear and angular velocity from odometry, IMU readings, and — critically — distance to goal. That last one seems obvious in hindsight, but I initially forgot it. The agent couldn't tell when it was close to success.
Action space is continuous with two values: linear velocity (0 to 0.5 m/s) and angular velocity (-1.5 to 1.5 rad/s). Linear velocity is clamped to be forward-only. Allowing backward movement would double the complexity without much benefit, and the simpler action space learns faster.
self.action_space = spaces.Box( low=np.array([0.0, -1.5]), high=np.array([0.5, 1.5]), dtype=np.float32 )
Now for the part that kept me up at night. Initial training results were dismal. The robot would move back and forth, spin in circles, or stop just short of the goal. Success rate: maybe 2 out of 5 episodes if I was lucky.
The culprit was a reward function that tried to do too much. I had bonuses for getting close, penalties for moving away, punishment for zero velocity, and harsh collision penalties. All these signals competed with each other, and the agent couldn't figure out what actually mattered.
The fix came through simplification. I stripped the reward function down to essentials.
def step(self, action): v = np.clip(action[0], 0.0, 0.7) # no backward driving w = np.clip(action[1], -2.0, 2.0) self.robot.set_velocity(v, w) for _ in range(int(self.time_step / (1/240))): p.stepSimulation() self.step_count += 1 obs = self._get_obs() info = self._get_info() current_distance = info["distance_to_goal"] progress = self.prev_distance - current_distance self.prev_distance = current_distance self.recent_progress.append(progress) if len(self.recent_progress) > 20: self.recent_progress.pop(0) reward = 0.0 # 1. PRIMARY: progress toward goal reward += progress # 1b. heading alignment when far from goal dtheta_abs = abs(obs[2]) if current_distance > 1.0: reward += 0.1 * (1.0 - dtheta_abs) # 2. goal reached if current_distance < self.goal_threshold: reward += 50.0 # 3. collision penalty if info["collision"]: reward -= 10.0 # 4. time penalty (efficiency) reward -= 0.01 # 5. stuck penalty (only if truly stuck and far from goal) if len(self.recent_progress) >= 20: if abs(sum(self.recent_progress[-20:])) < self.stuck_threshold and current_distance > 1.0: reward -= 0.1 # obstacle proximity (lidar < 0.3 m) min_lidar = float(np.min(obs[6:42])) if min_lidar < 0.3: reward -= 1.0 * (0.3 - min_lidar) # excessive rotation if abs(action[1]) > 2.0: reward -= 0.2 * (abs(action[1]) - 2.0) terminated = bool(current_distance < self.goal_threshold) truncated = bool(self.step_count >= self.max_steps) or bool(info["collision"]) return obs, reward, terminated, truncated, info
The final training configuration used PPO with 4 parallel environments on Apple Silicon (M2). After 1 million timesteps, the results spoke for themselves.
Evaluation results (5 episodes):
============================================================
Running 5 evaluation episodes
============================================================
Building warehouse environment...
Created warehouse with 31 obstacles
--- Episode 1/5 ---
Initial distance to goal: 3.56m
Step 50: dist=2.75m, reward=4.0
Step 100: dist=1.70m, reward=9.4
Step 150: dist=0.66m, reward=11.9
✓ SUCCESS | Steps: 159 | Reward: 61.9 | Final distance: 0.48m | Distance traveled: 3.10m
--- Episode 2/5 ---
Initial distance to goal: 2.72m
✓ SUCCESS | Steps: 120 | Reward: 58.2 | Final distance: 0.49m | Distance traveled: 2.26m
--- Episode 3/5 ---
Initial distance to goal: 5.10m
✓ SUCCESS | Steps: 275 | Reward: 50.3 | Final distance: 0.49m | Distance traveled: 5.17m
--- Episode 4/5 ---
✓ SUCCESS | Steps: 150 | Reward: 57.8 | Final distance: 0.48m
--- Episode 5/5 ---
✓ SUCCESS | Steps: 116 | Reward: 58.9 | Final distance: 0.49m
============================================================
EVALUATION RESULTS
Success rate: 5/5 (100.0%)
Average reward: 57.43 ± 3.83
Average episode length: 164.0 ± 57.9 steps
Average distance: 3.06m ± 1.09m
============================================================
A simulation that runs slowly is a simulation you won't iterate on. Initial performance was rough — under 1 FPS with all visualizations active. The bottleneck was surprising: camera image capture and debug line drawing.
The fix was to update expensive visualizations less frequently. Camera images update every 10 frames instead of every frame. Lidar rays are drawn every 10 frames, and only every tenth ray is shown. Path visualization updates every 10 frames with longer-lasting debug lines.
These changes brought the simulation from 0.8 FPS to over 4 FPS — still not blazing fast, but usable for development. When training headless (no GUI), performance is much better.
What I've shown here is a foundation. There's plenty of room to explore.
A few principles stand out, looking back:
We've traveled a long way from the 2D simplicity of IR-SIM. PyBullet brings real physics, real sensors, and real challenges. But the core principles remain: build incrementally, test constantly, and when things break (they will), simplify before adding complexity.
The complete project, including all training scripts and pre-trained models, is on GitHub. Clone it, break it, improve it.
Next time, maybe sim-to-real transfer — taking the virtual robot and making it control something you can actually touch. But that's a story for another post.