Combining Planning and Diffusion
for Mobility with Unknown Dynamics

In Submission

Yajvan Ravan, Zhutian Yang, Tao Chen,
Tomás Lozano-Pérez, Leslie Kaelbling

MIT CSAIL

Planner-Ordered Policy (PoPi) is an approach for long- horizon robot manipulation problems
in which the dynamics are partially unknown, for example, pushing and pulling a 5-wheeled office chair.

Motivations

Many robot tasks involve finding and following a long-horizon path while interacting with an environment whose dynamics are not known.
For example, a robot arm pushing an object among obstacles on a table or a mobile robot pushing an office chair among furniture are both facing this type of problem.

We seek an approach that
(1) allows the robot to be trained quickly in the real world, without access to a simulator, as the dynamics present a large sim-to-real gap,
(2) generalizes to somewhat different environment.

Planner-Ordered Policy learns a short-horizon, diffusion-based manipulation policy policy via imitation learning and uses a “global” motion planner to define waypoints for the local policy.

Abstract

Manipulation of large objects over long horizons (such as carts in a warehouse) is an essential skill for deployable robotic systems. Large objects require mobile manipulation which involves simultaneous manipulation, navigation, and movement with the object in tow. In many real-world situations, object dynamics are incredibly complex, such as the interaction of an office chair (with a rotating base and five caster wheels) and the ground. We present a hierarchical algorithm for long- horizon robot manipulation problems in which the dynamics are partially unknown. We observe that diffusion-based behavior cloning is highly effective for short-horizon problems with unknown dynamics, so we decompose the problem into an abstract high-level, obstacle-aware motion-planning problem that produces a waypoint sequence. We use a short-horizon, relative-motion diffusion policy to achieve the waypoints in sequence. We train mobile manipulation policies on a Spot robot that has to push and pull an office chair. Our hierarchical ma- nipulation policy performs consistently better, especially when the horizon increases, compared to a diffusion policy trained on long-horizon demonstrations or motion planning assuming a rigidly-attached object (success rate of 8 (versus 0 and 5 respectively) out of 10 runs). Importantly, our learned policy generalizes to new layouts, grasps, chairs, and flooring that induces more friction, without any further training, showing promise for other complex mobile manipulation problems

Planner-Ordered Policy

Experiments

Tasks:

Testing the long-horizon success to move the chair on a 10m trajectory, with 3 turns, around an obstacle
Testing the generalization on an environment with more complex dynamics (friction from carpeted floor) and different obstacle configuration
Testing the generalization with an unseen chair & unseen grasp. Both present novel dynamics not seen in training demonstrations.

Baselines:

RRT

A*

Global Diffusion

Results

We find that PoPi performs consistently better as the horizon increases, compared to a ”global” diffusion policy or motion planning assuming a rigidly-attached object. Importantly, PoPi generalizes to new layouts, grasps, chairs, and even flooring, without any further training.

RRT vs PoPi vs Long-Horizon Diffusion

RRT & PoPi perform well at short-horizons. As the horizon increases from 2m to 6m to 10m, RRT begins failing much more frequently than PoPi. The "Global" Diffusion policy is unable to capture dynamics at all.

Combining Planning and Diffusion
for Mobility with Unknown Dynamics

In Submission

Planner-Ordered Policy (PoPi) is an approach for long- horizon robot manipulation problems
in which the dynamics are partially unknown, for example, pushing and pulling a 5-wheeled office chair.

Motivations

Abstract

Planner-Ordered Policy

Experiments

Results

RRT vs PoPi vs Long-Horizon Diffusion

Testing Generalization

PoPi maintains performance across environments & generalizes zero-shot to new conditions.
It consistently outperforms the baselines.

Supplementary Video

BibTeX

Combining Planning and Diffusion for Mobility with Unknown Dynamics

In Submission

Planner-Ordered Policy (PoPi) is an approach for long- horizon robot manipulation problems in which the dynamics are partially unknown, for example, pushing and pulling a 5-wheeled office chair.

Motivations

Abstract

Planner-Ordered Policy

Experiments

Results

RRT vs PoPi vs Long-Horizon Diffusion

Testing Generalization

PoPi maintains performance across environments & generalizes zero-shot to new conditions. It consistently outperforms the baselines.

Supplementary Video

BibTeX

Combining Planning and Diffusion
for Mobility with Unknown Dynamics

Planner-Ordered Policy (PoPi) is an approach for long- horizon robot manipulation problems
in which the dynamics are partially unknown, for example, pushing and pulling a 5-wheeled office chair.

PoPi maintains performance across environments & generalizes zero-shot to new conditions.
It consistently outperforms the baselines.