Many robot tasks involve finding and following a long-horizon path
while interacting with an environment whose dynamics are
not known.
For example, a robot arm pushing an object
among obstacles on a table or a mobile robot pushing an
office chair among furniture are both facing this type of
problem.
We seek an approach that
(1) allows the robot to be trained quickly in the real world, without access to a simulator, as the dynamics present a large sim-to-real gap,
(2) generalizes to somewhat different environment.
Planner-Ordered Policy learns a short-horizon, diffusion-based
manipulation policy policy via imitation learning and uses a “global” motion planner
to define waypoints for the local policy.
Manipulation of large objects over long horizons (such as carts in a warehouse) is an essential skill for deployable robotic systems. Large objects require mobile manipulation which involves simultaneous manipulation, navigation, and movement with the object in tow. In many real-world situations, object dynamics are incredibly complex, such as the interaction of an office chair (with a rotating base and five caster wheels) and the ground. We present a hierarchical algorithm for long- horizon robot manipulation problems in which the dynamics are partially unknown. We observe that diffusion-based behavior cloning is highly effective for short-horizon problems with unknown dynamics, so we decompose the problem into an abstract high-level, obstacle-aware motion-planning problem that produces a waypoint sequence. We use a short-horizon, relative-motion diffusion policy to achieve the waypoints in sequence. We train mobile manipulation policies on a Spot robot that has to push and pull an office chair. Our hierarchical ma- nipulation policy performs consistently better, especially when the horizon increases, compared to a diffusion policy trained on long-horizon demonstrations or motion planning assuming a rigidly-attached object (success rate of 8 (versus 0 and 5 respectively) out of 10 runs). Importantly, our learned policy generalizes to new layouts, grasps, chairs, and flooring that induces more friction, without any further training, showing promise for other complex mobile manipulation problems
Tasks:
Baselines:
We find that PoPi performs consistently better as the horizon increases, compared to a ”global” diffusion policy or motion planning assuming a rigidly-attached object. Importantly, PoPi generalizes to new layouts, grasps, chairs, and even flooring, without any further training.
RRT & PoPi perform well at short-horizons. As the horizon increases from 2m to 6m to 10m, RRT begins failing much more frequently than PoPi. The "Global" Diffusion policy is unable to capture dynamics at all.
@misc{ravan2024combiningplanningdiffusionmobility,
title={Combining Planning and Diffusion for Mobility with Unknown Dynamics},
author={Yajvan Ravan and Zhutian Yang and Tao Chen and Tomás Lozano-Pérez and Leslie Pack Kaelbling},
year={2024},
eprint={2410.06911},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2410.06911},
}