Recent research achievements in Learning from Demonstration (LfD) demonstrate that the reinforcement learning is effective for the robots to improve its movement skills. The current challenge mainly remains in how to generate new robot motions, which have similar preassigned performance indicator but are different from the demonstrated tasks. To deal with the above issue, this paper proposes a framework to represent the policy and conduct imitation learning and optimization for robot intelligent trajectory planning, based on the improved local weighted regression (iLWR) and policy improvement with path integral by dual perturbation (PI2-DP). Besides, the reward guided weight searching and basis function’s adaptive evolving are performed alternately in two spaces, i.e. the basis function space and the weight space, to deal with the above problem. The alternate learning process constructs a sequence of two-tuples which joins the demonstration task and new one together for motor skill transfer. So that the robot skills can be gradually learnt from similar tasks, and those skills can also correspond the demonstrated tasks to dissimilar tasks in different criterion. Classical via-points trajectory planning experiments are performed with the SCARA manipulator, a 10 DOF planar and the UR robot. These results show that the proposed method is not only feasible but also effective.
|Journal||IEEE Transactions on Neural Networks and Learning Systems|
|Early online date||24 Sep 2020|
|Publication status||Early online - 24 Sep 2020|
- motor skill acquisition
- alternate learning in two spaces (ALTS)
- improved local weighted regression (iLWR)