CN111230875B

CN111230875B - Double-arm robot humanoid operation planning method based on deep learning

Info

Publication number: CN111230875B
Application number: CN202010081726.8A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Beijing Fanchuan Intelligent Robot Technology Co ltd
Current assignee: Beijing Fanchuan Intelligent Robot Technology Co ltd
Priority date: 2020-02-06
Filing date: 2020-02-06
Publication date: 2023-05-12
Anticipated expiration: 2040-02-06
Also published as: CN111230875A

Abstract

The invention provides a double-arm robot humanoid operation planning method based on deep learning, which comprises the following steps: step S1, training a strategy network; the strategy network can calculate the probability of the mechanical arm in each state at the next moment according to the current state and the target state of the mechanical arm, and give out various motion schemes; s2, determining an evaluation function; the evaluation function is used for evaluating the motion scheme provided by the strategy network; and step S3, performing tree search by combining the strategy network and the evaluation function to obtain a result of the path planning of the mechanical arm. According to the invention, the human motion sample is used as a guide, and the human motion sample is selected or changed by combining with the specific environment of the current robot, so that the double-arm robot can perform humanoid operation planning.

Description

Double-arm robot humanoid operation planning method based on deep learning

Technical Field

The invention relates to a robot mechanical arm operation planning method, in particular to a double-arm robot humanoid operation planning method based on deep learning, and belongs to the technical field of robots.

Background

In dangerous complex working environments such as anti-riot, battlefield combat, harmful gas and nuclear radiation of armed police, the main consideration of path planning is how a robot can move from an initial state to a target state without collision, and the technical level of path planning is an important index for measuring the intelligent degree of the robot. Because of the strong learning ability of human beings, a large number of exercise action samples are accumulated through continuous exercise learning and adaptation training of different environments. When the motion task similar to the previous motion task is executed, according to the motion sample, the required motion can be quickly and effectively completed by slightly changing according to the actual environment.

The existing path planning methods mainly include path planning algorithms based on graphs, such as an a-x algorithm, a visual method, an artificial potential field algorithm, a probability road graph algorithm (PRM) based on random sampling, a rapid spread random tree algorithm (RRT), and the like.

The dimension of the mechanical arm is higher than that of the mobile robot, the traditional path planning algorithm has the problems of large calculated amount, difficult description of obstacles, incomplete algorithm, non-optimal algorithm and the like, the planning algorithm based on sampling satisfies the problem of complete probability, but the planning result is possibly different each time due to the fact that certain randomness is introduced in sampling, and the problem that the planning result cannot be prejudged exists.

Based on this problem, there is a need in the art for a method for performing humanoid work planning by using a human motion sample as a guide, and on the other hand, selecting or changing the human motion sample in combination with a specific environment in which the current robot is located, thereby implementing a double-arm robot.

Disclosure of Invention

The invention aims to provide a motion planning method which can take human motion actions as guidance of robot motion planning and can consider the specific environment of the robot at present.

The technical scheme of the invention is as follows.

The first aspect of the invention provides a double-arm robot humanoid operation planning method based on deep learning, which comprises the following steps:

step S1, training a strategy network; the strategy network can calculate the probability of each state of the mechanical arm at the next moment according to the current state and the target state of the mechanical arm, and give out various motion schemes

S2, determining an evaluation function; the evaluation function is used for evaluating the motion scheme provided by the strategy network;

and step S3, performing tree search by combining the strategy network and the evaluation function to obtain a result of the path planning of the mechanical arm.

Preferably, the policy network employs human locomotor action samples.

Preferably, the human locomotion activity sample is obtained by a motion capture device.

Preferably, the evaluation function is used to evaluate the cost of moving the robot arm from the current state to the states at the next moment.

Preferably, the evaluation function is further used for evaluating whether the mechanical arm collides with itself and the surrounding environment during the movement.

Preferably, the tree search treats each motion state of the mechanical arm as one node of a tree; and performing multi-step expansion on the motion of the mechanical arm through the strategy network to obtain a plurality of nodes of the tree, evaluating the expanded leaf nodes according to the evaluation function, returning and updating the values of the root nodes from the leaf nodes, and finally selecting the optimal node according to the values of the nodes as a mechanical arm path planning result.

Preferably, the step S3 further includes:

step S31, selecting; according to possible human motion actions obtained by the strategy network, selecting an action q with larger probability;

step S32, an expanding step; after the action q, continuing to utilize the strategy network to select the child node with larger probability to expand the tree to obtain a plurality of subtrees, and stopping after expanding a certain step number;

step S33, an evaluation step; evaluating each node by using an evaluation function from the leaf node of the tree of which expansion is completed;

step S34, returning the step; returning the result of the evaluation function from the leaf node to the root node of the tree, wherein the evaluation value of each leaf node is updated along with the evaluation value of the returned child node; after all the backpasses are completed, the final evaluation value of the node q selected initially is obtained, so that the node with the highest evaluation value is selected as the next movement of the mechanical arm.

A second aspect of the present invention provides a two-arm robot humanoid job planning system, comprising:

a policy network training device; the strategy network can calculate the probability of each state of the mechanical arm at the next moment according to the current state and the target state of the mechanical arm, and give out various motion schemes

An evaluation function determination means; the evaluation function is used for evaluating the movement scheme provided by the strategy network

Tree searching means; the tree search device can perform tree search by combining the strategy network and the evaluation function to obtain a result of the path planning of the mechanical arm.

Preferably, the policy network employs human motion action samples; the evaluation function is used for evaluating the motion scheme provided by the strategy network; the tree search treats each motion state of the mechanical arm as one node of a tree; and performing multi-step expansion on the motion of the mechanical arm through the strategy network to obtain a plurality of nodes of the tree, evaluating the expanded leaf nodes according to the evaluation function, returning and updating the values of the root nodes from the leaf nodes, and finally selecting the optimal node according to the values of the nodes as a mechanical arm path planning result.

A third aspect of the present invention provides a two-arm robot comprising two substantially symmetrical mechanical arms, characterized in that the two-arm robot performs path planning for the operation of the two substantially symmetrical mechanical arms according to the deep learning-based two-arm robot humanoid operation planning method according to any one of the first aspects of the present invention.

By the technical scheme, the invention can obtain the following technical effects.

(1) The invention combines a strategy network based on deep learning, an evaluation function based on a motion environment and tree search to form a novel robot motion planning method.

(2) When the strategy network is trained, human motion is taken as a sample, the characteristics of human motion are used for reference, and the humanoid characteristics of operation planning of the double-arm robot are realized.

Drawings

Fig. 1 is a schematic diagram of a two-arm robot humanoid work planning method based on deep learning of the present invention.

Detailed Description

Example 1

In fig. 1, embodiment 1 of the present invention provides a two-arm robot humanoid job planning method based on deep learning, which includes the following steps:

In a preferred embodiment, the policy network employs human locomotor action samples.

In a preferred embodiment, the human locomotion activity sample is obtained by a motion capture device.

In a preferred embodiment, the evaluation function is used to evaluate the cost of moving the robot arm from the current state to the states at the next moment.

In a preferred embodiment, the evaluation function is also used to evaluate whether the robot arm collides with itself and the surrounding environment during the movement.

In a preferred embodiment, the tree search treats each motion state of the robotic arm as a node of a tree; and performing multi-step expansion on the motion of the mechanical arm through the strategy network to obtain a plurality of nodes of the tree, evaluating the expanded leaf nodes according to the evaluation function, returning and updating the values of the root nodes from the leaf nodes, and finally selecting the optimal node according to the values of the nodes as a mechanical arm path planning result.

In a preferred embodiment, the step S3 further includes:

Example 2

The embodiment 2 of the invention provides a double-arm robot humanoid operation planning system, which comprises:

In a preferred embodiment, the policy network employs human locomotor action samples; the evaluation function is used for evaluating the motion scheme provided by the strategy network; the tree search treats each motion state of the mechanical arm as one node of a tree; and performing multi-step expansion on the motion of the mechanical arm through the strategy network to obtain a plurality of nodes of the tree, evaluating the expanded leaf nodes according to the evaluation function, returning and updating the values of the root nodes from the leaf nodes, and finally selecting the optimal node according to the values of the nodes as a mechanical arm path planning result.

Example 3

Embodiment 3 of the present invention provides a two-arm robot, which comprises two substantially symmetrical mechanical arms, wherein the two-arm robot performs path planning on the operations of the two substantially symmetrical mechanical arms according to the two-arm robot humanoid operation planning method based on deep learning according to any one of embodiment 1 of the present invention.

Claims

1. A double-arm robot humanoid operation planning method based on deep learning comprises the following steps:

step S1, training a strategy network; the strategy network can calculate the probability of the mechanical arm in each state at the next moment according to the current state and the target state of the mechanical arm, and give out various motion schemes;

step S3, performing tree search by combining the strategy network and the evaluation function to obtain a result of mechanical arm path planning;

the method is characterized in that the strategy network adopts a human motion action sample;

the evaluation function is used for evaluating whether the mechanical arm collides with the mechanical arm and the surrounding environment in the movement process;

the tree search treats each motion state of the mechanical arm as one node of a tree; and performing multi-step expansion on the motion of the mechanical arm through the strategy network to obtain a plurality of nodes of the tree, evaluating the expanded leaf nodes according to the evaluation function, returning and updating the values of the root nodes from the leaf nodes, and finally selecting the optimal node according to the values of the nodes as a mechanical arm path planning result.

2. The deep learning-based two-arm robot humanoid job planning method of claim 1, wherein the human motion samples are acquired through a motion capture device.

3. The deep learning-based two-arm robot humanoid job planning method of claim 1, wherein the evaluation function is used for evaluating the cost of each state of the mechanical arm from the current state to the next moment.

4. The method for planning a humanoid job of a double-arm robot based on deep learning according to claim 1, wherein the step S3 further comprises:

5. A two-arm robotic humanoid job planning system comprising:

a policy network training device; the strategy network can calculate the probability of the mechanical arm in each state at the next moment according to the current state and the target state of the mechanical arm, and give out various motion schemes;

an evaluation function determination means; the evaluation function is used for evaluating the motion scheme provided by the strategy network;

tree searching means; the tree searching device can be used for executing tree searching by combining the strategy network and the evaluation function to obtain a result of mechanical arm path planning;

the strategy network adopts a human motion action sample; the tree search treats each motion state of the mechanical arm as one node of a tree; performing multi-step expansion on the motion of the mechanical arm through the strategy network to obtain a plurality of nodes of a tree, evaluating the expanded leaf nodes according to the evaluation function, returning and updating the values of all the root nodes from the leaf nodes, and finally selecting an optimal node according to the values of all the nodes as a mechanical arm path planning result;

the evaluation function is also used for evaluating whether the mechanical arm collides with the mechanical arm and the surrounding environment during the movement process.

6. A two-arm robot comprising two substantially symmetrical robotic arms, wherein the two-arm robot routes the work of the two substantially symmetrical robotic arms according to the deep learning based two-arm robot humanoid work planning method of any one of claims 1-4.