CN113050640A

CN113050640A - Industrial robot path planning method and system based on generation of countermeasure network

Info

Publication number: CN113050640A
Application number: CN202110289240.8A
Authority: CN
Inventors: 沃天宇; 左易; 郭晓辉; 鲍韦彤; 刘品; 王瑞
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-03-18
Filing date: 2021-03-18
Publication date: 2021-06-29
Anticipated expiration: 2041-03-18
Also published as: CN113050640B

Abstract

The invention realizes an industrial robot path planning method based on generation of a countermeasure network by a method in the field of artificial intelligence processing. The method specifically comprises the following steps: firstly, reading information related to a model of a robot, then analyzing the information content and establishing the robot model, then constructing a simulation environment of the robot, constructing and constructing a track planning environment and a data generation environment in the simulation environment, obtaining a planning result by the track planning environment, finally inputting the planning result into the data generation environment, forming a data set for neural network training together with an external scene, inputting the data set into a machine learning environment applying a neural network method, and outputting the final output result of the machine learning environment through a visual front end, so that the method has better effect on indexes such as success rate and the like compared with the prior art. On the basis, a system architecture suitable for the method is also realized.

Description

Industrial robot path planning method and system based on generation of countermeasure network

Technical Field

The invention relates to the field of artificial intelligence processing, in particular to an industrial robot path planning method and system based on a generation countermeasure network.

Background

Trajectory planning of robots is a key research content in the field of robots. The robot participates in the production activity and must plan a reasonable, kinematical law-conforming and collision-free motion route. With the development of computer technology, the traditional robot path planning algorithm, such as a, RRT, is more and more mature. However, in the existing robot trajectory planning algorithm, when the degree of freedom of the robot is high and the planning environment is relatively complicated, a large amount of time is generally consumed for planning. It is acceptable in a factory environment to perform repetitive work tasks, but conventional algorithms have difficulty meeting the requirements if it is desired to have the robot perform more dynamic tasks. Tasks such as robot collaboration put higher demands on the planning algorithm. Thus, in a real production environment, a robot can only perform a pre-calculated set of actions mechanically and repeatedly, and cannot effectively cooperate with other robots or pipelining workers. The method has the advantages that real-time track planning and obstacle avoidance can be completed in a very short time slice, which cannot be realized from the traditional perspective, but deep learning is realized, so that a new idea is provided for the real-time obstacle avoidance of the robot.

The invention introduces a network structure of generating a countermeasure network (GAN) in a robot trajectory planning task. The structure of the countermeasure network is generated, and the sample conforming to the original sample distribution can be generated by learning the distribution of the sample. Aiming at a robot track planning task, the invention uses a Conditional generation countermeasure network (Conditional GAN) to ensure that the planned track changes along with the change of a planning environment, and uses a spectrum Normalization countermeasure network (Spectral Normalization GAN) to ensure the stability of training.

The first prior art is as follows: the technology shown in the mechanical arm grabbing scattered stacking piston motion planning method based on the improved RRT algorithm of application No. CN201810602059.6, the mobile robot path planning method based on the improved RRT algorithm of application No. CN201810819848.5, and the heuristic RRT mechanical arm motion planning method based on the target deviation optimization of application No. CN 201911346837.0:

robot trajectory planning has always been a hot problem for robotics. The traditional track planning algorithm is based on different ideas, and a plurality of algorithms are provided. The method comprises a sampling-based fast-searching Random Tree (RRT) algorithm, a Probability Roadmap (PRM) algorithm capable of being queried for multiple times, a heuristic A algorithm and other basic algorithms, and an extended algorithm combined according to the basic algorithms. Taking RRT algorithm as an example, the operation flow of the algorithm is to initialize a spanning tree, sample in free space, verify whether sampled points collide with each other by using a physical simulation engine, if not, find the nearest point in the spanning tree, reconnect the nearest point in a certain range, and finally obtain a feasible relatively short path.

The first prior art has the following disadvantages: for modern high-freedom robots, the RRT algorithm works in the high-dimensional configuration space of the robot. In the case of a relatively complex environment, the algorithm RRT requires tens of seconds to minutes to obtain relatively good results. RRT is a probabilistic complete method, and as more points are sampled, the planning result approaches the shortest path. However, the RRT algorithm is not stable, and in the same scene, the RRT algorithm is affected by the randomness of sampling, and the planning time is not stable. The planning time of the traditional algorithm is long, and the moving obstacles cannot be avoided in real time under the condition that the environment is frequently changed. According to the principle of the RRT algorithm, acceleration cannot be performed by a parallel method or the like. Therefore, the conventional method faces a dynamic environment and cannot perform real-time planning.

The second prior art is: the method for planning the obstacle avoidance path of the industrial robot based on machine learning is shown in application number CN201910231812.X, the method for planning the cooperative game path based on a neural network and an artificial potential field is shown in application number CN201810907242.7, and the method for planning the intelligent local path of the unmanned system based on a double-back-propagation neural network is shown in application number CN201910800165. X:

supervised learning has always been the most classical model in machine learning. In the International Conference on Robotics and Automation (ICRA) Conference in 2019, Qureshi et al proposes an MPNet model, which is a model based on supervised learning, and adopts the traditional algorithms such as RRT and the like to generate a large number of tracks, and then carries out supervised learning, namely training a neural network with environment as input and action in the next step as output. This paper also relates to point cloud coding and the like. After long-time training, the neural network can rapidly plan a path under a given scene.

Deep Reinforcement Learning (DRL) is an important solution for the markov decision process. Whereas robot control, or the entire control domain, can be abstracted as MDP procedures. Therefore, how to organically combine deep reinforcement learning and robotics is an important research direction in the field of robotics. The correlation work involves both direct control of the bottom layer and the overall planning of the top layer abstraction. In deep reinforcement learning, the interaction of an intelligent agent and the environment jointly form a Markov decision process. And acquiring information from the environment, inputting the information into a policy network according to the environment information to acquire an action, acquiring feedback of the action and forming a new environment. After a strategy for obtaining high rewards is trained, the intelligent agent can complete the target task. There are many efforts to reinforce the learning of the robot trajectory planning task.

The second prior art has the disadvantages of two aspects: 1. and (3) supervision and learning: the greatest difference between the supervised learning model of the prior art and the model based on generation of a competing network in this context is. Its randomness is only provided by random discard (Dropout), and the planning results for the same scenario are almost identical. When the supervised learning model planning fails, a Hybrid re-planning (Hybrid planning) method is directly used, i.e. a fallback to the conventional algorithm is performed. The success rate of the method cannot be well guaranteed, and the problem of planning failure frequently occurs in actual production. 2. Reinforcement learning: for the problem of robot trajectory planning, the common reinforcement learning algorithm has a plurality of problems. First, the e-greedy exploration strategy commonly used by reinforcement learning algorithms allows a proportion of actions to be explored completely randomly, but this randomness may make feasible solutions extremely difficult to obtain if one wants to plan a path in a confined environment. In the process of reinforcement learning, if the attitude information is used as input, the dimensionality is too low, the neural network is difficult to model a scene, and an overfitting phenomenon is easy to occur.

Therefore, the traditional trajectory planning algorithm has great limitation, needs to consume a certain time for planning in advance, and cannot adapt to a dynamically changing environment. And if a method based on the neural network is used, a feasible path can be obtained more quickly through training, and the application scene of the robot is expanded. In this way, the robot is not limited to performing mechanical and repetitive operations, and has the ability to react quickly in more complex environments, laying the foundation for the direction of human-machine coordination.

Disclosure of Invention

Therefore, the invention firstly provides an industrial robot path planning method based on generation of a countermeasure network, which comprises the following steps: firstly, reading information related to a model of a robot, then analyzing the information content and establishing the robot model, then constructing a simulation environment of the robot, constructing and constructing a track planning environment and a data generation environment in the simulation environment, obtaining a planning result by the track planning environment, finally inputting the planning result into the data generation environment, forming a data set for neural network training together with an external scene, inputting the data set into a machine learning environment applying a neural network method, and outputting the final output result of the machine learning environment through a visual front end;

specifically, the method comprises the following steps: the information related to the robot model comprises parameters of robot joints and rotating shafts, Mesh information of a robot trunk, information such as mass, gravity center, inertia matrix and PID control parameters, and the information adopts a URDF standard storage format and a nested label form to store the connection relation and various parameters of the robot;

the simulation environment of the robot comprises the following tasks: controlling the posture of the robot, rendering a scene, detecting the collision relation between the robot and an obstacle, and interacting with a neural network, wherein the simulation environment is secondarily developed on a pyBullet engine, and the planning content of the simulation environment comprises the following steps: placing obstacles at random positions, designating initial and final coordinates in the moving range of the robot, and detecting whether the robot has self collision;

the data generation environment takes the generation result of the track planning environment and a random scene generated by a traditional method as original data, an informationed RRT algorithm is used for planning, and the obtained result is preprocessed to form a training data set of a neural network under a machine learning environment;

the machine learning environment uses a kinematic network model to calculate data, and then the output trajectory data are aggregated to obtain a generation result of a trajectory planning environment as the final output result;

the training process of the kinematic network model improves the stability of training by using WGAN, and the optimization objective function of the WGAN is as follows:

and using a gradient penalty method, setting the loss function as:

and using a spectral normalization layer to make the model more stable while satisfying the Lipschitz condition, wherein the training process optimization goal comprises: and optimizing generator parameters according to the loss function of the generator, optimizing the generator parameters according to the loss function of the discriminator and optimizing the generator parameters according to the collision loss function output by the kinematic network.

The specific implementation manner of the Informed RRT algorithm is as follows: first, a function is defined, comprising: sampling function, collision function, cost function and extension function, the concrete process is as follows: firstly, randomly generating a robot gesture by using a sampling function, then calling a distance function, and calculating the distance to the maximumPoint x in the near spanning tree_nearestFrom x_nearestStarting direction x_randAdvancing a certain step length to obtain a sampling point x_newAfter sampling, calling a collision function provided by a physical engine, detecting whether collision occurs or not, if no collision occurs, calling a distance function, and calculating the distance at r_RRT*Set X of points in the nearest spanning tree in the range_nearAfter finding this set, x is added to each point in the set_nearPerforming an operation of calculating a piece of the following x by an extension function_nearTo x_newIf the cost of the route is adopted and meets the requirement, the new point can be added into the spanning tree by using the cost function to calculate whether the route has collision or not, and then the local optimization of the spanning tree is carried out in the range if x is_newJust near the end point, a feasible solution is found; the cost function of the Informed RRT algorithm takes the form:

Cost({p₁,q₁},{p₂,q₂})＝||p₁-p₂||₂+λ*(1-<q₁，q₂>²)。

the kinematics network is a neural network algorithm improved on the basis of GAN, the input is the rotating shaft angle of each posture in a planning path, the coordinates of the characteristic points of the robot are obtained through a transformation matrix, the position relation between the robot and the obstacle is further calculated, and the position relation between the robot and the obstacle and the loss function of WGAN are used as two optimization targets of the neural network.

And aggregating the track data, firstly planning for multiple times in parallel to obtain different tracks, sequentially verifying the tracks by using a physical simulation engine, returning a successful result, if the tracks fail for multiple times, constructing a spanning tree by using the sampling points, and continuously searching by using an RRT algorithm to maximize the planning speed and success rate based on the generation countermeasure network model.

The visualization front end comprises distribution, processing and 3D display of tasks.

The invention also provides an industrial robot path planning system based on the generation countermeasure network, which comprises the following components: the three-layer architecture is suitable for the service environment, the robot environment and the machine learning environment of the industrial robot path planning method based on the generation countermeasure network;

specifically, the service environment comprises a visual front end and a rear end connected with the robot environment, the front end generates data interaction with a flash engine at the rear end through an interaction module, and sends data to a 3D display module through the interaction module for distribution, processing and 3D display of characters; the flash engine at the rear end receives a planning result generated by the robot environment through a task management module and sends the planning result to the front end;

the robot environment comprises a data generation part and a track planning part, the track planning part receives a neural network output result of the machine learning environment through a post-processing module, generates a planning result after passing through a path aggregation module, sends the planning result to the rear end, takes the data generation part as one of training set data sources, and takes a planning demand unit as input data of the neural network of the machine learning environment; the data generation part is provided with an RRT trajectory planning unit, receives the planning result, generates data by combining a scene, and forms a training data set of the machine learning environment after passing through a preprocessing unit;

and the machine learning environment combines the training data set with robot sampling data and sends the training data set to a neural network module for operation processing.

The technical effects to be realized by the invention are as follows:

1. a set of open-source, multi-platform and complete robot simulation and display system is set up, a track planning environment is designed, some classical track planning algorithms are realized, some improvements are made for the generation speed, and finally high-quality data can be obtained.

2. A set of GAN models is established and applied to the task of trajectory planning, and the network structure capable of better processing time sequence sequences, resampling, smoothing and other technologies are researched. Network structures such as WGAN-GP, spectrum normalization and the like are researched and used, so that the training is more stable. The GAN model can output different tracks, so that aggregation can be carried out finally, and the indexes such as accuracy and the like are further improved. Finally, appropriate structures and training parameters are searched, and a complete trajectory planning system is formed by combining modules such as visualization and the like.

3. In the neural network, a kinematic network is abstracted according to the forward kinematics principle of the robot, and differentiable coordinate transformation is carried out between each connecting rod of the robot. The robot is sampled and transformed according to the Monte Carlo principle, the collision condition of the robot and the obstacle is calculated, and meanwhile, the collision condition can be propagated reversely to assist in optimizing a generator network. The specific training process is also studied here, suggesting that gradient accumulation techniques can be used to better train the network. The kinematic network also provides features for the GAN arbiter, optimizing the network structure and making more efficient use of the label information in the CGAN.

Drawings

Fig. 1 schematic diagram of RRT algorithm;

FIG. 2 is a general structure of a GAN network;

FIG. 3 is a diagram of a kinematic network architecture;

FIG. 4 is a trace data aggregation flow diagram;

FIG. 5 is a diagram of simulation environment and visualization environment effects;

FIG. 6 System architecture diagram;

Detailed Description

The following is a preferred embodiment of the present invention and is further described with reference to the accompanying drawings, but the present invention is not limited to this embodiment.

The invention provides an industrial robot path planning method based on generation of a countermeasure network. The information of the robot comprises parameters of a robot joint and a rotating shaft, Mesh information of a robot trunk, information of mass, gravity center, inertia matrix, PID control parameters and the like. For the storage of the information, there is a factual standard storage Format called Unified Robot Description Format (URDF), and its bottom layer is a file in XML Format, and the connection relationship and various parameters of the Robot are stored in the form of nested tags. The invention firstly analyzes the content in the robot model and establishes the robot model.

And then building a simulation environment of the robot. The simulation environment needs to accomplish the following tasks: the method comprises the steps of controlling the posture of the robot, rendering a scene, detecting the collision relation between the robot and an obstacle, and interacting with a machine learning module. Therefore, the invention selects secondary development on the pyBullet engine to meet the requirements. The environment is responsible for generating tasks of planning the environment, and specifically comprises the steps of placing obstacles at random positions, appointing initial and ending coordinates in the moving range of the robot, and detecting whether the robot has self collision.

And then performing a data generation task. The machine learning method used in the present invention requires the results obtained by the conventional method as training data. First a largely random scene. And planning by using an RRT algorithm in a random scene to obtain a planning result. And taking the planning environment and the planning result as a data set for neural network training.

The process specifically adopts an inform RRT algorithm, and firstly defines a function, a sampling (Sample) function, a Collision (Collision) function, a Cost (Cost) function and an extension (extended) function. The sampling function is to randomly generate a point x_randIn the robot field, a gesture is randomly generated, a distance function is called, and a point x in a spanning tree with the closest calculation distance is_nearestFrom x_nearest. Starting direction x_randAdvancing a certain step length to obtain a sampling point x_newAfter sampling, a collision function provided by a physical engine is called first to detect whether a collision occurs or not. If no collision occurs, a distance function is called, and the distance is calculated at r_RRT*Set X of points in the nearest spanning tree in the range_near. After finding this set, x is added to each point in the set_nearOperating by first computing a secondary x by an extension function_nearTo x_newIf the cost of the route is adopted, the new point can be added if the cost is in accordance with the requirementGo into the spanning tree and then perform a local optimization (Rewire) of the spanning tree in this range if x_newJust near the endpoint represents a feasible solution. The cost function used in the present invention is as follows, representing the cost between two poses obtained by summing the end position and the end rotation according to a certain ratio.

Cost({p₁，q₁}，{p₂，q₂})＝||p₁-p₂||₂+λ*(1-<q₁，q₂>²)

And establishing a machine learning model, wherein the machine learning model used by the invention is a generation confrontation network model. The specific network structure is shown in fig. 2:

the network is based on a conditional generation countermeasure network (CGAN), and comprises two word networks of a generator and an arbiter. The generator takes random noise and planning environment as input, and planning result as output. The arbiter takes the planning environment and the WGAN generation trajectory or RRT-planned trajectory as inputs, and the decision result as an output. The training of the network can be completed by using the data set generated in the last step. Aiming at the robot track planning task, the invention is correspondingly improved. A new network, a kinematic network, is presented. The kinematic network structure is shown in fig. 3:

the input of the kinematic network is the rotating shaft angle of each attitude in the planned path, the coordinates of the characteristic points of the robot can be obtained through the transformation matrix, and the position relation between the robot and the obstacle can be calculated. The coordinates of the robot feature points are used as the extracted feature input discriminator, so that the generation quality can be effectively improved. The position relation between the robot and the obstacle, and the loss function of the GAN are two optimization targets of the neural network. The model uses Wasserstein GAN (WGAN) to improve the stability of the training. The optimization goal of WGAN is

In WGAN, the arbiter needs to satisfy the Lipschitz condition. To satisfy this condition, the present invention uses a gradient penalty method, so the loss function is finally set as:

since the discriminator needs to satisfy the Lipschitz condition, a Batch Normalization (Batch Normalization) layer cannot be used, and the weight cannot be limited within a reasonable range. Therefore, the invention also uses a spectrum Normalization layer to make the model more stable while satisfying the Lipschitz condition.

And then training the model, and training the network by the training data set obtained in the previous step. The optimization targets comprise three, generator parameters are optimized according to the loss function of the generator, the generator parameters are optimized according to the loss function of the discriminator, and the generator parameters are optimized according to the collision loss function output by the kinematic network. And adjusting the training frequency proportion and the learning rate of each optimization target to obtain a trained generated countermeasure network capable of planning the track.

Because of the adoption of the structure of generating the countermeasure network, the output tracks need to be aggregated. That is, if such a neural network is used for supervised learning, only the same result is output for the same scene. The generated countermeasure network can change along with the change of the random variable, and if the result can be mixed and used for many times through multiple times of planning, the final planning success rate is improved. Thus, the data aggregation flow of fig. 4 is employed.

I.e. first planning several times in parallel to obtain different trajectories. And (3) verifying the tracks in sequence by using a physical simulation engine, returning a successful result, if the tracks fail for multiple times, constructing a spanning tree by using the sampling points, and continuously searching by using an RRT algorithm to maximize the planning speed and the success rate based on the generated countermeasure network model.

And constructing a front-end visualization system based on a browser, wherein the front-end visualization system comprises task distribution, processing and 3D display. And providing network service in the form of backend service by the generated countermeasure network, and finally obtaining the efficient real-time trajectory planning system. Fig. 5 is a presentation of the results:

the interaction can be carried out with a trajectory planning model based on the generated countermeasure network through the front end of the browser, and the planning result can be rapidly obtained.

The final system architecture diagram is shown in fig. 6.

Claims

1. An industrial robot path planning method based on a generation countermeasure network is characterized in that: the method comprises the following steps: firstly, reading information related to a model of a robot, then analyzing the information content and establishing the robot model, then constructing a simulation environment of the robot, constructing and constructing a track planning environment and a data generation environment in the simulation environment, obtaining a planning result by the track planning environment, finally inputting the planning result into the data generation environment, forming a data set for neural network training together with an external scene, inputting the data set into a machine learning environment applying a neural network method, and outputting the final output result of the machine learning environment through a visual front end;

and using a gradient penalty method, setting the loss function as:

2. An industrial robot path planning method based on generation of a countermeasure network according to claim 1, characterized in that: the specific implementation manner of the Informed RRT algorithm is as follows: first, a function is defined, comprising: sampling function, collision function, cost function and extension function, the concrete process is as follows: firstly, randomly generating a robot gesture by using a sampling function, then calling a distance function, and calculating a point x in a spanning tree with the closest distance_nearestFrom x_nearestStarting direction x_randAdvancing a certain step length to obtain a sampling point x_newAfter sampling, calling a collision function provided by a physical engine, detecting whether collision occurs or not, if no collision occurs, calling a distance function, and calculating the distance

Set X of points in the nearest spanning tree in the range_nearAfter finding this set, x is added to each point in the set_nearPerforming an operation of calculating a piece of the following x by an extension function_nearTo x_newIf the cost of the route is adopted and meets the requirement, the new point can be added into the spanning tree by using the cost function to calculate whether the route has collision or not, and then the local optimization of the spanning tree is carried out in the range if x is_newJust near the end point, a feasible solution is found; the cost function of the Informed RRT algorithm takes the form:

Cost({p₁，q₁}，{p₂，q₂})＝||p₁-p₂||₂+λ*(1-<q₁，q₂>²)。

3. an industrial robot path planning method based on generation of a countermeasure network according to claim 2, characterized in that: the kinematics network is a neural network algorithm improved on the basis of GAN, the input is the rotating shaft angle of each posture in a planning path, the coordinates of the characteristic points of the robot are obtained through a transformation matrix, the position relation between the robot and the obstacle is further calculated, and the position relation between the robot and the obstacle and the loss function of WGAN are used as two optimization targets of the neural network.

4. An industrial robot path planning method based on generation of a countermeasure network according to claim 3, characterized in that: and aggregating the track data, firstly planning for multiple times in parallel to obtain different tracks, sequentially verifying the tracks by using a physical simulation engine, returning a successful result, if the tracks fail for multiple times, constructing a spanning tree by using the sampling points, and continuously searching by using an RRT algorithm to maximize the planning speed and success rate based on the generation countermeasure network model.

5. An industrial robot path planning method based on generation of a countermeasure network according to claim 4, characterized in that: the visualization front end comprises distribution, processing and 3D display of tasks.

6. An industrial robot path planning system based on generation of a countermeasure network, characterized in that: matching an industrial robot path planning method based on generation of a countermeasure network according to any one of claims 1-5, dividing the system into three-layer architecture of a service environment, a robot environment and a machine learning environment;