CN116330283A

CN116330283A - Method for grabbing target object by mechanical arm in dense scene

Info

Publication number: CN116330283A
Application number: CN202310301614.2A
Authority: CN
Inventors: 沈捷; 曹恺; 李鑫; 王莉; 张盛
Original assignee: Nanjing Tech University
Current assignee: Nanjing Tech University
Priority date: 2023-03-24
Filing date: 2023-03-24
Publication date: 2023-06-27

Abstract

The invention discloses a method for grabbing a target object by a mechanical arm in a dense scene, which comprises the following steps: setting up a scene similar to a real environment through V-rep dynamic simulation software; constructing a pushing and grabbing collaborative strategy network of an encoder-decoder structure in a simulation environment, wherein the whole structure consists of a dense convolutional neural network DenseNet121 and two parallel full convolutional neural networks; building a real densely stacked complex scene, transplanting a trained pushing and grabbing cooperative network of a simulation end to a physical platform, and performing action decision grabbing target objects by a mechanical arm in a real environment. According to the method, a deep reinforcement learning mode is adopted to continuously try actions in a working environment to obtain a reward value to train, push and grasp a cooperative strategy network, the strategy network autonomously decides an optimal action according to the shielding state of a target object, an effective action sequence is formed to complete a target grasping task, and the grasping success rate of the target object in a dense scene is improved.

Description

Method for grabbing target object by mechanical arm in dense scene

Technical Field

The invention relates to the technical field of application of robot technology, in particular to a method for grabbing a target object by a mechanical arm in a dense scene.

Background

With the industrialization of artificial intelligence, the role of the mechanical arm is more important, and the mechanical arm is an essential component for completing complex tasks. By its very nature, the robotic arm provides a core function "grip". The traditional teaching method is simple and easy to operate, but the mechanical arm is limited to act in a highly repeated environment and does not have self-adaptive grabbing capability. The method for detecting the deep learning target is used for detecting the position of the target object in the two-dimensional space, then a corresponding point-based method, a template-based method and a voting-based method are adopted for predicting the actual three-dimensional space pose of the target object, and then the mechanical arm is used for grabbing the object at any position, but the methods need to know the object model in advance, and the object grabbing can only be carried out under the structured simple environment. The deep reinforcement learning combines the perception capability of the deep learning and the decision capability of the reinforcement learning to generate a decision network suitable for the grabbing task of the target object, and the action approaching to the optimal decision is obtained directly according to the input image calculation, so that the grabbing success rate of the object in different scenes is improved to a certain extent. However, in an environment where objects are densely stacked, the target object is easily blocked by other objects, and the mechanical arm still cannot find a proper grabbing pose, so that final grabbing fails. The pushing and grabbing system based on deep reinforcement learning separates objects in a dense scene through pushing actions, then grabs the separated objects, and solves the object grabbing problem in the dense scene through a synergistic effect between pushing and grabbing. However, such methods grasp arbitrary objects in dense scenes one by one, instead of designated target objects, and the reward function fails to effectively evaluate the effectiveness of the pushing and grasping actions, resulting in sometimes "pushing" actions and "grasping" actions being not very reasonable.

Therefore, how to design a reasonable pushing and grabbing collaborative strategy network, so that the mechanical arm can successfully grab the appointed target object in a dense scene is a considerable problem.

Disclosure of Invention

The invention provides a method for grabbing a target object by a mechanical arm in a dense scene, which constructs a pushing and grabbing cooperative strategy network for the target object based on deep reinforcement learning, designs a pushing action and grabbing action rewarding function of the target object, judges whether the pushing action effectively separates the target object from other objects, and if the grabbing action successfully grabs the target object, continuously tries to acquire a rewarding value in a working environment based on a Q learning algorithm to train the pushing and grabbing cooperative strategy network. The method reasonably evaluates the effectiveness of pushing and grabbing actions, and completes accurate grabbing of the target object in a dense scene through cooperation of the pushing action and the grabbing action.

The invention provides a method for grabbing a target object by a mechanical arm in a dense scene, which comprises the following main steps:

step S1: building an object dense complex environment for the mechanical arm to work in a V-rep simulation environment;

step S2: constructing a pushing and grabbing collaborative strategy network based on deep reinforcement learning from end to end in a simulation environment, and adopting an encoder-decoder structure;

step S3: in the simulation environment constructed in the step S1, continuously trying actions in the working environment to obtain a reward value based on a Q learning algorithm to train a pushing and grabbing collaborative strategy network, and shielding the grabbing network if the target object is seriously shielded

Training only push network +.>

If the target object is not seriously shielded, the cooperative strategy network is pushed and grabbed by parallel training;

step S4: building a real densely stacked complex scene, transplanting a trained pushing and grabbing cooperative network of a simulation end to a physical platform, and performing action decision grabbing target objects by a mechanical arm in a real environment.

2. Optionally, the encoder-decoder structure of step S2 specifically includes the following steps:

step S2-1: constructing an encoder by adopting a dense convolutional neural network DenseNet121, and extracting a potential feature map for calculating the action of the mechanical arm;

step S2-2: adopting two parallel full convolution neural networks to construct decoder, respectively push network

And grab network->

3. Optionally, the training pushing and grabbing collaborative policy network in step S3 specifically includes the following steps:

step S3-1: capturing RGB image I within a workspace with a depth camera _c And depth image I _d I in RGB image _d Dividing the target to be grabbed to form a target object mask image I _m Processing three kinds of image information, and fusing to generate a characteristic diagram H _cdm ；

Step S3-2: generating a characteristic diagram H from the step S3-1 _cdm By rotating clockwise in turn

N rotating feature images are obtained, and the N feature images are sequentially transmitted to the encoder constructed in the step S2-1 to extract N potential feature images at t time

Step S3-3: if the target object is seriously blocked, shielding the grabbing network

Training only push network +.>

N potential feature maps extracted in step S3-2 +.>

Sequentially input to push network->

Outputting a push network Q value matrix, and selecting an action corresponding to the maximum Q value as an optimal decision action A _t The method comprises the steps of carrying out a first treatment on the surface of the If the purpose isIf the target object is not seriously shielded, pushing and grabbing a collaborative strategy network through parallel training, and enabling N potential feature graphs extracted in the step S3-2 to be +.>

Sequentially input to push network->

And grab network->

Outputting a Q value matrix of the pushing network and a Q value matrix of the grabbing network, and selecting an action corresponding to the maximum Q value as an optimal decision action A _t ；

Step S3-4: the mechanical arm executes the optimal action A _t The working scene changes, and the steps S3-1 and S3-2 are re-executed to acquire N potential feature maps at the time t+1

Step S3-5: according to the state S after the action _t+1 Using a piecewise reward function R _t Action A of the computing mechanical arm _t After the reward value, evaluate action A _t Affirmative or negative to cause completion of the final task;

step S3-6: storing interaction data quadruples (S _t ，A _t ，S _t+1 ，R _t ) To an experience pool;

step S3-7: calculating an objective function according to the prize value calculated in step S3-5

Wherein gamma is E [0,1 ]]Is a discount factor;

step S3-8: network training with minimized Huber loss function:

in the middle of，

For the predicted value of the network output at time t, +.>

The true value at the time t is traced back according to the Q value at the time t+1, and the difference between the true value and the true value serves as a loss function;

step S3-9: and iteratively updating the Huber loss function to push and grasp the collaborative strategy network, stopping training until the success rate curve reaches a stable oscillation state, and storing the latest parameters of the push and grasp collaborative strategy network.

4. Optionally, generating the feature map H as described in step S3-1 _cdm The method specifically comprises the following steps:

step S3-1-1: capturing a predefined working space by arranging a camera with a fixed point position outside the mechanical arm, and acquiring RGB color images I in the space _c And Depth image I _d ；

Step S3-1-2: application of pre-trained segmentation network PoseCNN to RGB color image I _c Dividing the target to be grabbed, setting the pixel of the target object to be 1, setting the rest pixels to be 0, and generating a binarization mask image I of the target object _m ；

Step S3-1-3: generating a binarized mask image I of the target object according to step S3-1-2 _m Judging the shielding state of the target object, counting the number of pixels in the target object area, and if the number of pixels is lower than a threshold value c, indicating that the target object is seriously shielded currently; in contrast, the target object is not occluded;

step S3-1-4: knowing the external parameters of the camera, RGB color image I _c Depth image I _d And mask image I _m Orthographic projection conversion in gravity direction into color height diagram H from top to bottom view angle _c Depth-height map H _d And target object mask height map H _m ；

Step S3-1-5: will color height map H _c Depth-height map H _d And target object mask height map H _m The height maps are respectively convolved to extract feature maps with the same resolution, and the feature maps H are spliced from top to bottom according to the channel layers _cdm 。

5. Optionally, selecting the optimal decision action A as described in step S3-3 _t The method specifically comprises the following steps:

step S3-3-1: n potential feature maps extracted in the step S3-2

Sequentially input to a push network

Calculating Q value matrix of N pushing actions at t moment->

Ith potential feature map at time t->

Q value of action "push" action at pixel coordinates (x, y) is +.>

If the target object is not seriously blocked, simultaneously, N potential feature images are sequentially input into a pushing network +.>

Calculating Q value matrix of N grabbing actions>

Ith latent feature map->

The value of the action "grabbing" action at the coordinates (x, y) of the latent feature image element is +.>

Step S3-3-2: if the object is a target objectIs seriously blocked, and the 'pushing' Q value matrix under all rotation angles at the moment t in the step S3-3-1 is assembled

Find out the maximum Q value at time t +.>

i=1, 2..n; if the target object is not seriously blocked, collecting the Q value matrix of pushing and grabbing actions under all rotation angles at the moment t in the step S3-3-1>

Find out the maximum Q value at time t +.>

Wherein (1)>

Step S3-3-3: at maximum Q value Q _max The corresponding action is the optimal action A of the mechanical arm at the moment t _t ，A＝<m，d，θ>Wherein m represents the action type of the mechanical arm, m epsilon { p, g }, p represents the "pushing" action, g represents the "grabbing" action, d= (x, y, z) represents the three-dimensional coordinates of the action point of the mechanical arm, and θ represents the rotation angle of the tail end of the mechanical arm around the z axis; if it is

Equal to->

Then the machineThe arm performs a "push" action, the three-dimensional coordinates d of the action point being (x ^* ，y ^* ，z ^* )，/>

z ^* Is a pixel (x) ^* ，y ^* ) The depth value of the position is that the rotation angle theta of the tail end of the mechanical arm around the z axis is

On the contrary, if->

Equal to->

The mechanical arm performs a "grabbing" action; the three-dimensional coordinates d of the action point are (x ^* ，y ^* ，z ^* )，/>

z ^* Is a pixel (x) ^* ，y ^* ) The depth value of the position, the rotation angle theta of the tail end of the mechanical arm around the z axis is +.>

6. Optionally, the segment prize function R described in step S3-5 _t Calculating a prize value, comprising the steps of:

step S3-5-1: masking image I of target object by interpolation _m Amplified to one time of the original I _m ' image moment finding I _m And I _m Centroid coordinates of' target object region pixels, will I _m ' target object region pixel mapping to I _m In the target object area, the barycenter coordinates of the target object area and the barycenter coordinates are ensured to be coincident, and a target rewarding area R is generated _area ；

Step S3-5-2: calculating R _area Internal non-orderArea ratio of target object

R _{no_object} Awarding the region R for the target _area The size of the area in which no object is present, R _{target_area} Awarding the region R for the target _area The size of the area of the inner target object;

step S3-5-3: presetting rewards, when the action corresponding to the maximum Q value is pushing, and the two-dimensional coordinates of the action points are located in the target rewards area R _area In, if the area ratio P of the non-target object after pushing _occupy Less than before pushing, meaning that the "pushing" action at time t effectively separates the target object from other objects, giving the prize value R _t =0.5; conversely, if the area ratio P of the non-target object after pushing _occupy Before pushing, means that the pushing action at the moment t cannot effectively separate the target object from other objects, and the bonus value R is given _t ＝0.25；

Step S3-5-4: similarly, when the action corresponding to the maximum Q value is "grabbing", and the two-dimensional coordinates of the action point are located in the target bonus area R _area If the mechanical arm successfully grabs the target object, a reward value R is given _t =1, otherwise, the mechanical arm does not successfully grasp the target object, and gives a reward value R _t ＝0.5；

Step S3-5-5: other cases are considered to be push failure and grip failure, and the prize value is set to 0.

7. Optionally, the densely stacking complex scene described in step S4 specifically includes the following steps:

step S4-1: the method comprises the steps of performing hand-eye calibration by using a calibration plate and a depth camera, obtaining internal and external parameters of the camera, and obtaining a transformation matrix M between three-dimensional space coordinates (x, y, z) corresponding to two-dimensional pixel coordinates (x, y);

step S4-2: the trained pushing and grabbing collaborative strategy network outputs the two-dimensional pixel coordinate (x) with the maximum Q value ^* ，y ^* ) Combining the transformation matrix M to obtain an optimal action A _t Namely the action type and the action point of the mechanical arm are in the actual empty stateThree-dimensional coordinates (x ^* ，y ^* ，z ^* ) And the rotation angle θ of the arm tip around the z-axis;

step S4-3: the end effector of the mechanical arm is controlled to rotate to the angle theta of the characteristic diagram with the maximum Q value, and the IK solver in the mechanical arm is used for solving the motion to the three-dimensional coordinate (x ^* ，y ^* ，z ^* ) The rotation of each degree of freedom is required to drive each joint track of the mechanical arm to move to execute the optimal action A _t ；

Step S4-4: repeating the steps S4-1 to S4-3 for a plurality of times to realize grabbing the target object in the dense scene.

Drawings

FIG. 1 is a process flow diagram of the process of the present invention;

FIG. 2 is a flow chart of an actual grabbing of the method of the present invention;

FIG. 3 is a view of a scene of a captured object in a dense scene of the method of the present invention;

FIG. 4 is a diagram of a collaborative pushing and grabbing network architecture for grabbing a target object according to the method of the present invention;

FIG. 5 is a color map, depth map, target mask map under a simulation environment of the method of the present invention;

FIG. 6 is a color height map, depth height map, target mask height map under a simulation environment of the method of the present invention;

FIG. 7 is a thermal elevation view of a pushing action of the method of the present invention;

FIG. 8 is a thermodynamic height diagram of a gripping action of the method of the present invention;

fig. 9 is a diagram of the optimal actions in a dense scenario of the method of the present invention:

FIG. 10 is a diagram of an example of rewards of the method of the invention;

fig. 11 is a graph of the success rate of capturing a target object in a dense scene of the method of the present invention.

Detailed Description

The invention provides a method for grabbing a target object by a mechanical arm in a dense scene, which is described in detail and completely with reference to the drawings and the technical method in the embodiment of the invention. The concrete explanation is as follows:

fig. 1 is a flow chart of the method of the present invention, fig. 2 is a flow chart of the actual grabbing of the method of the present invention, and fig. 1 and 2 contain all the steps of the method, specifically the steps are as follows:

step S1: and building an object intensive and complex environment for the mechanical arm to work in the V-rep simulation environment.

According to the principle of deep reinforcement learning trial and error, a great deal of experience data needs to be collected through interaction with the environment to give policy network learning, and the data collection mode in a real environment is extremely expensive. In order to solve the problem, a scene which is highly similar to reality is built by using simulation software V-rep to carry out a simulation experiment, data are collected efficiently, and noise interference caused by simulation to reality scene migration on neural network output is reduced as much as possible.

Mechanical arm, fidf joint configuration file, simulation object, obj instance file, D435i camera, ground, table and other obj instance files are imported into the V-rep simulation software. A space for limiting the position of the three-dimensional matrix of 0.448 x 0.4 is defined in the predefined working space of the mechanical arm; small wood blocks randomly drop in the space to simulate a scene with dense objects, so that the mechanical arm can collect data and train a strategy network. The experimental platform environment is shown in fig. 3.

Step S2: the pushing and grabbing collaborative strategy network based on the deep reinforcement learning from end to end is constructed in a simulation environment, and an encoder-decoder structure is adopted, wherein the pushing and grabbing collaborative strategy network framework is shown in fig. 4, and specifically comprises the following steps:

step S2-1: and constructing an encoder by adopting a dense convolutional neural network DenseNet121, and extracting a potential characteristic diagram for calculating the action of the mechanical arm.

And grab network->

Push network

And grab network->

And the network structure comprises a Conv2d convolution layer, a Relu activation function layer and a BatchNorm2d normalization layer. The Q value matrix with the size of 1 multiplied by 20 is output, and the double linear interpolation method is adopted to carry out up-sampling and mapping to obtain an action Q value matrix diagram with the size of 1 multiplied by 224.

Training only push network +.>

If the target object is not seriously shielded, the cooperative strategy network is pushed and grabbed by parallel training; the method specifically comprises the following steps:

step S3-1: capturing RGB image I within a workspace with a depth camera _c And depth image I _d I in RGB image _d Dividing the target to be grabbed to form a target object mask image I _m Processing three kinds of image information, and fusing to generate a characteristic diagram H _cdm The method specifically comprises the following steps:

step S3-1-1: capturing a predefined working space by arranging a camera with a fixed point position outside the mechanical arm, and acquiring RGB color images I in the space _c And Depth image I _d 。

Step S3-1-2: application of pre-trained segmentation network PoseCNN to RGB color image I _c Dividing the target to be grabbed, setting the pixel of the target object to be 1, setting the rest pixels to be 0, and generating a binarization mask image I of the target object _m The visual information is shown in fig. 5.

Step S3-1-3: generating a binarized mask image I of the target object according to step S3-1-2 _m Judging the shielding state of the target object, counting the number of pixels in the target object area, and if the number of pixels is lower than a threshold value c, indicating that the target object is seriously shielded currently; in contrast, the target object is not severely occluded.

Step S3-1-4: knowing the external parameters of the camera, RGB color image I _c Depth image I _d And mask image I _m Orthographic projection conversion in gravity direction into color height diagram H from top to bottom view angle _c Depth-height map H _d And target object mask height map H _m The visual information is shown in fig. 6.

Step S3-2: generating a characteristic diagram H by the step S3-1-5 _cdm By rotating clockwise in turn

Training only push network +.>

N potential feature maps extracted in step S3-2 +.>

Sequentially input to push network->

Outputting a push network Q value matrix, and selecting an action corresponding to the maximum Q value as an optimal decision action A _t The method comprises the steps of carrying out a first treatment on the surface of the If the target object is not seriously blocked, the cooperative strategy network is pushed and grabbed by parallel training, and N potential feature maps extracted in the step S3-2 are subjected to +.>

Sequentially input to push network->

And grab network->

Outputting a Q value matrix of the pushing network and a Q value matrix of the grabbing network, and selecting an action corresponding to the maximum Q value as an optimal decision action A _t The visual information is shown in fig. 7 and 8, and specifically comprises the following steps:

step S3-3-1: n potential feature maps extracted in the step S3-2

Sequentially input to a push network

Calculating Q value matrix of N pushing actions at t moment->

Ith potential feature map at time t->

Q value of action "push" action at pixel coordinates (x, y) is +.>

If the target object is not seriously coveredThe N potential feature images are sequentially input into a pushing network>

Calculating Q value matrix of N grabbing actions>

Ith latent feature map->

Step S3-3-2: if the target object is seriously blocked, the 'pushing' Q value matrix under all rotation angles at the moment t in the step S3-3-1 is assembled

Find out the maximum Q value at time t +.>

Find out the maximum Q value at time t +.>

Wherein (1)>

Step S3-3-3: at maximum Q value Q _max The corresponding action is the optimal action A of the mechanical arm at the moment t _t ，A＝<m，d，θ>Where m represents the motion type of the mechanical arm, m e { p, g }, p represents the "pushing" motion, g represents the "grabbing" motion, d= (x, y, z) represents the three-dimensional coordinates of the motion point of the mechanical arm, and θ represents the angle of rotation of the end of the mechanical arm around the z axis. If it is

Equal to->

The mechanical arm executes a pushing action, and the three-dimensional coordinate d of the action point is (x ^* ，y ^* ，z ^* )，/>

On the contrary, if->

Equal to->

z ^* Is a pixel (x) ^* ，y ^* ) The position of the partIs about the z-axis by an angle θ>

Optimal action A _t The definition is as follows: pushing: the parallel clamping jaws at the tail end of the mechanical arm are combined, and are pushed from right to left for 5 cm along a straight line towards the rotation direction of the height diagram. Grabbing: and opening the parallel clamping jaw at the tail end of the mechanical arm, and clamping the mechanical arm by 3 centimeters downwards towards the depth of the optimal action point. The visualized information is shown in fig. 9, the circles in the left graph are pushing optimal action points, and the circles in the right graph are grabbing optimal action points.

Step S3-5: according to the state S after the action _t+1 Using a piecewise reward function R _c Action A of the computing mechanical arm _c After the reward value, evaluate action A _t The visual information of affirmative or negative prompt to complete the final task is shown in fig. 10, and specifically includes the following steps:

step S3-5-1: masking image I of target object by interpolation _m Amplified to one time of the original I _m ' image moment finding I _m And I _m Centroid coordinates of' target object region pixels, will I _m ' target object region pixel mapping to I _m In the target object area, the barycenter coordinates of the target object area and the barycenter coordinates are ensured to be coincident, and a target rewarding area R is generated _area 。

Step S3-5-2: calculating R _area Area ratio of internal non-target object

R _{no_object} Awarding the region R for the target _area The size of the area in which no object is present, R _{targec_area} Awarding the region R for the target _area The size of the area of the inner target object.

Step S3-5-3: a prize is preset. When the action corresponding to the maximum Q value is "push", and the two-dimensional coordinates of the action point are located in the target rewarding area R _area In, if the area ratio P of the non-target object after pushing _occupy Less than before pushing, meaning that the "pushing" action at time t effectively separates the target object from other objects, giving the prize value R _t =0.5, otherwise, if the area ratio P of the non-target object after pushing _occupy Before pushing, means that the pushing action at the moment t cannot effectively separate the target object from other objects, and the bonus value R is given _t ＝0.25。

Step S3-6: storing interaction data quadruples (S _t ，A _t ，S _t+1 ，R _t ) To an experience pool.

Wherein gamma is E [0,1 ]]Is a discount factor.

Step S3-8: network training with minimized Huber loss function:

in the method, in the process of the invention,

the predicted value of the network output at the time t,/>

in order to trace back the true value at the time t according to the Q value at the time t+1, the difference between the two functions is used as a loss function. In the network gradient back propagation process, the loss function value of other pixels is 0, and only the loss function value of the optimal action pixel is calculated and selected. When the absolute value of the loss function is smaller than 1, the gradient gradually decreases along with the approach of the loss value to the minimum value of the loss function by adopting the loss function of MSE, so that the gradient approaches the minimum value of the loss function more accurately; under other conditions, the loss function of MAE is adopted, the gradient is always large, and the network training speed of the model is conveniently accelerated; combining MSE with MAE forms a Huber loss function. The gradient updating mode adopts an optimization method of random optimization of self-adaptive momentum and weight attenuation, and the historical acceleration convergence of the past gradient is summarized by the following specific formula:

in the above, g _t For network parameter gradient, m _t+1 For 1 st order moment estimation of gradient, beta ₁ Is a momentum coefficient of order 1; v _t+1 For 2-order moment estimation of gradient, beta ₂ Is the momentum coefficient of the order 2,

unbiased estimation for moment 1 estimation, +.>

An unbiased estimate of the moment estimate of order 2; a is learning rate, the magnitude and speed of the convergence of the decision network are controlled, and a constant of 0.0001 is selected as a default. Lambda is a weight attenuation coefficient, prevents the network from being over-fitted, and defaults to a constant of 0.00002.

Step S3-9: and (3) iteratively updating the Huber loss function, pushing and grabbing the collaborative strategy network until convergence, observing the steady oscillation state of the success rate curve, and stopping training after a period of time. And storing the latest pushing and grabbing collaborative strategy network parameters, and transplanting the latest pushing and grabbing collaborative strategy network parameters into a real environment.

Through multiple tests, the trained pushing and grabbing collaborative strategy network grabs the target object in a dense scene with a success rate of approximately 85%, and the grabbing success rate is shown in fig. 11.

Step S4: building a real densely stacked complex scene, transplanting a trained pushing and grabbing cooperative network of a simulation end to a physical platform, and enabling a mechanical arm to make action decisions to grab a target object in a real environment, wherein the method specifically comprises the following steps of:

step S4-1: and (3) performing hand-eye calibration by using the calibration plate and the depth camera, obtaining internal and external parameters of the camera, and obtaining a transformation matrix M between the two-dimensional pixel coordinates (x, y) and the corresponding three-dimensional space coordinates (x, y, z).

Step S4-2: the trained pushing and grabbing collaborative strategy network outputs the two-dimensional pixel coordinate (x) with the maximum Q value ^* ，y ^* ) Combining the transformation matrix M to obtain an optimal action A _t I.e. the type of motion of the robotic arm, the three-dimensional coordinates (x ^* ，y ^* ，z ^* ) And the arm tip rotates by an angle θ about the z-axis.

Step S4-3: the end effector of the mechanical arm is controlled to rotate to the angle theta of the characteristic diagram with the maximum Q value, and the IK solver in the mechanical arm is used for solving the motion to the three-dimensional coordinate (x ^* ，y ^* ，z ^* ) The rotation of each degree of freedom is required to drive each joint track of the mechanical arm to move to execute the optimal action A _t 。

All the steps and experimental results are operated at the simulation end, and a large gap exists between the simulation end and the actual reality environment. Therefore, in order to ensure the effectiveness of the method, the migration training well pushes and grabs the collaborative strategy network to have practical significance on the real end. And building a real experiment platform, wherein the platform comprises a mechanical arm, a depth camera, a to-be-picked object block and the like. Through multiple tests, the real end and the simulation end have similar success rates of grabbing the target object in a dense scene, and the method is proved to be feasible.

The foregoing is a detailed description of the method of practicing the invention and is not intended to limit the inventive example process to that which is described above. For the technical field of the invention, the related technical personnel can make changes, such as substitution, deletion and the like, within the original scope of the invention, and the changes also belong to the protection scope of the invention.

Claims

1. The method for grabbing the target object by the mechanical arm in the dense scene is characterized by comprising the following steps of:

Training only push network +.>

2. The method for capturing a target object by a mechanical arm in a dense scene according to claim 1, wherein the encoder-decoder structure in step S2 specifically comprises the following steps:

And grab network->

3. The method for capturing a target object by a mechanical arm in a dense scene according to claim 1, wherein the training pushing and capturing collaborative strategy network in step S3 specifically comprises the following steps:

N rotation feature maps are obtained and sequentially transmitted to the encoder constructed in the step S2-1 to extract N potential feature maps at t time>

Training only push network +.>

N potential feature maps extracted in step S3-2 +.>

Sequentially input to push network->

Sequentially input to push network->

And grab network->

step S3-7: according to the steps ofStep S3-5, calculating a target function according to the reward value calculated in the step

Wherein gamma is E [0,1 ]]Is a discount factor;

step S3-8: network training with minimized Huber loss function:

in the method, in the process of the invention,

for the predicted value of the network output at time t, +.>

4. The method for capturing a target object by a manipulator in a dense scene as claimed in claim 3, wherein the generating of the feature map H in step S3-1 is characterized by _cdm The method specifically comprises the following steps:

Step S3-1-3: according toGenerating a binarized mask image I of the target object in step S3-1-2 _m Judging the shielding state of the target object, counting the number of pixels in the target object area, and if the number of pixels is lower than a threshold value c, indicating that the target object is seriously shielded currently; in contrast, the target object is not occluded;

5. A method for capturing a target object by a manipulator in a dense scene as claimed in claim 3, wherein the selecting of the optimal decision action a in step S3-3 _t The method specifically comprises the following steps:

step S3-3-1: n potential feature maps extracted in the step S3-2

Sequentially input to push network->

Calculating Q value matrix of N pushing actions at t moment->

Ith potential feature map at time t->

Q value of action "push" action at pixel coordinates (x, y) is +.>

Calculating Q value matrix of N grabbing actions>

Ith latent feature map

Find out the maximum Q value at time t +.>

If the target object is not seriously blocked, collecting the Q value matrix of pushing and grabbing actions under all rotation angles at the moment t in the step S3-3-1>

Find out the maximum Q value at time t +.>

Wherein (1)>

Equal to->

On the contrary, if->

Equal to->

The mechanical arm performs a "grabbing" action; the three-dimensional coordinates d of the action point are (x ^* ，y ^* ，z ^* )，

6. A method for capturing a target object by a manipulator in a dense scene according to claim 3, wherein the step S3-5 is a piecewise reward function R _t Calculating a prize value, comprising the steps of:

Step S3-5-2: calculating R _area Area ratio of internal non-target object

step S3-5-3: presetting rewards, when the action corresponding to the maximum Q value is pushing, and the two-dimensional coordinates of the action points are located in the target rewards area R _area In, if the area ratio P of the non-target object after pushing _occupy Less than before pushing, meaning that the "pushing" action at time t effectively matches the target object withOther objects being separated, giving prize value R _t =0.5; conversely, if the area ratio P of the non-target object after pushing _occupy Before pushing, means that the pushing action at the moment t cannot effectively separate the target object from other objects, and the bonus value R is given _t ＝0.25；

7. The method for capturing a target object by a mechanical arm in a dense scene according to claim 1, wherein the dense stacking complex scene in step S4 specifically includes the following steps:

step S4-2: the trained pushing and grabbing collaborative strategy network outputs the two-dimensional pixel coordinate (x) with the maximum Q value ^* ，y ^* ) Combining the transformation matrix M to obtain an optimal action A _t I.e. the type of motion of the robotic arm, the three-dimensional coordinates (x ^* ，y ^* ，z ^* ) And the rotation angle θ of the arm tip around the z-axis;