CN116330283A - Method for grabbing target object by mechanical arm in dense scene - Google Patents

Method for grabbing target object by mechanical arm in dense scene Download PDF

Info

Publication number
CN116330283A
CN116330283A CN202310301614.2A CN202310301614A CN116330283A CN 116330283 A CN116330283 A CN 116330283A CN 202310301614 A CN202310301614 A CN 202310301614A CN 116330283 A CN116330283 A CN 116330283A
Authority
CN
China
Prior art keywords
target object
action
value
network
mechanical arm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310301614.2A
Other languages
Chinese (zh)
Inventor
沈捷
曹恺
李鑫
王莉
张盛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Tech University
Original Assignee
Nanjing Tech University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Tech University filed Critical Nanjing Tech University
Priority to CN202310301614.2A priority Critical patent/CN116330283A/en
Publication of CN116330283A publication Critical patent/CN116330283A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1602Programme controls characterised by the control system, structure, architecture
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Automation & Control Theory (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention discloses a method for grabbing a target object by a mechanical arm in a dense scene, which comprises the following steps: setting up a scene similar to a real environment through V-rep dynamic simulation software; constructing a pushing and grabbing collaborative strategy network of an encoder-decoder structure in a simulation environment, wherein the whole structure consists of a dense convolutional neural network DenseNet121 and two parallel full convolutional neural networks; building a real densely stacked complex scene, transplanting a trained pushing and grabbing cooperative network of a simulation end to a physical platform, and performing action decision grabbing target objects by a mechanical arm in a real environment. According to the method, a deep reinforcement learning mode is adopted to continuously try actions in a working environment to obtain a reward value to train, push and grasp a cooperative strategy network, the strategy network autonomously decides an optimal action according to the shielding state of a target object, an effective action sequence is formed to complete a target grasping task, and the grasping success rate of the target object in a dense scene is improved.

Description

Method for grabbing target object by mechanical arm in dense scene
Technical Field
The invention relates to the technical field of application of robot technology, in particular to a method for grabbing a target object by a mechanical arm in a dense scene.
Background
With the industrialization of artificial intelligence, the role of the mechanical arm is more important, and the mechanical arm is an essential component for completing complex tasks. By its very nature, the robotic arm provides a core function "grip". The traditional teaching method is simple and easy to operate, but the mechanical arm is limited to act in a highly repeated environment and does not have self-adaptive grabbing capability. The method for detecting the deep learning target is used for detecting the position of the target object in the two-dimensional space, then a corresponding point-based method, a template-based method and a voting-based method are adopted for predicting the actual three-dimensional space pose of the target object, and then the mechanical arm is used for grabbing the object at any position, but the methods need to know the object model in advance, and the object grabbing can only be carried out under the structured simple environment. The deep reinforcement learning combines the perception capability of the deep learning and the decision capability of the reinforcement learning to generate a decision network suitable for the grabbing task of the target object, and the action approaching to the optimal decision is obtained directly according to the input image calculation, so that the grabbing success rate of the object in different scenes is improved to a certain extent. However, in an environment where objects are densely stacked, the target object is easily blocked by other objects, and the mechanical arm still cannot find a proper grabbing pose, so that final grabbing fails. The pushing and grabbing system based on deep reinforcement learning separates objects in a dense scene through pushing actions, then grabs the separated objects, and solves the object grabbing problem in the dense scene through a synergistic effect between pushing and grabbing. However, such methods grasp arbitrary objects in dense scenes one by one, instead of designated target objects, and the reward function fails to effectively evaluate the effectiveness of the pushing and grasping actions, resulting in sometimes "pushing" actions and "grasping" actions being not very reasonable.
Therefore, how to design a reasonable pushing and grabbing collaborative strategy network, so that the mechanical arm can successfully grab the appointed target object in a dense scene is a considerable problem.
Disclosure of Invention
The invention provides a method for grabbing a target object by a mechanical arm in a dense scene, which constructs a pushing and grabbing cooperative strategy network for the target object based on deep reinforcement learning, designs a pushing action and grabbing action rewarding function of the target object, judges whether the pushing action effectively separates the target object from other objects, and if the grabbing action successfully grabs the target object, continuously tries to acquire a rewarding value in a working environment based on a Q learning algorithm to train the pushing and grabbing cooperative strategy network. The method reasonably evaluates the effectiveness of pushing and grabbing actions, and completes accurate grabbing of the target object in a dense scene through cooperation of the pushing action and the grabbing action.
The invention provides a method for grabbing a target object by a mechanical arm in a dense scene, which comprises the following main steps:
step S1: building an object dense complex environment for the mechanical arm to work in a V-rep simulation environment;
step S2: constructing a pushing and grabbing collaborative strategy network based on deep reinforcement learning from end to end in a simulation environment, and adopting an encoder-decoder structure;
step S3: in the simulation environment constructed in the step S1, continuously trying actions in the working environment to obtain a reward value based on a Q learning algorithm to train a pushing and grabbing collaborative strategy network, and shielding the grabbing network if the target object is seriously shielded
Figure BSA0000296762590000021
Training only push network +.>
Figure BSA0000296762590000022
If the target object is not seriously shielded, the cooperative strategy network is pushed and grabbed by parallel training;
step S4: building a real densely stacked complex scene, transplanting a trained pushing and grabbing cooperative network of a simulation end to a physical platform, and performing action decision grabbing target objects by a mechanical arm in a real environment.
2. Optionally, the encoder-decoder structure of step S2 specifically includes the following steps:
step S2-1: constructing an encoder by adopting a dense convolutional neural network DenseNet121, and extracting a potential feature map for calculating the action of the mechanical arm;
step S2-2: adopting two parallel full convolution neural networks to construct decoder, respectively push network
Figure BSA0000296762590000023
And grab network->
Figure BSA0000296762590000024
3. Optionally, the training pushing and grabbing collaborative policy network in step S3 specifically includes the following steps:
step S3-1: capturing RGB image I within a workspace with a depth camera c And depth image I d I in RGB image d Dividing the target to be grabbed to form a target object mask image I m Processing three kinds of image information, and fusing to generate a characteristic diagram H cdm
Step S3-2: generating a characteristic diagram H from the step S3-1 cdm By rotating clockwise in turn
Figure BSA0000296762590000031
N rotating feature images are obtained, and the N feature images are sequentially transmitted to the encoder constructed in the step S2-1 to extract N potential feature images at t time
Figure BSA0000296762590000032
Step S3-3: if the target object is seriously blocked, shielding the grabbing network
Figure BSA0000296762590000033
Training only push network +.>
Figure BSA0000296762590000034
N potential feature maps extracted in step S3-2 +.>
Figure BSA0000296762590000035
Sequentially input to push network->
Figure BSA0000296762590000036
Outputting a push network Q value matrix, and selecting an action corresponding to the maximum Q value as an optimal decision action A t The method comprises the steps of carrying out a first treatment on the surface of the If the purpose isIf the target object is not seriously shielded, pushing and grabbing a collaborative strategy network through parallel training, and enabling N potential feature graphs extracted in the step S3-2 to be +.>
Figure BSA0000296762590000037
Sequentially input to push network->
Figure BSA0000296762590000038
And grab network->
Figure BSA0000296762590000039
Outputting a Q value matrix of the pushing network and a Q value matrix of the grabbing network, and selecting an action corresponding to the maximum Q value as an optimal decision action A t
Step S3-4: the mechanical arm executes the optimal action A t The working scene changes, and the steps S3-1 and S3-2 are re-executed to acquire N potential feature maps at the time t+1
Figure BSA00002967625900000310
Step S3-5: according to the state S after the action t+1 Using a piecewise reward function R t Action A of the computing mechanical arm t After the reward value, evaluate action A t Affirmative or negative to cause completion of the final task;
step S3-6: storing interaction data quadruples (S t ,A t ,S t+1 ,R t ) To an experience pool;
step S3-7: calculating an objective function according to the prize value calculated in step S3-5
Figure BSA00002967625900000311
Wherein gamma is E [0,1 ]]Is a discount factor;
step S3-8: network training with minimized Huber loss function:
Figure BSA00002967625900000312
in the middle of,
Figure BSA00002967625900000313
For the predicted value of the network output at time t, +.>
Figure BSA00002967625900000314
The true value at the time t is traced back according to the Q value at the time t+1, and the difference between the true value and the true value serves as a loss function;
step S3-9: and iteratively updating the Huber loss function to push and grasp the collaborative strategy network, stopping training until the success rate curve reaches a stable oscillation state, and storing the latest parameters of the push and grasp collaborative strategy network.
4. Optionally, generating the feature map H as described in step S3-1 cdm The method specifically comprises the following steps:
step S3-1-1: capturing a predefined working space by arranging a camera with a fixed point position outside the mechanical arm, and acquiring RGB color images I in the space c And Depth image I d
Step S3-1-2: application of pre-trained segmentation network PoseCNN to RGB color image I c Dividing the target to be grabbed, setting the pixel of the target object to be 1, setting the rest pixels to be 0, and generating a binarization mask image I of the target object m
Step S3-1-3: generating a binarized mask image I of the target object according to step S3-1-2 m Judging the shielding state of the target object, counting the number of pixels in the target object area, and if the number of pixels is lower than a threshold value c, indicating that the target object is seriously shielded currently; in contrast, the target object is not occluded;
step S3-1-4: knowing the external parameters of the camera, RGB color image I c Depth image I d And mask image I m Orthographic projection conversion in gravity direction into color height diagram H from top to bottom view angle c Depth-height map H d And target object mask height map H m
Step S3-1-5: will color height map H c Depth-height map H d And target object mask height map H m The height maps are respectively convolved to extract feature maps with the same resolution, and the feature maps H are spliced from top to bottom according to the channel layers cdm
5. Optionally, selecting the optimal decision action A as described in step S3-3 t The method specifically comprises the following steps:
step S3-3-1: n potential feature maps extracted in the step S3-2
Figure BSA0000296762590000041
Sequentially input to a push network
Figure BSA0000296762590000042
Calculating Q value matrix of N pushing actions at t moment->
Figure BSA0000296762590000043
Ith potential feature map at time t->
Figure BSA0000296762590000044
Q value of action "push" action at pixel coordinates (x, y) is +.>
Figure BSA0000296762590000045
If the target object is not seriously blocked, simultaneously, N potential feature images are sequentially input into a pushing network +.>
Figure BSA0000296762590000046
Calculating Q value matrix of N grabbing actions>
Figure BSA0000296762590000047
Ith latent feature map->
Figure BSA00002967625900000412
The value of the action "grabbing" action at the coordinates (x, y) of the latent feature image element is +.>
Figure BSA0000296762590000048
Step S3-3-2: if the object is a target objectIs seriously blocked, and the 'pushing' Q value matrix under all rotation angles at the moment t in the step S3-3-1 is assembled
Figure BSA0000296762590000049
Find out the maximum Q value at time t +.>
Figure BSA00002967625900000410
Figure BSA00002967625900000411
i=1, 2..n; if the target object is not seriously blocked, collecting the Q value matrix of pushing and grabbing actions under all rotation angles at the moment t in the step S3-3-1>
Figure BSA0000296762590000051
Find out the maximum Q value at time t +.>
Figure BSA0000296762590000052
Figure BSA0000296762590000053
Wherein (1)>
Figure BSA0000296762590000054
Figure BSA0000296762590000055
Step S3-3-3: at maximum Q value Q max The corresponding action is the optimal action A of the mechanical arm at the moment t t ,A=<m,d,θ>Wherein m represents the action type of the mechanical arm, m epsilon { p, g }, p represents the "pushing" action, g represents the "grabbing" action, d= (x, y, z) represents the three-dimensional coordinates of the action point of the mechanical arm, and θ represents the rotation angle of the tail end of the mechanical arm around the z axis; if it is
Figure BSA0000296762590000056
Equal to->
Figure BSA0000296762590000057
Then the machineThe arm performs a "push" action, the three-dimensional coordinates d of the action point being (x * ,y * ,z * ),/>
Figure BSA0000296762590000058
Figure BSA0000296762590000059
z * Is a pixel (x) * ,y * ) The depth value of the position is that the rotation angle theta of the tail end of the mechanical arm around the z axis is
Figure BSA00002967625900000510
On the contrary, if->
Figure BSA00002967625900000511
Equal to->
Figure BSA00002967625900000512
The mechanical arm performs a "grabbing" action; the three-dimensional coordinates d of the action point are (x * ,y * ,z * ),/>
Figure BSA00002967625900000513
z * Is a pixel (x) * ,y * ) The depth value of the position, the rotation angle theta of the tail end of the mechanical arm around the z axis is +.>
Figure BSA00002967625900000514
6. Optionally, the segment prize function R described in step S3-5 t Calculating a prize value, comprising the steps of:
step S3-5-1: masking image I of target object by interpolation m Amplified to one time of the original I m ' image moment finding I m And I m Centroid coordinates of' target object region pixels, will I m ' target object region pixel mapping to I m In the target object area, the barycenter coordinates of the target object area and the barycenter coordinates are ensured to be coincident, and a target rewarding area R is generated area
Step S3-5-2: calculating R area Internal non-orderArea ratio of target object
Figure BSA00002967625900000515
R no_object Awarding the region R for the target area The size of the area in which no object is present, R target_area Awarding the region R for the target area The size of the area of the inner target object;
step S3-5-3: presetting rewards, when the action corresponding to the maximum Q value is pushing, and the two-dimensional coordinates of the action points are located in the target rewards area R area In, if the area ratio P of the non-target object after pushing occupy Less than before pushing, meaning that the "pushing" action at time t effectively separates the target object from other objects, giving the prize value R t =0.5; conversely, if the area ratio P of the non-target object after pushing occupy Before pushing, means that the pushing action at the moment t cannot effectively separate the target object from other objects, and the bonus value R is given t =0.25;
Step S3-5-4: similarly, when the action corresponding to the maximum Q value is "grabbing", and the two-dimensional coordinates of the action point are located in the target bonus area R area If the mechanical arm successfully grabs the target object, a reward value R is given t =1, otherwise, the mechanical arm does not successfully grasp the target object, and gives a reward value R t =0.5;
Step S3-5-5: other cases are considered to be push failure and grip failure, and the prize value is set to 0.
7. Optionally, the densely stacking complex scene described in step S4 specifically includes the following steps:
step S4-1: the method comprises the steps of performing hand-eye calibration by using a calibration plate and a depth camera, obtaining internal and external parameters of the camera, and obtaining a transformation matrix M between three-dimensional space coordinates (x, y, z) corresponding to two-dimensional pixel coordinates (x, y);
step S4-2: the trained pushing and grabbing collaborative strategy network outputs the two-dimensional pixel coordinate (x) with the maximum Q value * ,y * ) Combining the transformation matrix M to obtain an optimal action A t Namely the action type and the action point of the mechanical arm are in the actual empty stateThree-dimensional coordinates (x * ,y * ,z * ) And the rotation angle θ of the arm tip around the z-axis;
step S4-3: the end effector of the mechanical arm is controlled to rotate to the angle theta of the characteristic diagram with the maximum Q value, and the IK solver in the mechanical arm is used for solving the motion to the three-dimensional coordinate (x * ,y * ,z * ) The rotation of each degree of freedom is required to drive each joint track of the mechanical arm to move to execute the optimal action A t
Step S4-4: repeating the steps S4-1 to S4-3 for a plurality of times to realize grabbing the target object in the dense scene.
Drawings
FIG. 1 is a process flow diagram of the process of the present invention;
FIG. 2 is a flow chart of an actual grabbing of the method of the present invention;
FIG. 3 is a view of a scene of a captured object in a dense scene of the method of the present invention;
FIG. 4 is a diagram of a collaborative pushing and grabbing network architecture for grabbing a target object according to the method of the present invention;
FIG. 5 is a color map, depth map, target mask map under a simulation environment of the method of the present invention;
FIG. 6 is a color height map, depth height map, target mask height map under a simulation environment of the method of the present invention;
FIG. 7 is a thermal elevation view of a pushing action of the method of the present invention;
FIG. 8 is a thermodynamic height diagram of a gripping action of the method of the present invention;
fig. 9 is a diagram of the optimal actions in a dense scenario of the method of the present invention:
FIG. 10 is a diagram of an example of rewards of the method of the invention;
fig. 11 is a graph of the success rate of capturing a target object in a dense scene of the method of the present invention.
Detailed Description
The invention provides a method for grabbing a target object by a mechanical arm in a dense scene, which is described in detail and completely with reference to the drawings and the technical method in the embodiment of the invention. The concrete explanation is as follows:
fig. 1 is a flow chart of the method of the present invention, fig. 2 is a flow chart of the actual grabbing of the method of the present invention, and fig. 1 and 2 contain all the steps of the method, specifically the steps are as follows:
step S1: and building an object intensive and complex environment for the mechanical arm to work in the V-rep simulation environment.
According to the principle of deep reinforcement learning trial and error, a great deal of experience data needs to be collected through interaction with the environment to give policy network learning, and the data collection mode in a real environment is extremely expensive. In order to solve the problem, a scene which is highly similar to reality is built by using simulation software V-rep to carry out a simulation experiment, data are collected efficiently, and noise interference caused by simulation to reality scene migration on neural network output is reduced as much as possible.
Mechanical arm, fidf joint configuration file, simulation object, obj instance file, D435i camera, ground, table and other obj instance files are imported into the V-rep simulation software. A space for limiting the position of the three-dimensional matrix of 0.448 x 0.4 is defined in the predefined working space of the mechanical arm; small wood blocks randomly drop in the space to simulate a scene with dense objects, so that the mechanical arm can collect data and train a strategy network. The experimental platform environment is shown in fig. 3.
Step S2: the pushing and grabbing collaborative strategy network based on the deep reinforcement learning from end to end is constructed in a simulation environment, and an encoder-decoder structure is adopted, wherein the pushing and grabbing collaborative strategy network framework is shown in fig. 4, and specifically comprises the following steps:
step S2-1: and constructing an encoder by adopting a dense convolutional neural network DenseNet121, and extracting a potential characteristic diagram for calculating the action of the mechanical arm.
Step S2-2: adopting two parallel full convolution neural networks to construct decoder, respectively push network
Figure BSA0000296762590000071
And grab network->
Figure BSA0000296762590000081
Push network
Figure BSA0000296762590000082
And grab network->
Figure BSA0000296762590000083
And the network structure comprises a Conv2d convolution layer, a Relu activation function layer and a BatchNorm2d normalization layer. The Q value matrix with the size of 1 multiplied by 20 is output, and the double linear interpolation method is adopted to carry out up-sampling and mapping to obtain an action Q value matrix diagram with the size of 1 multiplied by 224.
Step S3: in the simulation environment constructed in the step S1, continuously trying actions in the working environment to obtain a reward value based on a Q learning algorithm to train a pushing and grabbing collaborative strategy network, and shielding the grabbing network if the target object is seriously shielded
Figure BSA0000296762590000084
Training only push network +.>
Figure BSA0000296762590000085
If the target object is not seriously shielded, the cooperative strategy network is pushed and grabbed by parallel training; the method specifically comprises the following steps:
step S3-1: capturing RGB image I within a workspace with a depth camera c And depth image I d I in RGB image d Dividing the target to be grabbed to form a target object mask image I m Processing three kinds of image information, and fusing to generate a characteristic diagram H cdm The method specifically comprises the following steps:
step S3-1-1: capturing a predefined working space by arranging a camera with a fixed point position outside the mechanical arm, and acquiring RGB color images I in the space c And Depth image I d
Step S3-1-2: application of pre-trained segmentation network PoseCNN to RGB color image I c Dividing the target to be grabbed, setting the pixel of the target object to be 1, setting the rest pixels to be 0, and generating a binarization mask image I of the target object m The visual information is shown in fig. 5.
Step S3-1-3: generating a binarized mask image I of the target object according to step S3-1-2 m Judging the shielding state of the target object, counting the number of pixels in the target object area, and if the number of pixels is lower than a threshold value c, indicating that the target object is seriously shielded currently; in contrast, the target object is not severely occluded.
Step S3-1-4: knowing the external parameters of the camera, RGB color image I c Depth image I d And mask image I m Orthographic projection conversion in gravity direction into color height diagram H from top to bottom view angle c Depth-height map H d And target object mask height map H m The visual information is shown in fig. 6.
Step S3-1-5: will color height map H c Depth-height map H d And target object mask height map H m The height maps are respectively convolved to extract feature maps with the same resolution, and the feature maps H are spliced from top to bottom according to the channel layers cdm
Step S3-2: generating a characteristic diagram H by the step S3-1-5 cdm By rotating clockwise in turn
Figure BSA0000296762590000091
N rotating feature images are obtained, and the N feature images are sequentially transmitted to the encoder constructed in the step S2-1 to extract N potential feature images at t time
Figure BSA0000296762590000092
Step S3-3: if the target object is seriously blocked, shielding the grabbing network
Figure BSA0000296762590000093
Training only push network +.>
Figure BSA0000296762590000094
N potential feature maps extracted in step S3-2 +.>
Figure BSA0000296762590000095
Sequentially input to push network->
Figure BSA0000296762590000096
Outputting a push network Q value matrix, and selecting an action corresponding to the maximum Q value as an optimal decision action A t The method comprises the steps of carrying out a first treatment on the surface of the If the target object is not seriously blocked, the cooperative strategy network is pushed and grabbed by parallel training, and N potential feature maps extracted in the step S3-2 are subjected to +.>
Figure BSA0000296762590000097
Sequentially input to push network->
Figure BSA0000296762590000098
And grab network->
Figure BSA0000296762590000099
Outputting a Q value matrix of the pushing network and a Q value matrix of the grabbing network, and selecting an action corresponding to the maximum Q value as an optimal decision action A t The visual information is shown in fig. 7 and 8, and specifically comprises the following steps:
step S3-3-1: n potential feature maps extracted in the step S3-2
Figure BSA00002967625900000910
Sequentially input to a push network
Figure BSA00002967625900000911
Calculating Q value matrix of N pushing actions at t moment->
Figure BSA00002967625900000912
Ith potential feature map at time t->
Figure BSA00002967625900000913
Q value of action "push" action at pixel coordinates (x, y) is +.>
Figure BSA00002967625900000914
If the target object is not seriously coveredThe N potential feature images are sequentially input into a pushing network>
Figure BSA00002967625900000915
Calculating Q value matrix of N grabbing actions>
Figure BSA00002967625900000916
Ith latent feature map->
Figure BSA00002967625900000917
The value of the action "grabbing" action at the coordinates (x, y) of the latent feature image element is +.>
Figure BSA00002967625900000918
Step S3-3-2: if the target object is seriously blocked, the 'pushing' Q value matrix under all rotation angles at the moment t in the step S3-3-1 is assembled
Figure BSA00002967625900000919
Find out the maximum Q value at time t +.>
Figure BSA00002967625900000920
Figure BSA00002967625900000921
i=1, 2..n; if the target object is not seriously blocked, collecting the Q value matrix of pushing and grabbing actions under all rotation angles at the moment t in the step S3-3-1>
Figure BSA00002967625900000922
Find out the maximum Q value at time t +.>
Figure BSA00002967625900000923
Figure BSA00002967625900000924
Wherein (1)>
Figure BSA00002967625900000925
Figure BSA00002967625900000926
Step S3-3-3: at maximum Q value Q max The corresponding action is the optimal action A of the mechanical arm at the moment t t ,A=<m,d,θ>Where m represents the motion type of the mechanical arm, m e { p, g }, p represents the "pushing" motion, g represents the "grabbing" motion, d= (x, y, z) represents the three-dimensional coordinates of the motion point of the mechanical arm, and θ represents the angle of rotation of the end of the mechanical arm around the z axis. If it is
Figure BSA0000296762590000101
Equal to->
Figure BSA0000296762590000102
The mechanical arm executes a pushing action, and the three-dimensional coordinate d of the action point is (x * ,y * ,z * ),/>
Figure BSA0000296762590000103
Figure BSA0000296762590000104
z * Is a pixel (x) * ,y * ) The depth value of the position is that the rotation angle theta of the tail end of the mechanical arm around the z axis is
Figure BSA0000296762590000105
On the contrary, if->
Figure BSA0000296762590000106
Equal to->
Figure BSA0000296762590000107
The mechanical arm performs a "grabbing" action; the three-dimensional coordinates d of the action point are (x * ,y * ,z * ),/>
Figure BSA0000296762590000108
z * Is a pixel (x) * ,y * ) The position of the partIs about the z-axis by an angle θ>
Figure BSA0000296762590000109
Optimal action A t The definition is as follows: pushing: the parallel clamping jaws at the tail end of the mechanical arm are combined, and are pushed from right to left for 5 cm along a straight line towards the rotation direction of the height diagram. Grabbing: and opening the parallel clamping jaw at the tail end of the mechanical arm, and clamping the mechanical arm by 3 centimeters downwards towards the depth of the optimal action point. The visualized information is shown in fig. 9, the circles in the left graph are pushing optimal action points, and the circles in the right graph are grabbing optimal action points.
Step S3-4: the mechanical arm executes the optimal action A t The working scene changes, and the steps S3-1 and S3-2 are re-executed to acquire N potential feature maps at the time t+1
Figure BSA00002967625900001010
Step S3-5: according to the state S after the action t+1 Using a piecewise reward function R c Action A of the computing mechanical arm c After the reward value, evaluate action A t The visual information of affirmative or negative prompt to complete the final task is shown in fig. 10, and specifically includes the following steps:
step S3-5-1: masking image I of target object by interpolation m Amplified to one time of the original I m ' image moment finding I m And I m Centroid coordinates of' target object region pixels, will I m ' target object region pixel mapping to I m In the target object area, the barycenter coordinates of the target object area and the barycenter coordinates are ensured to be coincident, and a target rewarding area R is generated area
Step S3-5-2: calculating R area Area ratio of internal non-target object
Figure BSA00002967625900001011
R no_object Awarding the region R for the target area The size of the area in which no object is present, R targec_area Awarding the region R for the target area The size of the area of the inner target object.
Step S3-5-3: a prize is preset. When the action corresponding to the maximum Q value is "push", and the two-dimensional coordinates of the action point are located in the target rewarding area R area In, if the area ratio P of the non-target object after pushing occupy Less than before pushing, meaning that the "pushing" action at time t effectively separates the target object from other objects, giving the prize value R t =0.5, otherwise, if the area ratio P of the non-target object after pushing occupy Before pushing, means that the pushing action at the moment t cannot effectively separate the target object from other objects, and the bonus value R is given t =0.25。
Step S3-5-4: similarly, when the action corresponding to the maximum Q value is "grabbing", and the two-dimensional coordinates of the action point are located in the target bonus area R area If the mechanical arm successfully grabs the target object, a reward value R is given t =1, otherwise, the mechanical arm does not successfully grasp the target object, and gives a reward value R t =0.5;
Step S3-5-5: other cases are considered to be push failure and grip failure, and the prize value is set to 0.
Step S3-6: storing interaction data quadruples (S t ,A t ,S t+1 ,R t ) To an experience pool.
Step S3-7: calculating an objective function according to the prize value calculated in step S3-5
Figure BSA0000296762590000111
Wherein gamma is E [0,1 ]]Is a discount factor.
Step S3-8: network training with minimized Huber loss function:
Figure BSA0000296762590000112
in the method, in the process of the invention,
Figure BSA0000296762590000113
the predicted value of the network output at the time t,/>
Figure BSA0000296762590000114
in order to trace back the true value at the time t according to the Q value at the time t+1, the difference between the two functions is used as a loss function. In the network gradient back propagation process, the loss function value of other pixels is 0, and only the loss function value of the optimal action pixel is calculated and selected. When the absolute value of the loss function is smaller than 1, the gradient gradually decreases along with the approach of the loss value to the minimum value of the loss function by adopting the loss function of MSE, so that the gradient approaches the minimum value of the loss function more accurately; under other conditions, the loss function of MAE is adopted, the gradient is always large, and the network training speed of the model is conveniently accelerated; combining MSE with MAE forms a Huber loss function. The gradient updating mode adopts an optimization method of random optimization of self-adaptive momentum and weight attenuation, and the historical acceleration convergence of the past gradient is summarized by the following specific formula:
Figure BSA0000296762590000121
in the above, g t For network parameter gradient, m t+1 For 1 st order moment estimation of gradient, beta 1 Is a momentum coefficient of order 1; v t+1 For 2-order moment estimation of gradient, beta 2 Is the momentum coefficient of the order 2,
Figure BSA0000296762590000122
unbiased estimation for moment 1 estimation, +.>
Figure BSA0000296762590000123
An unbiased estimate of the moment estimate of order 2; a is learning rate, the magnitude and speed of the convergence of the decision network are controlled, and a constant of 0.0001 is selected as a default. Lambda is a weight attenuation coefficient, prevents the network from being over-fitted, and defaults to a constant of 0.00002.
Step S3-9: and (3) iteratively updating the Huber loss function, pushing and grabbing the collaborative strategy network until convergence, observing the steady oscillation state of the success rate curve, and stopping training after a period of time. And storing the latest pushing and grabbing collaborative strategy network parameters, and transplanting the latest pushing and grabbing collaborative strategy network parameters into a real environment.
Through multiple tests, the trained pushing and grabbing collaborative strategy network grabs the target object in a dense scene with a success rate of approximately 85%, and the grabbing success rate is shown in fig. 11.
Step S4: building a real densely stacked complex scene, transplanting a trained pushing and grabbing cooperative network of a simulation end to a physical platform, and enabling a mechanical arm to make action decisions to grab a target object in a real environment, wherein the method specifically comprises the following steps of:
step S4-1: and (3) performing hand-eye calibration by using the calibration plate and the depth camera, obtaining internal and external parameters of the camera, and obtaining a transformation matrix M between the two-dimensional pixel coordinates (x, y) and the corresponding three-dimensional space coordinates (x, y, z).
Step S4-2: the trained pushing and grabbing collaborative strategy network outputs the two-dimensional pixel coordinate (x) with the maximum Q value * ,y * ) Combining the transformation matrix M to obtain an optimal action A t I.e. the type of motion of the robotic arm, the three-dimensional coordinates (x * ,y * ,z * ) And the arm tip rotates by an angle θ about the z-axis.
Step S4-3: the end effector of the mechanical arm is controlled to rotate to the angle theta of the characteristic diagram with the maximum Q value, and the IK solver in the mechanical arm is used for solving the motion to the three-dimensional coordinate (x * ,y * ,z * ) The rotation of each degree of freedom is required to drive each joint track of the mechanical arm to move to execute the optimal action A t
Step S4-4: repeating the steps S4-1 to S4-3 for a plurality of times to realize grabbing the target object in the dense scene.
All the steps and experimental results are operated at the simulation end, and a large gap exists between the simulation end and the actual reality environment. Therefore, in order to ensure the effectiveness of the method, the migration training well pushes and grabs the collaborative strategy network to have practical significance on the real end. And building a real experiment platform, wherein the platform comprises a mechanical arm, a depth camera, a to-be-picked object block and the like. Through multiple tests, the real end and the simulation end have similar success rates of grabbing the target object in a dense scene, and the method is proved to be feasible.
The foregoing is a detailed description of the method of practicing the invention and is not intended to limit the inventive example process to that which is described above. For the technical field of the invention, the related technical personnel can make changes, such as substitution, deletion and the like, within the original scope of the invention, and the changes also belong to the protection scope of the invention.

Claims (7)

1. The method for grabbing the target object by the mechanical arm in the dense scene is characterized by comprising the following steps of:
step S1: building an object dense complex environment for the mechanical arm to work in a V-rep simulation environment;
step S2: constructing a pushing and grabbing collaborative strategy network based on deep reinforcement learning from end to end in a simulation environment, and adopting an encoder-decoder structure;
step S3: in the simulation environment constructed in the step S1, continuously trying actions in the working environment to obtain a reward value based on a Q learning algorithm to train a pushing and grabbing collaborative strategy network, and shielding the grabbing network if the target object is seriously shielded
Figure FSA0000296762580000011
Training only push network +.>
Figure FSA0000296762580000012
If the target object is not seriously shielded, the cooperative strategy network is pushed and grabbed by parallel training;
step S4: building a real densely stacked complex scene, transplanting a trained pushing and grabbing cooperative network of a simulation end to a physical platform, and performing action decision grabbing target objects by a mechanical arm in a real environment.
2. The method for capturing a target object by a mechanical arm in a dense scene according to claim 1, wherein the encoder-decoder structure in step S2 specifically comprises the following steps:
step S2-1: constructing an encoder by adopting a dense convolutional neural network DenseNet121, and extracting a potential feature map for calculating the action of the mechanical arm;
step S2-2: adopting two parallel full convolution neural networks to construct decoder, respectively push network
Figure FSA0000296762580000013
And grab network->
Figure FSA0000296762580000014
3. The method for capturing a target object by a mechanical arm in a dense scene according to claim 1, wherein the training pushing and capturing collaborative strategy network in step S3 specifically comprises the following steps:
step S3-1: capturing RGB image I within a workspace with a depth camera c And depth image I d I in RGB image d Dividing the target to be grabbed to form a target object mask image I m Processing three kinds of image information, and fusing to generate a characteristic diagram H cdm
Step S3-2: generating a characteristic diagram H from the step S3-1 cdm By rotating clockwise in turn
Figure FSA0000296762580000015
N rotation feature maps are obtained and sequentially transmitted to the encoder constructed in the step S2-1 to extract N potential feature maps at t time>
Figure FSA0000296762580000016
Step S3-3: if the target object is seriously blocked, shielding the grabbing network
Figure FSA0000296762580000017
Training only push network +.>
Figure FSA0000296762580000018
N potential feature maps extracted in step S3-2 +.>
Figure FSA0000296762580000019
Sequentially input to push network->
Figure FSA00002967625800000110
Outputting a push network Q value matrix, and selecting an action corresponding to the maximum Q value as an optimal decision action A t The method comprises the steps of carrying out a first treatment on the surface of the If the target object is not seriously blocked, the cooperative strategy network is pushed and grabbed by parallel training, and N potential feature maps extracted in the step S3-2 are subjected to +.>
Figure FSA00002967625800000111
Sequentially input to push network->
Figure FSA00002967625800000112
And grab network->
Figure FSA00002967625800000113
Outputting a Q value matrix of the pushing network and a Q value matrix of the grabbing network, and selecting an action corresponding to the maximum Q value as an optimal decision action A t
Step S3-4: the mechanical arm executes the optimal action A t The working scene changes, and the steps S3-1 and S3-2 are re-executed to acquire N potential feature maps at the time t+1
Figure FSA0000296762580000021
Step S3-5: according to the state S after the action t+1 Using a piecewise reward function R t Action A of the computing mechanical arm t After the reward value, evaluate action A t Affirmative or negative to cause completion of the final task;
step S3-6: storing interaction data quadruples (S t ,A t ,S t+1 ,R t ) To an experience pool;
step S3-7: according to the steps ofStep S3-5, calculating a target function according to the reward value calculated in the step
Figure FSA0000296762580000022
Wherein gamma is E [0,1 ]]Is a discount factor;
step S3-8: network training with minimized Huber loss function:
Figure FSA0000296762580000023
in the method, in the process of the invention,
Figure FSA0000296762580000024
for the predicted value of the network output at time t, +.>
Figure FSA0000296762580000025
The true value at the time t is traced back according to the Q value at the time t+1, and the difference between the true value and the true value serves as a loss function;
step S3-9: and iteratively updating the Huber loss function to push and grasp the collaborative strategy network, stopping training until the success rate curve reaches a stable oscillation state, and storing the latest parameters of the push and grasp collaborative strategy network.
4. The method for capturing a target object by a manipulator in a dense scene as claimed in claim 3, wherein the generating of the feature map H in step S3-1 is characterized by cdm The method specifically comprises the following steps:
step S3-1-1: capturing a predefined working space by arranging a camera with a fixed point position outside the mechanical arm, and acquiring RGB color images I in the space c And Depth image I d
Step S3-1-2: application of pre-trained segmentation network PoseCNN to RGB color image I c Dividing the target to be grabbed, setting the pixel of the target object to be 1, setting the rest pixels to be 0, and generating a binarization mask image I of the target object m
Step S3-1-3: according toGenerating a binarized mask image I of the target object in step S3-1-2 m Judging the shielding state of the target object, counting the number of pixels in the target object area, and if the number of pixels is lower than a threshold value c, indicating that the target object is seriously shielded currently; in contrast, the target object is not occluded;
step S3-1-4: knowing the external parameters of the camera, RGB color image I c Depth image I d And mask image I m Orthographic projection conversion in gravity direction into color height diagram H from top to bottom view angle c Depth-height map H d And target object mask height map H m
Step S3-1-5: will color height map H c Depth-height map H d And target object mask height map H m The height maps are respectively convolved to extract feature maps with the same resolution, and the feature maps H are spliced from top to bottom according to the channel layers cdm
5. A method for capturing a target object by a manipulator in a dense scene as claimed in claim 3, wherein the selecting of the optimal decision action a in step S3-3 t The method specifically comprises the following steps:
step S3-3-1: n potential feature maps extracted in the step S3-2
Figure FSA0000296762580000031
Sequentially input to push network->
Figure FSA0000296762580000032
Calculating Q value matrix of N pushing actions at t moment->
Figure FSA0000296762580000033
Ith potential feature map at time t->
Figure FSA0000296762580000034
Q value of action "push" action at pixel coordinates (x, y) is +.>
Figure FSA0000296762580000035
If the target object is not seriously blocked, simultaneously, N potential feature images are sequentially input into a pushing network +.>
Figure FSA0000296762580000036
Calculating Q value matrix of N grabbing actions>
Figure FSA0000296762580000037
Ith latent feature map
Figure FSA0000296762580000038
The value of the action "grabbing" action at the coordinates (x, y) of the latent feature image element is +.>
Figure FSA0000296762580000039
Step S3-3-2: if the target object is seriously blocked, the 'pushing' Q value matrix under all rotation angles at the moment t in the step S3-3-1 is assembled
Figure FSA00002967625800000310
Find out the maximum Q value at time t +.>
Figure FSA00002967625800000311
Figure FSA00002967625800000312
If the target object is not seriously blocked, collecting the Q value matrix of pushing and grabbing actions under all rotation angles at the moment t in the step S3-3-1>
Figure FSA00002967625800000313
Find out the maximum Q value at time t +.>
Figure FSA00002967625800000314
Figure FSA00002967625800000315
Wherein (1)>
Figure FSA00002967625800000316
Step S3-3-3: at maximum Q value Q max The corresponding action is the optimal action A of the mechanical arm at the moment t t ,A=<m,d,θ>Wherein m represents the action type of the mechanical arm, m epsilon { p, g }, p represents the "pushing" action, g represents the "grabbing" action, d= (x, y, z) represents the three-dimensional coordinates of the action point of the mechanical arm, and θ represents the rotation angle of the tail end of the mechanical arm around the z axis; if it is
Figure FSA00002967625800000317
Equal to->
Figure FSA00002967625800000318
The mechanical arm executes a pushing action, and the three-dimensional coordinate d of the action point is (x * ,y * ,z * ),/>
Figure FSA00002967625800000319
Figure FSA00002967625800000320
z * Is a pixel (x) * ,y * ) The depth value of the position, the rotation angle theta of the tail end of the mechanical arm around the z axis is +.>
Figure FSA00002967625800000321
Figure FSA00002967625800000322
On the contrary, if->
Figure FSA00002967625800000323
Equal to->
Figure FSA00002967625800000324
The mechanical arm performs a "grabbing" action; the three-dimensional coordinates d of the action point are (x * ,y * ,z * ),
Figure FSA00002967625800000325
z * Is a pixel (x) * ,y * ) The depth value of the position, the rotation angle theta of the tail end of the mechanical arm around the z axis is +.>
Figure FSA00002967625800000326
6. A method for capturing a target object by a manipulator in a dense scene according to claim 3, wherein the step S3-5 is a piecewise reward function R t Calculating a prize value, comprising the steps of:
step S3-5-1: masking image I of target object by interpolation m Amplified to one time of the original I m ' image moment finding I m And I m Centroid coordinates of' target object region pixels, will I m ' target object region pixel mapping to I m In the target object area, the barycenter coordinates of the target object area and the barycenter coordinates are ensured to be coincident, and a target rewarding area R is generated area
Step S3-5-2: calculating R area Area ratio of internal non-target object
Figure FSA0000296762580000041
R no_object Awarding the region R for the target area The size of the area in which no object is present, R target_area Awarding the region R for the target area The size of the area of the inner target object;
step S3-5-3: presetting rewards, when the action corresponding to the maximum Q value is pushing, and the two-dimensional coordinates of the action points are located in the target rewards area R area In, if the area ratio P of the non-target object after pushing occupy Less than before pushing, meaning that the "pushing" action at time t effectively matches the target object withOther objects being separated, giving prize value R t =0.5; conversely, if the area ratio P of the non-target object after pushing occupy Before pushing, means that the pushing action at the moment t cannot effectively separate the target object from other objects, and the bonus value R is given t =0.25;
Step S3-5-4: similarly, when the action corresponding to the maximum Q value is "grabbing", and the two-dimensional coordinates of the action point are located in the target bonus area R area If the mechanical arm successfully grabs the target object, a reward value R is given t =1, otherwise, the mechanical arm does not successfully grasp the target object, and gives a reward value R t =0.5;
Step S3-5-5: other cases are considered to be push failure and grip failure, and the prize value is set to 0.
7. The method for capturing a target object by a mechanical arm in a dense scene according to claim 1, wherein the dense stacking complex scene in step S4 specifically includes the following steps:
step S4-1: the method comprises the steps of performing hand-eye calibration by using a calibration plate and a depth camera, obtaining internal and external parameters of the camera, and obtaining a transformation matrix M between three-dimensional space coordinates (x, y, z) corresponding to two-dimensional pixel coordinates (x, y);
step S4-2: the trained pushing and grabbing collaborative strategy network outputs the two-dimensional pixel coordinate (x) with the maximum Q value * ,y * ) Combining the transformation matrix M to obtain an optimal action A t I.e. the type of motion of the robotic arm, the three-dimensional coordinates (x * ,y * ,z * ) And the rotation angle θ of the arm tip around the z-axis;
step S4-3: the end effector of the mechanical arm is controlled to rotate to the angle theta of the characteristic diagram with the maximum Q value, and the IK solver in the mechanical arm is used for solving the motion to the three-dimensional coordinate (x * ,y * ,z * ) The rotation of each degree of freedom is required to drive each joint track of the mechanical arm to move to execute the optimal action A t
Step S4-4: repeating the steps S4-1 to S4-3 for a plurality of times to realize grabbing the target object in the dense scene.
CN202310301614.2A 2023-03-24 2023-03-24 Method for grabbing target object by mechanical arm in dense scene Pending CN116330283A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310301614.2A CN116330283A (en) 2023-03-24 2023-03-24 Method for grabbing target object by mechanical arm in dense scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310301614.2A CN116330283A (en) 2023-03-24 2023-03-24 Method for grabbing target object by mechanical arm in dense scene

Publications (1)

Publication Number Publication Date
CN116330283A true CN116330283A (en) 2023-06-27

Family

ID=86883502

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310301614.2A Pending CN116330283A (en) 2023-03-24 2023-03-24 Method for grabbing target object by mechanical arm in dense scene

Country Status (1)

Country Link
CN (1) CN116330283A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116922403A (en) * 2023-09-19 2023-10-24 上海摩马智能科技有限公司 Visual feedback intelligent track implementation method based on simulation

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116922403A (en) * 2023-09-19 2023-10-24 上海摩马智能科技有限公司 Visual feedback intelligent track implementation method based on simulation

Similar Documents

Publication Publication Date Title
CN111906784B (en) Pharyngeal swab double-arm sampling robot based on machine vision guidance and sampling method
CN107263449B (en) Robot remote teaching system based on virtual reality
CN110400345B (en) Deep reinforcement learning-based radioactive waste push-grab cooperative sorting method
Schenck et al. Learning robotic manipulation of granular media
CN111203878B (en) Robot sequence task learning method based on visual simulation
Cao et al. Suctionnet-1billion: A large-scale benchmark for suction grasping
JP6671694B1 (en) Machine learning device, machine learning system, data processing system, and machine learning method
CN108052004A (en) Industrial machinery arm autocontrol method based on depth enhancing study
Pashevich et al. Learning to augment synthetic images for sim2real policy transfer
CN112605983B (en) Mechanical arm pushing and grabbing system suitable for intensive environment
CN108284436B (en) Remote mechanical double-arm system with simulation learning mechanism and method
CN106737673A (en) A kind of method of the control of mechanical arm end to end based on deep learning
JP2013193202A (en) Method and system for training robot using human assisted task demonstration
CN108196453A (en) A kind of manipulator motion planning Swarm Intelligent Computation method
CN116330283A (en) Method for grabbing target object by mechanical arm in dense scene
CN112347900B (en) Monocular vision underwater target automatic grabbing method based on distance estimation
CN114851201B (en) Mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction
WO2020180697A1 (en) Robotic manipulation using domain-invariant 3d representations predicted from 2.5d vision data
CN111152227A (en) Mechanical arm control method based on guided DQN control
Liu et al. Sim-and-real reinforcement learning for manipulation: A consensus-based approach
CN116852353A (en) Method for capturing multi-target object by dense scene mechanical arm based on deep reinforcement learning
Cho et al. Development of VR visualization system including deep learning architecture for improving teleoperability
Qi et al. Reinforcement learning control for robot arm grasping based on improved DDPG
Fornas et al. Fitting primitive shapes in point clouds: a practical approach to improve autonomous underwater grasp specification of unknown objects
Wang et al. Self-supervised learning for joint pushing and grasping policies in highly cluttered environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination