CN116330283A - Method for grabbing target object by mechanical arm in dense scene - Google Patents
Method for grabbing target object by mechanical arm in dense scene Download PDFInfo
- Publication number
- CN116330283A CN116330283A CN202310301614.2A CN202310301614A CN116330283A CN 116330283 A CN116330283 A CN 116330283A CN 202310301614 A CN202310301614 A CN 202310301614A CN 116330283 A CN116330283 A CN 116330283A
- Authority
- CN
- China
- Prior art keywords
- target object
- action
- value
- network
- mechanical arm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 62
- 230000009471 action Effects 0.000 claims abstract description 130
- 238000004088 simulation Methods 0.000 claims abstract description 22
- 230000002787 reinforcement Effects 0.000 claims abstract description 9
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 5
- 230000006870 function Effects 0.000 claims description 30
- 239000011159 matrix material Substances 0.000 claims description 30
- 238000012549 training Methods 0.000 claims description 22
- 238000010586 diagram Methods 0.000 claims description 20
- 230000000875 corresponding effect Effects 0.000 claims description 19
- 230000008569 process Effects 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 238000004422 calculation algorithm Methods 0.000 claims description 4
- 230000003993 interaction Effects 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 239000012636 effector Substances 0.000 claims description 3
- 230000005484 gravity Effects 0.000 claims description 3
- 230000000873 masking effect Effects 0.000 claims description 3
- 230000010355 oscillation Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002195 synergetic effect Effects 0.000 description 1
- 239000002023 wood Substances 0.000 description 1
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1602—Programme controls characterised by the control system, structure, architecture
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1628—Programme controls characterised by the control loop
- B25J9/163—Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/02—Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]
Landscapes
- Engineering & Computer Science (AREA)
- Robotics (AREA)
- Mechanical Engineering (AREA)
- Automation & Control Theory (AREA)
- Processing Or Creating Images (AREA)
Abstract
The invention discloses a method for grabbing a target object by a mechanical arm in a dense scene, which comprises the following steps: setting up a scene similar to a real environment through V-rep dynamic simulation software; constructing a pushing and grabbing collaborative strategy network of an encoder-decoder structure in a simulation environment, wherein the whole structure consists of a dense convolutional neural network DenseNet121 and two parallel full convolutional neural networks; building a real densely stacked complex scene, transplanting a trained pushing and grabbing cooperative network of a simulation end to a physical platform, and performing action decision grabbing target objects by a mechanical arm in a real environment. According to the method, a deep reinforcement learning mode is adopted to continuously try actions in a working environment to obtain a reward value to train, push and grasp a cooperative strategy network, the strategy network autonomously decides an optimal action according to the shielding state of a target object, an effective action sequence is formed to complete a target grasping task, and the grasping success rate of the target object in a dense scene is improved.
Description
Technical Field
The invention relates to the technical field of application of robot technology, in particular to a method for grabbing a target object by a mechanical arm in a dense scene.
Background
With the industrialization of artificial intelligence, the role of the mechanical arm is more important, and the mechanical arm is an essential component for completing complex tasks. By its very nature, the robotic arm provides a core function "grip". The traditional teaching method is simple and easy to operate, but the mechanical arm is limited to act in a highly repeated environment and does not have self-adaptive grabbing capability. The method for detecting the deep learning target is used for detecting the position of the target object in the two-dimensional space, then a corresponding point-based method, a template-based method and a voting-based method are adopted for predicting the actual three-dimensional space pose of the target object, and then the mechanical arm is used for grabbing the object at any position, but the methods need to know the object model in advance, and the object grabbing can only be carried out under the structured simple environment. The deep reinforcement learning combines the perception capability of the deep learning and the decision capability of the reinforcement learning to generate a decision network suitable for the grabbing task of the target object, and the action approaching to the optimal decision is obtained directly according to the input image calculation, so that the grabbing success rate of the object in different scenes is improved to a certain extent. However, in an environment where objects are densely stacked, the target object is easily blocked by other objects, and the mechanical arm still cannot find a proper grabbing pose, so that final grabbing fails. The pushing and grabbing system based on deep reinforcement learning separates objects in a dense scene through pushing actions, then grabs the separated objects, and solves the object grabbing problem in the dense scene through a synergistic effect between pushing and grabbing. However, such methods grasp arbitrary objects in dense scenes one by one, instead of designated target objects, and the reward function fails to effectively evaluate the effectiveness of the pushing and grasping actions, resulting in sometimes "pushing" actions and "grasping" actions being not very reasonable.
Therefore, how to design a reasonable pushing and grabbing collaborative strategy network, so that the mechanical arm can successfully grab the appointed target object in a dense scene is a considerable problem.
Disclosure of Invention
The invention provides a method for grabbing a target object by a mechanical arm in a dense scene, which constructs a pushing and grabbing cooperative strategy network for the target object based on deep reinforcement learning, designs a pushing action and grabbing action rewarding function of the target object, judges whether the pushing action effectively separates the target object from other objects, and if the grabbing action successfully grabs the target object, continuously tries to acquire a rewarding value in a working environment based on a Q learning algorithm to train the pushing and grabbing cooperative strategy network. The method reasonably evaluates the effectiveness of pushing and grabbing actions, and completes accurate grabbing of the target object in a dense scene through cooperation of the pushing action and the grabbing action.
The invention provides a method for grabbing a target object by a mechanical arm in a dense scene, which comprises the following main steps:
step S1: building an object dense complex environment for the mechanical arm to work in a V-rep simulation environment;
step S2: constructing a pushing and grabbing collaborative strategy network based on deep reinforcement learning from end to end in a simulation environment, and adopting an encoder-decoder structure;
step S3: in the simulation environment constructed in the step S1, continuously trying actions in the working environment to obtain a reward value based on a Q learning algorithm to train a pushing and grabbing collaborative strategy network, and shielding the grabbing network if the target object is seriously shieldedTraining only push network +.>If the target object is not seriously shielded, the cooperative strategy network is pushed and grabbed by parallel training;
step S4: building a real densely stacked complex scene, transplanting a trained pushing and grabbing cooperative network of a simulation end to a physical platform, and performing action decision grabbing target objects by a mechanical arm in a real environment.
2. Optionally, the encoder-decoder structure of step S2 specifically includes the following steps:
step S2-1: constructing an encoder by adopting a dense convolutional neural network DenseNet121, and extracting a potential feature map for calculating the action of the mechanical arm;
step S2-2: adopting two parallel full convolution neural networks to construct decoder, respectively push networkAnd grab network->
3. Optionally, the training pushing and grabbing collaborative policy network in step S3 specifically includes the following steps:
step S3-1: capturing RGB image I within a workspace with a depth camera c And depth image I d I in RGB image d Dividing the target to be grabbed to form a target object mask image I m Processing three kinds of image information, and fusing to generate a characteristic diagram H cdm ;
Step S3-2: generating a characteristic diagram H from the step S3-1 cdm By rotating clockwise in turnN rotating feature images are obtained, and the N feature images are sequentially transmitted to the encoder constructed in the step S2-1 to extract N potential feature images at t time
Step S3-3: if the target object is seriously blocked, shielding the grabbing networkTraining only push network +.>N potential feature maps extracted in step S3-2 +.>Sequentially input to push network->Outputting a push network Q value matrix, and selecting an action corresponding to the maximum Q value as an optimal decision action A t The method comprises the steps of carrying out a first treatment on the surface of the If the purpose isIf the target object is not seriously shielded, pushing and grabbing a collaborative strategy network through parallel training, and enabling N potential feature graphs extracted in the step S3-2 to be +.>Sequentially input to push network->And grab network->Outputting a Q value matrix of the pushing network and a Q value matrix of the grabbing network, and selecting an action corresponding to the maximum Q value as an optimal decision action A t ;
Step S3-4: the mechanical arm executes the optimal action A t The working scene changes, and the steps S3-1 and S3-2 are re-executed to acquire N potential feature maps at the time t+1
Step S3-5: according to the state S after the action t+1 Using a piecewise reward function R t Action A of the computing mechanical arm t After the reward value, evaluate action A t Affirmative or negative to cause completion of the final task;
step S3-6: storing interaction data quadruples (S t ,A t ,S t+1 ,R t ) To an experience pool;
step S3-7: calculating an objective function according to the prize value calculated in step S3-5Wherein gamma is E [0,1 ]]Is a discount factor;
step S3-8: network training with minimized Huber loss function:
in the middle of,For the predicted value of the network output at time t, +.>The true value at the time t is traced back according to the Q value at the time t+1, and the difference between the true value and the true value serves as a loss function;
step S3-9: and iteratively updating the Huber loss function to push and grasp the collaborative strategy network, stopping training until the success rate curve reaches a stable oscillation state, and storing the latest parameters of the push and grasp collaborative strategy network.
4. Optionally, generating the feature map H as described in step S3-1 cdm The method specifically comprises the following steps:
step S3-1-1: capturing a predefined working space by arranging a camera with a fixed point position outside the mechanical arm, and acquiring RGB color images I in the space c And Depth image I d ;
Step S3-1-2: application of pre-trained segmentation network PoseCNN to RGB color image I c Dividing the target to be grabbed, setting the pixel of the target object to be 1, setting the rest pixels to be 0, and generating a binarization mask image I of the target object m ;
Step S3-1-3: generating a binarized mask image I of the target object according to step S3-1-2 m Judging the shielding state of the target object, counting the number of pixels in the target object area, and if the number of pixels is lower than a threshold value c, indicating that the target object is seriously shielded currently; in contrast, the target object is not occluded;
step S3-1-4: knowing the external parameters of the camera, RGB color image I c Depth image I d And mask image I m Orthographic projection conversion in gravity direction into color height diagram H from top to bottom view angle c Depth-height map H d And target object mask height map H m ;
Step S3-1-5: will color height map H c Depth-height map H d And target object mask height map H m The height maps are respectively convolved to extract feature maps with the same resolution, and the feature maps H are spliced from top to bottom according to the channel layers cdm 。
5. Optionally, selecting the optimal decision action A as described in step S3-3 t The method specifically comprises the following steps:
step S3-3-1: n potential feature maps extracted in the step S3-2Sequentially input to a push networkCalculating Q value matrix of N pushing actions at t moment->Ith potential feature map at time t->Q value of action "push" action at pixel coordinates (x, y) is +.>If the target object is not seriously blocked, simultaneously, N potential feature images are sequentially input into a pushing network +.>Calculating Q value matrix of N grabbing actions>Ith latent feature map->The value of the action "grabbing" action at the coordinates (x, y) of the latent feature image element is +.>
Step S3-3-2: if the object is a target objectIs seriously blocked, and the 'pushing' Q value matrix under all rotation angles at the moment t in the step S3-3-1 is assembledFind out the maximum Q value at time t +.> i=1, 2..n; if the target object is not seriously blocked, collecting the Q value matrix of pushing and grabbing actions under all rotation angles at the moment t in the step S3-3-1>Find out the maximum Q value at time t +.> Wherein (1)>
Step S3-3-3: at maximum Q value Q max The corresponding action is the optimal action A of the mechanical arm at the moment t t ,A=<m,d,θ>Wherein m represents the action type of the mechanical arm, m epsilon { p, g }, p represents the "pushing" action, g represents the "grabbing" action, d= (x, y, z) represents the three-dimensional coordinates of the action point of the mechanical arm, and θ represents the rotation angle of the tail end of the mechanical arm around the z axis; if it isEqual to->Then the machineThe arm performs a "push" action, the three-dimensional coordinates d of the action point being (x * ,y * ,z * ),/> z * Is a pixel (x) * ,y * ) The depth value of the position is that the rotation angle theta of the tail end of the mechanical arm around the z axis isOn the contrary, if->Equal to->The mechanical arm performs a "grabbing" action; the three-dimensional coordinates d of the action point are (x * ,y * ,z * ),/>z * Is a pixel (x) * ,y * ) The depth value of the position, the rotation angle theta of the tail end of the mechanical arm around the z axis is +.>
6. Optionally, the segment prize function R described in step S3-5 t Calculating a prize value, comprising the steps of:
step S3-5-1: masking image I of target object by interpolation m Amplified to one time of the original I m ' image moment finding I m And I m Centroid coordinates of' target object region pixels, will I m ' target object region pixel mapping to I m In the target object area, the barycenter coordinates of the target object area and the barycenter coordinates are ensured to be coincident, and a target rewarding area R is generated area ;
Step S3-5-2: calculating R area Internal non-orderArea ratio of target objectR no_object Awarding the region R for the target area The size of the area in which no object is present, R target_area Awarding the region R for the target area The size of the area of the inner target object;
step S3-5-3: presetting rewards, when the action corresponding to the maximum Q value is pushing, and the two-dimensional coordinates of the action points are located in the target rewards area R area In, if the area ratio P of the non-target object after pushing occupy Less than before pushing, meaning that the "pushing" action at time t effectively separates the target object from other objects, giving the prize value R t =0.5; conversely, if the area ratio P of the non-target object after pushing occupy Before pushing, means that the pushing action at the moment t cannot effectively separate the target object from other objects, and the bonus value R is given t =0.25;
Step S3-5-4: similarly, when the action corresponding to the maximum Q value is "grabbing", and the two-dimensional coordinates of the action point are located in the target bonus area R area If the mechanical arm successfully grabs the target object, a reward value R is given t =1, otherwise, the mechanical arm does not successfully grasp the target object, and gives a reward value R t =0.5;
Step S3-5-5: other cases are considered to be push failure and grip failure, and the prize value is set to 0.
7. Optionally, the densely stacking complex scene described in step S4 specifically includes the following steps:
step S4-1: the method comprises the steps of performing hand-eye calibration by using a calibration plate and a depth camera, obtaining internal and external parameters of the camera, and obtaining a transformation matrix M between three-dimensional space coordinates (x, y, z) corresponding to two-dimensional pixel coordinates (x, y);
step S4-2: the trained pushing and grabbing collaborative strategy network outputs the two-dimensional pixel coordinate (x) with the maximum Q value * ,y * ) Combining the transformation matrix M to obtain an optimal action A t Namely the action type and the action point of the mechanical arm are in the actual empty stateThree-dimensional coordinates (x * ,y * ,z * ) And the rotation angle θ of the arm tip around the z-axis;
step S4-3: the end effector of the mechanical arm is controlled to rotate to the angle theta of the characteristic diagram with the maximum Q value, and the IK solver in the mechanical arm is used for solving the motion to the three-dimensional coordinate (x * ,y * ,z * ) The rotation of each degree of freedom is required to drive each joint track of the mechanical arm to move to execute the optimal action A t ;
Step S4-4: repeating the steps S4-1 to S4-3 for a plurality of times to realize grabbing the target object in the dense scene.
Drawings
FIG. 1 is a process flow diagram of the process of the present invention;
FIG. 2 is a flow chart of an actual grabbing of the method of the present invention;
FIG. 3 is a view of a scene of a captured object in a dense scene of the method of the present invention;
FIG. 4 is a diagram of a collaborative pushing and grabbing network architecture for grabbing a target object according to the method of the present invention;
FIG. 5 is a color map, depth map, target mask map under a simulation environment of the method of the present invention;
FIG. 6 is a color height map, depth height map, target mask height map under a simulation environment of the method of the present invention;
FIG. 7 is a thermal elevation view of a pushing action of the method of the present invention;
FIG. 8 is a thermodynamic height diagram of a gripping action of the method of the present invention;
fig. 9 is a diagram of the optimal actions in a dense scenario of the method of the present invention:
FIG. 10 is a diagram of an example of rewards of the method of the invention;
fig. 11 is a graph of the success rate of capturing a target object in a dense scene of the method of the present invention.
Detailed Description
The invention provides a method for grabbing a target object by a mechanical arm in a dense scene, which is described in detail and completely with reference to the drawings and the technical method in the embodiment of the invention. The concrete explanation is as follows:
fig. 1 is a flow chart of the method of the present invention, fig. 2 is a flow chart of the actual grabbing of the method of the present invention, and fig. 1 and 2 contain all the steps of the method, specifically the steps are as follows:
step S1: and building an object intensive and complex environment for the mechanical arm to work in the V-rep simulation environment.
According to the principle of deep reinforcement learning trial and error, a great deal of experience data needs to be collected through interaction with the environment to give policy network learning, and the data collection mode in a real environment is extremely expensive. In order to solve the problem, a scene which is highly similar to reality is built by using simulation software V-rep to carry out a simulation experiment, data are collected efficiently, and noise interference caused by simulation to reality scene migration on neural network output is reduced as much as possible.
Mechanical arm, fidf joint configuration file, simulation object, obj instance file, D435i camera, ground, table and other obj instance files are imported into the V-rep simulation software. A space for limiting the position of the three-dimensional matrix of 0.448 x 0.4 is defined in the predefined working space of the mechanical arm; small wood blocks randomly drop in the space to simulate a scene with dense objects, so that the mechanical arm can collect data and train a strategy network. The experimental platform environment is shown in fig. 3.
Step S2: the pushing and grabbing collaborative strategy network based on the deep reinforcement learning from end to end is constructed in a simulation environment, and an encoder-decoder structure is adopted, wherein the pushing and grabbing collaborative strategy network framework is shown in fig. 4, and specifically comprises the following steps:
step S2-1: and constructing an encoder by adopting a dense convolutional neural network DenseNet121, and extracting a potential characteristic diagram for calculating the action of the mechanical arm.
Step S2-2: adopting two parallel full convolution neural networks to construct decoder, respectively push networkAnd grab network->
Push networkAnd grab network->And the network structure comprises a Conv2d convolution layer, a Relu activation function layer and a BatchNorm2d normalization layer. The Q value matrix with the size of 1 multiplied by 20 is output, and the double linear interpolation method is adopted to carry out up-sampling and mapping to obtain an action Q value matrix diagram with the size of 1 multiplied by 224.
Step S3: in the simulation environment constructed in the step S1, continuously trying actions in the working environment to obtain a reward value based on a Q learning algorithm to train a pushing and grabbing collaborative strategy network, and shielding the grabbing network if the target object is seriously shieldedTraining only push network +.>If the target object is not seriously shielded, the cooperative strategy network is pushed and grabbed by parallel training; the method specifically comprises the following steps:
step S3-1: capturing RGB image I within a workspace with a depth camera c And depth image I d I in RGB image d Dividing the target to be grabbed to form a target object mask image I m Processing three kinds of image information, and fusing to generate a characteristic diagram H cdm The method specifically comprises the following steps:
step S3-1-1: capturing a predefined working space by arranging a camera with a fixed point position outside the mechanical arm, and acquiring RGB color images I in the space c And Depth image I d 。
Step S3-1-2: application of pre-trained segmentation network PoseCNN to RGB color image I c Dividing the target to be grabbed, setting the pixel of the target object to be 1, setting the rest pixels to be 0, and generating a binarization mask image I of the target object m The visual information is shown in fig. 5.
Step S3-1-3: generating a binarized mask image I of the target object according to step S3-1-2 m Judging the shielding state of the target object, counting the number of pixels in the target object area, and if the number of pixels is lower than a threshold value c, indicating that the target object is seriously shielded currently; in contrast, the target object is not severely occluded.
Step S3-1-4: knowing the external parameters of the camera, RGB color image I c Depth image I d And mask image I m Orthographic projection conversion in gravity direction into color height diagram H from top to bottom view angle c Depth-height map H d And target object mask height map H m The visual information is shown in fig. 6.
Step S3-1-5: will color height map H c Depth-height map H d And target object mask height map H m The height maps are respectively convolved to extract feature maps with the same resolution, and the feature maps H are spliced from top to bottom according to the channel layers cdm 。
Step S3-2: generating a characteristic diagram H by the step S3-1-5 cdm By rotating clockwise in turnN rotating feature images are obtained, and the N feature images are sequentially transmitted to the encoder constructed in the step S2-1 to extract N potential feature images at t time
Step S3-3: if the target object is seriously blocked, shielding the grabbing networkTraining only push network +.>N potential feature maps extracted in step S3-2 +.>Sequentially input to push network->Outputting a push network Q value matrix, and selecting an action corresponding to the maximum Q value as an optimal decision action A t The method comprises the steps of carrying out a first treatment on the surface of the If the target object is not seriously blocked, the cooperative strategy network is pushed and grabbed by parallel training, and N potential feature maps extracted in the step S3-2 are subjected to +.>Sequentially input to push network->And grab network->Outputting a Q value matrix of the pushing network and a Q value matrix of the grabbing network, and selecting an action corresponding to the maximum Q value as an optimal decision action A t The visual information is shown in fig. 7 and 8, and specifically comprises the following steps:
step S3-3-1: n potential feature maps extracted in the step S3-2Sequentially input to a push networkCalculating Q value matrix of N pushing actions at t moment->Ith potential feature map at time t->Q value of action "push" action at pixel coordinates (x, y) is +.>If the target object is not seriously coveredThe N potential feature images are sequentially input into a pushing network>Calculating Q value matrix of N grabbing actions>Ith latent feature map->The value of the action "grabbing" action at the coordinates (x, y) of the latent feature image element is +.>
Step S3-3-2: if the target object is seriously blocked, the 'pushing' Q value matrix under all rotation angles at the moment t in the step S3-3-1 is assembledFind out the maximum Q value at time t +.> i=1, 2..n; if the target object is not seriously blocked, collecting the Q value matrix of pushing and grabbing actions under all rotation angles at the moment t in the step S3-3-1>Find out the maximum Q value at time t +.> Wherein (1)>
Step S3-3-3: at maximum Q value Q max The corresponding action is the optimal action A of the mechanical arm at the moment t t ,A=<m,d,θ>Where m represents the motion type of the mechanical arm, m e { p, g }, p represents the "pushing" motion, g represents the "grabbing" motion, d= (x, y, z) represents the three-dimensional coordinates of the motion point of the mechanical arm, and θ represents the angle of rotation of the end of the mechanical arm around the z axis. If it isEqual to->The mechanical arm executes a pushing action, and the three-dimensional coordinate d of the action point is (x * ,y * ,z * ),/> z * Is a pixel (x) * ,y * ) The depth value of the position is that the rotation angle theta of the tail end of the mechanical arm around the z axis isOn the contrary, if->Equal to->The mechanical arm performs a "grabbing" action; the three-dimensional coordinates d of the action point are (x * ,y * ,z * ),/>z * Is a pixel (x) * ,y * ) The position of the partIs about the z-axis by an angle θ>
Optimal action A t The definition is as follows: pushing: the parallel clamping jaws at the tail end of the mechanical arm are combined, and are pushed from right to left for 5 cm along a straight line towards the rotation direction of the height diagram. Grabbing: and opening the parallel clamping jaw at the tail end of the mechanical arm, and clamping the mechanical arm by 3 centimeters downwards towards the depth of the optimal action point. The visualized information is shown in fig. 9, the circles in the left graph are pushing optimal action points, and the circles in the right graph are grabbing optimal action points.
Step S3-4: the mechanical arm executes the optimal action A t The working scene changes, and the steps S3-1 and S3-2 are re-executed to acquire N potential feature maps at the time t+ 1
Step S3-5: according to the state S after the action t+1 Using a piecewise reward function R c Action A of the computing mechanical arm c After the reward value, evaluate action A t The visual information of affirmative or negative prompt to complete the final task is shown in fig. 10, and specifically includes the following steps:
step S3-5-1: masking image I of target object by interpolation m Amplified to one time of the original I m ' image moment finding I m And I m Centroid coordinates of' target object region pixels, will I m ' target object region pixel mapping to I m In the target object area, the barycenter coordinates of the target object area and the barycenter coordinates are ensured to be coincident, and a target rewarding area R is generated area 。
Step S3-5-2: calculating R area Area ratio of internal non-target objectR no_object Awarding the region R for the target area The size of the area in which no object is present, R targec_area Awarding the region R for the target area The size of the area of the inner target object.
Step S3-5-3: a prize is preset. When the action corresponding to the maximum Q value is "push", and the two-dimensional coordinates of the action point are located in the target rewarding area R area In, if the area ratio P of the non-target object after pushing occupy Less than before pushing, meaning that the "pushing" action at time t effectively separates the target object from other objects, giving the prize value R t =0.5, otherwise, if the area ratio P of the non-target object after pushing occupy Before pushing, means that the pushing action at the moment t cannot effectively separate the target object from other objects, and the bonus value R is given t =0.25。
Step S3-5-4: similarly, when the action corresponding to the maximum Q value is "grabbing", and the two-dimensional coordinates of the action point are located in the target bonus area R area If the mechanical arm successfully grabs the target object, a reward value R is given t =1, otherwise, the mechanical arm does not successfully grasp the target object, and gives a reward value R t =0.5;
Step S3-5-5: other cases are considered to be push failure and grip failure, and the prize value is set to 0.
Step S3-6: storing interaction data quadruples (S t ,A t ,S t+1 ,R t ) To an experience pool.
Step S3-7: calculating an objective function according to the prize value calculated in step S3-5Wherein gamma is E [0,1 ]]Is a discount factor.
Step S3-8: network training with minimized Huber loss function:
in the method, in the process of the invention,the predicted value of the network output at the time t,/>in order to trace back the true value at the time t according to the Q value at the time t+ 1, the difference between the two functions is used as a loss function. In the network gradient back propagation process, the loss function value of other pixels is 0, and only the loss function value of the optimal action pixel is calculated and selected. When the absolute value of the loss function is smaller than 1, the gradient gradually decreases along with the approach of the loss value to the minimum value of the loss function by adopting the loss function of MSE, so that the gradient approaches the minimum value of the loss function more accurately; under other conditions, the loss function of MAE is adopted, the gradient is always large, and the network training speed of the model is conveniently accelerated; combining MSE with MAE forms a Huber loss function. The gradient updating mode adopts an optimization method of random optimization of self-adaptive momentum and weight attenuation, and the historical acceleration convergence of the past gradient is summarized by the following specific formula:
in the above, g t For network parameter gradient, m t+1 For 1 st order moment estimation of gradient, beta 1 Is a momentum coefficient of order 1; v t+1 For 2-order moment estimation of gradient, beta 2 Is the momentum coefficient of the order 2,unbiased estimation for moment 1 estimation, +.>An unbiased estimate of the moment estimate of order 2; a is learning rate, the magnitude and speed of the convergence of the decision network are controlled, and a constant of 0.0001 is selected as a default. Lambda is a weight attenuation coefficient, prevents the network from being over-fitted, and defaults to a constant of 0.00002.
Step S3-9: and (3) iteratively updating the Huber loss function, pushing and grabbing the collaborative strategy network until convergence, observing the steady oscillation state of the success rate curve, and stopping training after a period of time. And storing the latest pushing and grabbing collaborative strategy network parameters, and transplanting the latest pushing and grabbing collaborative strategy network parameters into a real environment.
Through multiple tests, the trained pushing and grabbing collaborative strategy network grabs the target object in a dense scene with a success rate of approximately 85%, and the grabbing success rate is shown in fig. 11.
Step S4: building a real densely stacked complex scene, transplanting a trained pushing and grabbing cooperative network of a simulation end to a physical platform, and enabling a mechanical arm to make action decisions to grab a target object in a real environment, wherein the method specifically comprises the following steps of:
step S4-1: and (3) performing hand-eye calibration by using the calibration plate and the depth camera, obtaining internal and external parameters of the camera, and obtaining a transformation matrix M between the two-dimensional pixel coordinates (x, y) and the corresponding three-dimensional space coordinates (x, y, z).
Step S4-2: the trained pushing and grabbing collaborative strategy network outputs the two-dimensional pixel coordinate (x) with the maximum Q value * ,y * ) Combining the transformation matrix M to obtain an optimal action A t I.e. the type of motion of the robotic arm, the three-dimensional coordinates (x * ,y * ,z * ) And the arm tip rotates by an angle θ about the z-axis.
Step S4-3: the end effector of the mechanical arm is controlled to rotate to the angle theta of the characteristic diagram with the maximum Q value, and the IK solver in the mechanical arm is used for solving the motion to the three-dimensional coordinate (x * ,y * ,z * ) The rotation of each degree of freedom is required to drive each joint track of the mechanical arm to move to execute the optimal action A t 。
Step S4-4: repeating the steps S4-1 to S4-3 for a plurality of times to realize grabbing the target object in the dense scene.
All the steps and experimental results are operated at the simulation end, and a large gap exists between the simulation end and the actual reality environment. Therefore, in order to ensure the effectiveness of the method, the migration training well pushes and grabs the collaborative strategy network to have practical significance on the real end. And building a real experiment platform, wherein the platform comprises a mechanical arm, a depth camera, a to-be-picked object block and the like. Through multiple tests, the real end and the simulation end have similar success rates of grabbing the target object in a dense scene, and the method is proved to be feasible.
The foregoing is a detailed description of the method of practicing the invention and is not intended to limit the inventive example process to that which is described above. For the technical field of the invention, the related technical personnel can make changes, such as substitution, deletion and the like, within the original scope of the invention, and the changes also belong to the protection scope of the invention.
Claims (7)
1. The method for grabbing the target object by the mechanical arm in the dense scene is characterized by comprising the following steps of:
step S1: building an object dense complex environment for the mechanical arm to work in a V-rep simulation environment;
step S2: constructing a pushing and grabbing collaborative strategy network based on deep reinforcement learning from end to end in a simulation environment, and adopting an encoder-decoder structure;
step S3: in the simulation environment constructed in the step S1, continuously trying actions in the working environment to obtain a reward value based on a Q learning algorithm to train a pushing and grabbing collaborative strategy network, and shielding the grabbing network if the target object is seriously shieldedTraining only push network +.>If the target object is not seriously shielded, the cooperative strategy network is pushed and grabbed by parallel training;
step S4: building a real densely stacked complex scene, transplanting a trained pushing and grabbing cooperative network of a simulation end to a physical platform, and performing action decision grabbing target objects by a mechanical arm in a real environment.
2. The method for capturing a target object by a mechanical arm in a dense scene according to claim 1, wherein the encoder-decoder structure in step S2 specifically comprises the following steps:
step S2-1: constructing an encoder by adopting a dense convolutional neural network DenseNet121, and extracting a potential feature map for calculating the action of the mechanical arm;
3. The method for capturing a target object by a mechanical arm in a dense scene according to claim 1, wherein the training pushing and capturing collaborative strategy network in step S3 specifically comprises the following steps:
step S3-1: capturing RGB image I within a workspace with a depth camera c And depth image I d I in RGB image d Dividing the target to be grabbed to form a target object mask image I m Processing three kinds of image information, and fusing to generate a characteristic diagram H cdm ;
Step S3-2: generating a characteristic diagram H from the step S3-1 cdm By rotating clockwise in turnN rotation feature maps are obtained and sequentially transmitted to the encoder constructed in the step S2-1 to extract N potential feature maps at t time>
Step S3-3: if the target object is seriously blocked, shielding the grabbing networkTraining only push network +.>N potential feature maps extracted in step S3-2 +.>Sequentially input to push network->Outputting a push network Q value matrix, and selecting an action corresponding to the maximum Q value as an optimal decision action A t The method comprises the steps of carrying out a first treatment on the surface of the If the target object is not seriously blocked, the cooperative strategy network is pushed and grabbed by parallel training, and N potential feature maps extracted in the step S3-2 are subjected to +.>Sequentially input to push network->And grab network->Outputting a Q value matrix of the pushing network and a Q value matrix of the grabbing network, and selecting an action corresponding to the maximum Q value as an optimal decision action A t ;
Step S3-4: the mechanical arm executes the optimal action A t The working scene changes, and the steps S3-1 and S3-2 are re-executed to acquire N potential feature maps at the time t+1
Step S3-5: according to the state S after the action t+1 Using a piecewise reward function R t Action A of the computing mechanical arm t After the reward value, evaluate action A t Affirmative or negative to cause completion of the final task;
step S3-6: storing interaction data quadruples (S t ,A t ,S t+1 ,R t ) To an experience pool;
step S3-7: according to the steps ofStep S3-5, calculating a target function according to the reward value calculated in the stepWherein gamma is E [0,1 ]]Is a discount factor;
step S3-8: network training with minimized Huber loss function:
in the method, in the process of the invention,for the predicted value of the network output at time t, +.>The true value at the time t is traced back according to the Q value at the time t+1, and the difference between the true value and the true value serves as a loss function;
step S3-9: and iteratively updating the Huber loss function to push and grasp the collaborative strategy network, stopping training until the success rate curve reaches a stable oscillation state, and storing the latest parameters of the push and grasp collaborative strategy network.
4. The method for capturing a target object by a manipulator in a dense scene as claimed in claim 3, wherein the generating of the feature map H in step S3-1 is characterized by cdm The method specifically comprises the following steps:
step S3-1-1: capturing a predefined working space by arranging a camera with a fixed point position outside the mechanical arm, and acquiring RGB color images I in the space c And Depth image I d ;
Step S3-1-2: application of pre-trained segmentation network PoseCNN to RGB color image I c Dividing the target to be grabbed, setting the pixel of the target object to be 1, setting the rest pixels to be 0, and generating a binarization mask image I of the target object m ;
Step S3-1-3: according toGenerating a binarized mask image I of the target object in step S3-1-2 m Judging the shielding state of the target object, counting the number of pixels in the target object area, and if the number of pixels is lower than a threshold value c, indicating that the target object is seriously shielded currently; in contrast, the target object is not occluded;
step S3-1-4: knowing the external parameters of the camera, RGB color image I c Depth image I d And mask image I m Orthographic projection conversion in gravity direction into color height diagram H from top to bottom view angle c Depth-height map H d And target object mask height map H m ;
Step S3-1-5: will color height map H c Depth-height map H d And target object mask height map H m The height maps are respectively convolved to extract feature maps with the same resolution, and the feature maps H are spliced from top to bottom according to the channel layers cdm 。
5. A method for capturing a target object by a manipulator in a dense scene as claimed in claim 3, wherein the selecting of the optimal decision action a in step S3-3 t The method specifically comprises the following steps:
step S3-3-1: n potential feature maps extracted in the step S3-2Sequentially input to push network->Calculating Q value matrix of N pushing actions at t moment->Ith potential feature map at time t->Q value of action "push" action at pixel coordinates (x, y) is +.>If the target object is not seriously blocked, simultaneously, N potential feature images are sequentially input into a pushing network +.>Calculating Q value matrix of N grabbing actions>Ith latent feature mapThe value of the action "grabbing" action at the coordinates (x, y) of the latent feature image element is +.>
Step S3-3-2: if the target object is seriously blocked, the 'pushing' Q value matrix under all rotation angles at the moment t in the step S3-3-1 is assembledFind out the maximum Q value at time t +.> If the target object is not seriously blocked, collecting the Q value matrix of pushing and grabbing actions under all rotation angles at the moment t in the step S3-3-1>Find out the maximum Q value at time t +.> Wherein (1)>
Step S3-3-3: at maximum Q value Q max The corresponding action is the optimal action A of the mechanical arm at the moment t t ,A=<m,d,θ>Wherein m represents the action type of the mechanical arm, m epsilon { p, g }, p represents the "pushing" action, g represents the "grabbing" action, d= (x, y, z) represents the three-dimensional coordinates of the action point of the mechanical arm, and θ represents the rotation angle of the tail end of the mechanical arm around the z axis; if it isEqual to->The mechanical arm executes a pushing action, and the three-dimensional coordinate d of the action point is (x * ,y * ,z * ),/> z * Is a pixel (x) * ,y * ) The depth value of the position, the rotation angle theta of the tail end of the mechanical arm around the z axis is +.> On the contrary, if->Equal to->The mechanical arm performs a "grabbing" action; the three-dimensional coordinates d of the action point are (x * ,y * ,z * ),z * Is a pixel (x) * ,y * ) The depth value of the position, the rotation angle theta of the tail end of the mechanical arm around the z axis is +.>
6. A method for capturing a target object by a manipulator in a dense scene according to claim 3, wherein the step S3-5 is a piecewise reward function R t Calculating a prize value, comprising the steps of:
step S3-5-1: masking image I of target object by interpolation m Amplified to one time of the original I m ' image moment finding I m And I m Centroid coordinates of' target object region pixels, will I m ' target object region pixel mapping to I m In the target object area, the barycenter coordinates of the target object area and the barycenter coordinates are ensured to be coincident, and a target rewarding area R is generated area ;
Step S3-5-2: calculating R area Area ratio of internal non-target objectR no_object Awarding the region R for the target area The size of the area in which no object is present, R target_area Awarding the region R for the target area The size of the area of the inner target object;
step S3-5-3: presetting rewards, when the action corresponding to the maximum Q value is pushing, and the two-dimensional coordinates of the action points are located in the target rewards area R area In, if the area ratio P of the non-target object after pushing occupy Less than before pushing, meaning that the "pushing" action at time t effectively matches the target object withOther objects being separated, giving prize value R t =0.5; conversely, if the area ratio P of the non-target object after pushing occupy Before pushing, means that the pushing action at the moment t cannot effectively separate the target object from other objects, and the bonus value R is given t =0.25;
Step S3-5-4: similarly, when the action corresponding to the maximum Q value is "grabbing", and the two-dimensional coordinates of the action point are located in the target bonus area R area If the mechanical arm successfully grabs the target object, a reward value R is given t =1, otherwise, the mechanical arm does not successfully grasp the target object, and gives a reward value R t =0.5;
Step S3-5-5: other cases are considered to be push failure and grip failure, and the prize value is set to 0.
7. The method for capturing a target object by a mechanical arm in a dense scene according to claim 1, wherein the dense stacking complex scene in step S4 specifically includes the following steps:
step S4-1: the method comprises the steps of performing hand-eye calibration by using a calibration plate and a depth camera, obtaining internal and external parameters of the camera, and obtaining a transformation matrix M between three-dimensional space coordinates (x, y, z) corresponding to two-dimensional pixel coordinates (x, y);
step S4-2: the trained pushing and grabbing collaborative strategy network outputs the two-dimensional pixel coordinate (x) with the maximum Q value * ,y * ) Combining the transformation matrix M to obtain an optimal action A t I.e. the type of motion of the robotic arm, the three-dimensional coordinates (x * ,y * ,z * ) And the rotation angle θ of the arm tip around the z-axis;
step S4-3: the end effector of the mechanical arm is controlled to rotate to the angle theta of the characteristic diagram with the maximum Q value, and the IK solver in the mechanical arm is used for solving the motion to the three-dimensional coordinate (x * ,y * ,z * ) The rotation of each degree of freedom is required to drive each joint track of the mechanical arm to move to execute the optimal action A t ;
Step S4-4: repeating the steps S4-1 to S4-3 for a plurality of times to realize grabbing the target object in the dense scene.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310301614.2A CN116330283A (en) | 2023-03-24 | 2023-03-24 | Method for grabbing target object by mechanical arm in dense scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310301614.2A CN116330283A (en) | 2023-03-24 | 2023-03-24 | Method for grabbing target object by mechanical arm in dense scene |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116330283A true CN116330283A (en) | 2023-06-27 |
Family
ID=86883502
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310301614.2A Pending CN116330283A (en) | 2023-03-24 | 2023-03-24 | Method for grabbing target object by mechanical arm in dense scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116330283A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116922403A (en) * | 2023-09-19 | 2023-10-24 | 上海摩马智能科技有限公司 | Visual feedback intelligent track implementation method based on simulation |
-
2023
- 2023-03-24 CN CN202310301614.2A patent/CN116330283A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116922403A (en) * | 2023-09-19 | 2023-10-24 | 上海摩马智能科技有限公司 | Visual feedback intelligent track implementation method based on simulation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111906784B (en) | Pharyngeal swab double-arm sampling robot based on machine vision guidance and sampling method | |
CN107263449B (en) | Robot remote teaching system based on virtual reality | |
CN110400345B (en) | Deep reinforcement learning-based radioactive waste push-grab cooperative sorting method | |
Schenck et al. | Learning robotic manipulation of granular media | |
CN111203878B (en) | Robot sequence task learning method based on visual simulation | |
Cao et al. | Suctionnet-1billion: A large-scale benchmark for suction grasping | |
JP6671694B1 (en) | Machine learning device, machine learning system, data processing system, and machine learning method | |
CN108052004A (en) | Industrial machinery arm autocontrol method based on depth enhancing study | |
Pashevich et al. | Learning to augment synthetic images for sim2real policy transfer | |
CN112605983B (en) | Mechanical arm pushing and grabbing system suitable for intensive environment | |
CN108284436B (en) | Remote mechanical double-arm system with simulation learning mechanism and method | |
CN106737673A (en) | A kind of method of the control of mechanical arm end to end based on deep learning | |
JP2013193202A (en) | Method and system for training robot using human assisted task demonstration | |
CN108196453A (en) | A kind of manipulator motion planning Swarm Intelligent Computation method | |
CN116330283A (en) | Method for grabbing target object by mechanical arm in dense scene | |
CN112347900B (en) | Monocular vision underwater target automatic grabbing method based on distance estimation | |
CN114851201B (en) | Mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction | |
WO2020180697A1 (en) | Robotic manipulation using domain-invariant 3d representations predicted from 2.5d vision data | |
CN111152227A (en) | Mechanical arm control method based on guided DQN control | |
Liu et al. | Sim-and-real reinforcement learning for manipulation: A consensus-based approach | |
CN116852353A (en) | Method for capturing multi-target object by dense scene mechanical arm based on deep reinforcement learning | |
Cho et al. | Development of VR visualization system including deep learning architecture for improving teleoperability | |
Qi et al. | Reinforcement learning control for robot arm grasping based on improved DDPG | |
Fornas et al. | Fitting primitive shapes in point clouds: a practical approach to improve autonomous underwater grasp specification of unknown objects | |
Wang et al. | Self-supervised learning for joint pushing and grasping policies in highly cluttered environments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |