CN114782347A - Mechanical arm grabbing parameter estimation method based on attention mechanism generation type network - Google Patents

Mechanical arm grabbing parameter estimation method based on attention mechanism generation type network Download PDF

Info

Publication number
CN114782347A
CN114782347A CN202210387024.1A CN202210387024A CN114782347A CN 114782347 A CN114782347 A CN 114782347A CN 202210387024 A CN202210387024 A CN 202210387024A CN 114782347 A CN114782347 A CN 114782347A
Authority
CN
China
Prior art keywords
grabbing
image
dimensional
mechanical arm
attention mechanism
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210387024.1A
Other languages
Chinese (zh)
Inventor
杨宇翔
邢玉虎
全嘉勉
高明裕
何志伟
董哲康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202210387024.1A priority Critical patent/CN114782347A/en
Publication of CN114782347A publication Critical patent/CN114782347A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1664Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1679Programme controls characterised by the tasks executed
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • G06T2207/10021Stereoscopic video; Stereoscopic image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Mechanical Engineering (AREA)
  • Robotics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a mechanical arm grabbing parameter estimation method based on an attention mechanism generating network. According to the method, the RGB-D camera is used for capturing corresponding operation scene images, the operation scene images are input into a trained attention-based mechanism generation type network, mechanical arm grabbing parameter information such as grabbing quality, grabbing angle, grabbing width and grabbing priority is obtained, and other grabbing parameters are screened through the grabbing priority, so that better mechanical arm grabbing parameters in a complex multi-object environment are obtained. The method not only can well solve the autonomous grabbing capacity of the mechanical arm in the complex stacking environment, but also further deepens the perception capacity of a visual system to effective information in the complex stacking environment through parameter estimation of grabbing priority, enhances the data processing capacity of the whole system to multi-dimensional information, and accordingly improves the grabbing precision of the mechanical arm in the complex stacking environment.

Description

Mechanical arm grabbing parameter estimation method based on attention mechanism generation type network
Technical Field
The invention belongs to the field of mechanical arm grabbing control, and particularly relates to a mechanical arm grabbing parameter estimation method based on an attention mechanism generating network.
Background
Currently, in the research on the autonomous grasping of the mechanical arm, the grasping technology for a single object in a simple scene is mature; in practical situations, however, a plurality of objects often present a disordered stacking manner in a complex environment, which brings a greater challenge to the robotic arm autonomous grasping technology. The invention provides a mechanical arm grabbing parameter estimation method based on an attention mechanism generating network, which effectively estimates mechanical arm grabbing parameters through the attention mechanism generating network, deepens the perception capability of a visual system to effective information in a complex environment, improves the fusion capability of multi-channel information, realizes grabbing tasks of various objects in the complex environment, and successfully solves the problem of automatic grabbing of the mechanical arm in a multi-object stacking environment.
Disclosure of Invention
Aiming at the problem of autonomous grabbing of the mechanical arm in a complex stacking scene, the invention provides a mechanical arm grabbing parameter estimation method based on an attention mechanism generating network, so that the grabbing precision of autonomous grabbing of the mechanical arm in a complex stacking environment is improved.
In order to achieve the purpose, the invention adopts the main technical scheme that:
s1 use RGB-D camera to obtain mechanical armJob scene image in front state, including RGB image IrgbAnd depth image IdepthAnd a job scene reference coordinate system;
s2, inputting each operation scene image I into a trained neural network based on attention mechanism, generating a predicted value of a two-dimensional image group containing a motion instruction vector, wherein the predicted value of the two-dimensional image group at least contains one two-dimensional capture quality image GθA two-dimensional capture angle image AθOne two-dimensional captured width image WθAnd a two-dimensional capture priority image OθThe robot arm comprises a gripping success rate information, a gripping angle information, a clamping jaw opening width information and a gripping sequence information when the robot arm grips an object;
s3 two-dimensional capture quality image G in two-dimensional image group obtained by predictionθThe pixel values are sorted according to the size, n pixel points with the largest pixel values are selected, and the pixel points with the largest pixel values are predicted values with the highest grabbing success rate
Figure BDA0003594100180000021
Corresponding to the two-dimensional angle image A according to the pixel point coordinates of the predicted valueθTwo-dimensional capture width image WθAnd two-dimensional capture priority image OθIn the method, a predicted value of the grabbing angle can be obtained
Figure BDA0003594100180000022
Grabbing width predicted value
Figure BDA0003594100180000023
And fetch sequence predicted values
Figure BDA0003594100180000024
Wherein p isnRepresenting two-dimensional capture quality image GθThe pixel value of (1) is arranged from large to small, and the pixel point coordinate of the nth pixel is arranged;
s4 pair fetch sequence prediction value
Figure BDA0003594100180000025
Sorting, selecting the priority of the grabbing sequence with the highest priorityHigh predicted value
Figure BDA0003594100180000026
Then
Figure BDA0003594100180000027
The grabbing information corresponding to the pixel coordinates is the optimal motion instruction vector, i.e.
Figure BDA0003594100180000028
And S5, analyzing the obtained optimal motion instruction vector to obtain the grabbing coordinate, the grabbing angle and the grabbing width of the target object to be grabbed under the base coordinate system of the mechanical arm, namely the grabbing parameters of the mechanical arm.
Preferably, step S2 includes:
s21 training the generated neural network based on the attention mechanism based on the existing data set;
s22, respectively preprocessing the job scene images to obtain 300 × 300 pixel job scene images;
s23, inputting the image feature vector into a trained attention-based generating neural network;
s24 outputs the two-dimensional image group prediction value including the motion instruction vector, that is, the parameter prediction value of the capturing success probability, to the attention-based generative neural network.
Preferably, step S3 includes:
s31, sorting the pixel values of the two-dimensional grabbing quality images in the predicted two-dimensional image group, namely, the size of each pixel value in the two-dimensional grabbing quality images represents the grabbing success rate of the mechanical arm taking the point as the center;
s32, selecting n coordinates of the maximum grabbing success rate predicted values as grabbing center coordinates;
s33, according to the predicted two-dimensional image group, obtaining the grabbing angle pixel value, grabbing width pixel value and grabbing priority pixel value corresponding to the grabbing center coordinate,
s34 analyzes the information of the grabbing angle, the grabbing width and the grabbing sequence according to the grabbing angle pixel value, the grabbing width pixel value and the grabbing priority pixel value.
Preferably, step S5 includes:
s51, analyzing the obtained optimal motion instruction vector;
s52, carrying out coordinate conversion on the analyzed data through a wrist camera coordinate system;
s53, transforming the coordinates of the converted camera coordinate system through the mechanical arm base coordinate system;
and S54, inputting the coordinate obtained by converting the mechanical arm base coordinate system, the grabbing width and the grabbing angle information into the mechanical arm control system for grabbing.
Preferably, the training method of the attention-based generative neural network comprises the following steps:
s01 creating a data set G for training a network based on the existing data settrain(ii) a The G istrainThe data set comprises a work scene image, effective capture frame information and a segmentation image only containing the topmost object, wherein GtrainThe segmented image only containing the topmost object is a two-dimensional capture priority image;
s02, mapping the effective grabbing frame information to a 300X 300 two-dimensional image to obtain a two-dimensional grabbing quality image, a two-dimensional grabbing angle image and a two-dimensional grabbing width image, and combining GtrainConstructing a two-dimensional image group by only containing the segmented image of the topmost object;
s03, constructing a generating neural network based on the attention mechanism by constructing an attention mechanism module;
s04 Using data set GtrainAnd the two-dimensional image group trains the generated neural network based on the attention mechanism, RGB-D images without capture information are input, a two-dimensional image group containing capture information is output, and the trained generated neural network based on the attention mechanism is obtained.
Preferably, the attention-based generative neural network comprises:
a feature extraction part, an attention mechanism part and a network generation part;
a feature extraction section:
the feature extraction network consists of a convolution layer with convolution kernel size of 9 multiplied by 9 and two convolution layers with convolution kernel size of 4 multiplied by 4, and each convolution layer is followed by a Batch Normalization layer and a Rectified Linear Unit active layer at this stage;
cutting out RGB image I with size of 300 × 300rgbAnd depth image IdepthPerforming feature fusion to obtain a fusion feature map IfusionA first reaction offusionInputting a feature extraction network, and performing feature extraction to obtain a feature map Ioutput1
Attention mechanism part:
the attention mechanism network consists of five attention modules, wherein each module consists of a residual error part, a Squeeze part and an Excitation part;
the residual part is divided into direct mapping and residual mapping, the direct mapping is checked by convolution kernel of 1 × 1output1Performing convolution operation to obtain direct mapping result h (I)output1) The residual mapping is composed of two convolution layers with convolution kernel size of 3 × 3, a Batch Normalization layer is next to the two convolution layers, a Rectified Linear Unit active layer is next to the first Batch Normalization layer, Ioutput1Obtaining R (I) after residual mappingoutput1);
The Squeeze part is realized by introducing Global Average firing, and the role is to obtain the Global information embedding of each channel of the feature map, namely the feature vector; suppose ucIs the characteristic diagram with W multiplied by H and the channel is C, the characteristic diagram after Squeeze is zc
Figure BDA0003594100180000041
The Excitation moiety being via zcLearning the weight of each channel, and forming by a door mechanism of two fully-connected layers; gating cell scThe size of the glass is 1 multiplied by 1,the number of channels is C, then scThe calculation method of (a) is expressed as:
sc=Fex(zc,w)=σ(g(z,w))=σ(w2δ(w1zc)) (2)
wherein sigma is a sigmoid activation function, delta is a ReLU activation function,
Figure BDA0003594100180000042
gamma is the number of nodes of the hidden layer;
will obtain scAnd ucPerforming vector product to obtain the predicted value
Figure BDA0003594100180000043
Figure BDA0003594100180000044
R (I)output1) Sequentially inputting the Squeeze part and the Excitation part to obtain the predicted values passing through the Squeeze part and the Excitation part
Figure BDA0003594100180000045
Will be provided with
Figure BDA0003594100180000046
Splicing the characteristic graph obtained by the residual error part to obtain the output I of the attention moduleoutput2
Figure BDA0003594100180000047
Will Ioutput1The output of the attention mechanism network part can be obtained by inputting 5 linearly connected five attention modules
Figure BDA0003594100180000051
Generating a network part:
the generation network consists of two deconvolution layers with convolution kernel sizes of 4 multiplied by 4 and a deconvolution layer with convolution kernel sizes of 9 multiplied by 9, wherein the two deconvolution layers with convolution kernel sizes of 4 multiplied by 4 are both followed by a Batch Normalization layer and a Rectified Linear Unit active layer;
will be provided with
Figure BDA0003594100180000052
Inputting the generated network to obtain the predicted value of the two-dimensional image group containing the motion instruction vector.
The invention has the following beneficial effects:
the invention provides a mechanical arm grabbing parameter estimation method based on an attention mechanism generating network, which can grab various unknown objects in a complex unstructured environment. The sensing capability of the mechanical arm in a complex environment is improved by fusing multiple information channels such as a color image, a depth image and the like; the construction of the lightweight attention generation network ensures the real-time property of the mechanical arm when the mechanical arm grabs the object; the establishment of the grabbing priority improves the effective grabbing precision of the mechanical arm.
Drawings
FIG. 1 is a diagram illustrating an example of a neural network structure generated based on attention mechanism according to an embodiment of the present invention;
fig. 2 is a frame diagram of a robot grasping parameter learning system based on an attention mechanism generating network according to an embodiment of the present invention.
Detailed Description
For a better understanding of the present invention, reference will now be made in detail to the present invention, by way of example, which is illustrated in the accompanying drawings of FIG. 1. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention. The present invention will be described in further detail with reference to the following embodiments:
s1, acquiring the work scene image of the mechanical arm in the current state by using the RGB-D camera, wherein the work scene image comprises an RGB image IrgbAnd depth image IdepthAnd an operating scene reference coordinate system;
s2, inputting each operation scene image I into a trained attention-based generating neural network in sequence to generate a predicted value of a two-dimensional image group containing a motion instruction vector, wherein the two-dimensional image group at least contains one two-dimensional capture quality image GθA two-dimensional angle image AθOne two-dimensional captured width image WθAnd a two-dimensional capture priority image OθThe method comprises the following steps of respectively including grabbing success rate information, grabbing angle information, clamping jaw opening width information and grabbing sequence information when a mechanical arm grabs an object;
the attention mechanism-based generating neural network comprises a feature extraction part, an attention mechanism part and a generating network part;
a feature extraction section:
the feature extraction network consists of a convolution layer with convolution kernel size of 9 multiplied by 9 and two convolution layers with convolution kernel size of 4 multiplied by 4, and each convolution layer is followed by a Batch Normalization layer and a Rectified Linear Unit activation layer at the stage;
will cut out 300X 300 RGB image IrgbAnd depth image IdepthPerforming feature fusion to obtain a fusion feature map IfusionIs shown byfusionInputting a feature extraction network, and performing feature extraction to obtain a feature map Ioutput1
Attention mechanism part:
the attention mechanism network consists of five attention modules, wherein each module consists of a residual error part, a Squeeze part and an Excitation part;
the residual part can be divided into two parts of direct mapping and residual mapping, the direct mapping is checked by a convolution kernel of 1 × 1 to check Ioutput1Performing convolution operation to obtain direct mapping result h (I)output1) The residual mapping is composed of two convolution layers with convolution kernel size of 3 x 3, a Batch Normalization layer is next to the two convolution layers, a Rectified Linear Unit active layer is next to the first Batch Normalization layer, Ioutput1Obtaining R (I) after residual mappingoutput1);
The Squeeze part is realized by introducing Global Average Potential (GAP) and the role is to obtain the Global information embedding of each channel of the feature map, namely the feature vector. Suppose ucIs the characteristic diagram with W multiplied by H and the channel is C, the characteristic diagram after Squeeze is zc
Figure BDA0003594100180000061
The Excitation moiety being via zcThe learned weight of each channel is composed of a gate mechanism (gate mechanism) of two fully connected layers. Gating unit scIs a feature vector with the size of 1 multiplied by 1 and the number of channels C, then scThe calculation method of (a) is expressed as:
sc=Fex(zc,w)=σ(g(z,w))=σ(w2δ(w1zc)) (2)
wherein sigma is a sigmoid activation function, delta is a ReLU activation function,
Figure BDA0003594100180000071
gamma is the number of nodes of the hidden layer;
will obtain scAnd ucPerforming vector product to obtain the predicted value
Figure BDA0003594100180000072
Figure BDA0003594100180000073
R (I)output1) Sequentially inputting the Squeeze part and the Excitation part to obtain the predicted values passing through the Squeeze part and the Excitation part
Figure BDA0003594100180000074
Will be provided with
Figure BDA0003594100180000075
Splicing the characteristic graph obtained by the residual error part to obtain the output I of the attention moduleoutput2
Figure BDA0003594100180000076
Will Ioutput1The output of the attention mechanism network part can be obtained by inputting 5 linearly connected five attention modules
Figure BDA0003594100180000077
Generating a network part:
the generation network consists of two deconvolution layers with convolution kernel sizes of 4 multiplied by 4 and a deconvolution layer with convolution kernel sizes of 9 multiplied by 9, wherein the two deconvolution layers with convolution kernel sizes of 4 multiplied by 4 are both followed by a Batch Normalization layer and a Rectified Linear Unit active layer;
will be provided with
Figure BDA0003594100180000078
And inputting the generation network to obtain the predicted value of the two-dimensional image group containing the motion instruction vector.
S3 two-dimensional capture quality image G in two-dimensional image group obtained by predictionθThe pixel values are sorted according to the size, ten pixel points with the largest pixel value are selected, and the pixel points are predicted values with the highest grabbing success rate
Figure BDA0003594100180000079
Corresponding to the two-dimensional angle image A according to the pixel point coordinates of the predicted valueθTwo-dimensional capture width image WθAnd two-dimensional capture priority image OθIn the method, a predicted value of the grabbing angle can be obtained
Figure BDA00035941001800000710
Grabbing width predicted value
Figure BDA00035941001800000711
And fetch sequence predicted values
Figure BDA00035941001800000712
S4 pair grabbing sequence predicted value
Figure BDA00035941001800000713
Sorting, selecting the predicted value with the highest priority of the grabbing sequence
Figure BDA00035941001800000714
Then
Figure BDA00035941001800000715
The capture information corresponding to the pixel coordinates is the optimal motion instruction vector, i.e.
Figure BDA00035941001800000716
And S5, analyzing the obtained optimal motion instruction vector, and obtaining the grabbing coordinate, the grabbing angle and the grabbing width of the target object to be grabbed in the mechanical arm base coordinate system, namely the mechanical arm grabbing parameters, through coordinate transformation of the wrist camera and coordinate transformation between the mechanical arm wrist and the base after analysis.
S6 repeats steps S1-S5 until all objects are grabbed.
As a specific preference for implementing the technical solution of the present invention, before step S2, the method comprises:
s01 creating a data set G for training a network based on the existing data settrain(ii) a The G istrainThe data set comprises a working scene image, effective capture frame information and a segmentation image only containing a topmost object;
s02, mapping the effective grabbing frame information to a 300X 300 two-dimensional image to obtain a two-dimensional grabbing quality image, a two-dimensional grabbing angle image and a two-dimensional grabbing width image, and combining GtrainThe image group is divided into two-dimensional image groups, wherein the two-dimensional image groups only contain the top-layer object;
s03, constructing a generating neural network based on the attention mechanism by constructing an attention mechanism module;
s04 Using data set GtrainTraining the generated neural network based on the attention mechanism by the two-dimensional image group, inputting an RGB-D image without grabbing information, outputting a two-dimensional image group containing grabbing information, and obtaining the trained generated neural network based on the attention mechanism;
as a specific preferred implementation of the technical solution of the present invention, as shown in fig. 2, a robot grasping parameter learning system based on an attention mechanism generating network includes:
offline learning, by data set GtrainContinuously training a generative neural network based on an attention mechanism, thereby obtaining a mechanical arm grabbing parameter prediction model;
and in the online learning, the operation scene image under the actual condition is acquired through the field perception of the actual operation scene and is input into the mechanical arm grabbing parameter prediction model, and the grabbing parameters of the mechanical arm under the actual scene are acquired, so that the grabbing of the mechanical arm under the actual scene is realized.
It should be understood that the above description of specific embodiments of the present invention is only for the purpose of illustrating the technical lines and features of the present invention, and is intended to enable those skilled in the art to understand the contents of the present invention and to implement the present invention, but the present invention is not limited to the above specific embodiments. It is intended that all such changes and modifications as fall within the scope of the appended claims be embraced therein.

Claims (6)

1. The mechanical arm grabbing parameter estimation method based on the attention mechanism generation type network is characterized by comprising the following steps:
s1, acquiring the work scene image of the mechanical arm in the current state by using the RGB-D camera, wherein the work scene image comprises an RGB image IrgbAnd depth image IdepthAnd a job scene reference coordinate system;
s2, inputting each operation scene image I into trained neural network based on attention mechanism to generate predicted value of two-dimensional image group containing motion instruction vectorThe predicted value of the two-dimensional image group at least comprises a two-dimensional grabbing quality image GθA two-dimensional capture angle image AθOne two-dimensional captured width image WθAnd a two-dimensional capture priority image OθThe robot arm comprises a gripping success rate information, a gripping angle information, a clamping jaw opening width information and a gripping sequence information when the robot arm grips an object;
s3 matching the predicted two-dimensional image group with the two-dimensional grabbing quality image GθThe pixel values are sorted according to size, n pixel points with the maximum pixel value are selected, and the pixel points are predicted values with the highest capturing success rate
Figure FDA0003594100170000011
Corresponding to the two-dimensional angle image A according to the pixel point coordinates of the predicted valueθTwo-dimensional capture width image WθAnd two-dimensional capture priority image OθIn the method, a predicted value of the grabbing angle can be obtained
Figure FDA0003594100170000012
Grabbing width predicted value
Figure FDA0003594100170000013
And fetch sequence predicted values
Figure FDA0003594100170000014
Wherein p isnRepresenting two-dimensional grab-quality images GθThe pixel value of (1) is arranged from large to small, and the pixel point coordinate of the nth pixel is arranged;
s4 pair fetch sequence prediction value
Figure FDA0003594100170000015
Sorting, selecting the predicted value with the highest priority of the grabbing sequence
Figure FDA0003594100170000016
Then
Figure FDA0003594100170000017
The grabbing information corresponding to the pixel coordinates is the optimal motion instruction vector, i.e.
Figure FDA0003594100170000018
And S5, analyzing the obtained optimal motion instruction vector to obtain the grabbing coordinates, the grabbing angle and the grabbing width of the target object to be grabbed under the mechanical arm base coordinate system, namely the grabbing parameters of the mechanical arm.
2. The method for estimating manipulator grasping parameters based on attention mechanism generating network according to claim 1, wherein step S2 comprises:
s21 training the generated neural network based on the attention mechanism based on the existing data set;
s22, respectively preprocessing the operation scene images to obtain 300 x 300 pixel operation scene images;
s23, inputting the image feature vector into a trained attention-based generating neural network;
s24 outputs the two-dimensional image group prediction value including the motion instruction vector, that is, the parameter prediction value of the capturing success probability, to the attention-based generative neural network.
3. The method for estimating manipulator grasping parameters based on an attention mechanism generating network according to claim 1 or 2, wherein step S3 includes:
s31, sorting the pixel values of the two-dimensional grabbing quality images in the predicted two-dimensional image group, namely, the size of each pixel value in the two-dimensional grabbing quality images represents the grabbing success rate of the mechanical arm taking the point as the center;
s32, selecting n coordinates of the predicted values of the maximum grabbing success rate as grabbing center coordinates;
s33, according to the predicted two-dimensional image group, obtaining the grabbing angle pixel value, grabbing width pixel value and grabbing priority pixel value corresponding to the grabbing center coordinate,
s34 analyzes the information of the grabbing angle, the grabbing width and the grabbing sequence according to the grabbing angle pixel value, the grabbing width pixel value and the grabbing priority pixel value.
4. The method for estimating grabbing parameters of a mechanical arm based on an attention mechanism generating network as claimed in any one of claims 1 to 3, wherein step S5 includes:
s51, analyzing the obtained optimal motion instruction vector;
s52, carrying out coordinate conversion on the analyzed data through a wrist camera coordinate system;
s53, transforming the coordinates of the converted camera coordinate system through the mechanical arm base coordinate system;
and S54, inputting the coordinate obtained by converting the mechanical arm base coordinate system, the grabbing width and the grabbing angle information into the mechanical arm control system for grabbing.
5. The mechanical arm grabbing parameter estimation method based on the attention mechanism generation network as claimed in claim 2, wherein: the attention mechanism-based training method of the generative neural network comprises the following steps:
s01 creating a data set G for training a network based on the existing data settrain(ii) a The G istrainThe data set comprises a working scene image, effective capture frame information and a segmentation image only containing the topmost object, wherein GtrainThe segmented image only containing the topmost object is a two-dimensional capture priority image;
s02, mapping the effective grabbing frame information to a 300X 300 two-dimensional image to obtain a two-dimensional grabbing quality image, a two-dimensional grabbing angle image and a two-dimensional grabbing width image, and combining GtrainConstructing a two-dimensional image group by only containing the segmented image of the topmost object;
s03, constructing a generating neural network based on the attention mechanism by constructing an attention mechanism module;
s04 Using data set GtrainGenerating neural network based on attention mechanism to be paired with two-dimensional imageAnd performing training, namely inputting an RGB-D image without grabbing information, outputting a two-dimensional image group containing grabbing information, and obtaining a trained attention-based generating neural network.
6. The method for estimating manipulator grasping parameters based on attention mechanism generating network according to claim 5, wherein the attention mechanism generating neural network comprises:
a feature extraction part, an attention mechanism part and a network generation part;
a feature extraction part:
the feature extraction network consists of a convolution layer with convolution kernel size of 9 multiplied by 9 and two convolution layers with convolution kernel size of 4 multiplied by 4, and each convolution layer is followed by a Batch Normalization layer and a Rectified Linear Unit activation layer at the stage;
will cut out 300X 300 RGB image IrgbAnd depth image IdepthPerforming feature fusion to obtain a fusion feature map IfusionA first reaction offusionInputting a feature extraction network, and performing feature extraction to obtain a feature map Ioutput1
Attention mechanism part:
the attention mechanism network consists of five attention modules, wherein each module consists of a residual error part, a Squeeze part and an Excitation part;
the residual part is divided into direct mapping and residual mapping, the direct mapping is checked by convolution kernel of 1 × 1output1Performing convolution operation to obtain direct mapping result h (I)output1) The residual mapping is composed of two convolution layers with convolution kernel size of 3 x 3, a Batch Normalization layer is next to the two convolution layers, a Rectified Linear Unit active layer is next to the first Batch Normalization layer, Ioutput1Obtaining R (I) after residual mappingoutput1);
The Squeeze part is realized by introducing globalage Pooling, and the role is to obtain the global information embedding of each channel of the feature map, namely the feature vector; suppose ucIs a feature map with W × H size and C channel, the feature map after Squeeze is zc
Figure FDA0003594100170000041
The Excitation moiety being via zcLearning the weight of each channel, wherein the weight is formed by door mechanisms of two fully-connected layers; gating cell scIs a feature vector with the size of 1 multiplied by 1 and the number of channels C, then scThe calculation method of (a) is expressed as:
sc=Fex(zc,w)=σ(g(z,w))=σ(w2δ(w1zc)) (2)
wherein, sigma is sigmoid activation function, delta is ReLU activation function,
Figure FDA0003594100170000042
gamma is the number of nodes of the hidden layer;
will obtain scAnd ucPerforming vector product to obtain the predicted value
Figure FDA0003594100170000043
Figure FDA0003594100170000044
R (I)output1) Sequentially inputting the Squeeze part and the Excitation part to obtain the predicted values passing through the Squeeze part and the Excitation part
Figure FDA0003594100170000045
Will be provided with
Figure FDA0003594100170000046
Splicing the characteristic graph obtained by the residual error part to obtain the output I of the attention moduleoutput2
Figure FDA0003594100170000047
Will Ioutput1The output of the attention mechanism network part can be obtained by inputting 5 linearly connected five attention modules
Figure FDA0003594100170000048
Generating a network part:
the generation network consists of two deconvolution layers with convolution kernel sizes of 4 multiplied by 4 and a deconvolution layer with convolution kernel sizes of 9 multiplied by 9, wherein the two deconvolution layers with convolution kernel sizes of 4 multiplied by 4 are both followed by a Batch Normalization layer and a Rectified Linear Unit active layer;
will be provided with
Figure FDA0003594100170000049
And inputting the generation network to obtain the predicted value of the two-dimensional image group containing the motion instruction vector.
CN202210387024.1A 2022-04-13 2022-04-13 Mechanical arm grabbing parameter estimation method based on attention mechanism generation type network Pending CN114782347A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210387024.1A CN114782347A (en) 2022-04-13 2022-04-13 Mechanical arm grabbing parameter estimation method based on attention mechanism generation type network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210387024.1A CN114782347A (en) 2022-04-13 2022-04-13 Mechanical arm grabbing parameter estimation method based on attention mechanism generation type network

Publications (1)

Publication Number Publication Date
CN114782347A true CN114782347A (en) 2022-07-22

Family

ID=82429469

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210387024.1A Pending CN114782347A (en) 2022-04-13 2022-04-13 Mechanical arm grabbing parameter estimation method based on attention mechanism generation type network

Country Status (1)

Country Link
CN (1) CN114782347A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115972198A (en) * 2022-12-05 2023-04-18 无锡宇辉信息技术有限公司 Mechanical arm visual grabbing method and device under incomplete information condition
CN117549307A (en) * 2023-12-15 2024-02-13 安徽大学 Robot vision grabbing method and system in unstructured environment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115972198A (en) * 2022-12-05 2023-04-18 无锡宇辉信息技术有限公司 Mechanical arm visual grabbing method and device under incomplete information condition
CN115972198B (en) * 2022-12-05 2023-10-10 无锡宇辉信息技术有限公司 Mechanical arm visual grabbing method and device under incomplete information condition
CN117549307A (en) * 2023-12-15 2024-02-13 安徽大学 Robot vision grabbing method and system in unstructured environment
CN117549307B (en) * 2023-12-15 2024-04-16 安徽大学 Robot vision grabbing method and system in unstructured environment

Similar Documents

Publication Publication Date Title
CN108280856B (en) Unknown object grabbing pose estimation method based on mixed information input network model
Liu et al. Deep learning-based human motion prediction considering context awareness for human-robot collaboration in manufacturing
CN114782347A (en) Mechanical arm grabbing parameter estimation method based on attention mechanism generation type network
CN111046948B (en) Point cloud simulation and deep learning workpiece pose identification and robot feeding method
CN105772407A (en) Waste classification robot based on image recognition technology
CN110238840B (en) Mechanical arm autonomous grabbing method based on vision
CN113172629B (en) Object grabbing method based on time sequence tactile data processing
Dai Real-time and accurate object detection on edge device with TensorFlow Lite
CN113762159B (en) Target grabbing detection method and system based on directional arrow model
CN114140418A (en) Seven-degree-of-freedom grabbing posture detection method based on RGB image and depth image
CN112613478B (en) Data active selection method for robot grabbing
CN110171001A (en) A kind of intelligent sorting machinery arm system based on CornerNet and crawl control method
CN114998573B (en) Grabbing pose detection method based on RGB-D feature depth fusion
CN115147488A (en) Workpiece pose estimation method based on intensive prediction and grasping system
CN117523181B (en) Multi-scale object grabbing point detection method and system based on unstructured scene
Jiang et al. Robotic grasp detection using light-weight cnn model
CN113681552B (en) Five-dimensional grabbing method for robot hybrid object based on cascade neural network
CN117340895A (en) Mechanical arm 6-DOF autonomous grabbing method based on target detection
Permana et al. Hand movement identification using single-stream spatial convolutional neural networks
CN115861956A (en) Yolov3 road garbage detection method based on decoupling head
Geng et al. A Novel Real-time Grasping Method Cobimbed with YOLO and GDFCN
CN114882214A (en) Method for predicting object grabbing sequence from image based on deep learning
Gu et al. Cooperative Grasp Detection using Convolutional Neural Network
Kumar et al. Employing data augmentation for recognition of hand gestures using deep learning
CN117772648B (en) Part sorting processing method, device, equipment and medium based on body intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination