CN109934864B - Residual error network deep learning method for mechanical arm grabbing pose estimation - Google Patents

Residual error network deep learning method for mechanical arm grabbing pose estimation Download PDF

Info

Publication number
CN109934864B
CN109934864B CN201910192296.4A CN201910192296A CN109934864B CN 109934864 B CN109934864 B CN 109934864B CN 201910192296 A CN201910192296 A CN 201910192296A CN 109934864 B CN109934864 B CN 109934864B
Authority
CN
China
Prior art keywords
grabbing
residual
convolution
mechanical arm
filters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910192296.4A
Other languages
Chinese (zh)
Other versions
CN109934864A (en
Inventor
白帆
姚仁杰
陈懋宁
崔哲新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN201910192296.4A priority Critical patent/CN109934864B/en
Publication of CN109934864A publication Critical patent/CN109934864A/en
Application granted granted Critical
Publication of CN109934864B publication Critical patent/CN109934864B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a residual error network deep learning method for estimating a grabbing pose of a mechanical arm, which comprises the following steps: initializing a mechanical arm, and adjusting a wrist camera of the mechanical arm to enable the wrist camera to be located at a known height position above a vertical X0Y plane; acquiring a depth image of an object to be grabbed by the mechanical arm; adopting a pre-trained improved GG-CNN model to perform mapping processing on the depth image, and outputting four 300 x 300 pixel grabbing information images, including grabbing success rate, grabbing angle cosine value, grabbing angle sine value and grabbing width; further acquiring the information of the grabbing angle and the width at the position with the highest success rate; and obtaining the grabbing angle and the grabbing width of the target object under the mechanical arm base coordinate system through coordinate transformation of the grabbing information obtained from the grabbing success rate image. According to the improved GG-CNN model, the residual error network is built through the residual error building module, the fitting effect and the learning capacity of the convolutional neural network are enhanced, and the grabbing precision of the grabbing pose generated is higher.

Description

Residual error network deep learning method for mechanical arm grabbing pose estimation
Technical Field
The invention belongs to an information control technology, and particularly relates to a residual error network deep learning method for estimating a grabbing pose of a mechanical arm.
Background
In recent years, vision-based robotic arm grasping has become a hotspot in current research. Generally, when performing a grabbing action, accurate target detection and positioning need to be achieved first. The traditional target detection is usually static detection, and the target is single, and the target detection is influenced by factors such as the change of the shape, the size and the visual angle, the change of external illumination and the like, so that the generalization capability of the extracted features is not strong, and the robustness is poor. The development of deep learning algorithms has facilitated the advancement of the task of target detection and localization. It is generally accepted by the research community that deep networks generally work better than shallow networks, but deep elevation of the network cannot be achieved by simple stacking of layers. Deep networks are difficult to train due to the gradient vanishing problem. In 2015, technicians in the industry proposed the concept of residual error network (ResNet) to solve the problem of accuracy degradation. In ImageNet hierarchical data sets, very good results were obtained with an extremely deep residual network.
The combination of mechanical arm visual grabbing and deep learning is the main direction of mechanical arm grabbing research at present. Recently, some of the technicians in the industry propose to carry out the optimum pose grabbing research of the object by constructing a grabbing and generating Neural Network (GG-CNN for short), and the convolution Neural Network is constructed by corresponding the pixels of the input depth image and the pixels of the output grabbing information image, so as to realize the prediction of the optimum grabbing pose of the complex object. However, the GG-CNN excessively pursues the recognition and grabbing speeds, so that the recognition accuracy of the neural network is reduced, and the application of the network model in the aspect of mechanical arm grabbing has certain limitation.
Therefore, how to improve the recognition accuracy of the GG-CNN applied to the mechanical arm grabbing pose estimation becomes a problem to be solved at present.
Disclosure of Invention
The invention aims to provide a residual error network deep learning method for mechanical arm grabbing pose estimation, which can effectively improve the generation precision of the optimal grabbing pose of a mechanical arm and enables a GG-CNN model to have higher practicability in the field of high-precision grabbing.
In order to achieve the purpose, the invention adopts the main technical scheme that:
the invention provides a residual error network deep learning method for estimating a grabbing pose of a mechanical arm, which comprises the following steps:
s1, initializing a mechanical arm, and adjusting the mechanical arm to enable a wrist camera to be located at a known height position above a vertical X0Y plane;
s2, obtaining a depth image of an object to be grabbed by the mechanical arm;
s3, cutting the central part of the depth image to obtain an object depth image with 300 multiplied by 300 pixels;
s4, mapping the object depth image by adopting a pre-trained improved GG-CNN model, and outputting four grabbing information images of 300 x 300 pixels, wherein the grabbing information images comprise a grabbing success rate, a grabbing angle cosine value, a grabbing angle sine value and a grabbing width;
s5, selecting a pixel point with the highest power in the grabbing success rate image, and correspondingly corresponding to a grabbing angle cosine value, a grabbing angle sine value and a corresponding pixel point in the grabbing width information image to obtain grabbing angle and width information which serve as grabbing information and have the highest grabbing success rate;
s5, acquiring the grabbing angle and the grabbing width of the target object under a mechanical arm base coordinate system (Cartesian coordinate system) through coordinate transformation of a wrist camera, a mechanical arm wrist and a mechanical arm base according to the grabbing information acquired from the grabbing success rate image;
s6, inputting grabbing information, and controlling the mechanical arm to grab (namely outputting the grabbing position, angle and width of the target object to be grabbed after coordinate transformation so as to control the mechanical arm to grab the target object);
the improved GG-CNN model is characterized in that a residual error network is built in the existing GG-CNN model by building a residual error module, and the fitting effect and the learning capacity of a convolutional neural network are enhanced, so that the grabbing accuracy of the grabbing pose generated by the improved GG-CNN model is higher, the improved GG-CNN model is more sensitive to the change of the position and the shape of an object, and the improved GG-CNN model has practical application value.
Optionally, before step S1, the method comprises:
s0-1, creating a first data set G for training input and output of the improved GG-CNN model based on the existing data set train (ii) a The first data set comprises images marked as positive capture information and images marked as negative capture information, and the images in the first data set are provided with a plurality of marked capture frames;
s0-2, improving the existing GG-CNN model by constructing a residual error module and constructing a residual error network so as to construct an improved GG-CNN model and ensure that the sizes of input and output images of the improved GG-CNN model are unchanged;
s0-3, using the first data set G train And training the GG-CNN model improved through the residual error to obtain the trained improved GG-CNN model.
Optionally, the GG-CNN model refined by residual error comprises:
a convolution part, a deconvolution part and an output part;
the convolution portion includes: ten residual error modules are used for carrying out the operations,
wherein the first residual module comprises: 1 convolution residual module with pooling layer, the parameters in the convolution residual module including: 4 filters with step size of 3 × 3;
the second residual module includes: 5 identity residual modules, wherein the parameters of the identity residual modules comprise: 4 filters with step size of 1 × 1;
the third residual module includes: 1 convolution residual module with pooling layer, the parameters in the convolution residual module including: 8 filters with step size of 2 × 2;
the fourth residual module includes: 5 identity residual modules, wherein the parameters of the identity residual modules comprise: 8 filters with step size of 1 × 1;
the fifth residual module includes: 1 convolution residual module with pooling layer, the parameters in the convolution residual module including: 16 filters with step size of 2 × 2;
the sixth residual module includes: 5 identity residual modules, wherein the parameters of the identity residual modules comprise: 16 filters with step size of 1 × 1;
the seventh residual module includes: 1 convolution residual module with pooling layer, the parameters in the convolution residual module including: 32 filters with step size of 5 × 5;
the eighth residual module includes: 5 identity residual modules, wherein the parameters of the identity residual modules comprise: 32 filters with step size of 1 × 1;
the ninth residual module includes: 1 convolution residual module with pooling layer, the parameters in the convolution residual module including: 64 filters with step size of 1 × 1;
the tenth residual module includes: 5 identity residual modules, wherein the parameters of the identity residual modules comprise: 64 filters with step size of 1 × 1;
the deconvolution part comprises 5 deconvolution layers with different parameters;
the number of filters of the first deconvolution layer is 64, the size of each filter is 3 × 3, and the step size is 1 × 1;
the number of filters of the second deconvolution layer is 32, the size of each filter is 5 × 5, and the step size is 5 × 5;
the number of filters of the third deconvolution layer is 16, the size of each filter is 5 × 5, and the step size is 2 × 2;
the number of filters of the fourth deconvolution layer is 8, the size of each filter is 7 × 7, and the step size is 2 × 2;
the number of the fifth deconvolution layer filters is 4, the size of each filter is 9 × 9, and the step size is 3 × 3;
the output part comprises four linearly mapped convolution layers, each convolution layer comprises 1 filter, and the four linearly mapped convolution layers sequentially and respectively map and output the grabbing success rate, the cosine value of the grabbing angle, the sine value of the grabbing angle and the grabbing width.
Optionally, in step S0-3, the following cross-over ratio formula is used to measure the capture accuracy of the GG-CNN network improved by the residual error;
intersection ratio formula:
Figure BDA0001994722000000051
wherein, C and G respectively represent two known areas, and the intersection and union ratio calculation is the ratio of the intersection and the union between the two areas.
Optionally, the depth image I = R in S2 H×W Where H is height, W is width, and the capture description of the depth image is:
Figure BDA0001994722000000052
grabbing in image space by coordinate transformation of mechanical arm
Figure BDA0001994722000000053
Conversion to grab in world coordinate g:
Figure BDA0001994722000000054
wherein o = (u, v) is a position coordinate of a pixel having the highest capturing success rate,
Figure BDA0001994722000000055
is the angle of rotation in the camera reference frame,
Figure BDA0001994722000000056
is the grip width in the image coordinates; t is RC Is a coordinate transformation from the camera coordinate system to the robot arm coordinate system, T CI Calibration transformation based on camera internal parameters and hand-eye positions between the mechanical arm and the camera;
the output image in S4 is represented as: g = (phi, W, Q) epsilon R 3×H×W
Phi, W and Q are each e R 3×H×W Respectively representing a grabbing angle, a grabbing width and a grabbing accuracy, wherein the grabbing angle phi is split into a grabbing angle cosine value and a grabbing angle sine value, and phi corresponding to a coordinate o with the highest grabbing success rate, W and Q comprise
Figure BDA0001994722000000057
And the value of q;
in the step S4, the improved GG-CNN model trained in advance is used to perform mapping processing on the object depth image, specifically: g = M (I);
determining an optimal grab pose in image space from G:
Figure BDA0001994722000000058
specifically, from output grabbing information G, a pixel with the largest grabbing success rate Q of a Q image is selected, and a coordinate o of the pixel corresponds to phi and W in the output G, so that the position, the angle and the width information of the optimal grabbing pose are obtained;
further, the method can be used for preparing a novel liquid crystal displayTo ground through
Figure BDA0001994722000000061
Optimal grabbing pose g in calculating world coordinates best
Optionally, the processing procedure of each residual module of the convolution part includes:
each residual error module comprises a main path and an auxiliary path;
the auxiliary path consists of two paths of a path adopting pooling and convolution operation and a shortcut path without operation;
specifically, the main path includes:
1) The input data X is firstly subjected to regularization operation, then passes through an activation layer using a ReLU activation function, and finally is output to the next layer through a filter and a convolution layer;
2) The previous layer is regularized, passes through an activation layer using a ReLU activation function, and finally passes through a filter and a convolution layer to output F (X);
the secondary path includes:
1) Module pooling parameter is true: the input data X passes through the maximum pooling layer, then passes through the convolution layer with the size of a filter of 5 multiplied by 5, the number of filters and the step length of 1 multiplied by 1, and then the W (X) is output;
2) Module pooling parameter false: directly outputting X without any operation;
the outputs of the primary path and the selected secondary path are added as the overall output H (X) of the residual block function.
The invention has the beneficial effects that:
compared with the prior art, the method provided by the invention can improve the generation precision of the optimal grabbing pose of the mechanical arm, so that the improved GG-CNN model in the method provided by the invention has higher practicability in the field of high-precision grabbing.
That is to say, the method proposes that a convolution residual module is firstly constructed, a residual network is constructed by multilayer accumulation of the residual module, the depth of the convolution neural network is deepened, and the deepened convolution residual module is used as a main part of the improved GG-CNN. The invention improves the GG-CNN model, improves the accuracy of the generation of the optimal grabbing pose of the mechanical arm, and makes the network model more practical in the field of high-accuracy grabbing.
Drawings
FIG. 1 is a flow chart of a residual error network deep learning method facing to manipulator grabbing pose estimation in the invention;
FIG. 2 is a schematic diagram of a Cartesian space and image space depiction of the present application;
FIG. 3 is a schematic diagram of a prior art Connell university grab dataset;
FIG. 4 is a schematic diagram of a training data set generation process in the present application;
FIG. 5 is a schematic diagram of a GG-CNN structure in the prior art;
FIG. 6 is a schematic diagram of a portion of the structure used in constructing the residual module of the present application;
FIG. 7 is a schematic diagram of constructing an identity residual block in the present application;
FIG. 8 is a diagram of a convolutional residual block in the present application;
FIG. 9 is a diagram of residual block functions in the present application;
FIG. 10 is a block diagram of the GG-CNN model improved by residual error in the present application;
FIG. 11 is a graph of accuracy versus the model of FIGS. 5 and 10;
fig. 12 is a graph comparing the output effects of the models before and after the model is processed, such as fig. 5 and fig. 10.
Detailed Description
For the purpose of better explaining the present invention and to facilitate understanding, the present invention will be described in detail by way of specific embodiments with reference to the accompanying drawings.
The problem of autonomous grasping of mechanical arms is an important problem in the field of robot research. Aiming at the problem of optimal grabbing pose, the method and the system endow the mechanical arm with vision and combine a deep learning algorithm to realize the intellectualization of mechanical arm grabbing.
In the application, a convolutional neural network (GG-CNN) is generated by adopting the idea of a residual error network to improve capture, a convolutional residual error module (shown in figure 9) is firstly built, the residual error network is built by utilizing multilayer accumulation of the residual error module, the depth of the convolutional neural network is deepened, and the residual error network is used as a main part for improving the GG-CNN. According to the method, the GG-CNN is improved through a deep residual error network, and the accuracy of the optimal grabbing pose generation model of the mechanical arm is improved. Experimental results show that the accuracy of the GG-CNN model improved by using the residual error network reaches 88%, which is far higher than the accuracy of 72% of the original model, the accuracy of predicting the optimal grabbing pose of the mechanical arm by using the model is greatly improved, and the GG-CNN model has certain scientific research significance and application value in the field of mechanical arm visual grabbing.
Fig. 1 illustrates a method provided by an embodiment of the present invention, which may include the following steps:
s1, initializing a mechanical arm, and adjusting the mechanical arm to enable a wrist camera to be located at a known height above a vertical X0Y plane.
In the present embodiment, the description is made with reference to the wrist camera of the robot arm, but in practical application, the wrist camera is not limited thereto, and any camera located at the upper portion of the robot arm and used in cooperation with the robot arm may be used.
S2, obtaining a depth image of the object to be grabbed by the mechanical arm.
And S3, cutting out the central part of the depth image to obtain an object depth image with 300 multiplied by 300 pixels.
The present embodiment does not limit the manner of cropping the depth image, but needs to retain a major portion of the target object of the depth image.
S4, mapping the object depth image by adopting a pre-trained improved GG-CNN model, and outputting four grabbing information images of 300 x 300 pixels, wherein the grabbing information images comprise a grabbing success rate, a grabbing angle cosine value, a grabbing angle sine value and a grabbing width;
s5, selecting a pixel point with the highest power in the image with the capturing success rate, and corresponding to a capturing angle cosine value, a capturing angle sine value and a corresponding pixel point in the image with the capturing width information to obtain the capturing angle and width information which is used as capturing information and has the highest capturing success rate;
s5, acquiring the grabbing angle and the grabbing width of the target object under a mechanical arm base coordinate system (Cartesian coordinate system) through coordinate transformation of a wrist camera, a mechanical arm wrist and a mechanical arm base according to the grabbing information acquired from the grabbing success rate image;
s6, inputting grabbing information, and controlling the mechanical arm to grab (namely outputting grabbing positions, angles and widths of the target object to be grabbed after coordinate transformation so as to control the mechanical arm to grab the target object);
the improved GG-CNN model is characterized in that a residual error network is built in the existing GG-CNN model through building a residual error module, and the fitting function and the learning capability of a convolutional neural network are enhanced, so that the grabbing accuracy of the grabbing pose generated by the improved GG-CNN model is higher, the grabbing pose is more sensitive to the change of the position and the shape of an object, and the improved GG-CNN model has practical application value.
For a better understanding of the aspects of the present application, the following description is made with reference to the accompanying drawings.
1. Gripping scheme based on GG-CNN
1.1 defining grab parameters and transformations
The present application studies the problem of detecting and grabbing on unknown objects perpendicular to a plane, as shown in FIG. 2, with a depth camera acquiring depth images in a given scene
The grabbing is performed on a plane perpendicular to the XOY plane (i.e. the robot base coordinate system, referred to as the robot coordinate system), and in this embodiment, the grabbing may be defined as:
Figure BDA0001994722000000091
using these pose-describing parameters, a grasping action can be determined, the position being the center of the gripper in Cartesian coordinates p = (x, y, z), the pose comprising the angle of rotation of the end effector about the z-axis
Figure BDA0001994722000000092
And the required width ω. And the grabbing success rate q represents the possibility of successful grabbing.
The internal parameters of the camera used in the present application are known, whereby a depth image I = R of height H and width W is acquired H×W And detecting the capture of the depth image I. In the figureGrab descriptions like in I are:
Figure BDA0001994722000000093
wherein o = (u, v) is a position coordinate of a pixel having the highest capturing success rate,
Figure BDA0001994722000000094
is the angle of rotation in the camera reference frame (i.e. the aforementioned wrist camera or the camera of the robot arm),
Figure BDA0001994722000000095
is the grip width in the image coordinates. Through the coordinate transformation of the mechanical arm, the grabbing in the image space can be realized
Figure BDA0001994722000000096
Conversion to grab in world coordinates g:
Figure BDA0001994722000000097
T RC is a coordinate transformation from the camera coordinate system to the robot arm coordinate system, T CI Is based on camera internal parameters and calibration transformation of hand-eye position between the mechanical arm and the camera, converting from 2D image coordinate to 3D camera coordinate system.
In addition, a group of captures in image space is referred to as a capture map, which is represented as
G=(Φ,W,Q)∈R 3×H×W
Where Φ, W and Q are each ∈ R 3×H×W Respectively representing a grabbing angle, a grabbing width and a grabbing accuracy, wherein the grabbing angle phi is split into a grabbing angle cosine value and a grabbing angle sine value, and phi corresponding to a coordinate o with the highest grabbing success rate, W and Q comprise
Figure BDA0001994722000000101
And the value of q.
In an ideal case, the grab value of each pixel in the depth image I may be directly calculated, instead of randomly sampling the input image. To this end, the function M in the depth image (or called mapping M/mapping function M) is defined as the transformation from the input depth image to the capture information image:
G=M(I)
the optimal capture pose in the image space can be calculated from G
Figure BDA0001994722000000102
Specifically, from output grabbing information G, a pixel with the largest grabbing success rate Q of a Q image is selected, and a coordinate o of the pixel corresponds to phi and W in the output G, so that the position, the angle and the width information of the optimal grabbing pose are obtained;
and by the equation
Figure BDA0001994722000000103
Calculating the optimal grabbing pose g in world coordinates best .
1.2 neural network approximate mapping relationship
It is to be understood that the following detailed description describes how the improved GG-CNN is used to determine the mapping function M.
Approximating the functional map M using a grab-generated convolutional neural network (GG-CNN): i → G. By M λ A neural network is represented, where λ is the weight after neural network training.
Proves that M λ (I)=(Q λλ ,W λ ) M (I) using an L2 loss function
Figure BDA0001994722000000104
Input I with training set train And corresponding output G train Learning and training the neural network as follows:
Figure BDA0001994722000000111
where G is a set of grabbing parameters at a point p estimated in a cartesian coordinate system, which corresponds to each pixel o. θ has no meaning, but is for convenience of description.
The grab graph G represents a triplet of images: Φ, W, and Q. These parameters are expressed as follows:
q is an image describing the success rate of the grabbing performed at each point (u, v). This value is a scalar in the range of [0,1], where values close to 1 indicate a higher success rate of grabbing.
Φ is an image describing the angle of the capturing performed at each point. Since the typical object is symmetric to grip around π/2 radians, the angle should be in the range of [ - π/2, π/2 ].
W is an image describing the end effector width of the grasp performed at each point. To ensure that the depth is constant, the value of W is in the range of [0,150] pixels, which can be converted to a physical measurement using the depth camera parameters and the measured depth.
1.3 Construction and training of GG-CNN
None of the existing datasets meet the training requirements of GG-CNN, and in order to train the GG-CNN model, a dataset conforming to the input and output of GG-CNN is created from the Connell university grab dataset (shown in FIG. 3). The cornell university grab dataset contains RGB-D images of 885 real objects, of which 5110 are labeled "positive grabs" and 2909 are labeled "negative grabs". Although this is a relatively small capture dataset compared to some newer composite datasets, this data best meets the pixel-by-pixel capture requirements of the present application, since each image provides multiple labeled capture frames.
Random cropping, scaling and rotation are used to increase the number of cornell university grab datasets, creating a set G of 8840 depth images and associated grab images train And effectively incorporates 51,100 grab examples.
The cornell university grab dataset represents the object to be grabbed as a grab rectangular box using pixel coordinates, thereby calibrating the position and rotation angle of the end effector. To transition from the grabbed rectangular box representation to the image-based representation G, the center third of each grabbed rectangle is selected as the image grippable region, which corresponds to the position of the center of the end effector. And assume that no other region is a valid grab. The data set generation process is shown in fig. 4.
Grabbing success rate Q: consider whether each pixel in the connell university grab dataset is a valid grab as a binary label, and consider Q train The graspable region of (a) is set to 1, and all other pixels are 0.
Rotation angle Φ: calculating the value of each grabbing rectangle at [ -pi/2, pi/2]Angle within range and setting corresponding phi train And (4) a region. In order to eliminate the problem that the data with the angle at + -pi/2 may have discontinuity and too large value when the original angle is used. The angle is decomposed into two vector components on the unit circle, yielding [ -1,1]Values within the range, since the antipodal grip is symmetrical around π/2 radians, use two components sin (2 Φ) train ) And cos (2. Phi.) (II) train ) They are provided at phi train ∈[-π/2,π/2]An inner unique value.
Grabbing width W: similar to the angle, the width (in units of maximum) of each grabbing rectangle is calculated, the width of the grabber is expressed, and W is set T The corresponding parts of (a). During training, W is set T Is scaled down by a ratio of 1/150 to be [0,1]]Within the range. The width of the end effector may be calculated using the parameters of the camera/cameras and the measured depth.
Inputting a depth image: since the cornell university grabbed data set was captured using a real camera, it already contained real sensor noise, so no noise needs to be added. The depth image is repaired using OpenCV to remove invalid values. The average value of each depth image is subtracted, centering its value at 0 to provide depth invariance.
Through the above definitions and operations, a dataset for training the GG-CNN model was generated from the cornell university grab dataset.
Mapping M by using GG-CNN function model in prior art λ (I)=(Q λλ ,W λ ) Approximate generation of a capture information map directly from an input depth image IImage G λ And taking the depth image of 300 multiplied by 300 as input, and finally obtaining the captured information image through three-layer convolution operation and three-layer deconvolution operation. The GG-CNN complete structure is shown in FIG. 5.
Since the GG-CNN shown in FIG. 5 cannot improve the accuracy in recognition and capture, the GG-CNN structure shown in FIG. 5 is improved in the present application, as shown in FIG. 10, and the improvement process is as follows.
2 improved GG-CNN model based on residual error network
Firstly, the idea of the residual error network is introduced, secondly, two basic modules (such as an identity residual block and a convolution residual block) are explained, and finally, the residual error module is constructed by combining the two basic modules, and the residual error network is constructed by utilizing the residual error module, wherein the structure is shown in fig. 10.
2.1 residual error network
The residual Network takes advantage of the cross-layer link idea of a high speed Network (high-speed Network), but improves the same. Directly transmitting input X to output as an initial result in a mode of constructing a residual block short connections, wherein the output result is
H(X)=F(X)+X
When F (X) =0, then H (X) = X, i.e. identity mapping. ResNet corresponds to changing the learning objective, and instead of learning a complete output, the difference between the target values H (X) and X, the so-called residual:
F(X)=H(X)-X
therefore, the objective of the later training is to approach the residual result to 0, so that the accuracy does not decrease as the network deepens.
The residual error jump type structure breaks through the convention that the output of the n-1 layer of the traditional neural network only can be used as the input for the n layer, so that the output of a certain layer can directly cross several layers to be used as the input of a later layer, and the significance of the residual error jump type structure is to provide a new direction for the difficult problem that the accuracy of the whole learning model is not reduced and reversely increased due to the fact that a multi-layer network is superposed.
In ResNet (residual network), the shortcut links allow the gradient to propagate back to the layers further ahead, fig. 6 (a) shows the main path of the neural network, fig. 6 (b) adds a shortcut link to the main path, and by stacking these ResNet modules, a very deep neural network can be constructed.
Two main types of modules are used in ResNet (i.e., identity and convolution blocks), the choice of identity and convolution blocks depending largely on whether the input/output sizes are the same or different. If they are the same, then the identity residual block is used, otherwise the convolution residual block is used.
(1) Constant residual block
The identity residual block is a standard block used in ResNet, corresponding to the case where the input and output have the same dimensions.
The auxiliary path is a shortcut connection (shortcut), and the convolutional layer constitutes the main path. In fig. 7, convolution and ReLU activation operations are also performed, and Batch regularization is added to increase the training speed and prevent overfitting.
(2) Convolution residual block
The convolutional residual block of ResNet is another type of residual block that can be used when the input and output sizes do not match, as shown in fig. 8.
The convolutional layer in the shortcut path is used to resize the input X to different sizes in order to match the output sizes of the shortcut path and the main path.
2.2 improving GG-CNN by introducing residual network
In the application, the idea of the residual error network is introduced into the GG-CNN, and the construction of a deeper neural network model is performed by constructing the residual error module, so that the accuracy of the gripping pose generated by the GG-CNN model is improved, and a better mechanical arm optimal gripping pose generation network is obtained. The constructed residual module structure is shown in fig. 9.
In the present application, the constructed residual module is divided into two major paths, namely a main path and an auxiliary path, wherein the auxiliary path is composed of two paths, namely a path adopting pooling and convolution operation and a shortcut path without operation.
For better illustration, assume that the input is X, and to distinguish the outputs of each path, they are named F (X), W (X), and H (X), respectively, and this section is mainly to explain a single residual block.
The operation on the main path includes:
3) Referring to fig. 9, the input X is first regularized, then passes through the activation layer using the ReLU activation function, and finally passes through the convolutional layer with filter number of filters/2 (where filters are input parameters filters of the module function) and step length of 1 × 1, and is output to the next layer; wherein the filter size is 3 × 3;
4) Regularization operation is carried out on the previous layer, the filter is used for activating an activation layer of a ReLU activation function, finally, the filter number is a convolution layer with filters (wherein the filters are input parameters of module functions), the step length is strides (wherein the strides is input parameters of the module functions), and F (X) is output; wherein the filter size is 5 x 5.
The operations on the secondary path include:
3) The module function pooling parameter is true: the input X passes through the maximum pooling layer with the size of strings (wherein strings is the input parameter strings of the module function), and then passes through the convolution layer with the number of filters (wherein filters is the input parameter filters of the module function) and the step length of 1 multiplied by 1, and then the W (X) is output. Wherein the filter size is 5 × 5;
4) Module function pooling parameter is false: and directly adding the output of the main path and the selected auxiliary path to the output of the X without any operation, wherein the output is used as the integral output H (X) of the residual module function.
In the application, the GG-CNN model is improved by utilizing the residual modules built by the user, and an intermediate structure is built by stacking the residual modules on the premise of ensuring the original input and output sizes to be unchanged, wherein the model structure is shown in FIG. 10.
Specifically, the GG-CNN network improved by residual shown in fig. 10 includes: a convolution part, a deconvolution part and an output part;
the convolution portion includes: ten residual error modules are used for carrying out the residual error correction,
wherein the first residual module comprises: 1 convolution residual module with pooling layer, the parameters in the convolution residual module including: 4 filters with step size of 3 × 3;
the second residual module includes: 5 identity residual modules, wherein the parameters of the identity residual modules comprise: 4 filters with step size of 1 × 1;
the third residual module includes: 1 convolution residual module with pooling layer, the parameters in the convolution residual module including: 8 filters with step size of 2 × 2;
the fourth residual module includes: 5 identity residual modules, wherein the parameters of the identity residual modules comprise: 8 filters with step size of 1 × 1;
the fifth residual module includes: 1 convolution residual module with pooling layer, the parameters in the convolution residual module including: 16 filters with step size of 2 × 2;
the sixth residual module includes: 5 identity residual modules, wherein the parameters of the identity residual modules comprise: 16 filters with step size of 1 × 1;
the seventh residual module includes: 1 convolution residual module with pooling layer, the parameters in the convolution residual module including: 32 filters with step size of 5 × 5;
the eighth residual module includes: 5 identity residual modules, wherein the parameters of the identity residual modules comprise: 32 filters with step size of 1 × 1;
the ninth residual module includes: 1 convolution residual module with pooling layer, the parameters in the convolution residual module including: 64 filters with step size of 1 × 1;
the tenth residual module includes: 5 identity residual modules, wherein the parameters of the identity residual modules comprise: 64 filters with step size of 1 × 1;
the deconvolution part comprises 5 deconvolution layers with different parameters;
the number of filters of the first deconvolution layer is 64, the size of each filter is 3 × 3, and the step size is 1 × 1;
the number of filters of the second deconvolution layer is 32, the size of each filter is 5 × 5, and the step size is 5 × 5;
the number of filters of the third deconvolution layer is 16, the size of each filter is 5 × 5, and the step size is 2 × 2;
the number of filters of the fourth deconvolution layer is 8, the size of each filter is 7 × 7, and the step size is 2 × 2;
the number of the fifth deconvolution layer filters is 4, the size of each filter is 9 × 9, and the step size is 3 × 3;
the output part comprises four linearly mapped convolution layers, each convolution layer comprises 1 filter, and the four linearly mapped convolution layers sequentially and respectively map and output the grabbing success rate, the cosine value of the grabbing angle, the sine value of the grabbing angle and the grabbing width.
That is to say, in this embodiment, the network output of the residual error portion is transformed by three deconvolution layers to obtain a capture set G required in this application, the deconvolution output is linearly activated, and is mapped to a position picture p of an output layer, an angle diagram Φ formed by a sine picture and a cosine picture of an angle is captured, and a width picture W is captured, so that the GG-CNN network improved by the residual error in this application is formed.
3 results and analysis of the experiment
In the application, a residual error network improved GG-CNN model is used in a mechanical arm grabbing simulation experiment algorithm, an experimental environment is an Ubuntu16.04 system, a pose generation algorithm and a grabbing algorithm programming environment are Python 2, a laboratory server display card GTX1080 is used for accelerating a training process, and multiple improvement tests are performed.
In the training and testing of the network model, the accuracy of the model is measured by referring to the concept of target detection area intersection ratio (IoU). The cross-over ratio is defined as follows
Figure BDA0001994722000000171
And taking the ratio of the capture frame generated by the network to the aggregate and union of the marked capture frames as the accuracy of the network generation capture of the application.
The method comprises the steps of carrying out experiments, improvement and optimization on original GG-CNN network parameters, improving the accuracy of the GG-CNN network by adjusting the type of an optimizer, the learning rate, regularization parameters, the size of batch data, a loss function, the number of layers of an activation function and a neural network of the network, finally selecting an Adam optimizer after multiple experiments, attenuating the learning rate, setting the size of the batch data to be 32, adopting MSE (mean square error) for the loss function, adopting ReLU for the activation function, constructing a residual error network by utilizing constructed residual error modules, and constructing a deep residual error network by superposing multiple layers of modules.
As shown in fig. 11, the accuracy curves of the residual error network improved GG-CNN model and the original GG-CNN model (as shown in fig. 5) shown in fig. 10 are gradually improved with the increase of epoch, and the accuracy of the models before and after the improvement is basically stable through 100 epoch training. Compared with the accuracy curve of the improved model and the original model, the accuracy of the model before improvement can be clearly seen to be stabilized at about 71%, and the accuracy of the improved model can be stabilized at about 88% finally.
The GG-CNN model improved by the residual error network improves the pose generation accuracy by 17%, which shows that the deep residual error network is built by utilizing the multilayer residual error modules, a deeper grasping generation convolutional neural network model is built, the accuracy of the GG-CNN model can be effectively improved, and the more accurate optimal grasping pose of the mechanical arm is obtained.
In order to test the effect of the capture pose generation network before and after improvement, the data set of the application creates a data set which accords with GG-CNN input and output from a Connell university capture data set. Displaying RGB-D images of real objects in captured data sets of the university of Connell and marks 'positive capture' and 'negative capture' on the images at the same time, wherein the graspable poses of the marks are represented by rectangular frames, and the whole RGB images are placed at the upper left positions; taking the depth image corresponding to the data set as output, representing a graspable pose by using a light grey rectangular box, representing the graspable pose generated by neural network training by using a dark grey rectangular box, and placing the whole depth image at the upper right position; and each pixel point in the grabbing width and the grabbing angle image output by the neural network training has a corresponding grabbing parameter value, the grabbing width image is placed at the lower left position, and the grabbing angle image is placed at the lower right position. The effect before and after the network improvement is generated by the image display capture of a set of four images, and the display is performed by using two objects 1 and 2, and the effect is shown in fig. 12. In fig. 12, (a) represents the output poses generated by the recognition of the object 1 before the improvement, (b) represents the output poses generated by the recognition of the object 2 before the improvement, (c) represents the output poses generated by the recognition of the object 1 after the improvement, and (d) represents the output poses generated by the recognition of the object 2 after the improvement.
Comparing the output of the network models before and after improvement, firstly observing the effect of a dark gray rectangular frame generated by a convolutional neural network generated by grabbing in a depth image, wherein for an object 1, the width of a grabbing frame generated before the improvement is small and cannot meet actual grabbing, and the grabbing frame generated after the GG-CNN model is improved is not only proper in width but also can meet grabbing requirements in position; for the object 2, the positions of the grabbing frames generated before and after the improvement can meet the actual requirements, and the effect is good. And observing the grabbing width and angle images output by the grabbing generated convolutional neural network, wherein for the object 1 and the object 2, the distribution condition of the grabbed pixel points of the grabbed images generated after the model is improved is more consistent with the pixel distribution condition of the depth images of the actual objects, and the grabbing width and angle values are more in line with the reality. The color of the captured information image output by the network model is more obvious, the improved network model is more sensitive to the difference perception of the size, the shape and the position of an object, and the change of captured information can be better reflected.
The GG-CNN model is improved by constructing the residual error network, so that the accuracy of model generation and grabbing pose is obviously improved, and the grabbing effect is obviously improved.
In the prior art, the GG-CNN model pursues calculation speed, adopts an over-simple neural network structure, reduces the magnitude of neural network parameters, and sacrifices the capture accuracy of a part of network models. According to the method, a residual block function suitable for a network model is constructed by adopting the idea of a residual network, the structure of the GG-CNN model is reconstructed, the accuracy of predicting the optimal grabbing pose of the mechanical arm by the model is greatly improved, although the deeper network means the increase of the calculation time, the grabbing with high quality and high precision is still an important requirement in actual grabbing, and the method has a certain application value in certain fields with higher precision requirements.
It should be understood that the above description of specific embodiments of the present invention is only for the purpose of illustrating the technical lines and features of the present invention, and is intended to enable those skilled in the art to understand the contents of the present invention and to implement the present invention, but the present invention is not limited to the above specific embodiments. It is intended that all such changes and modifications as fall within the scope of the appended claims be embraced therein.

Claims (7)

1. A residual error network deep learning method for estimating a grabbing pose of a mechanical arm is characterized by comprising the following steps:
s2, acquiring a depth image of a target object to be grabbed, which is acquired by a wrist camera of the initialized mechanical arm, wherein the tail end of the mechanical arm is adjusted to enable the wrist camera to be positioned at a preset height position above a vertical X0Y plane;
s3, preprocessing the acquired depth image to obtain an object depth image with 300 x 300 pixels;
s4, mapping the object depth image by adopting a pre-trained improved GG-CNN model, and outputting four captured information images with the pixels of 300 x 300, wherein the captured information images comprise a capturing success rate, a capturing angle cosine value, a capturing angle sine value and a capturing width;
s5, selecting a pixel point with the highest power in the image with the capturing success rate, and corresponding to a capturing angle cosine value, a capturing angle sine value and a corresponding pixel point in the image with the capturing width information to obtain the capturing angle and width information which is used as capturing information and has the highest capturing success rate;
s5, the obtained grabbing information is subjected to coordinate transformation of a wrist camera and then coordinate transformation between a wrist of the mechanical arm and the base, and finally the grabbing angle and the grabbing width of the target object to be grabbed under a mechanical arm base coordinate system are obtained;
the improved GG-CNN model is characterized in that a residual error network is built in the existing GG-CNN model by building a residual error module, and the fitting effect and the learning capacity of a convolutional neural network are enhanced.
2. Method according to claim 1, characterized in that, before step S2, it comprises:
s0-1, creating a first data set G for training the inputs and outputs of the improved GG-CNN model based on the existing data set train (ii) a The first data set comprises images marked as positive grabbing information and images marked as negative grabbing information, and the images in the first data set are provided with a plurality of grabbing frames with marks;
s0-2, improving the existing GG-CNN model by constructing a residual error module and constructing a residual error network so as to construct an improved GG-CNN model and ensure that the sizes of input and output images of the improved GG-CNN model are unchanged;
s0-3, using the first data set G train And training the GG-CNN model improved through the residual error to obtain the trained improved GG-CNN model.
3. The method of claim 2, wherein the GG-CNN model refined by residuals comprises:
a convolution part, a deconvolution part and an output part;
the convolution portion includes: ten residual error modules are used for carrying out the residual error correction,
wherein the first residual module comprises: 1 convolution residual module with pooling layer, the parameters in the convolution residual module including: 4 filters with step size of 3 × 3;
the second residual module includes: 5 identity residual modules, wherein the parameters in the identity residual modules comprise: 4 filters with step size of 1 × 1;
the third residual module includes: 1 convolution residual module with pooling layer, the parameters in the convolution residual module including: 8 filters with step size of 2 × 2;
the fourth residual module includes: 5 identity residual modules, wherein the parameters of the identity residual modules comprise: 8 filters with step size of 1 × 1;
the fifth residual module includes: 1 convolution residual module with pooling layer, the parameters in the convolution residual module including: 16 filters with step size of 2 × 2;
the sixth residual module includes: 5 identity residual modules, wherein the parameters of the identity residual modules comprise: 16 filters with step size of 1 × 1;
the seventh residual module includes: 1 convolution residual module with pooling layer, the parameters in the convolution residual module including: 32 filters with step size of 5 × 5;
the eighth residual module includes: 5 identity residual modules, wherein the parameters of the identity residual modules comprise: 32 filters with step size of 1 × 1;
the ninth residual module includes: 1 convolution residual module with pooling layer, the parameters in the convolution residual module including: 64 filters with step size of 1 × 1;
the tenth residual module includes: 5 identity residual modules, wherein the parameters of the identity residual modules comprise: 64 filters with step size of 1 × 1;
the deconvolution part comprises 5 deconvolution layers with different parameters;
the number of filters of the first deconvolution layer is 64, the size of each filter is 3 × 3, and the step size is 1 × 1;
the number of filters of the second deconvolution layer is 32, the size of each filter is 5 × 5, and the step size is 5 × 5;
the number of filters of the third deconvolution layer is 16, the size of each filter is 5 × 5, and the step size is 2 × 2;
the number of filters of the fourth deconvolution layer is 8, the size of each filter is 7 × 7, and the step size is 2 × 2;
the number of the fifth deconvolution layer filters is 4, the size of each filter is 9 × 9, and the step size is 3 × 3;
the output part comprises four linearly mapped convolution layers, each convolution layer comprises 1 filter, and the four linearly mapped convolution layers sequentially and respectively map and output the grabbing success rate, the cosine value of the grabbing angle, the sine value of the grabbing angle and the grabbing width.
4. A method according to claim 3, characterized in that in step S0-3, the following cross-over ratio formula is used to measure the grabbing accuracy of the GG-CNN network improved by the residual error;
intersection ratio formula:
Figure FDA0001994721990000031
wherein, C and G respectively represent two known areas, and the intersection and union ratio calculation is the ratio of the intersection and the union between the two areas.
5. The method of claim 1,
depth image I = R in S2 H×W Where H is height, W is width, and the capture description of the depth image is:
Figure FDA0001994721990000041
grabbing in image space by coordinate transformation of mechanical arm
Figure FDA0001994721990000042
Conversion to grab in world coordinate g:
Figure FDA0001994721990000043
wherein o = (u, v) is a position coordinate of a pixel having the highest capturing success rate,
Figure FDA0001994721990000044
is the angle of rotation in the camera reference frame,
Figure FDA0001994721990000045
is the grip width in the image coordinates; t is RC Is a coordinate transformation from the camera coordinate system to the robot arm coordinate system, T CI Calibration transformation based on camera internal parameters and hand-eye positions between the mechanical arm and the camera;
the output image in S4 is represented as: g = (phi, W, Q) epsilon R 3×H×W
Phi, W and Q are each e R 3×H×W Respectively representing the grabbing angle and grabbing widthDegree and grabbing accuracy, wherein the grabbing angle phi is split into a grabbing angle cosine value and a grabbing angle sine value, phi corresponding to the coordinate o with the highest grabbing success rate, W and Q comprise
Figure FDA0001994721990000046
Figure FDA0001994721990000047
And the value of q;
in the step S4, the improved GG-CNN model trained in advance is used to perform mapping processing on the object depth image, specifically: g = M (I);
determining the optimal capture pose in the image space from G:
Figure FDA0001994721990000048
specifically, from output grabbing information G, a pixel with the largest grabbing success rate Q of a Q image is selected, and a coordinate o of the pixel corresponds to phi and W in the output G, so that the position, the angle and the width information of the optimal grabbing pose are obtained;
further, by
Figure FDA0001994721990000049
Calculating the optimal grabbing pose g in world coordinates best
6. The method of claim 3,
the processing procedure of each residual module of the convolution part comprises the following steps:
each residual error module comprises a main path and an auxiliary path;
the auxiliary path consists of two paths of a path adopting pooling and convolution operation and a shortcut path without operation;
specifically, the main path includes:
1) The input data X is firstly subjected to regularization operation, then passes through an activation layer using a ReLU activation function, and finally is output to the next layer through a filter and a convolution layer;
2) The upper layer is regularized, then passes through an activation layer using a ReLU activation function, and finally passes through a filter and a convolution layer to output F (X);
the secondary path includes:
1) Module pooling parameter is true: the input data X passes through the maximum pooling layer, then passes through the convolution layer with the size of a filter of 5 multiplied by 5, the number of filters and the step length of 1 multiplied by 1, and then the W (X) is output;
2) Module pooling parameter false: directly outputting X without any operation;
the outputs of the primary path and the selected secondary path are added as the overall output H (X) of the residual block function.
7. The method according to any one of claims 1 to 6, characterized in that before the step S2, it further comprises the following step S1:
s1, initializing a mechanical arm, and adjusting the mechanical arm to enable a wrist camera to be located at a preset height position above a vertical X0Y plane;
correspondingly, after step S5, step S6 is also included:
and S6, outputting the information of the grabbing position, angle and width of the target object to be grabbed after coordinate transformation so as to control the mechanical arm to grab the target object.
CN201910192296.4A 2019-03-14 2019-03-14 Residual error network deep learning method for mechanical arm grabbing pose estimation Active CN109934864B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910192296.4A CN109934864B (en) 2019-03-14 2019-03-14 Residual error network deep learning method for mechanical arm grabbing pose estimation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910192296.4A CN109934864B (en) 2019-03-14 2019-03-14 Residual error network deep learning method for mechanical arm grabbing pose estimation

Publications (2)

Publication Number Publication Date
CN109934864A CN109934864A (en) 2019-06-25
CN109934864B true CN109934864B (en) 2023-01-20

Family

ID=66987254

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910192296.4A Active CN109934864B (en) 2019-03-14 2019-03-14 Residual error network deep learning method for mechanical arm grabbing pose estimation

Country Status (1)

Country Link
CN (1) CN109934864B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110633738B (en) * 2019-08-30 2021-12-10 杭州电子科技大学 Rapid classification method for industrial part images
CN111015676B (en) * 2019-12-16 2023-04-28 中国科学院深圳先进技术研究院 Grabbing learning control method, system, robot and medium based on hand-free eye calibration
CN113021333A (en) * 2019-12-25 2021-06-25 沈阳新松机器人自动化股份有限公司 Object grabbing method and system and terminal equipment
CN111127548B (en) * 2019-12-25 2023-11-24 深圳市商汤科技有限公司 Grabbing position detection model training method, grabbing position detection method and grabbing position detection device
CN113362353A (en) * 2020-03-04 2021-09-07 上海分众软件技术有限公司 Method for identifying advertising player frame by utilizing synthesis training picture
CN111444917A (en) * 2020-03-30 2020-07-24 合肥京东方显示技术有限公司 License plate character recognition method and device, electronic equipment and storage medium
CN112437349B (en) * 2020-11-10 2022-09-23 杭州时趣信息技术有限公司 Video stream recommendation method and related device
CN112734727A (en) * 2021-01-11 2021-04-30 安徽理工大学 Apple picking method based on improved deep neural network
CN113269112A (en) * 2021-06-03 2021-08-17 梅卡曼德(北京)机器人科技有限公司 Method and device for identifying capture area, electronic equipment and storage medium
CN113327295A (en) * 2021-06-18 2021-08-31 华南理工大学 Robot rapid grabbing method based on cascade full convolution neural network
CN113799138A (en) * 2021-10-09 2021-12-17 中山大学 Mechanical arm grabbing method for generating convolutional neural network based on grabbing
CN114155294A (en) * 2021-10-25 2022-03-08 东北大学 Engineering machinery working device pose estimation method based on deep learning
CN114241247B (en) * 2021-12-28 2023-03-07 国网浙江省电力有限公司电力科学研究院 Transformer substation safety helmet identification method and system based on deep residual error network
CN115026836B (en) * 2022-07-21 2023-03-24 深圳市华成工业控制股份有限公司 Control method, device and equipment of five-axis manipulator and storage medium
CN115319739A (en) * 2022-08-02 2022-11-11 中国科学院沈阳自动化研究所 Workpiece grabbing method based on visual mechanical arm
CN115070781B (en) * 2022-08-24 2022-12-13 绿盛环保材料(集团)有限公司 Object grabbing method and two-mechanical-arm cooperation system
CN117732827A (en) * 2024-01-10 2024-03-22 深圳市林科超声波洗净设备有限公司 Battery shell cleaning line feeding and discharging control system and method based on robot

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015153739A1 (en) * 2014-04-01 2015-10-08 University Of South Florida Systems and methods for planning a robot grasp based upon a demonstrated grasp
CN106874914A (en) * 2017-01-12 2017-06-20 华南理工大学 A kind of industrial machinery arm visual spatial attention method based on depth convolutional neural networks
CN108491880A (en) * 2018-03-23 2018-09-04 西安电子科技大学 Object classification based on neural network and position and orientation estimation method
US10089575B1 (en) * 2015-05-27 2018-10-02 X Development Llc Determining grasping parameters for grasping of an object by a robot grasping end effector

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015153739A1 (en) * 2014-04-01 2015-10-08 University Of South Florida Systems and methods for planning a robot grasp based upon a demonstrated grasp
US10089575B1 (en) * 2015-05-27 2018-10-02 X Development Llc Determining grasping parameters for grasping of an object by a robot grasping end effector
CN106874914A (en) * 2017-01-12 2017-06-20 华南理工大学 A kind of industrial machinery arm visual spatial attention method based on depth convolutional neural networks
CN108491880A (en) * 2018-03-23 2018-09-04 西安电子科技大学 Object classification based on neural network and position and orientation estimation method

Also Published As

Publication number Publication date
CN109934864A (en) 2019-06-25

Similar Documents

Publication Publication Date Title
CN109934864B (en) Residual error network deep learning method for mechanical arm grabbing pose estimation
CN109255813B (en) Man-machine cooperation oriented hand-held object pose real-time detection method
CN113888631B (en) Designated object grabbing method based on target cutting area
CN107953329B (en) Object recognition and attitude estimation method and device and mechanical arm grabbing system
CN113065546B (en) Target pose estimation method and system based on attention mechanism and Hough voting
CN111913435B (en) Single/multi-target key point positioning method based on stacked hourglass network
CN110480637B (en) Mechanical arm part image recognition and grabbing method based on Kinect sensor
CN111360862B (en) Method for generating optimal grabbing pose based on convolutional neural network
CN113221647B (en) 6D pose estimation method fusing point cloud local features
CN111862201A (en) Deep learning-based spatial non-cooperative target relative pose estimation method
CN111998862B (en) BNN-based dense binocular SLAM method
WO2021164887A1 (en) 6d pose and shape estimation method
CN114882109A (en) Robot grabbing detection method and system for sheltering and disordered scenes
CN114742888A (en) 6D attitude estimation method based on deep learning
CN114851201A (en) Mechanical arm six-degree-of-freedom vision closed-loop grabbing method based on TSDF three-dimensional reconstruction
CN115147488B (en) Workpiece pose estimation method and grabbing system based on dense prediction
CN114549669B (en) Color three-dimensional point cloud acquisition method based on image fusion technology
CN113327295A (en) Robot rapid grabbing method based on cascade full convolution neural network
CN117021099A (en) Human-computer interaction method oriented to any object and based on deep learning and image processing
CN117037062A (en) Target object grabbing method, system, electronic equipment and storage medium
CN115578460B (en) Robot grabbing method and system based on multi-mode feature extraction and dense prediction
CN117011380A (en) 6D pose estimation method of target object
CN116664843A (en) Residual fitting grabbing detection network based on RGBD image and semantic segmentation
CN116277030A (en) Model-free grabbing planning method and system based on depth vision
CN114998573A (en) Grabbing pose detection method based on RGB-D feature depth fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant