CN111652928A - Method for detecting object grabbing pose in three-dimensional point cloud - Google Patents
Method for detecting object grabbing pose in three-dimensional point cloud Download PDFInfo
- Publication number
- CN111652928A CN111652928A CN202010390619.3A CN202010390619A CN111652928A CN 111652928 A CN111652928 A CN 111652928A CN 202010390619 A CN202010390619 A CN 202010390619A CN 111652928 A CN111652928 A CN 111652928A
- Authority
- CN
- China
- Prior art keywords
- grabbing
- point cloud
- pose
- layer
- candidate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000001514 detection method Methods 0.000 claims abstract description 24
- 238000012549 training Methods 0.000 claims abstract description 19
- 238000010606 normalization Methods 0.000 claims description 26
- 230000004913 activation Effects 0.000 claims description 22
- 238000005070 sampling Methods 0.000 claims description 14
- 230000009466 transformation Effects 0.000 claims description 12
- 238000011176 pooling Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 7
- 238000005520 cutting process Methods 0.000 claims description 5
- 230000002194 synthesizing effect Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 2
- 238000010008 shearing Methods 0.000 claims description 2
- 238000006243 chemical reaction Methods 0.000 claims 1
- 238000012360 testing method Methods 0.000 abstract description 6
- 230000006870 function Effects 0.000 description 33
- 230000000007 visual effect Effects 0.000 description 7
- 238000013135 deep learning Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000002787 reinforcement Effects 0.000 description 2
- 206010061274 Malocclusion Diseases 0.000 description 1
- 239000007801 affinity label Substances 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J19/00—Accessories fitted to manipulators, e.g. for monitoring, for viewing; Safety devices combined with or specially adapted for use in connection with manipulators
- B25J19/02—Sensing devices
- B25J19/04—Viewing devices
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1612—Programme controls characterised by the hand, wrist, grip control
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1694—Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
- B25J9/1697—Vision controlled systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Robotics (AREA)
- General Physics & Mathematics (AREA)
- Mechanical Engineering (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Orthopedic Medicine & Surgery (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
A method for detecting object grabbing pose in three-dimensional point cloud comprises the steps of training an end-to-end object grabbing pose detection model by arranging object grabbing pose in a sample image to serve as a training set, and identifying three-dimensional point cloud data to be detected to obtain candidate grabbing pose scores so as to achieve object grabbing pose detection. The invention closely relates the overall and local characteristic relation in the point cloud through the end-to-end full scene training test, and improves the detection accuracy while optimizing the running speed.
Description
Technical Field
The invention relates to a technology in the field of image processing, in particular to a method for detecting an object grabbing pose in three-dimensional point cloud.
Background
Object grabbing is a basic problem in the field of robots, and has wide application prospects in the industries of manufacturing, building, service and the like. The most critical step in capture is the capture pose (pose of the capture device in space) detection for a given visual scene (e.g., a picture or a point cloud).
The existing grabbing pose technology is divided into two technical routes, wherein one route indirectly generates the grabbing pose by estimating the pose of an object in space, but the prediction result is very sensitive to the accuracy of estimation of the object pose, so that the prediction accuracy is greatly reduced; the other route is to estimate the capture pose directly in the scene without knowing the pose information of the object, and can be realized by reinforcement learning or deep learning. Iterative correction and evaluation are carried out on the current pose of the mechanical arm based on a reinforcement learning method, the current pose gradually approaches to an object, and finally a reliable grabbing pose is generated. A method based on deep learning obtains a large number of grabbing pose candidates through grid sampling on point clouds, then the grabbing poses are coded into a 2D image, and whether grabbing can be carried out or not is judged through CNN. The further improved technology encodes the grabbing pose into a three-dimensional point cloud, so that although the classification accuracy is improved, the candidate pose is still realized through grid sampling, and the calculation amount is huge.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides the method for detecting the object grabbing pose in the three-dimensional point cloud, which closely links the overall characteristic relation with the local characteristic relation in the point cloud through the end-to-end full scene training test, thereby optimizing the running speed and improving the detection accuracy.
The invention is realized by the following technical scheme:
the invention relates to a method for detecting object grabbing pose in three-dimensional point cloud, which trains an end-to-end object grabbing pose detection model by arranging object grabbing pose in a sample image as a training set, and identifies three-dimensional point cloud data to be detected to obtain candidate grabbing pose scores so as to realize object grabbing pose detection.
The finishing is as follows: RGB-D images of object capture pose detection with different object combinations and different placing modes are obtained from an existing image library serving as sample images, and a point cloud scene and corresponding training labels are synthesized.
The end-to-end object grabbing pose detection model comprises the following steps: the device comprises a candidate grabbing point prediction module for processing coded spatial information, a spatial transformation module for generating candidate grabbing pose characteristics, a grabbing parameter prediction module and a grabbing affinity prediction module, wherein: the candidate grabbing point prediction module is internally provided with a PointNet + + model for processing a point cloud scene and carries out candidate grabbing point location and main shaft direction prediction, the space transformation module is used for cutting point clouds near candidate grabbing poses and converting the point clouds into a clamping jaw coordinate system, the grabbing parameter prediction module is used for predicting grabbing pivoting angles, clamping widths and grabbing scores, and the grabbing affinity prediction module is used for judging the robustness of the grabbing poses.
The object grabbing pose detection is preferably further realized by performing threshold judgment on candidate grabbing pose scores.
Technical effects
The invention integrally solves the problem that the three-dimensional point cloud scene information cannot be effectively utilized in the object grabbing pose detection.
Compared with the prior art, the method has the advantages that the grabbing pose is coded in a three-dimensional point cloud mode, the data information amount is greatly improved compared with two-dimensional coding, an end-to-end full-scene training test process is provided, undifferentiated grid sampling candidate poses are avoided, the overall and local characteristic relations in the point cloud are closely related, the detection accuracy is improved while the running speed is optimized, and the best performance is achieved on a large general grabbing data set GraspNet.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic view of a jaw coordinate system;
FIG. 3 is a schematic diagram illustrating the effect of the present invention.
Detailed Description
As shown in fig. 1, the present embodiment relates to a method for detecting an object capture pose in a three-dimensional point cloud, which includes the following steps:
step 1: data preprocessing: the method comprises the following steps of obtaining RGB-D images containing different object combinations and different placing modes and used for object capturing pose detection from an existing image library, and synthesizing a point cloud scene and corresponding training labels, wherein the method specifically comprises the following steps:
step 1.1: synthesizing a point cloud using the intra-camera parameters, wherein: the camera intrinsic parameters are expressed as scale factors f of the camera on the u-axis and the v-axis of an image coordinate systemxAnd fyPrincipal point coordinates (c) of image coordinate systemx,cy) And an image depth value scaling s, which represents the coordinate of a point on the image coordinate system by (u, v), represents the corresponding depth value by d, and represents the three-dimensional coordinate in the camera coordinate system by (x, y, z), in combinationAnd converting the RGB-D image into a point cloud scene under a camera coordinate system.
Step 1.2: adding a plurality of training labels for the point cloud scene, specifically comprising: and obtaining 25600 point cloud scenes with capture pose labels by using the object type labels, the object poses in the point cloud, the capture poses on each object and the scores of the capture poses.
As shown in fig. 2, the grabbing pose includes a translation amount, a main shaft direction, a pivoting angle and a clamping width, the main shaft direction is along the z axis in the clamping jaw coordinate system, the pivoting angle refers to a rotation amount from the z axis to the y axis, and the clamping width refers to a distance between two fingers of the clamping jaw on the y axis.
Step 2: constructing an end-to-end object grabbing pose detection model, wherein the model comprises the following steps: the end-to-end object grabbing pose detection model comprises the following steps: the device comprises a candidate grabbing point prediction module for processing coded spatial information, a spatial transformation module for generating candidate grabbing pose characteristics, a grabbing parameter prediction module and a grabbing affinity prediction module, wherein: the candidate grabbing point prediction module predicts candidate grabbing point positions and main shaft directions, the space transformation module cuts point clouds near candidate grabbing poses and converts the point clouds into a clamping jaw coordinate system, the grabbing parameter prediction module predicts an axial rotation angle, a clamping width and grabbing scores of grabbing, and the grabbing affinity prediction module judges robustness of the grabbing poses.
The candidate grabbing point prediction module comprises: a base network unit and a prediction unit, wherein: the base network unit encodes the 20000 multiplied by 3 point cloud, and the prediction unit predicts the appropriate grabbing point location and clamping jaw main shaft direction in the scene by using the encoded point cloud.
The base network unit adopts a mode recorded in a document 'PointNet + +: deep hierarchical Feature Learning on Point set in a Metric Space' (NIPS 2017) by Charles R.Qi and the like to realize a PointNet + + three-dimensional deep Learning network, and the input of the network is a network which comprises a plurality of objects and has the size of N × C1Scene point cloud (N represents the number of points, C)1Number of characteristic channels representing points), and obtaining the size of N × C after a plurality of times of downsampling, 1 × 1 convolution, maximum neighborhood pooling and characteristic jump splicing2And further processed to size M × C using the farthest point sampling2The point set of (2). Wherein:
N=20000,M=1024,C1=3,C2=256。
the prediction unit takes the point set output by the base network unit as input and has the size of (C)2V +2, V +2), the first 2 dimensions of each sampling point are two categories whether the point can be grabbed, the last V dimensions represent the prediction of the grabbing principal axis direction, the process is to uniformly sample V viewing angles from a unit circle with the point as the center, and select the viewing angle with the highest score as the candidate principal axis direction. The model was taken as V300.
The cutting is as follows: and the space transformation module cuts out a point cloud block in a cylindrical space according to the candidate grabbing point position and the main shaft direction, the cylindrical main shaft is along the main shaft direction of the clamping jaw, and the center of the bottom surface circle is the candidate grabbing point.
The rotary shaftThe method comprises the following steps: converting the point cloud block after cutting into a clamping jaw coordinate system, and enabling v to be [ v ]1,v2,v3]Then the transformation matrix of the point cloud coordinates is represented as o ═ o1,o2,o3]Wherein:each candidate shears a cylinder point cloud corresponding to a different jaw depth, wherein: the radius of the bottom surface of the cylinder is 0.05m, and the depth is 0.01m, 0.02m, 0.03m and 0.04m in sequence. Processing all candidate grabbing poses to obtain 4M pieces of shearing point clouds, and sampling each piece of point cloud to NsPoint, finally 4M × N is obtaineds×C1The point cloud array of (1), wherein: n is a radical ofs=64。
The grabbing parameter prediction module comprises: multilayer sensor, maximum pooling layer, first full-link layer F1The first normalization layer B1A first activation function layer R1A second full-junction layer F2The second batch of normalization layers B2A second activation function layer R2And a third fully-connected layer F3。
The grabbing parameters are input through a grabbing parameter prediction module according to the point cloud after the tangent transformation, and the size of the grabbing parameters is (C)3,C4,C5) The obtained multi-layer sensor has a size of 4M × Ns×C5The characteristic point cloud array of (1), passing through the maximum pooling layer P1Obtaining a size of 4M × C5The point cloud global feature array sequentially passes through a first full connecting layer F1The first normalization layer B1A first activation function layer R1A second full-junction layer F2The second batch of normalization layers B2A second activation function layer R2A third full-junction layer F3The size of the obtained product is 4M × C8And (4) obtaining a predicted value of the pose parameter.
The multilayer perceptron is three groups of 1 × 1 convolution layers, batch normalization layers and activation function layer sequences, the activation function layer of each group of sequences adopts a ReLU function, the number of convolution layer output channels and the number of batch normalization units in the first group are C364, second group of convolutional layersThe number of channels and the number of batch normalization units are both C4128, the number of output channels of the convolution layer and the number of batch normalization units in the third group are both C5=256。
The first full connecting layer F1Number of output channels C6128, second fully-connected layer F2Number of output channels C7128, third fully-connected layer F3Number of output channels C8=36。
Said first and second activation function layers R1And R2The ReLU function is used.
The first and second normalization layers B1And B2The number of the units of (a) is 128.
The grabbing parameter prediction module outputs a predicted value of the grabbing parameter, and specifically comprises the following steps: and dividing the pivoting angle (0-180 ℃) into 12 classes in an interval range of 15 degrees, wherein the numerical values of 36 channels in the predicted values respectively represent the fraction of each angle, the corresponding holding width and the scoring predicted value of the grabbing pose at the angle, and the angle with the highest score, the holding width and the grabbing fraction are the final predicted values.
The grasping affinity prediction module comprises: multilayer perceptron, second largest pooling layer, fourth full-link layer F4The fourth batch of normalization layer B4A fourth activation function layer R4A fifth full-junction layer F5The fifth normalization layer B5A fifth activation function layer R5And a sixth fully-connected layer F6。
The grasping affinity degree is input by a grasping affinity degree prediction module by taking the converted point cloud as input, and the grasping affinity degree is (C)10,C11,C12) The obtained multi-layer sensor has a size of 4M × Ns×C12The characteristic point cloud array of (1), passing through the maximum pooling layer P2Obtaining a size of 4M × C12The point cloud global feature array sequentially passes through a fourth full connecting layer F4The fourth batch of normalization layer B4An activation function layer R4A fifth full-junction layer F5The fifth normalization layer B5A fifth activation function layer R5A sixth full connection layerF6The size of the obtained product is 4M × C15The capture affinity prediction value of (1).
The multilayer perceptron is three groups of 1 × 1 convolution layers, batch normalization layers and activation function layer sequences, the activation function layer of each group of sequences adopts a ReLU function, the number of convolution layer output channels and the number of batch normalization units in the first group are C10The output channel number and the batch normalization unit number of the convolution layer in the second group are C11128, the number of output channels of the convolution layer and the number of batch normalization units in the third group are both C12=256。
The fourth full connecting layer F4Number of output channels C13128, fifth fully-connected layer F5Number of output channels C1464, sixth fully-connected layer F5Number of output channels C15=12。
Said fourth and fifth activation function layers R4And R5All adopt ReLU function, fourth and fifth batch normalization layer B4And B5The number of the units of (a) is 128.
The grasping affinity output by the grasping affinity prediction module is corresponding grasping affinity under 12 angles, and the affinity under the prediction angle is taken as a final predicted value, wherein: the grabbing affinity represents the maximum fluctuation amplitude of the object which can be still grabbed by the clamping jaw by changing parameters under the current grabbing pose.
And step 3: training an end-to-end object grabbing pose detection model, and specifically comprises the following steps:
step 3.1: and initializing the parameters to be trained in the model by using Gaussian distribution sampling with the average value of 0 and the standard deviation of 0.01.
Step 3.2: inputting 25600 point cloud scenes with object grabbing pose labels obtained in the step 1 into the model as training samples for training, transforming the training samples in two stages in the step 2.1 and the step 2.2, and transmitting the transformed training samples to an output layer to obtain a sampling point graspable confidence coefficient predicted value { ciThe predicted values of the grabbing fractions(s) corresponding to the directions of the main shafts of the clamping jaws at different visual angles are (i is the serial number of a sampling point)ijH (i is sampling point serial number, j is view angle serial number), and a predicted value of the fraction of the rotation angle around the axis R is RijI is a point cloud block number, j is a depth number), and a clamping width predicted value WijCapturing pose score predicted values (S) by using a point cloud block serial number i and a depth serial number jijAnd (i is a point cloud block serial number, j is a depth serial number), capturing the affinity predicted value (T)ijAnd (i is the point cloud block serial number and j is the depth serial number).
The training sample comprises: scene point cloud P, sampling point snatchable confidence label(i is the number of sampling points), and grabbing fraction labels corresponding to the directions of the main shafts of the clamping jaws at different visual angles(i is sampling point serial number, j is visual angle serial number), and the label is divided according to the rotating angle around the shaft(i is point cloud block serial number, j is visual angle serial number), and clamping width label(i is point cloud block serial number, j is visual angle serial number), and grabbing pose score labels(i is point cloud block serial number, j is visual angle serial number), and grabbing affinity labels(i is the point cloud block number, j is the view angle number).
Step 3.3: adjusting model parameters by using a multi-target joint loss function in combination with a Back Propagation (BP) algorithm, wherein the multi-target joint loss function comprises: candidate grab point loss function LAGrabbing pose parameter loss function LRAnd grasping the affinity loss function LF。
The candidate grabbing point loss function is
Wherein: if and only if there is a label with an angle less than 5 deg. to the predicted principal axis direction is 1,softmax loss function representing the confidence that a sample point can grab,smooth L representing the principal axis direction and label on the graspable point1Regression loss function, λ1=0.5。
The grabbing pose parameter loss function is Wherein:sigmoid cross entropy loss function representing angle classification when the grab depth is d,smooth L representing grabbing pose score when grabbing depth is d1The function of the regression loss is used,smooth L representing grip width at grip depth d1Regression loss function, λ2=1.0,λ3=0.2。
The grabbing affinity loss function isWherein:smooth L representing grab affinity1A regression loss function.
The target function of the back propagation BP algorithm is L ═ LA({ci},{si})+αLR({Rij},{Sij},{Wij})+βLF({TijH) α ═ 0.5 and β ═ 0.1.
In this embodiment, the learning rate of the back propagation BP algorithm is 0.001 at the initial value, the whole training data set is iterated 90 times, and the learning rates sequentially become 0.0001, 0.00001, and 0.000001 after 40, 60, and 80 iterations.
And 4, step 4: carrying out object grabbing pose detection through the trained end-to-end object grabbing pose detection model: 7680 RGB-D images are adopted, scene point clouds are synthesized by the images to be detected according to the method of the step 1.1 and then input into the model, and the predicted value of the object grabbing pose is obtained through layer-by-layer change and calculation.
The model checking criteria used in this example were: the predicted grabbing pose is subjected to non-maximum inhibition screening, distributed to the object with the closest distance, given the surface friction coefficient mu, detected by using a complete object scanning model whether the object can be successfully grabbed or not, and the corresponding average accuracy AP is calculatedμThe corresponding AP is calculated by increasing μ from 0.2 to 1.0 stepwise at 0.2 intervalsμAnd taking the mean value to obtain the final score AP.
According to the model test standard used by the method, the optimal result is achieved on the large-scale general captured data set GraspNet, which is shown in the following table:
through a specific practical experiment, the system is operated on a PyTorch computing frame by using a single NVIDIA RTX 2080GPU, and data on GraspNet is tested, wherein the obtained experimental data are as follows: the best performance is achieved on test data of three difficulties, wherein AP of the Sen difficulty reaches 27.56/29.88 (data test results collected from RealSense/Kinect cameras respectively, the same is shown below), AP of the Unseen difficulty reaches 26.11/27.84, and AP of the Novel difficulty reaches 10.55/11.51.
Compared with the prior art, the method does not need to rely on the pose information of the object, improves the prediction speed, greatly improves the accuracy of the detection of the grabbing pose through the steps of the direction prediction of the main shaft of the clamping jaw, the grabbing affinity prediction and the like, and achieves the best performance on a large-scale general grabbing data set GraspNet.
The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
Claims (11)
1. A method for detecting object grabbing pose in three-dimensional point cloud is characterized in that an end-to-end object grabbing pose detection model is trained by arranging object grabbing pose in a sample image as a training set, and then three-dimensional point cloud data to be detected are identified to obtain candidate grabbing pose scores so as to realize object grabbing pose detection;
the end-to-end object grabbing pose detection model comprises the following steps: the device comprises a candidate grabbing point prediction module for processing coded spatial information, a spatial transformation module for generating candidate grabbing pose characteristics, a grabbing parameter prediction module and a grabbing affinity prediction module.
2. The method for detecting the object grabbing pose in the three-dimensional point cloud according to claim 1, wherein when the candidate grabbing point prediction module predicts the main shaft direction of the clamping jaw, a unit sphere is generated by taking a grabbing point as a sphere center, the sphere is discretized into 300 uniformly distributed viewpoints, and the direction of the viewpoint pointing to the sphere center represents the main shaft direction of the clamping jaw, so that the prediction of the main shaft direction is converted into the classification problem of the viewpoint; the candidate grabbing point prediction module is internally provided with a PointNet + + model for processing a point cloud scene and carries out candidate grabbing point location and main shaft direction prediction.
3. The method for detecting the object grabbing pose in the three-dimensional point cloud of claim 1, wherein the space transformation module is used for clipping the point cloud near the candidate grabbing pose and converting the point cloud into a clamping jaw coordinate system, and the grabbing parameter prediction module is used for predicting the grabbing pivoting angle, the clamping width and the grabbing score.
4. The method for detecting the object grabbing pose in the three-dimensional point cloud of claim 1, wherein the grabbing affinity prediction module outputs the affinity of candidate grabbing poses, namely the maximum disturbance range when the object can be grabbed after the predicted pose is disturbed, the grabbing affinity represents the robustness of the grabbing pose, and the stronger the grabbing affinity, the stronger the robustness of the predicted grabbing pose.
5. The method for detecting the object grabbing pose in the three-dimensional point cloud according to claim 1, wherein the sorting is as follows: RGB-D images of object capture pose detection with different object combinations and different placing modes are obtained from an existing image library serving as sample images, and a point cloud scene and corresponding training labels are synthesized.
6. The method for detecting the object capture pose in the three-dimensional point cloud of claim 1, wherein the object capture pose detection is further realized by performing threshold judgment on candidate capture pose scores.
7. The method for detecting the object capture pose in the three-dimensional point cloud according to claim 5, wherein the sorting comprises:
step 1.1: synthesizing a point cloud using the intra-camera parameters, wherein: the camera intrinsic parameters are expressed as scale factors f of the camera on the u-axis and the v-axis of an image coordinate systemxAnd fyPrincipal point of image coordinate systemLabel (c)x,cy) And an image depth value scaling s, which represents the coordinate of a point on the image coordinate system by (u, v), represents the corresponding depth value by d, and represents the three-dimensional coordinate in the camera coordinate system by (x, y, z), in combinationConverting the RGB-D image into a point cloud scene under a camera coordinate system;
step 1.2: adding a plurality of training labels for the point cloud scene, specifically comprising: and obtaining 25600 point cloud scenes with capture pose labels by using the object type labels, the object poses in the point cloud, the capture poses on each object and the scores of the capture poses.
8. The method for detecting the object capture pose in the three-dimensional point cloud according to claim 1 or 2, wherein the candidate capture point prediction module comprises: a base network unit and a prediction unit, wherein: the base network unit encodes the 20000 multiplied by 3 point cloud, and the prediction unit predicts the appropriate grabbing point location and clamping jaw main shaft direction in the scene by using the encoded point cloud.
9. The method for detecting the object grabbing pose in the three-dimensional point cloud according to claim 3, wherein the cutting is as follows: the space transformation module cuts out a point cloud block in a cylindrical space according to the candidate grabbing point position and the main shaft direction, the cylindrical main shaft is along the main shaft direction of the clamping jaw, and the circle center of the bottom surface is the candidate grabbing point;
the conversion is as follows: converting the point cloud block after cutting into a clamping jaw coordinate system, and enabling v to be [ v ]1,v2,v3]Then the transformation matrix of the point cloud coordinate is represented as O ═ O1,o2,o3]Wherein:each candidate shears a cylinder point cloud corresponding to a different jaw depth, wherein: the radius of the bottom surface of the cylinder is 0.05m, the depth is 0.01m, 0.02m, 0.03m and 0.04m in sequence, and all candidate grabbing poses are processedObtaining 4M pieces of shearing point clouds, sampling each piece of point cloud to NsPoint, finally 4M × N is obtaineds×C1The point cloud array of (1), wherein: n is a radical ofs=64。
10. The method for detecting the object grabbing pose in the three-dimensional point cloud according to claim 1 or 3, wherein the grabbing parameter prediction module comprises: multilayer sensor, maximum pooling layer, first full-link layer F1The first normalization layer B1A first activation function layer R1A second full-junction layer F2The second batch of normalization layers B2A second activation function layer R2And a third fully-connected layer F3;
The grabbing parameters are input through a grabbing parameter prediction module according to the point cloud after the tangent transformation, and the size of the grabbing parameters is (C)3,C4,C5) The obtained multi-layer sensor has a size of 4M × Ns×C5The characteristic point cloud array of (1), passing through the maximum pooling layer P1Obtaining a size of 4M × C5The point cloud global feature array sequentially passes through a first full connecting layer F1The first normalization layer B1A first activation function layer R1A second full-junction layer F2The second batch of normalization layers B2A second activation function layer R2A third full-junction layer F3The size of the obtained product is 4M × C8And (4) obtaining a predicted value of the pose parameter.
11. The method for detecting the object capture pose in the three-dimensional point cloud according to claim 1 or 4, wherein the capture affinity prediction module comprises: multilayer perceptron, second largest pooling layer, fourth full-link layer F4The fourth batch of normalization layer B4A fourth activation function layer R4A fifth full-junction layer F5The fifth normalization layer B5A fifth activation function layer R5And a sixth fully-connected layer F6;
The grasping affinity degree is input by a grasping affinity degree prediction module by taking the converted point cloud as input, and the grasping affinity degree is (C)10,C11,C12) The obtained multi-layer sensor has a size of 4M × Ns×C12The characteristic point cloud array of (1), passing through the maximum pooling layer P2Obtaining a size of 4M × C12The point cloud global feature array sequentially passes through a fourth full connecting layer F4The fourth batch of normalization layer B4An activation function layer R4A fifth full-junction layer F5The fifth normalization layer B5A fifth activation function layer R5Sixth full-junction layer F6The size of the obtained product is 4M × C15The capture affinity prediction value of (1).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010390619.3A CN111652928B (en) | 2020-05-11 | 2020-05-11 | Object grabbing pose detection method in three-dimensional point cloud |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010390619.3A CN111652928B (en) | 2020-05-11 | 2020-05-11 | Object grabbing pose detection method in three-dimensional point cloud |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111652928A true CN111652928A (en) | 2020-09-11 |
CN111652928B CN111652928B (en) | 2023-12-15 |
Family
ID=72349479
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010390619.3A Active CN111652928B (en) | 2020-05-11 | 2020-05-11 | Object grabbing pose detection method in three-dimensional point cloud |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111652928B (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112489126A (en) * | 2020-12-10 | 2021-03-12 | 浙江商汤科技开发有限公司 | Vehicle key point information detection method, vehicle control method and device and vehicle |
CN112489117A (en) * | 2020-12-07 | 2021-03-12 | 东南大学 | Robot grabbing pose detection method based on domain migration under single-view-point cloud |
CN112720459A (en) * | 2020-12-02 | 2021-04-30 | 达闼机器人有限公司 | Target object grabbing method and device, storage medium and electronic equipment |
CN112801988A (en) * | 2021-02-02 | 2021-05-14 | 上海交通大学 | Object grabbing pose detection method based on RGBD and deep neural network |
CN112894815A (en) * | 2021-01-25 | 2021-06-04 | 西安工业大学 | Method for detecting optimal position and posture for article grabbing by visual servo mechanical arm |
CN113345100A (en) * | 2021-05-19 | 2021-09-03 | 上海非夕机器人科技有限公司 | Prediction method, apparatus, device, and medium for target grasp posture of object |
CN113674348A (en) * | 2021-05-28 | 2021-11-19 | 中国科学院自动化研究所 | Object grabbing method, device and system |
CN114211490A (en) * | 2021-12-17 | 2022-03-22 | 中山大学 | Robot arm gripper pose prediction method based on Transformer model |
WO2022156749A1 (en) * | 2021-01-22 | 2022-07-28 | 熵智科技(深圳)有限公司 | Workpiece grabbing method and apparatus, and computer device and storage medium |
CN115082795A (en) * | 2022-07-04 | 2022-09-20 | 梅卡曼德(北京)机器人科技有限公司 | Virtual image generation method, device, equipment, medium and product |
CN115213721A (en) * | 2022-09-21 | 2022-10-21 | 江苏友邦精工实业有限公司 | A upset location manipulator for automobile frame processing |
CN116494253A (en) * | 2023-06-27 | 2023-07-28 | 北京迁移科技有限公司 | Target object grabbing pose acquisition method and robot grabbing system |
CN116758122A (en) * | 2023-08-14 | 2023-09-15 | 国网智能电网研究院有限公司 | Power transmission line iron tower pose registration method and device based on cross-source point cloud |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106737692A (en) * | 2017-02-10 | 2017-05-31 | 杭州迦智科技有限公司 | A kind of mechanical paw Grasp Planning method and control device based on depth projection |
CN108171748A (en) * | 2018-01-23 | 2018-06-15 | 哈工大机器人(合肥)国际创新研究院 | A kind of visual identity of object manipulator intelligent grabbing application and localization method |
CN109102547A (en) * | 2018-07-20 | 2018-12-28 | 上海节卡机器人科技有限公司 | Robot based on object identification deep learning model grabs position and orientation estimation method |
CN110363815A (en) * | 2019-05-05 | 2019-10-22 | 东南大学 | The robot that Case-based Reasoning is divided under a kind of haplopia angle point cloud grabs detection method |
CN110796700A (en) * | 2019-10-21 | 2020-02-14 | 上海大学 | Multi-object grabbing area positioning method based on convolutional neural network |
CN110909644A (en) * | 2019-11-14 | 2020-03-24 | 南京理工大学 | Method and system for adjusting grabbing posture of mechanical arm end effector based on reinforcement learning |
CN110969660A (en) * | 2019-12-17 | 2020-04-07 | 浙江大学 | Robot feeding system based on three-dimensional stereoscopic vision and point cloud depth learning |
-
2020
- 2020-05-11 CN CN202010390619.3A patent/CN111652928B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106737692A (en) * | 2017-02-10 | 2017-05-31 | 杭州迦智科技有限公司 | A kind of mechanical paw Grasp Planning method and control device based on depth projection |
CN108171748A (en) * | 2018-01-23 | 2018-06-15 | 哈工大机器人(合肥)国际创新研究院 | A kind of visual identity of object manipulator intelligent grabbing application and localization method |
CN109102547A (en) * | 2018-07-20 | 2018-12-28 | 上海节卡机器人科技有限公司 | Robot based on object identification deep learning model grabs position and orientation estimation method |
CN110363815A (en) * | 2019-05-05 | 2019-10-22 | 东南大学 | The robot that Case-based Reasoning is divided under a kind of haplopia angle point cloud grabs detection method |
CN110796700A (en) * | 2019-10-21 | 2020-02-14 | 上海大学 | Multi-object grabbing area positioning method based on convolutional neural network |
CN110909644A (en) * | 2019-11-14 | 2020-03-24 | 南京理工大学 | Method and system for adjusting grabbing posture of mechanical arm end effector based on reinforcement learning |
CN110969660A (en) * | 2019-12-17 | 2020-04-07 | 浙江大学 | Robot feeding system based on three-dimensional stereoscopic vision and point cloud depth learning |
Non-Patent Citations (1)
Title |
---|
何若涛: "面向机器人操作的三维物体检测与位姿估计", 广东工业大学硕士学位论文信息科技辑, no. 02, pages 19 - 47 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112720459A (en) * | 2020-12-02 | 2021-04-30 | 达闼机器人有限公司 | Target object grabbing method and device, storage medium and electronic equipment |
CN112489117A (en) * | 2020-12-07 | 2021-03-12 | 东南大学 | Robot grabbing pose detection method based on domain migration under single-view-point cloud |
CN112489126B (en) * | 2020-12-10 | 2023-09-19 | 浙江商汤科技开发有限公司 | Vehicle key point information detection method, vehicle control method and device and vehicle |
CN112489126A (en) * | 2020-12-10 | 2021-03-12 | 浙江商汤科技开发有限公司 | Vehicle key point information detection method, vehicle control method and device and vehicle |
WO2022156749A1 (en) * | 2021-01-22 | 2022-07-28 | 熵智科技(深圳)有限公司 | Workpiece grabbing method and apparatus, and computer device and storage medium |
CN112894815A (en) * | 2021-01-25 | 2021-06-04 | 西安工业大学 | Method for detecting optimal position and posture for article grabbing by visual servo mechanical arm |
CN112894815B (en) * | 2021-01-25 | 2022-09-27 | 西安工业大学 | Method for detecting optimal position and posture for article grabbing by visual servo mechanical arm |
CN112801988A (en) * | 2021-02-02 | 2021-05-14 | 上海交通大学 | Object grabbing pose detection method based on RGBD and deep neural network |
CN113345100A (en) * | 2021-05-19 | 2021-09-03 | 上海非夕机器人科技有限公司 | Prediction method, apparatus, device, and medium for target grasp posture of object |
CN113674348A (en) * | 2021-05-28 | 2021-11-19 | 中国科学院自动化研究所 | Object grabbing method, device and system |
CN113674348B (en) * | 2021-05-28 | 2024-03-15 | 中国科学院自动化研究所 | Object grabbing method, device and system |
CN114211490B (en) * | 2021-12-17 | 2024-01-05 | 中山大学 | Method for predicting pose of manipulator gripper based on transducer model |
CN114211490A (en) * | 2021-12-17 | 2022-03-22 | 中山大学 | Robot arm gripper pose prediction method based on Transformer model |
CN115082795A (en) * | 2022-07-04 | 2022-09-20 | 梅卡曼德(北京)机器人科技有限公司 | Virtual image generation method, device, equipment, medium and product |
CN115213721A (en) * | 2022-09-21 | 2022-10-21 | 江苏友邦精工实业有限公司 | A upset location manipulator for automobile frame processing |
CN116494253A (en) * | 2023-06-27 | 2023-07-28 | 北京迁移科技有限公司 | Target object grabbing pose acquisition method and robot grabbing system |
CN116494253B (en) * | 2023-06-27 | 2023-09-19 | 北京迁移科技有限公司 | Target object grabbing pose acquisition method and robot grabbing system |
CN116758122B (en) * | 2023-08-14 | 2023-11-14 | 国网智能电网研究院有限公司 | Power transmission line iron tower pose registration method and device based on cross-source point cloud |
CN116758122A (en) * | 2023-08-14 | 2023-09-15 | 国网智能电网研究院有限公司 | Power transmission line iron tower pose registration method and device based on cross-source point cloud |
Also Published As
Publication number | Publication date |
---|---|
CN111652928B (en) | 2023-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111652928B (en) | Object grabbing pose detection method in three-dimensional point cloud | |
CN111007073A (en) | Method and system for online detection of part defects in additive manufacturing process | |
CN110223345B (en) | Point cloud-based distribution line operation object pose estimation method | |
CN112836734A (en) | Heterogeneous data fusion method and device and storage medium | |
CN110795990B (en) | Gesture recognition method for underwater equipment | |
CN111899301A (en) | Workpiece 6D pose estimation method based on deep learning | |
CN111667535B (en) | Six-degree-of-freedom pose estimation method for occlusion scene | |
CN111768388A (en) | Product surface defect detection method and system based on positive sample reference | |
CN108305278B (en) | Image matching correlation improvement method in ORB-SLAM algorithm | |
CN111639571B (en) | Video action recognition method based on contour convolution neural network | |
CN107220601B (en) | Target capture point prediction method based on online confidence degree discrimination | |
CN114266891A (en) | Railway operation environment abnormity identification method based on image and laser data fusion | |
CN114170174A (en) | CLANet steel rail surface defect detection system and method based on RGB-D image | |
CN114581782B (en) | Fine defect detection method based on coarse-to-fine detection strategy | |
CN115115859A (en) | Long linear engineering construction progress intelligent identification and analysis method based on unmanned aerial vehicle aerial photography | |
CN117315025A (en) | Mechanical arm 6D pose grabbing method based on neural network | |
CN116205907A (en) | Decorative plate defect detection method based on machine vision | |
CN113269830B (en) | 6D pose estimation method and device based on geometric constraint cooperative attention network | |
Luo et al. | Grasp detection based on faster region cnn | |
CN114548253A (en) | Digital twin model construction system based on image recognition and dynamic matching | |
CN113752255A (en) | Mechanical arm six-degree-of-freedom real-time grabbing method based on deep reinforcement learning | |
CN113034575A (en) | Model construction method, pose estimation method and object picking device | |
CN113177969B (en) | Point cloud single-target tracking method of candidate seeds based on motion direction change | |
Lin et al. | Robotic grasp detection by rotation region CNN | |
CN115018910A (en) | Method and device for detecting target in point cloud data and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |