CN111652928A - Method for detecting object grabbing pose in three-dimensional point cloud - Google Patents

Method for detecting object grabbing pose in three-dimensional point cloud Download PDF

Info

Publication number
CN111652928A
CN111652928A CN202010390619.3A CN202010390619A CN111652928A CN 111652928 A CN111652928 A CN 111652928A CN 202010390619 A CN202010390619 A CN 202010390619A CN 111652928 A CN111652928 A CN 111652928A
Authority
CN
China
Prior art keywords
grabbing
point cloud
pose
layer
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010390619.3A
Other languages
Chinese (zh)
Other versions
CN111652928B (en
Inventor
王晨曦
方浩树
苟铭浩
卢策吾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202010390619.3A priority Critical patent/CN111652928B/en
Publication of CN111652928A publication Critical patent/CN111652928A/en
Application granted granted Critical
Publication of CN111652928B publication Critical patent/CN111652928B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J19/00Accessories fitted to manipulators, e.g. for monitoring, for viewing; Safety devices combined with or specially adapted for use in connection with manipulators
    • B25J19/02Sensing devices
    • B25J19/04Viewing devices
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1612Programme controls characterised by the hand, wrist, grip control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1694Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
    • B25J9/1697Vision controlled systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Robotics (AREA)
  • General Physics & Mathematics (AREA)
  • Mechanical Engineering (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Orthopedic Medicine & Surgery (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

A method for detecting object grabbing pose in three-dimensional point cloud comprises the steps of training an end-to-end object grabbing pose detection model by arranging object grabbing pose in a sample image to serve as a training set, and identifying three-dimensional point cloud data to be detected to obtain candidate grabbing pose scores so as to achieve object grabbing pose detection. The invention closely relates the overall and local characteristic relation in the point cloud through the end-to-end full scene training test, and improves the detection accuracy while optimizing the running speed.

Description

Method for detecting object grabbing pose in three-dimensional point cloud
Technical Field
The invention relates to a technology in the field of image processing, in particular to a method for detecting an object grabbing pose in three-dimensional point cloud.
Background
Object grabbing is a basic problem in the field of robots, and has wide application prospects in the industries of manufacturing, building, service and the like. The most critical step in capture is the capture pose (pose of the capture device in space) detection for a given visual scene (e.g., a picture or a point cloud).
The existing grabbing pose technology is divided into two technical routes, wherein one route indirectly generates the grabbing pose by estimating the pose of an object in space, but the prediction result is very sensitive to the accuracy of estimation of the object pose, so that the prediction accuracy is greatly reduced; the other route is to estimate the capture pose directly in the scene without knowing the pose information of the object, and can be realized by reinforcement learning or deep learning. Iterative correction and evaluation are carried out on the current pose of the mechanical arm based on a reinforcement learning method, the current pose gradually approaches to an object, and finally a reliable grabbing pose is generated. A method based on deep learning obtains a large number of grabbing pose candidates through grid sampling on point clouds, then the grabbing poses are coded into a 2D image, and whether grabbing can be carried out or not is judged through CNN. The further improved technology encodes the grabbing pose into a three-dimensional point cloud, so that although the classification accuracy is improved, the candidate pose is still realized through grid sampling, and the calculation amount is huge.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides the method for detecting the object grabbing pose in the three-dimensional point cloud, which closely links the overall characteristic relation with the local characteristic relation in the point cloud through the end-to-end full scene training test, thereby optimizing the running speed and improving the detection accuracy.
The invention is realized by the following technical scheme:
the invention relates to a method for detecting object grabbing pose in three-dimensional point cloud, which trains an end-to-end object grabbing pose detection model by arranging object grabbing pose in a sample image as a training set, and identifies three-dimensional point cloud data to be detected to obtain candidate grabbing pose scores so as to realize object grabbing pose detection.
The finishing is as follows: RGB-D images of object capture pose detection with different object combinations and different placing modes are obtained from an existing image library serving as sample images, and a point cloud scene and corresponding training labels are synthesized.
The end-to-end object grabbing pose detection model comprises the following steps: the device comprises a candidate grabbing point prediction module for processing coded spatial information, a spatial transformation module for generating candidate grabbing pose characteristics, a grabbing parameter prediction module and a grabbing affinity prediction module, wherein: the candidate grabbing point prediction module is internally provided with a PointNet + + model for processing a point cloud scene and carries out candidate grabbing point location and main shaft direction prediction, the space transformation module is used for cutting point clouds near candidate grabbing poses and converting the point clouds into a clamping jaw coordinate system, the grabbing parameter prediction module is used for predicting grabbing pivoting angles, clamping widths and grabbing scores, and the grabbing affinity prediction module is used for judging the robustness of the grabbing poses.
The object grabbing pose detection is preferably further realized by performing threshold judgment on candidate grabbing pose scores.
Technical effects
The invention integrally solves the problem that the three-dimensional point cloud scene information cannot be effectively utilized in the object grabbing pose detection.
Compared with the prior art, the method has the advantages that the grabbing pose is coded in a three-dimensional point cloud mode, the data information amount is greatly improved compared with two-dimensional coding, an end-to-end full-scene training test process is provided, undifferentiated grid sampling candidate poses are avoided, the overall and local characteristic relations in the point cloud are closely related, the detection accuracy is improved while the running speed is optimized, and the best performance is achieved on a large general grabbing data set GraspNet.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic view of a jaw coordinate system;
FIG. 3 is a schematic diagram illustrating the effect of the present invention.
Detailed Description
As shown in fig. 1, the present embodiment relates to a method for detecting an object capture pose in a three-dimensional point cloud, which includes the following steps:
step 1: data preprocessing: the method comprises the following steps of obtaining RGB-D images containing different object combinations and different placing modes and used for object capturing pose detection from an existing image library, and synthesizing a point cloud scene and corresponding training labels, wherein the method specifically comprises the following steps:
step 1.1: synthesizing a point cloud using the intra-camera parameters, wherein: the camera intrinsic parameters are expressed as scale factors f of the camera on the u-axis and the v-axis of an image coordinate systemxAnd fyPrincipal point coordinates (c) of image coordinate systemx,cy) And an image depth value scaling s, which represents the coordinate of a point on the image coordinate system by (u, v), represents the corresponding depth value by d, and represents the three-dimensional coordinate in the camera coordinate system by (x, y, z), in combination
Figure BDA0002485602430000021
And converting the RGB-D image into a point cloud scene under a camera coordinate system.
Step 1.2: adding a plurality of training labels for the point cloud scene, specifically comprising: and obtaining 25600 point cloud scenes with capture pose labels by using the object type labels, the object poses in the point cloud, the capture poses on each object and the scores of the capture poses.
As shown in fig. 2, the grabbing pose includes a translation amount, a main shaft direction, a pivoting angle and a clamping width, the main shaft direction is along the z axis in the clamping jaw coordinate system, the pivoting angle refers to a rotation amount from the z axis to the y axis, and the clamping width refers to a distance between two fingers of the clamping jaw on the y axis.
Step 2: constructing an end-to-end object grabbing pose detection model, wherein the model comprises the following steps: the end-to-end object grabbing pose detection model comprises the following steps: the device comprises a candidate grabbing point prediction module for processing coded spatial information, a spatial transformation module for generating candidate grabbing pose characteristics, a grabbing parameter prediction module and a grabbing affinity prediction module, wherein: the candidate grabbing point prediction module predicts candidate grabbing point positions and main shaft directions, the space transformation module cuts point clouds near candidate grabbing poses and converts the point clouds into a clamping jaw coordinate system, the grabbing parameter prediction module predicts an axial rotation angle, a clamping width and grabbing scores of grabbing, and the grabbing affinity prediction module judges robustness of the grabbing poses.
The candidate grabbing point prediction module comprises: a base network unit and a prediction unit, wherein: the base network unit encodes the 20000 multiplied by 3 point cloud, and the prediction unit predicts the appropriate grabbing point location and clamping jaw main shaft direction in the scene by using the encoded point cloud.
The base network unit adopts a mode recorded in a document 'PointNet + +: deep hierarchical Feature Learning on Point set in a Metric Space' (NIPS 2017) by Charles R.Qi and the like to realize a PointNet + + three-dimensional deep Learning network, and the input of the network is a network which comprises a plurality of objects and has the size of N × C1Scene point cloud (N represents the number of points, C)1Number of characteristic channels representing points), and obtaining the size of N × C after a plurality of times of downsampling, 1 × 1 convolution, maximum neighborhood pooling and characteristic jump splicing2And further processed to size M × C using the farthest point sampling2The point set of (2). Wherein:
N=20000,M=1024,C1=3,C2=256。
the prediction unit takes the point set output by the base network unit as input and has the size of (C)2V +2, V +2), the first 2 dimensions of each sampling point are two categories whether the point can be grabbed, the last V dimensions represent the prediction of the grabbing principal axis direction, the process is to uniformly sample V viewing angles from a unit circle with the point as the center, and select the viewing angle with the highest score as the candidate principal axis direction. The model was taken as V300.
The cutting is as follows: and the space transformation module cuts out a point cloud block in a cylindrical space according to the candidate grabbing point position and the main shaft direction, the cylindrical main shaft is along the main shaft direction of the clamping jaw, and the center of the bottom surface circle is the candidate grabbing point.
The rotary shaftThe method comprises the following steps: converting the point cloud block after cutting into a clamping jaw coordinate system, and enabling v to be [ v ]1,v2,v3]Then the transformation matrix of the point cloud coordinates is represented as o ═ o1,o2,o3]Wherein:
Figure BDA0002485602430000031
each candidate shears a cylinder point cloud corresponding to a different jaw depth, wherein: the radius of the bottom surface of the cylinder is 0.05m, and the depth is 0.01m, 0.02m, 0.03m and 0.04m in sequence. Processing all candidate grabbing poses to obtain 4M pieces of shearing point clouds, and sampling each piece of point cloud to NsPoint, finally 4M × N is obtaineds×C1The point cloud array of (1), wherein: n is a radical ofs=64。
The grabbing parameter prediction module comprises: multilayer sensor, maximum pooling layer, first full-link layer F1The first normalization layer B1A first activation function layer R1A second full-junction layer F2The second batch of normalization layers B2A second activation function layer R2And a third fully-connected layer F3
The grabbing parameters are input through a grabbing parameter prediction module according to the point cloud after the tangent transformation, and the size of the grabbing parameters is (C)3,C4,C5) The obtained multi-layer sensor has a size of 4M × Ns×C5The characteristic point cloud array of (1), passing through the maximum pooling layer P1Obtaining a size of 4M × C5The point cloud global feature array sequentially passes through a first full connecting layer F1The first normalization layer B1A first activation function layer R1A second full-junction layer F2The second batch of normalization layers B2A second activation function layer R2A third full-junction layer F3The size of the obtained product is 4M × C8And (4) obtaining a predicted value of the pose parameter.
The multilayer perceptron is three groups of 1 × 1 convolution layers, batch normalization layers and activation function layer sequences, the activation function layer of each group of sequences adopts a ReLU function, the number of convolution layer output channels and the number of batch normalization units in the first group are C364, second group of convolutional layersThe number of channels and the number of batch normalization units are both C4128, the number of output channels of the convolution layer and the number of batch normalization units in the third group are both C5=256。
The first full connecting layer F1Number of output channels C6128, second fully-connected layer F2Number of output channels C7128, third fully-connected layer F3Number of output channels C8=36。
Said first and second activation function layers R1And R2The ReLU function is used.
The first and second normalization layers B1And B2The number of the units of (a) is 128.
The grabbing parameter prediction module outputs a predicted value of the grabbing parameter, and specifically comprises the following steps: and dividing the pivoting angle (0-180 ℃) into 12 classes in an interval range of 15 degrees, wherein the numerical values of 36 channels in the predicted values respectively represent the fraction of each angle, the corresponding holding width and the scoring predicted value of the grabbing pose at the angle, and the angle with the highest score, the holding width and the grabbing fraction are the final predicted values.
The grasping affinity prediction module comprises: multilayer perceptron, second largest pooling layer, fourth full-link layer F4The fourth batch of normalization layer B4A fourth activation function layer R4A fifth full-junction layer F5The fifth normalization layer B5A fifth activation function layer R5And a sixth fully-connected layer F6
The grasping affinity degree is input by a grasping affinity degree prediction module by taking the converted point cloud as input, and the grasping affinity degree is (C)10,C11,C12) The obtained multi-layer sensor has a size of 4M × Ns×C12The characteristic point cloud array of (1), passing through the maximum pooling layer P2Obtaining a size of 4M × C12The point cloud global feature array sequentially passes through a fourth full connecting layer F4The fourth batch of normalization layer B4An activation function layer R4A fifth full-junction layer F5The fifth normalization layer B5A fifth activation function layer R5A sixth full connection layerF6The size of the obtained product is 4M × C15The capture affinity prediction value of (1).
The multilayer perceptron is three groups of 1 × 1 convolution layers, batch normalization layers and activation function layer sequences, the activation function layer of each group of sequences adopts a ReLU function, the number of convolution layer output channels and the number of batch normalization units in the first group are C10The output channel number and the batch normalization unit number of the convolution layer in the second group are C11128, the number of output channels of the convolution layer and the number of batch normalization units in the third group are both C12=256。
The fourth full connecting layer F4Number of output channels C13128, fifth fully-connected layer F5Number of output channels C1464, sixth fully-connected layer F5Number of output channels C15=12。
Said fourth and fifth activation function layers R4And R5All adopt ReLU function, fourth and fifth batch normalization layer B4And B5The number of the units of (a) is 128.
The grasping affinity output by the grasping affinity prediction module is corresponding grasping affinity under 12 angles, and the affinity under the prediction angle is taken as a final predicted value, wherein: the grabbing affinity represents the maximum fluctuation amplitude of the object which can be still grabbed by the clamping jaw by changing parameters under the current grabbing pose.
And step 3: training an end-to-end object grabbing pose detection model, and specifically comprises the following steps:
step 3.1: and initializing the parameters to be trained in the model by using Gaussian distribution sampling with the average value of 0 and the standard deviation of 0.01.
Step 3.2: inputting 25600 point cloud scenes with object grabbing pose labels obtained in the step 1 into the model as training samples for training, transforming the training samples in two stages in the step 2.1 and the step 2.2, and transmitting the transformed training samples to an output layer to obtain a sampling point graspable confidence coefficient predicted value { ciThe predicted values of the grabbing fractions(s) corresponding to the directions of the main shafts of the clamping jaws at different visual angles are (i is the serial number of a sampling point)ijH (i is sampling point serial number, j is view angle serial number), and a predicted value of the fraction of the rotation angle around the axis R is RijI is a point cloud block number, j is a depth number), and a clamping width predicted value WijCapturing pose score predicted values (S) by using a point cloud block serial number i and a depth serial number jijAnd (i is a point cloud block serial number, j is a depth serial number), capturing the affinity predicted value (T)ijAnd (i is the point cloud block serial number and j is the depth serial number).
The training sample comprises: scene point cloud P, sampling point snatchable confidence label
Figure BDA00024856024300000518
(i is the number of sampling points), and grabbing fraction labels corresponding to the directions of the main shafts of the clamping jaws at different visual angles
Figure BDA00024856024300000516
(i is sampling point serial number, j is visual angle serial number), and the label is divided according to the rotating angle around the shaft
Figure BDA00024856024300000513
(i is point cloud block serial number, j is visual angle serial number), and clamping width label
Figure BDA00024856024300000515
(i is point cloud block serial number, j is visual angle serial number), and grabbing pose score labels
Figure BDA00024856024300000514
(i is point cloud block serial number, j is visual angle serial number), and grabbing affinity labels
Figure BDA00024856024300000517
(i is the point cloud block number, j is the view angle number).
Step 3.3: adjusting model parameters by using a multi-target joint loss function in combination with a Back Propagation (BP) algorithm, wherein the multi-target joint loss function comprises: candidate grab point loss function LAGrabbing pose parameter loss function LRAnd grasping the affinity loss function LF
The candidate grabbing point loss function is
Figure BDA0002485602430000051
Wherein:
Figure BDA0002485602430000052
Figure BDA0002485602430000053
if and only if there is a label with an angle less than 5 deg. to the predicted principal axis direction is 1,
Figure BDA0002485602430000054
softmax loss function representing the confidence that a sample point can grab,
Figure BDA0002485602430000055
smooth L representing the principal axis direction and label on the graspable point1Regression loss function, λ1=0.5。
The grabbing pose parameter loss function is
Figure BDA0002485602430000056
Figure BDA0002485602430000057
Wherein:
Figure BDA0002485602430000058
sigmoid cross entropy loss function representing angle classification when the grab depth is d,
Figure BDA0002485602430000059
smooth L representing grabbing pose score when grabbing depth is d1The function of the regression loss is used,
Figure BDA00024856024300000510
smooth L representing grip width at grip depth d1Regression loss function, λ2=1.0,λ3=0.2。
The grabbing affinity loss function is
Figure BDA00024856024300000511
Wherein:
Figure BDA00024856024300000512
smooth L representing grab affinity1A regression loss function.
The target function of the back propagation BP algorithm is L ═ LA({ci},{si})+αLR({Rij},{Sij},{Wij})+βLF({TijH) α ═ 0.5 and β ═ 0.1.
In this embodiment, the learning rate of the back propagation BP algorithm is 0.001 at the initial value, the whole training data set is iterated 90 times, and the learning rates sequentially become 0.0001, 0.00001, and 0.000001 after 40, 60, and 80 iterations.
And 4, step 4: carrying out object grabbing pose detection through the trained end-to-end object grabbing pose detection model: 7680 RGB-D images are adopted, scene point clouds are synthesized by the images to be detected according to the method of the step 1.1 and then input into the model, and the predicted value of the object grabbing pose is obtained through layer-by-layer change and calculation.
The model checking criteria used in this example were: the predicted grabbing pose is subjected to non-maximum inhibition screening, distributed to the object with the closest distance, given the surface friction coefficient mu, detected by using a complete object scanning model whether the object can be successfully grabbed or not, and the corresponding average accuracy AP is calculatedμThe corresponding AP is calculated by increasing μ from 0.2 to 1.0 stepwise at 0.2 intervalsμAnd taking the mean value to obtain the final score AP.
According to the model test standard used by the method, the optimal result is achieved on the large-scale general captured data set GraspNet, which is shown in the following table:
Figure BDA0002485602430000061
through a specific practical experiment, the system is operated on a PyTorch computing frame by using a single NVIDIA RTX 2080GPU, and data on GraspNet is tested, wherein the obtained experimental data are as follows: the best performance is achieved on test data of three difficulties, wherein AP of the Sen difficulty reaches 27.56/29.88 (data test results collected from RealSense/Kinect cameras respectively, the same is shown below), AP of the Unseen difficulty reaches 26.11/27.84, and AP of the Novel difficulty reaches 10.55/11.51.
Compared with the prior art, the method does not need to rely on the pose information of the object, improves the prediction speed, greatly improves the accuracy of the detection of the grabbing pose through the steps of the direction prediction of the main shaft of the clamping jaw, the grabbing affinity prediction and the like, and achieves the best performance on a large-scale general grabbing data set GraspNet.
The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims (11)

1. A method for detecting object grabbing pose in three-dimensional point cloud is characterized in that an end-to-end object grabbing pose detection model is trained by arranging object grabbing pose in a sample image as a training set, and then three-dimensional point cloud data to be detected are identified to obtain candidate grabbing pose scores so as to realize object grabbing pose detection;
the end-to-end object grabbing pose detection model comprises the following steps: the device comprises a candidate grabbing point prediction module for processing coded spatial information, a spatial transformation module for generating candidate grabbing pose characteristics, a grabbing parameter prediction module and a grabbing affinity prediction module.
2. The method for detecting the object grabbing pose in the three-dimensional point cloud according to claim 1, wherein when the candidate grabbing point prediction module predicts the main shaft direction of the clamping jaw, a unit sphere is generated by taking a grabbing point as a sphere center, the sphere is discretized into 300 uniformly distributed viewpoints, and the direction of the viewpoint pointing to the sphere center represents the main shaft direction of the clamping jaw, so that the prediction of the main shaft direction is converted into the classification problem of the viewpoint; the candidate grabbing point prediction module is internally provided with a PointNet + + model for processing a point cloud scene and carries out candidate grabbing point location and main shaft direction prediction.
3. The method for detecting the object grabbing pose in the three-dimensional point cloud of claim 1, wherein the space transformation module is used for clipping the point cloud near the candidate grabbing pose and converting the point cloud into a clamping jaw coordinate system, and the grabbing parameter prediction module is used for predicting the grabbing pivoting angle, the clamping width and the grabbing score.
4. The method for detecting the object grabbing pose in the three-dimensional point cloud of claim 1, wherein the grabbing affinity prediction module outputs the affinity of candidate grabbing poses, namely the maximum disturbance range when the object can be grabbed after the predicted pose is disturbed, the grabbing affinity represents the robustness of the grabbing pose, and the stronger the grabbing affinity, the stronger the robustness of the predicted grabbing pose.
5. The method for detecting the object grabbing pose in the three-dimensional point cloud according to claim 1, wherein the sorting is as follows: RGB-D images of object capture pose detection with different object combinations and different placing modes are obtained from an existing image library serving as sample images, and a point cloud scene and corresponding training labels are synthesized.
6. The method for detecting the object capture pose in the three-dimensional point cloud of claim 1, wherein the object capture pose detection is further realized by performing threshold judgment on candidate capture pose scores.
7. The method for detecting the object capture pose in the three-dimensional point cloud according to claim 5, wherein the sorting comprises:
step 1.1: synthesizing a point cloud using the intra-camera parameters, wherein: the camera intrinsic parameters are expressed as scale factors f of the camera on the u-axis and the v-axis of an image coordinate systemxAnd fyPrincipal point of image coordinate systemLabel (c)x,cy) And an image depth value scaling s, which represents the coordinate of a point on the image coordinate system by (u, v), represents the corresponding depth value by d, and represents the three-dimensional coordinate in the camera coordinate system by (x, y, z), in combination
Figure FDA0002485602420000021
Converting the RGB-D image into a point cloud scene under a camera coordinate system;
step 1.2: adding a plurality of training labels for the point cloud scene, specifically comprising: and obtaining 25600 point cloud scenes with capture pose labels by using the object type labels, the object poses in the point cloud, the capture poses on each object and the scores of the capture poses.
8. The method for detecting the object capture pose in the three-dimensional point cloud according to claim 1 or 2, wherein the candidate capture point prediction module comprises: a base network unit and a prediction unit, wherein: the base network unit encodes the 20000 multiplied by 3 point cloud, and the prediction unit predicts the appropriate grabbing point location and clamping jaw main shaft direction in the scene by using the encoded point cloud.
9. The method for detecting the object grabbing pose in the three-dimensional point cloud according to claim 3, wherein the cutting is as follows: the space transformation module cuts out a point cloud block in a cylindrical space according to the candidate grabbing point position and the main shaft direction, the cylindrical main shaft is along the main shaft direction of the clamping jaw, and the circle center of the bottom surface is the candidate grabbing point;
the conversion is as follows: converting the point cloud block after cutting into a clamping jaw coordinate system, and enabling v to be [ v ]1,v2,v3]Then the transformation matrix of the point cloud coordinate is represented as O ═ O1,o2,o3]Wherein:
Figure FDA0002485602420000022
each candidate shears a cylinder point cloud corresponding to a different jaw depth, wherein: the radius of the bottom surface of the cylinder is 0.05m, the depth is 0.01m, 0.02m, 0.03m and 0.04m in sequence, and all candidate grabbing poses are processedObtaining 4M pieces of shearing point clouds, sampling each piece of point cloud to NsPoint, finally 4M × N is obtaineds×C1The point cloud array of (1), wherein: n is a radical ofs=64。
10. The method for detecting the object grabbing pose in the three-dimensional point cloud according to claim 1 or 3, wherein the grabbing parameter prediction module comprises: multilayer sensor, maximum pooling layer, first full-link layer F1The first normalization layer B1A first activation function layer R1A second full-junction layer F2The second batch of normalization layers B2A second activation function layer R2And a third fully-connected layer F3
The grabbing parameters are input through a grabbing parameter prediction module according to the point cloud after the tangent transformation, and the size of the grabbing parameters is (C)3,C4,C5) The obtained multi-layer sensor has a size of 4M × Ns×C5The characteristic point cloud array of (1), passing through the maximum pooling layer P1Obtaining a size of 4M × C5The point cloud global feature array sequentially passes through a first full connecting layer F1The first normalization layer B1A first activation function layer R1A second full-junction layer F2The second batch of normalization layers B2A second activation function layer R2A third full-junction layer F3The size of the obtained product is 4M × C8And (4) obtaining a predicted value of the pose parameter.
11. The method for detecting the object capture pose in the three-dimensional point cloud according to claim 1 or 4, wherein the capture affinity prediction module comprises: multilayer perceptron, second largest pooling layer, fourth full-link layer F4The fourth batch of normalization layer B4A fourth activation function layer R4A fifth full-junction layer F5The fifth normalization layer B5A fifth activation function layer R5And a sixth fully-connected layer F6
The grasping affinity degree is input by a grasping affinity degree prediction module by taking the converted point cloud as input, and the grasping affinity degree is (C)10,C11,C12) The obtained multi-layer sensor has a size of 4M × Ns×C12The characteristic point cloud array of (1), passing through the maximum pooling layer P2Obtaining a size of 4M × C12The point cloud global feature array sequentially passes through a fourth full connecting layer F4The fourth batch of normalization layer B4An activation function layer R4A fifth full-junction layer F5The fifth normalization layer B5A fifth activation function layer R5Sixth full-junction layer F6The size of the obtained product is 4M × C15The capture affinity prediction value of (1).
CN202010390619.3A 2020-05-11 2020-05-11 Object grabbing pose detection method in three-dimensional point cloud Active CN111652928B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010390619.3A CN111652928B (en) 2020-05-11 2020-05-11 Object grabbing pose detection method in three-dimensional point cloud

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010390619.3A CN111652928B (en) 2020-05-11 2020-05-11 Object grabbing pose detection method in three-dimensional point cloud

Publications (2)

Publication Number Publication Date
CN111652928A true CN111652928A (en) 2020-09-11
CN111652928B CN111652928B (en) 2023-12-15

Family

ID=72349479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010390619.3A Active CN111652928B (en) 2020-05-11 2020-05-11 Object grabbing pose detection method in three-dimensional point cloud

Country Status (1)

Country Link
CN (1) CN111652928B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112489126A (en) * 2020-12-10 2021-03-12 浙江商汤科技开发有限公司 Vehicle key point information detection method, vehicle control method and device and vehicle
CN112489117A (en) * 2020-12-07 2021-03-12 东南大学 Robot grabbing pose detection method based on domain migration under single-view-point cloud
CN112720459A (en) * 2020-12-02 2021-04-30 达闼机器人有限公司 Target object grabbing method and device, storage medium and electronic equipment
CN112801988A (en) * 2021-02-02 2021-05-14 上海交通大学 Object grabbing pose detection method based on RGBD and deep neural network
CN112894815A (en) * 2021-01-25 2021-06-04 西安工业大学 Method for detecting optimal position and posture for article grabbing by visual servo mechanical arm
CN113345100A (en) * 2021-05-19 2021-09-03 上海非夕机器人科技有限公司 Prediction method, apparatus, device, and medium for target grasp posture of object
CN113674348A (en) * 2021-05-28 2021-11-19 中国科学院自动化研究所 Object grabbing method, device and system
CN114211490A (en) * 2021-12-17 2022-03-22 中山大学 Robot arm gripper pose prediction method based on Transformer model
WO2022156749A1 (en) * 2021-01-22 2022-07-28 熵智科技(深圳)有限公司 Workpiece grabbing method and apparatus, and computer device and storage medium
CN115082795A (en) * 2022-07-04 2022-09-20 梅卡曼德(北京)机器人科技有限公司 Virtual image generation method, device, equipment, medium and product
CN115213721A (en) * 2022-09-21 2022-10-21 江苏友邦精工实业有限公司 A upset location manipulator for automobile frame processing
CN116494253A (en) * 2023-06-27 2023-07-28 北京迁移科技有限公司 Target object grabbing pose acquisition method and robot grabbing system
CN116758122A (en) * 2023-08-14 2023-09-15 国网智能电网研究院有限公司 Power transmission line iron tower pose registration method and device based on cross-source point cloud

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106737692A (en) * 2017-02-10 2017-05-31 杭州迦智科技有限公司 A kind of mechanical paw Grasp Planning method and control device based on depth projection
CN108171748A (en) * 2018-01-23 2018-06-15 哈工大机器人(合肥)国际创新研究院 A kind of visual identity of object manipulator intelligent grabbing application and localization method
CN109102547A (en) * 2018-07-20 2018-12-28 上海节卡机器人科技有限公司 Robot based on object identification deep learning model grabs position and orientation estimation method
CN110363815A (en) * 2019-05-05 2019-10-22 东南大学 The robot that Case-based Reasoning is divided under a kind of haplopia angle point cloud grabs detection method
CN110796700A (en) * 2019-10-21 2020-02-14 上海大学 Multi-object grabbing area positioning method based on convolutional neural network
CN110909644A (en) * 2019-11-14 2020-03-24 南京理工大学 Method and system for adjusting grabbing posture of mechanical arm end effector based on reinforcement learning
CN110969660A (en) * 2019-12-17 2020-04-07 浙江大学 Robot feeding system based on three-dimensional stereoscopic vision and point cloud depth learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106737692A (en) * 2017-02-10 2017-05-31 杭州迦智科技有限公司 A kind of mechanical paw Grasp Planning method and control device based on depth projection
CN108171748A (en) * 2018-01-23 2018-06-15 哈工大机器人(合肥)国际创新研究院 A kind of visual identity of object manipulator intelligent grabbing application and localization method
CN109102547A (en) * 2018-07-20 2018-12-28 上海节卡机器人科技有限公司 Robot based on object identification deep learning model grabs position and orientation estimation method
CN110363815A (en) * 2019-05-05 2019-10-22 东南大学 The robot that Case-based Reasoning is divided under a kind of haplopia angle point cloud grabs detection method
CN110796700A (en) * 2019-10-21 2020-02-14 上海大学 Multi-object grabbing area positioning method based on convolutional neural network
CN110909644A (en) * 2019-11-14 2020-03-24 南京理工大学 Method and system for adjusting grabbing posture of mechanical arm end effector based on reinforcement learning
CN110969660A (en) * 2019-12-17 2020-04-07 浙江大学 Robot feeding system based on three-dimensional stereoscopic vision and point cloud depth learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
何若涛: "面向机器人操作的三维物体检测与位姿估计", 广东工业大学硕士学位论文信息科技辑, no. 02, pages 19 - 47 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112720459A (en) * 2020-12-02 2021-04-30 达闼机器人有限公司 Target object grabbing method and device, storage medium and electronic equipment
CN112489117A (en) * 2020-12-07 2021-03-12 东南大学 Robot grabbing pose detection method based on domain migration under single-view-point cloud
CN112489126B (en) * 2020-12-10 2023-09-19 浙江商汤科技开发有限公司 Vehicle key point information detection method, vehicle control method and device and vehicle
CN112489126A (en) * 2020-12-10 2021-03-12 浙江商汤科技开发有限公司 Vehicle key point information detection method, vehicle control method and device and vehicle
WO2022156749A1 (en) * 2021-01-22 2022-07-28 熵智科技(深圳)有限公司 Workpiece grabbing method and apparatus, and computer device and storage medium
CN112894815A (en) * 2021-01-25 2021-06-04 西安工业大学 Method for detecting optimal position and posture for article grabbing by visual servo mechanical arm
CN112894815B (en) * 2021-01-25 2022-09-27 西安工业大学 Method for detecting optimal position and posture for article grabbing by visual servo mechanical arm
CN112801988A (en) * 2021-02-02 2021-05-14 上海交通大学 Object grabbing pose detection method based on RGBD and deep neural network
CN113345100A (en) * 2021-05-19 2021-09-03 上海非夕机器人科技有限公司 Prediction method, apparatus, device, and medium for target grasp posture of object
CN113674348A (en) * 2021-05-28 2021-11-19 中国科学院自动化研究所 Object grabbing method, device and system
CN113674348B (en) * 2021-05-28 2024-03-15 中国科学院自动化研究所 Object grabbing method, device and system
CN114211490B (en) * 2021-12-17 2024-01-05 中山大学 Method for predicting pose of manipulator gripper based on transducer model
CN114211490A (en) * 2021-12-17 2022-03-22 中山大学 Robot arm gripper pose prediction method based on Transformer model
CN115082795A (en) * 2022-07-04 2022-09-20 梅卡曼德(北京)机器人科技有限公司 Virtual image generation method, device, equipment, medium and product
CN115213721A (en) * 2022-09-21 2022-10-21 江苏友邦精工实业有限公司 A upset location manipulator for automobile frame processing
CN116494253A (en) * 2023-06-27 2023-07-28 北京迁移科技有限公司 Target object grabbing pose acquisition method and robot grabbing system
CN116494253B (en) * 2023-06-27 2023-09-19 北京迁移科技有限公司 Target object grabbing pose acquisition method and robot grabbing system
CN116758122B (en) * 2023-08-14 2023-11-14 国网智能电网研究院有限公司 Power transmission line iron tower pose registration method and device based on cross-source point cloud
CN116758122A (en) * 2023-08-14 2023-09-15 国网智能电网研究院有限公司 Power transmission line iron tower pose registration method and device based on cross-source point cloud

Also Published As

Publication number Publication date
CN111652928B (en) 2023-12-15

Similar Documents

Publication Publication Date Title
CN111652928B (en) Object grabbing pose detection method in three-dimensional point cloud
CN111007073A (en) Method and system for online detection of part defects in additive manufacturing process
CN110223345B (en) Point cloud-based distribution line operation object pose estimation method
CN112836734A (en) Heterogeneous data fusion method and device and storage medium
CN110795990B (en) Gesture recognition method for underwater equipment
CN111899301A (en) Workpiece 6D pose estimation method based on deep learning
CN111667535B (en) Six-degree-of-freedom pose estimation method for occlusion scene
CN111768388A (en) Product surface defect detection method and system based on positive sample reference
CN108305278B (en) Image matching correlation improvement method in ORB-SLAM algorithm
CN111639571B (en) Video action recognition method based on contour convolution neural network
CN107220601B (en) Target capture point prediction method based on online confidence degree discrimination
CN114266891A (en) Railway operation environment abnormity identification method based on image and laser data fusion
CN114170174A (en) CLANet steel rail surface defect detection system and method based on RGB-D image
CN114581782B (en) Fine defect detection method based on coarse-to-fine detection strategy
CN115115859A (en) Long linear engineering construction progress intelligent identification and analysis method based on unmanned aerial vehicle aerial photography
CN117315025A (en) Mechanical arm 6D pose grabbing method based on neural network
CN116205907A (en) Decorative plate defect detection method based on machine vision
CN113269830B (en) 6D pose estimation method and device based on geometric constraint cooperative attention network
Luo et al. Grasp detection based on faster region cnn
CN114548253A (en) Digital twin model construction system based on image recognition and dynamic matching
CN113752255A (en) Mechanical arm six-degree-of-freedom real-time grabbing method based on deep reinforcement learning
CN113034575A (en) Model construction method, pose estimation method and object picking device
CN113177969B (en) Point cloud single-target tracking method of candidate seeds based on motion direction change
Lin et al. Robotic grasp detection by rotation region CNN
CN115018910A (en) Method and device for detecting target in point cloud data and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant