CN116342698A

CN116342698A - Industrial part 6D pose estimation method based on incomplete geometric completion

Info

Publication number: CN116342698A
Application number: CN202310064596.0A
Authority: CN
Inventors: 刘达新; 王祺德; 刘振宇; 许嘉通; 谭建荣
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2023-02-06
Filing date: 2023-02-06
Publication date: 2023-06-27

Abstract

The invention discloses an industrial part 6D pose estimation method based on incomplete geometric completion. Acquiring an original RGB-D image containing a target industrial part, comprising an RGB image and an original point cloud converted from a depth map, and performing segmentation and positioning; recovering the complete geometric structure from the original target point cloud containing the incomplete geometric structure through a point cloud complement network to obtain a complement target point cloud; cutting an image, an original target point cloud and a complement target point cloud from a target ROI respectively, extracting features, and performing multi-scale stitching fusion to obtain enhanced RGB-D features; the 6D pose of the target industrial part is regressed in a confidence manner. Aiming at the problem of geometric incomplete of depth information caused by various complex factors of industrial parts in an actual scene, the invention realizes accurate and robust 6D pose estimation by complementing incomplete geometry and obtaining enhanced RGB-D characteristics, and has better engineering practical value.

Description

Industrial part 6D pose estimation method based on incomplete geometric completion

Technical Field

The invention relates to a pose estimation method in the technical field of computer vision and robots, in particular to a 6D pose estimation method of industrial parts based on incomplete geometric completion.

Background

The 6-degree-of-freedom (6D) pose of the target industrial part in the three-dimensional space is estimated by utilizing the visual sensor, and the 6-degree-of-freedom (6D) pose comprises a translation vector with three degrees of freedom and a rotation matrix with three degrees of freedom, is a basic computer vision technology and is widely used for various downstream robot operation tasks in actual industrial application scenes, such as robot assembly, obstacle avoidance planning, robot digital twin and the like.

The traditional method relies on various image features designed manually, solves the pose based on feature correspondence or voting, and has very limited solving accuracy, robustness and flexibility. With the rapid development of deep learning technology and RGB-D cameras, data-driven RGB-D based methods gradually achieve better pose estimation performance and are receiving attention from industry. Common industrial part targets often lack effective surface texture, and such methods perform better than RGB-only methods because the depth map in the RGB-D image provides complementary geometric information.

However, in practical applications, the three-dimensional data obtained from depth sensors is often incomplete and contains a lot of noise, and possible occlusion may further exacerbate this problem, especially for reflective metal surfaces of industrial parts. In this case, some visible regions in the image may be missing in the corresponding depth map, and incomplete three-dimensional data is insufficient to provide effective geometric information to match the two-dimensional image for accurate prediction inference, resulting in reduced performance of estimating the pose of the industrial part 6D.

Disclosure of Invention

In order to solve the problems, the invention provides an industrial part 6D pose estimation method based on incomplete geometric completion, which utilizes a deep learning technology to realize industrial part 6D pose estimation under incomplete geometric information from RGB-D images, and effectively improves the accuracy and robustness of pose estimation.

The specific technical scheme of the invention is as follows:

s1: shooting an original RGB-D image containing a target industrial part by using an RGB-D camera, wherein the original RGB-D image comprises an RGB image and a depth map, and converting the depth map into a corresponding original point cloud;

s2: dividing and positioning a target industrial part, dividing an instance Mask and an ROI clipping image of the target industrial part from an RGB image by adopting a Mask R-CNN method, and dividing an original target point cloud corresponding to the instance Mask of the target industrial part from the original point cloud according to the alignment relation between pixels of the RGB image and pixels of a depth image;

s3: recovering the complete geometric structure of the target industrial part from the original target point cloud containing the incomplete geometric structure through a point cloud complement network to obtain a corresponding complement target point cloud;

s4: RGB features are extracted from the ROI clipping image obtained in the step S2 respectively, original geometric features are extracted from the original target point cloud obtained in the step S2, and multi-scale geometric features are extracted from the complement target point cloud obtained in the step S3;

s5: performing multi-scale splicing and fusion on all the features obtained in the step S4 to obtain enhanced RGB-D features so as to enhance the original information of the RGB-D image;

s6: and (3) carrying out regression evaluation on the accurate 6D pose of the target industrial part from the enhanced RGB-D characteristics in a confidence scoring mode through supervision training.

In the step S1, the target industrial part area in the original point cloud may be incomplete and contain a lot of noise due to factors such as reflective metal surface, partial occlusion, and depth sensor noise of the RGB-D camera, which may exist in the target industrial part.

In the step S3, the point cloud completion network mainly comprises an encoder and a decoder, and the point cloud completion is realized by predicting mapping from the incomplete geometric space to the complete geometric space, specifically:

the encoder includes a dual-layered stacked PointNet network, and the decoder includes an MLP network;

firstly, the encoder takes the original target point cloud containing the incomplete geometric structure obtained in the step S2 as input, and extracts an intermediate feature vector from the original target point cloud by using a double-layer stacked PointNet network;

then, the decoder takes the intermediate feature vector as input and combines the MLP network to generate dense complement point cloud in a staged way from coarse to fine;

and finally, the dense complement point cloud is subjected to shape protection network layer adjustment processing, and the final complement point cloud is obtained through output, so that local shape distortion can be avoided.

The adjustment process of the shape protection network layer is as follows:

wherein p is ⁽ⁱ⁾ Complementing the points in the point cloud densely;

for the distance point p in the original target point cloud ⁽ⁱ⁾ The nearest two points; />

For virtual target points, for guiding the point p ⁽ⁱ⁾ Is used for adjusting the direction; d is virtual target point +.>

And point p ⁽ⁱ⁾ The distance between them; />

The complement target point cloud point of the final adjustment output is obtained; sigma is a parameter of adjustment that can be learned, e represents the natural base number of the product, I ₂ Representing 2 norms, i representing the index of any point.

The step S4 specifically includes the following steps:

s41: RGB feature extraction from ROI cropped images using CNN networks

Wherein H, W is the height and width of the ROI cropped image, d _rgb For the dimension of RGB feature->

Representing a real set;

s42: extracting original geometric features from original target point cloud using MLP network

Where N is the number of points in the original target point cloud, d _p Is the dimension of the original geometric feature;

s43: the multiscale geometrical features extracted from the complement target point cloud comprise region-level features F _r And global feature F _g ：

Regional level feature F _r Is obtained by aggregating the full target point cloud points in the range nearby each original target point cloud point in an MLP network and average pooling mode,

wherein d is _r The dimension of the regional level features;

global feature F _g Is obtained by extracting and processing from the complement target point cloud through the PointNet++ network,

wherein d is _g Is the dimension of the global feature.

The step S5 specifically includes the following steps:

s51: pixels and depths from RGB imagesAlignment relationship between pixels of the degree map for RGB feature F _rgb Downsampling is performed and then the original geometric feature F is obtained _p Splicing is carried out at the corresponding position;

s52: to the original geometric feature F _p And corresponding regional level feature F _r Splicing; due to the original geometrical features F _p And region level feature F _r Is in one-to-one correspondence, so that the two are spliced correspondingly.

S53: global feature F _g Copying N-1 times and then copying with global feature F _g Fusing the two to make the global feature F _g Expanding to expand global features

Then the global feature +.>

And region level feature F _r Splicing;

s54: by the splicing operation of the steps S51-S53, RGB feature F is realized _rgb Original geometric feature F _p Regional level feature F _r Global feature F _g Is spliced and fused in multiple scales to obtain the initial RGB-D characteristic after fusion

And d _f ＝d _rgb +d _p +d _r +d _g Wherein d is _f D, the dimension of the initial RGB-D feature _rgb Dimension d of RGB feature _p D is the dimension of the original geometric feature _r Dimension d is regional level feature _g Is a dimension of the global feature;

finally, initial RGB-D feature F _rgb-d Feature enhancement is carried out through an MLP network to obtain final enhanced RGB-D features

Wherein d' _f To enhance the dimension of the RGB-D feature.

In the step S6, N6D poses are predicted from the enhanced RGB-D features in a regression mode, the confidence coefficient of each predicted pose is obtained, and then the predicted pose with the highest confidence coefficient is used as a final 6D pose estimation result of the target industrial part.

The invention has the beneficial effects that:

(1) The invention realizes accurate 6D pose estimation of the industrial part, benefits from the supplementary geometric information provided by the RGB-D image, and solves the problem of reduced pose estimation performance of the traditional method for the weak texture industrial part;

(2) The invention solves the problem of geometric incomplete of depth information caused by various complex factors such as sensor noise, metal surface reflection, shielding and the like by complementing incomplete geometric information in the original target point cloud, and further improves the robustness of pose estimation of industrial parts in actual industrial application scenes;

(3) The pose estimation method provided by the invention can be directly deployed in various downstream robot application tasks, and the flexibility and the intelligence of a robot system are effectively improved.

In summary, the industrial part pose estimation method provided by the invention aims at the low texture characteristics of industrial parts and the geometric defectivity of depth sensor information caused by various complex factors in actual industrial scenes, realizes accurate and robust 6D pose estimation by complementing the defected geometry and obtaining enhanced RGB-D characteristics, and has good engineering practical value.

Drawings

FIG. 1 is a flow chart of 6D pose estimation of an industrial part based on incomplete geometric completion in the present invention.

FIG. 2 is a schematic diagram of a target industrial part for method testing in an embodiment of the invention.

Fig. 3 is a schematic diagram of an adjustment process of the complement point cloud in the present invention.

FIG. 4 is a schematic diagram of a fusion process for enhancing RGB-D characteristics in the present invention.

Fig. 5 is a visual result diagram of pose estimation in an embodiment of the present invention.

FIG. 6 is a visual effect of incomplete geometric completion of a target industrial part in an embodiment of the invention.

Detailed Description

The invention is further illustrated by a self-built industrial part pose estimation dataset:

in this example, the method of the present invention was trained and tested on a self-built industrial part dataset, as shown in FIG. 1. The self-built industrial part data lump includes about 30000 data samples in total, covering 8 different industrial parts (see fig. 2) and a variety of challenging conditions such as occlusion, illumination transformation, depth loss, etc. Each data sample consists of an original RGB image, a depth map and a mask, a category and a pose label of each target industrial part in the image. The 6D pose estimation flow of the industrial part is shown in figure 1. Aiming at the data set, the pose estimation method provided by the invention comprises the following specific steps:

s1: and shooting an original RGB-D image containing the target industrial part by using an RGB-D camera, wherein the original RGB-D image comprises an RGB image and a depth map, and converting the depth map into a corresponding original point cloud.

In the self-built industrial part dataset of the present embodiment, the target industrial part area in the original point cloud may be incomplete and contain a lot of noise due to the reflective metal surface, occlusion, and depth sensor noise of the RGB-D camera, etc. that may be present in the target industrial part.

S2: and (3) dividing and positioning the target industrial part, dividing an instance Mask and an ROI clipping image of the target industrial part from the RGB image by adopting a Mask R-CNN method, and dividing an original target point cloud corresponding to the instance Mask of the target industrial part from the original point cloud according to the alignment relation between RGB pixels and depth pixels.

S3: and recovering the complete geometric structure of the target industrial part from the original target point cloud containing the incomplete geometric structure through a point cloud complement network to obtain a corresponding complement target point cloud.

The point cloud completion network is composed of encoder-decoder structures, and the point cloud completion is realized by predicting mapping of incomplete geometric space to complete geometric space. Specifically, first, the encoder takes as input the original target point cloud containing the incomplete geometry in S2, and extracts an intermediate feature vector from it using the bi-layer stacked pointet network. The decoder then takes this intermediate feature vector as input and stages in a coarse to fine fashion in conjunction with the MLP network to generate a dense complement point cloud. Finally, to avoid local shape distortion, the dense complement point cloud is subjected to a shape protection network layer to adjust the output result (see fig. 3), so as to obtain a final complement point cloud, wherein the adjustment process is as follows:

wherein p is ⁽ⁱ⁾ Complementing the points in the point cloud densely;

for the distance p in the original target point cloud ⁽ⁱ⁾ The nearest two points; />

Is a virtual target point for guiding p ⁽ⁱ⁾ Is used for adjusting the direction; d is->

And p is as follows ⁽ⁱ⁾ The distance between them; />

The complement target point cloud point of the final adjustment output is obtained; sigma is a learnable adjustment parameter, e represents a natural base, ₂ representing 2 norms, i representing the index of any point.

In this embodiment, a series of depth point clouds of incomplete target industrial parts are pre-generated from different perspectives, and random occlusion and noise are added to these point clouds, so as to provide complementary synthetic training data for training of the point cloud completion network.

S4: RGB features are extracted from the ROI cropping image in S2, original geometric features are extracted from the original target point cloud in S2, and multi-scale geometric features are extracted from the complement target point cloud in S3, respectively.

Wherein RGB features are extracted from ROI cropped images using CNN network

Wherein H, W is the ROI cropped image height and width, d _rgb For the dimension of RGB feature->

Representing a real set; extracting original geometric feature +.>

Where N is the number of points in the original target point cloud, d _p Is the dimension of the original geometric feature; the multiscale geometrical features extracted from the complement target point cloud comprise regional level features and global features, wherein the regional level features are +.>

The method comprises the steps of aggregating the complement target point cloud points in a certain range near each original target point cloud point through an MLP network and an average pooling mode, wherein d _r For the dimension of the regional level feature, global feature +.>

Extracted from the complement target point cloud through the PointNet++ network, wherein d _g Is the dimension of the global feature.

In this embodiment, take d _rgb ＝128，d _p ＝128，d _r ＝256，d _g ＝512

S5: and (3) performing multi-scale splicing and fusion on all the features obtained in the step (S4) to enhance the original information of the RGB-D image and obtain enhanced RGB-D features.

As shown in fig. 4, first, according to the alignment relationship between RGB pixels and depth pixels, the pixel pairsRGB feature F _rgb Downsampling is performed and then the original geometric feature F is obtained _p And splicing at the corresponding positions. Due to the original geometrical features F _p And region level feature F _r Is in one-to-one correspondence, so that the two are spliced correspondingly. Then, the global feature F _g Copying for N-1 times and then with F _g Fusing the two to make the global feature F _g Expanded to

And associate it with regional level feature F _r And (5) splicing. Through the splicing operation, RGB feature F is realized _rgb Original geometric feature F _p Regional level feature F _r Global feature F _g Is fused by multi-scale splicing to obtain the initial RGB-D characteristic after fusion>

Wherein d is _f ＝d _rgb +d _p +d _r +d _g Is the dimension of the initial RGB-D feature. Finally, the initial RGB-D characteristic is enhanced through an MLP network to obtain the final enhanced RGB-D characteristic>

Wherein d' _f To enhance the dimension of the RGB-D feature.

In this example, d 'is taken' _f ＝128。

S6: and (3) regressing the accurate 6D pose of the target industrial part from the enhanced RGB-D characteristics in a confidence scoring mode through supervision training. Firstly, predicting N6D poses from the enhanced RGB-D features in a regression way, solving the confidence coefficient of each predicted pose, and taking the predicted pose with the highest confidence coefficient score as the final 6D pose estimation result of the target industrial part.

In this example, 70% of the data from the built industrial part dataset was taken and model training was performed based on the Injettia A100 graphics processor and Pytorch framework.

After model training is completed, 30% of the data from the data set of the built industrial part is tested.

The quantitative test results are shown in table 1. The accuracy of the pose estimation result is measured by taking the ratio of the average model distance smaller than 0.02m (ADD is smaller than or equal to 0.02) as an evaluation index. It can be seen that the average success rate of pose estimation of the method reaches more than 90%.

TABLE 1 success rate of pose estimation of the inventive method (%)

Target part	Part 1	Part 2	Part 3	Part 4	Part 5	Part 6	Part 7	Part 8	Average of
										Success rate	98.1	93.0	84.8	97.6	97.4	78.9	98.7	99.9	93.6

The qualitative test results are shown in fig. 5. The target industrial part is projected to an image plane in the estimated pose, and the accuracy of the estimated result is approximately measured. The method can successfully estimate the 6D pose of the target industrial part, and further verifies the effectiveness of the method. In addition, some incomplete geometric complementation effects are shown in FIG. 6.

The present invention is not limited to the above embodiments, but is merely used to help understand the method and core idea of the present invention. It should be noted that it would be obvious to those skilled in the art that various improvements and modifications can be made to the present application without departing from the principles of the present invention, and such improvements and modifications fall within the scope of the claims of the present application. What is not described in detail in this specification is prior art known to those skilled in the art.

Claims

1. The industrial part 6D pose estimation method based on incomplete geometric completion is characterized by comprising the following steps of:

s6: and (3) carrying out regression evaluation on the 6D pose of the target industrial part from the enhanced RGB-D characteristics in a confidence manner through supervision training.

2. The method for estimating 6D pose of industrial part based on incomplete geometric completion according to claim 1, wherein in step S1, the target industrial part area in the original point cloud may be incomplete and contain noise.

3. The industrial part 6D pose estimation method based on incomplete geometry completion according to claim 1, wherein in the step S3, the point cloud completion network mainly comprises an encoder and a decoder, and the point cloud completion is realized by predicting mapping from the incomplete geometry space to the complete geometry space, specifically:

and finally, the dense complement point cloud is subjected to shape protection network layer adjustment processing, and the final complement point cloud is obtained through output.

4. The industrial part 6D pose estimation method based on incomplete geometric completion according to claim 3, wherein the method is characterized by comprising the following steps of:

the adjustment process of the shape protection network layer is as follows:

wherein p is ⁽ⁱ⁾ Complementing the points in the point cloud densely;

And point p ⁽ⁱ⁾ The distance between them;

5. The method for estimating 6D pose of industrial parts based on incomplete geometric completion according to claim 1, wherein said step S4 specifically comprises the steps of:

s41: benefit (benefit)RGB feature extraction from ROI cropped images using CNN networks

Representing a real set;

Regional level feature F _r Is obtained by aggregating the full target point cloud points in the range beside each original target point cloud point in an MLP network and average pooling mode,

wherein d is _r The dimension of the regional level features;

wherein d is _g Is the dimension of the global feature.

6. The method for estimating 6D pose of industrial parts based on incomplete geometric completion according to claim 1, wherein said step S5 specifically comprises the steps of:

s51: RGB feature F is aligned according to the alignment between the pixels of the RGB image and the pixels of the depth map _rgb Downsampling is performed and then the original geometry is matchedFeature F _p Splicing is carried out at the corresponding position;

s52: to the original geometric feature F _p And corresponding regional level feature F _r Splicing;

Then the global feature +.>

And region level feature F _r Splicing;

s54: by the splicing operation of the steps S51-S53, RGB feature F is realized _rgb Original geometric feature F _p Regional level feature F _r Global feature F _g Is spliced and fused in multiple scales to obtain the initial RGB-D characteristic F after fusion _rgb-d ，

finally, initial RGB-D feature F _rgb-d Feature enhancement is carried out through an MLP network to obtain final enhanced RGB-D feature F' _rgb-d ，

Wherein d' _f To enhance the dimension of the RGB-D feature.

7. The method for estimating 6D pose of industrial part based on incomplete geometric completion according to claim 1, wherein in step S6, N6D poses are predicted from the enhanced RGB-D features by regression, the confidence of each predicted pose is obtained, and then the predicted pose with the highest confidence is used as the final 6D pose estimation result of the target industrial part.