CN114821263A

CN114821263A - Weak texture target pose estimation method based on feature fusion

Info

Publication number: CN114821263A
Application number: CN202210623453.4A
Authority: CN
Inventors: 马天; 蒙鑫; 牟琦; 姜梅
Original assignee: Xian University of Science and Technology
Current assignee: Xian University of Science and Technology
Priority date: 2022-06-01
Filing date: 2022-06-01
Publication date: 2022-07-29
Anticipated expiration: 2042-06-01
Also published as: CN114821263B

Abstract

The invention discloses a weak texture target pose estimation method based on feature fusion, which comprises the following steps of firstly, performing semantic segmentation on an RGB image where a target is located to obtain a target pixel mask and a minimum bounding box corresponding to the target; secondly, cutting the RGB image by adopting a minimum bounding box; thirdly, extracting color features by adopting a convolutional neural network; fourthly, acquiring a depth image; fifthly, segmenting the depth image and converting the depth image into point cloud data; sixthly, acquiring point features, local geometric features and global geometric features in the point cloud data; fusing the color features with the point features, the local geometric features and the global geometric features to obtain target fusion features; and eighthly, inputting the target fusion characteristics into a pose estimation network and outputting a pose estimation result. The method can be effectively applied to the pose estimation of the weak texture target, the problem that the pose estimation is insufficient in local feature consideration during feature fusion is solved, the accuracy of the pose estimation is improved, and the method is convenient to popularize and use.

Description

A Pose Estimation Method for Weak Texture Targets Based on Feature Fusion

技术领域technical field

本发明属于目标位姿估计技术领域，具体涉及一种基于特征融合的弱纹理目标位姿估计方法。The invention belongs to the technical field of target pose estimation, in particular to a weak texture target pose estimation method based on feature fusion.

背景技术Background technique

目标位姿估计能够将计算机产生的虚拟物体加载到真实图像序列，获取物件的位姿，帮助机械臂夹取物件，在机械臂抓取及增强现实等领域有广泛的应用。Target pose estimation can load virtual objects generated by the computer into real image sequences, obtain the pose of the object, and help the robotic arm to grip the object.

现有技术中，使用特征描述子提取目标特征实现位姿估计，例如《一种结合SURF描述子与自编码器的位姿估计方法》(CN114037742A)，该发明提取彩色图像的SURF特征点，再将通过卷积自编码器提取的特征与渲染数据中的对应位姿信息构成离线特征模板，选择特征模板中距离最小的K个特征向量投票得到目标的6D位姿。该发明减少了人工标注，降低了环境复杂度，并且减小了计算量。但是，使用SURF等特征描述子需要目标具有丰富的纹理图案，因此，该方法对弱纹理目标提取特征较困难，存在弱纹理目标效果差的问题。《一种弱纹理物体位姿估计方法》(CN113223181A)，针对弱纹理目标进行了优化，该发明分别通过彩色图像和点云，获得彩色嵌入特征和几何嵌入特征，然后利用自注意力机制提取位置依赖特征图，逐像素的融合后进行位姿估计。该发明能够丰富每个像素特征的信息，并自适应调整不同特征的权重，提高每个像素的识别精度。但是，该发明使用密集融合的方法，忽略了点云之间存在的局部特征对位姿估计的影响，从而限制了位姿估计的准确性。In the prior art, feature descriptors are used to extract target features to achieve pose estimation, for example, "A Method for Pose Estimation Combining SURF Descriptors and Autoencoders" (CN114037742A), the invention extracts SURF feature points of color images, and then The features extracted by the convolutional self-encoder and the corresponding pose information in the rendering data constitute an offline feature template, and the K feature vectors with the smallest distance in the feature template are selected to vote to obtain the 6D pose of the target. The invention reduces manual labeling, reduces the complexity of the environment, and reduces the amount of calculation. However, the use of feature descriptors such as SURF requires the target to have rich texture patterns. Therefore, this method is difficult to extract features for weak texture targets, and there is a problem that weak texture targets have poor effect. "A Pose Estimation Method for Weak Textured Objects" (CN113223181A), optimized for weak texture targets, the invention obtains color embedded features and geometric embedded features through color images and point clouds respectively, and then uses self-attention mechanism to extract positions Relying on the feature map, the pose estimation is performed after pixel-by-pixel fusion. The invention can enrich the information of each pixel feature, and adaptively adjust the weights of different features to improve the recognition accuracy of each pixel. However, this invention uses a dense fusion method, ignoring the influence of local features existing between point clouds on pose estimation, thus limiting the accuracy of pose estimation.

发明内容SUMMARY OF THE INVENTION

本发明所要解决的技术问题在于针对上述现有技术中的不足，提供一种基于特征融合的弱纹理目标位姿估计方法，其方法步骤简单，设计合理，实现方便，能够有效应用在弱纹理目标位姿估计中，解决了位姿估计在特征融合时对局部特征考虑不足的问题，提升了位姿估计的准确性，模型适配性更强，使用效果好，便于推广使用。。The technical problem to be solved by the present invention is to provide a weak texture target pose estimation method based on feature fusion in view of the above-mentioned deficiencies in the prior art. The method has simple steps, reasonable design, convenient implementation, and can be effectively applied to weak texture targets. In pose estimation, the problem that pose estimation does not consider local features in feature fusion is solved, the accuracy of pose estimation is improved, the model is more adaptable, the use effect is good, and it is easy to popularize and use. .

为解决上述技术问题，本发明采用的技术方案是：一种基于特征融合的弱纹理目标位姿估计方法，包括以下步骤：In order to solve the above-mentioned technical problems, the technical solution adopted in the present invention is: a method for estimating the pose of a weak texture target based on feature fusion, comprising the following steps:

步骤一、对目标所在的RGB图像进行语义分割，获得目标像素掩码和目标对应的最小包围框；Step 1. Semantically segment the RGB image where the target is located to obtain the target pixel mask and the minimum bounding box corresponding to the target;

步骤二、采用所述最小包围框裁剪RGB图像，获得裁剪后的RGB图像；Step 2, using the minimum bounding box to crop the RGB image to obtain the cropped RGB image;

步骤三、采用卷积神经网络提取裁剪后的RGB图像中目标的颜色特征；Step 3, using a convolutional neural network to extract the color feature of the target in the cropped RGB image;

步骤四、获取裁剪后的RGB图像的深度图像；Step 4: Obtain the depth image of the cropped RGB image;

步骤五、采用所述掩码对所述深度图像进行分割，转换为点云数据；Step 5, using the mask to segment the depth image and convert it into point cloud data;

步骤六、获取所述点云数据中的点特征、局部几何特征和全局几何特征；Step 6, obtaining point features, local geometric features and global geometric features in the point cloud data;

步骤七、将所述颜色特征与所述点特征、局部几何特征和全局几何特征进行融合，获得目标融合特征；Step 7, fuse the color feature with the point feature, the local geometric feature and the global geometric feature to obtain the target fusion feature;

步骤八、将所述目标融合特征输入到位姿估计网络，输出位姿估计结果。Step 8: Input the target fusion feature into the pose estimation network, and output the pose estimation result.

上述的一种基于特征融合的弱纹理目标位姿估计方法，步骤三中所述采用卷积神经网络提取裁剪后的RGB图像中目标的颜色特征的具体过程包括：In the above-mentioned method for estimating the pose of a weak texture target based on feature fusion, the specific process of using a convolutional neural network to extract the color feature of the target in the cropped RGB image described in step 3 includes:

步骤301、采用18个卷积层对裁剪后的RGB图像特征进行下采样，获得维度为512的特征；Step 301, using 18 convolutional layers to downsample the cropped RGB image features to obtain features with a dimension of 512;

步骤302、采用四个上采样层对特征进行上采样，获得32维的颜色特征。Step 302 , up-sampling the features using four up-sampling layers to obtain 32-dimensional color features.

上述的一种基于特征融合的弱纹理目标位姿估计方法，步骤六中所述获取点云数据中的点特征、局部几何特征和全局几何特征的具体过程包括：In the above-mentioned method for estimating the pose of a weak texture target based on feature fusion, the specific process of obtaining point features, local geometric features and global geometric features in the point cloud data described in step 6 includes:

步骤601、采用PointNet网络提取点云数据中的点特征；Step 601, using PointNet network to extract point features in point cloud data;

步骤602、随机选择256个位置点，对多个位置点的特征进行融合，减轻目标遮挡和分割存在噪声的影响；Step 602, randomly select 256 position points, and fuse the features of multiple position points to reduce the influence of target occlusion and segmentation noise;

步骤603、采用最远点采样的方式，找到空间中均匀分布的128个点；Step 603: Using the farthest point sampling method, find 128 points that are evenly distributed in the space;

步骤604、以每个均匀分布的点作为中心点，将固定半径的球体划分为一个局部区域；Step 604, taking each evenly distributed point as the center point, dividing the sphere of fixed radius into a local area;

步骤605、在每个局部区域内，对0.05cm、0.1cm和0.2cm三个空间尺度内的点云，采用PointNet提取特征，并进行连接聚集，形成局部几何特征；Step 605: In each local area, use PointNet to extract features for point clouds in three spatial scales of 0.05cm, 0.1cm and 0.2cm, and perform connection aggregation to form local geometric features;

步骤606、再采用最远点采样的方式，找到空间中均匀分布的64个点；Step 606, using the farthest point sampling method to find 64 points evenly distributed in the space;

步骤607、以每个均匀分布的点作为中心点，将固定半径的球体划分为一个局部区域；Step 607, using each evenly distributed point as the center point, divide the sphere of fixed radius into a local area;

步骤608、在每个局部区域内，对0.2cm和0.3cm两个空间尺度内的点云进行特征提取聚集；Step 608 , in each local area, perform feature extraction and aggregation on point clouds in two spatial scales of 0.2 cm and 0.3 cm;

步骤609、采用MLP在局部几何特征的基础上提取目标的全局几何特征。Step 609 , using MLP to extract the global geometric features of the target on the basis of the local geometric features.

上述的一种基于特征融合的弱纹理目标位姿估计方法，步骤七中所述将颜色特征与点特征、局部几何特征和全局几何特征进行融合，获得目标融合特征的具体过程包括：In the above-mentioned method for estimating the pose of a weak texture target based on feature fusion, as described in step 7, the color feature is fused with the point feature, the local geometric feature and the global geometric feature, and the specific process of obtaining the target fusion feature includes:

步骤701、对所述颜色特征进行一次1d卷积操作，再与所述点特征进行融合，形成点融合特征；Step 701: Perform a 1d convolution operation on the color feature, and then fuse with the point feature to form a point fusion feature;

步骤702、对所述点融合特征再次进行一次1d卷积操作，再与所述局部几何特征进行融合，形成局部融合特征；Step 702: Perform a 1d convolution operation on the point fusion feature again, and then fuse with the local geometric feature to form a local fusion feature;

步骤703、对步骤609中的全局几何特征、步骤701中的点融合特征和步骤702中的局部融合特征进行融合，形成最终的目标融合特征。Step 703 , fuse the global geometric feature in step 609 , the point fusion feature in step 701 and the local fusion feature in step 702 to form the final target fusion feature.

上述的一种基于特征融合的弱纹理目标位姿估计方法，步骤八中所述将目标融合特征输入到位姿估计网络，输出位姿估计结果的具体过程包括：In the above-mentioned method for estimating the pose of a weak texture target based on feature fusion, the target fusion feature is input into the pose estimation network as described in step 8, and the specific process of outputting the pose estimation result includes:

步骤801、所述目标融合特征作为训练集，训练位姿估计网络；Step 801, the target fusion feature is used as a training set to train a pose estimation network;

步骤802、所述位姿估计网络预测目标的旋转、平移以及位姿预测的置信度；Step 802, the pose estimation network predicts the rotation, translation and the confidence of the pose prediction of the target;

步骤803、以置信度最高的位置点所做出的位姿预测，作为初始位姿；Step 803, use the pose prediction made by the position point with the highest confidence as the initial pose;

步骤804、采用四层全连接网络对初始位姿进行优化，得到最终的位姿估计结果。Step 804 , using a four-layer fully connected network to optimize the initial pose to obtain a final pose estimation result.

上述的一种基于特征融合的弱纹理目标位姿估计方法，步骤八中所述位姿估计网络包括损失函数，所述损失函数通过置信度对位姿损失进行加权，所述损失函数Loss为：In the above-mentioned method for estimating the pose of a weak texture target based on feature fusion, the pose estimation network in step 8 includes a loss function, and the loss function weights the pose loss through confidence, and the loss function Loss is:

其中，i表示N个位置点的第i点，

表示N个位置点中第i点的位姿损失，c_i表示第i点预测位姿的置信度，ω代表平衡参数。Among them, i represents the ith point of N position points,

Represents the pose loss of the ith point in the N position points, ci represents the confidence of the predicted pose of the _ith point, and ω represents the balance parameter.

本发明与现有技术相比具有以下优点：Compared with the prior art, the present invention has the following advantages:

1、本发明方法步骤简单，设计合理，实现方便。1. The method of the present invention has simple steps, reasonable design and convenient implementation.

2、本发明通过关注点云数据局部之间存在细粒度的几何特征，获取更高质量的目标特征。2. The present invention obtains higher-quality target features by paying attention to the fine-grained geometric features existing between parts of the point cloud data.

3、本发明解决了位姿估计在特征融合时对局部特征考虑不足的问题，提升了位姿估计的准确性。3. The present invention solves the problem of insufficient consideration of local features during feature fusion in pose estimation, and improves the accuracy of pose estimation.

4、本发明能够有效应用在弱纹理目标位姿估计中，精度高，模型适配性更强，使用效果好，便于推广使用。4. The present invention can be effectively applied in the pose estimation of weak texture targets, has high precision, stronger model adaptability, good use effect, and is easy to popularize and use.

综上所述，本发明方法步骤简单，设计合理，实现方便，能够有效应用在弱纹理目标位姿估计中，解决了位姿估计在特征融合时对局部特征考虑不足的问题，提升了位姿估计的准确性，模型适配性更强，使用效果好，便于推广使用。To sum up, the method of the present invention has simple steps, reasonable design and convenient implementation, and can be effectively applied in the pose estimation of weak texture targets, solves the problem of insufficient consideration of local features during feature fusion in pose estimation, and improves the pose The accuracy of the estimation, the model adaptability is stronger, the use effect is good, and it is easy to popularize and use.

下面通过附图和实施例，对本发明的技术方案做进一步的详细描述。The technical solutions of the present invention will be further described in detail below through the accompanying drawings and embodiments.

附图说明Description of drawings

图1为本发明的方法流程图；Fig. 1 is the method flow chart of the present invention;

图2为本发明的网络结构图；Fig. 2 is the network structure diagram of the present invention;

图3为本发明各目标位姿估计结果可视化效果图。FIG. 3 is a visualization effect diagram of each target pose estimation result of the present invention.

具体实施方式Detailed ways

如图1和图2所示，本发明的基于特征融合的弱纹理目标位姿估计方法，包括以下步骤：As shown in Figure 1 and Figure 2, the feature fusion-based weak texture target pose estimation method of the present invention includes the following steps:

本实施例中，步骤三中所述采用卷积神经网络提取裁剪后的RGB图像中目标的颜色特征的具体过程包括：In this embodiment, the specific process of using the convolutional neural network to extract the color feature of the target in the cropped RGB image described in step 3 includes:

本实施例中，步骤六中所述获取点云数据中的点特征、局部几何特征和全局几何特征的具体过程包括：In this embodiment, the specific process of acquiring point features, local geometric features and global geometric features in the point cloud data described in step 6 includes:

本实施例中，步骤七中所述将颜色特征与点特征、局部几何特征和全局几何特征进行融合，获得目标融合特征的具体过程包括：In this embodiment, as described in step 7, the color feature is fused with the point feature, the local geometric feature and the global geometric feature, and the specific process of obtaining the target fusion feature includes:

本实施例中，步骤八中所述将目标融合特征输入到位姿估计网络，输出位姿估计结果的具体过程包括：In this embodiment, the target fusion feature is input into the pose estimation network described in step 8, and the specific process of outputting the pose estimation result includes:

本实施例中，步骤八中所述位姿估计网络包括损失函数，所述损失函数通过置信度对位姿损失进行加权，所述损失函数Loss为：In this embodiment, the pose estimation network in step 8 includes a loss function, and the loss function weights the pose loss through the confidence, and the loss function Loss is:

其中，i表示N个位置点的第i点，

具体实施时，

为目标3D模型采样点的坐标，分别通过Ground Truth位姿矩阵[R|t]和估计位姿矩阵

变换之后，对应点坐标之间的平均距离。

计算公式为：When implemented,

The coordinates of the sampling points of the target 3D model are passed through the Ground Truth pose matrix [R|t] and the estimated pose matrix respectively.

After transformation, the average distance between corresponding point coordinates.

The calculation formula is:

其中，M表示目标3D模型采样点集合，i表示N个位置点的第i点，p用作标记，表示此处为位姿损失，x_j表示采样点集合的第j点，(Rx_j+t)表示采样点通过Ground Truth位姿变换后的坐标，

表示通过第点估计位姿变换后的坐标。Among them, M represents the set of sampling points of the target 3D model, i represents the ith point of the N position points, p is used as a marker, which means the pose loss here, x _j represents the jth point of the set of sampling points, (Rx _j + t) represents the coordinates of the sampling point transformed by the Ground Truth pose,

Represents the coordinates transformed by the estimated pose of the point.

对于具有旋转对称性的目标，位姿损失定义为：目标3D模型采样点坐标，通过估计位姿矩阵

变换后的坐标，与通过Ground Truth位姿矩阵[R|t]变换之后，最近点坐标之间的平均距离，计算公式为：For targets with rotational symmetry, the pose loss is defined as: the coordinates of the sample points of the target 3D model, by estimating the pose matrix

The average distance between the transformed coordinates and the coordinates of the closest point after transformation by the Ground Truth pose matrix [R|t], the calculation formula is:

其中，x_k表示与x_j距离最近的点。Among them, x _k represents the point closest to x _j .

本发明基于Intel(R)Xeon(R)CPU E5-2678 v3 2.50GHz处理器，NVIDIA GeForceRTX 2080显卡，在Ubuntu16.04系统下，使用Pytorch0.4.1及Python3.6进行开发，使用CUDA9.0及cuDNN7.6.4进行网络加速训练。The invention is based on Intel(R) Xeon(R) CPU E5-2678 v3 2.50GHz processor, NVIDIA GeForceRTX 2080 graphics card, under Ubuntu16.04 system, using Pytorch0.4.1 and Python3.6 for development, using CUDA9.0 and cuDNN7 .6.4 Perform network acceleration training.

表1为本文方法在LineMOD数据集的初始位姿结果及位姿优化结果。该数据集包含13种复杂背景下的弱纹理目标序列，每个序列包含1100-1300张RGB-D图像。分别为Ape、Benchvise、Camera、Can、Cat、Driller、Duck、Eggbox、Glue、Holepuncher、Iron、Lamp以及Phone。其中，Eggbox与Glue两种目标具有旋转对称性，每种目标尺寸均不相同。实验将每种目标15％的RGB-D图像划分为训练集，其余为测试集。训练集包含13种目标总2372张RGB-D图像，测试集包含13种目标总13406张RGB-D图像。Table 1 shows the initial pose results and pose optimization results of our method in the LineMOD dataset. The dataset contains 13 weakly textured object sequences against complex backgrounds, and each sequence contains 1100-1300 RGB-D images. They are Ape, Benchvise, Camera, Can, Cat, Driller, Duck, Eggbox, Glue, Holepuncher, Iron, Lamp and Phone. Among them, Eggbox and Glue have rotational symmetry, and the size of each target is different. In the experiment, 15% of the RGB-D images of each target are divided into training sets and the rest are test sets. The training set contains a total of 2372 RGB-D images for 13 targets, and the test set contains a total of 13406 RGB-D images for 13 targets.

对比指标采用ADD精度，ADD计算公式如下：The comparison index adopts ADD accuracy, and the ADD calculation formula is as follows:

其中，Num_pre表示正确位姿估计的数量，Num_GT表示全部真实位姿的数量。如果ADD小于目标最大直径值的10％，则认为位姿估计正确，ADD计算公式为：Among them, Num _pre represents the number of correct pose estimates, and Num _GT represents the number of all true poses. If the ADD is less than 10% of the maximum diameter of the target, the pose estimation is considered correct. The ADD calculation formula is:

精度越高表示位姿估计方法越好。The higher the accuracy, the better the pose estimation method.

表1位姿优化结果Table 1 Pose optimization results

本发明方法在提取点云局部几何特征时，在局部区域内对不同半径空间的点云分别提取点云特征，由于同时对不同的目标进行训练，所以选择的多尺度半径大小相同。导致对于体型较小的目标来说，局部几何特征提取不够精细，从而影响实验精度。对于体型大中体型的目标，位姿估计精度较优。When extracting the local geometric features of the point cloud, the method of the invention extracts the point cloud features respectively from the point clouds of different radius spaces in the local area. Since different targets are trained at the same time, the selected multi-scale radii are the same. As a result, the extraction of local geometric features is not precise enough for small targets, which affects the experimental accuracy. For objects of large and medium size, the pose estimation accuracy is better.

为了更进一步验证本发明方法的效果，目标的位姿估计结果进行了可视化，其结果图如图3所示。In order to further verify the effect of the method of the present invention, the pose estimation result of the target is visualized, and the result is shown in Figure 3.

以上所述，仅是本发明的较佳实施例，并非对本发明作任何限制，凡是根据本发明技术实质对以上实施例所作的任何简单修改、变更以及等效结构变化，均仍属于本发明技术方案的保护范围内。The above are only preferred embodiments of the present invention and do not limit the present invention. Any simple modifications, changes and equivalent structural changes made to the above embodiments according to the technical essence of the present invention still belong to the technology of the present invention. within the scope of the program.

Claims

1. A weak texture target pose estimation method based on feature fusion is characterized by comprising the following steps:

firstly, performing semantic segmentation on an RGB image where a target is located to obtain a target pixel mask and a minimum bounding box corresponding to the target;

step two, cutting the RGB image by adopting the minimum bounding box to obtain the cut RGB image;

extracting color features of the target in the cut RGB image by adopting a convolutional neural network;

step four, obtaining a depth image of the cut RGB image;

fifthly, dividing the depth image by adopting the mask and converting the depth image into point cloud data;

sixthly, acquiring point features, local geometric features and global geometric features in the point cloud data;

step seven, fusing the color features with the point features, the local geometric features and the global geometric features to obtain target fusion features;

and step eight, inputting the target fusion characteristics into a pose estimation network, and outputting a pose estimation result.

2. The feature fusion-based weak texture object pose estimation method according to claim 1, wherein the specific process of extracting the color features of the objects in the clipped RGB image by using the convolutional neural network in the third step comprises:

step 301, adopting 18 convolution layers to carry out down-sampling on the cut RGB image characteristics to obtain characteristics with dimension of 512;

and 302, performing up-sampling on the features by adopting four up-sampling layers to obtain 32-dimensional color features.

3. The weak texture target pose estimation method based on feature fusion as claimed in claim 1, wherein the specific process of acquiring the point features, the local geometric features and the global geometric features in the point cloud data in the sixth step comprises:

601, extracting point characteristics in point cloud data by using a PointNet network;

step 602, randomly selecting 256 position points, and fusing the characteristics of the position points to reduce the influence of noise on target shielding and segmentation;

step 603, finding 128 points which are uniformly distributed in the space by adopting a farthest point sampling mode;

step 604, dividing the sphere with the fixed radius into a local area by taking each uniformly distributed point as a central point;

605, extracting features of point clouds in three spatial scales of 0.05cm, 0.1cm and 0.2cm in each local area by PointNet, and performing connection aggregation to form local geometric features;

step 606, adopting a farthest point sampling mode to find 64 points which are uniformly distributed in the space;

step 607, dividing the sphere with fixed radius into a local area by taking each evenly distributed point as a central point;

step 608, in each local area, performing feature extraction and aggregation on point clouds in two spatial scales of 0.2cm and 0.3 cm;

and step 609, extracting the global geometric features of the target on the basis of the local geometric features by adopting MLP.

4. The weak texture target pose estimation method based on feature fusion as claimed in claim 3, wherein the specific process of fusing the color feature with the point feature, the local geometric feature and the global geometric feature to obtain the target fusion feature in the seventh step includes:

701, performing 1d convolution operation on the color features, and fusing the color features and the point features to form point fusion features;

step 702, performing 1d convolution operation on the point fusion features again, and fusing the point fusion features with the local geometric features to form local fusion features;

and step 703, fusing the global geometric feature in step 609, the point fusion feature in step 701 and the local fusion feature in step 702 to form a final target fusion feature.

5. The weak texture target pose estimation method based on feature fusion as claimed in claim 4, wherein the specific process of inputting the target fusion features into the pose estimation network and outputting the pose estimation result in step eight includes:

step 801, taking the target fusion characteristics as a training set, and training a pose estimation network;

step 802, the pose estimation network predicts the rotation and translation of the target and the confidence of pose prediction;

step 803, using the pose prediction made by the position point with the highest confidence coefficient as the initial pose;

and 804, optimizing the initial pose by adopting a four-layer fully-connected network to obtain a final pose estimation result.

6. The weak texture target pose estimation method based on feature fusion of claim 5, wherein the pose estimation network in the eighth step comprises a Loss function, the Loss function weights pose Loss through confidence, and the Loss function Loss is:

wherein i represents the ith point of the N position points,

representing the pose loss of the ith point in the N position points, c _i And representing the confidence coefficient of the predicted pose of the ith point, wherein omega represents a balance parameter.