CN114821263A - Weak texture target pose estimation method based on feature fusion - Google Patents
Weak texture target pose estimation method based on feature fusion Download PDFInfo
- Publication number
- CN114821263A CN114821263A CN202210623453.4A CN202210623453A CN114821263A CN 114821263 A CN114821263 A CN 114821263A CN 202210623453 A CN202210623453 A CN 202210623453A CN 114821263 A CN114821263 A CN 114821263A
- Authority
- CN
- China
- Prior art keywords
- features
- point
- target
- feature
- fusion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 58
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 7
- 230000011218 segmentation Effects 0.000 claims abstract description 5
- 238000005070 sampling Methods 0.000 claims description 17
- 230000008569 process Effects 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 7
- 230000002776 aggregation Effects 0.000 claims description 6
- 238000004220 aggregation Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 4
- 238000013519 translation Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- XEEYBQQBJWHFJM-UHFFFAOYSA-N Iron Chemical compound [Fe] XEEYBQQBJWHFJM-UHFFFAOYSA-N 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 239000003292 glue Substances 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- -1 Holepuncher Substances 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 229910052742 iron Inorganic materials 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/56—Extraction of image or video features relating to colour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域technical field
本发明属于目标位姿估计技术领域,具体涉及一种基于特征融合的弱纹理目标位姿估计方法。The invention belongs to the technical field of target pose estimation, in particular to a weak texture target pose estimation method based on feature fusion.
背景技术Background technique
目标位姿估计能够将计算机产生的虚拟物体加载到真实图像序列,获取物件的位姿,帮助机械臂夹取物件,在机械臂抓取及增强现实等领域有广泛的应用。Target pose estimation can load virtual objects generated by the computer into real image sequences, obtain the pose of the object, and help the robotic arm to grip the object.
现有技术中,使用特征描述子提取目标特征实现位姿估计,例如《一种结合SURF描述子与自编码器的位姿估计方法》(CN114037742A),该发明提取彩色图像的SURF特征点,再将通过卷积自编码器提取的特征与渲染数据中的对应位姿信息构成离线特征模板,选择特征模板中距离最小的K个特征向量投票得到目标的6D位姿。该发明减少了人工标注,降低了环境复杂度,并且减小了计算量。但是,使用SURF等特征描述子需要目标具有丰富的纹理图案,因此,该方法对弱纹理目标提取特征较困难,存在弱纹理目标效果差的问题。《一种弱纹理物体位姿估计方法》(CN113223181A),针对弱纹理目标进行了优化,该发明分别通过彩色图像和点云,获得彩色嵌入特征和几何嵌入特征,然后利用自注意力机制提取位置依赖特征图,逐像素的融合后进行位姿估计。该发明能够丰富每个像素特征的信息,并自适应调整不同特征的权重,提高每个像素的识别精度。但是,该发明使用密集融合的方法,忽略了点云之间存在的局部特征对位姿估计的影响,从而限制了位姿估计的准确性。In the prior art, feature descriptors are used to extract target features to achieve pose estimation, for example, "A Method for Pose Estimation Combining SURF Descriptors and Autoencoders" (CN114037742A), the invention extracts SURF feature points of color images, and then The features extracted by the convolutional self-encoder and the corresponding pose information in the rendering data constitute an offline feature template, and the K feature vectors with the smallest distance in the feature template are selected to vote to obtain the 6D pose of the target. The invention reduces manual labeling, reduces the complexity of the environment, and reduces the amount of calculation. However, the use of feature descriptors such as SURF requires the target to have rich texture patterns. Therefore, this method is difficult to extract features for weak texture targets, and there is a problem that weak texture targets have poor effect. "A Pose Estimation Method for Weak Textured Objects" (CN113223181A), optimized for weak texture targets, the invention obtains color embedded features and geometric embedded features through color images and point clouds respectively, and then uses self-attention mechanism to extract positions Relying on the feature map, the pose estimation is performed after pixel-by-pixel fusion. The invention can enrich the information of each pixel feature, and adaptively adjust the weights of different features to improve the recognition accuracy of each pixel. However, this invention uses a dense fusion method, ignoring the influence of local features existing between point clouds on pose estimation, thus limiting the accuracy of pose estimation.
发明内容SUMMARY OF THE INVENTION
本发明所要解决的技术问题在于针对上述现有技术中的不足,提供一种基于特征融合的弱纹理目标位姿估计方法,其方法步骤简单,设计合理,实现方便,能够有效应用在弱纹理目标位姿估计中,解决了位姿估计在特征融合时对局部特征考虑不足的问题,提升了位姿估计的准确性,模型适配性更强,使用效果好,便于推广使用。。The technical problem to be solved by the present invention is to provide a weak texture target pose estimation method based on feature fusion in view of the above-mentioned deficiencies in the prior art. The method has simple steps, reasonable design, convenient implementation, and can be effectively applied to weak texture targets. In pose estimation, the problem that pose estimation does not consider local features in feature fusion is solved, the accuracy of pose estimation is improved, the model is more adaptable, the use effect is good, and it is easy to popularize and use. .
为解决上述技术问题,本发明采用的技术方案是:一种基于特征融合的弱纹理目标位姿估计方法,包括以下步骤:In order to solve the above-mentioned technical problems, the technical solution adopted in the present invention is: a method for estimating the pose of a weak texture target based on feature fusion, comprising the following steps:
步骤一、对目标所在的RGB图像进行语义分割,获得目标像素掩码和目标对应的最小包围框;Step 1. Semantically segment the RGB image where the target is located to obtain the target pixel mask and the minimum bounding box corresponding to the target;
步骤二、采用所述最小包围框裁剪RGB图像,获得裁剪后的RGB图像;Step 2, using the minimum bounding box to crop the RGB image to obtain the cropped RGB image;
步骤三、采用卷积神经网络提取裁剪后的RGB图像中目标的颜色特征;Step 3, using a convolutional neural network to extract the color feature of the target in the cropped RGB image;
步骤四、获取裁剪后的RGB图像的深度图像;Step 4: Obtain the depth image of the cropped RGB image;
步骤五、采用所述掩码对所述深度图像进行分割,转换为点云数据;Step 5, using the mask to segment the depth image and convert it into point cloud data;
步骤六、获取所述点云数据中的点特征、局部几何特征和全局几何特征;Step 6, obtaining point features, local geometric features and global geometric features in the point cloud data;
步骤七、将所述颜色特征与所述点特征、局部几何特征和全局几何特征进行融合,获得目标融合特征;Step 7, fuse the color feature with the point feature, the local geometric feature and the global geometric feature to obtain the target fusion feature;
步骤八、将所述目标融合特征输入到位姿估计网络,输出位姿估计结果。Step 8: Input the target fusion feature into the pose estimation network, and output the pose estimation result.
上述的一种基于特征融合的弱纹理目标位姿估计方法,步骤三中所述采用卷积神经网络提取裁剪后的RGB图像中目标的颜色特征的具体过程包括:In the above-mentioned method for estimating the pose of a weak texture target based on feature fusion, the specific process of using a convolutional neural network to extract the color feature of the target in the cropped RGB image described in step 3 includes:
步骤301、采用18个卷积层对裁剪后的RGB图像特征进行下采样,获得维度为512的特征;Step 301, using 18 convolutional layers to downsample the cropped RGB image features to obtain features with a dimension of 512;
步骤302、采用四个上采样层对特征进行上采样,获得32维的颜色特征。Step 302 , up-sampling the features using four up-sampling layers to obtain 32-dimensional color features.
上述的一种基于特征融合的弱纹理目标位姿估计方法,步骤六中所述获取点云数据中的点特征、局部几何特征和全局几何特征的具体过程包括:In the above-mentioned method for estimating the pose of a weak texture target based on feature fusion, the specific process of obtaining point features, local geometric features and global geometric features in the point cloud data described in step 6 includes:
步骤601、采用PointNet网络提取点云数据中的点特征;Step 601, using PointNet network to extract point features in point cloud data;
步骤602、随机选择256个位置点,对多个位置点的特征进行融合,减轻目标遮挡和分割存在噪声的影响;Step 602, randomly select 256 position points, and fuse the features of multiple position points to reduce the influence of target occlusion and segmentation noise;
步骤603、采用最远点采样的方式,找到空间中均匀分布的128个点;Step 603: Using the farthest point sampling method, find 128 points that are evenly distributed in the space;
步骤604、以每个均匀分布的点作为中心点,将固定半径的球体划分为一个局部区域;Step 604, taking each evenly distributed point as the center point, dividing the sphere of fixed radius into a local area;
步骤605、在每个局部区域内,对0.05cm、0.1cm和0.2cm三个空间尺度内的点云,采用PointNet提取特征,并进行连接聚集,形成局部几何特征;Step 605: In each local area, use PointNet to extract features for point clouds in three spatial scales of 0.05cm, 0.1cm and 0.2cm, and perform connection aggregation to form local geometric features;
步骤606、再采用最远点采样的方式,找到空间中均匀分布的64个点;Step 606, using the farthest point sampling method to find 64 points evenly distributed in the space;
步骤607、以每个均匀分布的点作为中心点,将固定半径的球体划分为一个局部区域;Step 607, using each evenly distributed point as the center point, divide the sphere of fixed radius into a local area;
步骤608、在每个局部区域内,对0.2cm和0.3cm两个空间尺度内的点云进行特征提取聚集;Step 608 , in each local area, perform feature extraction and aggregation on point clouds in two spatial scales of 0.2 cm and 0.3 cm;
步骤609、采用MLP在局部几何特征的基础上提取目标的全局几何特征。Step 609 , using MLP to extract the global geometric features of the target on the basis of the local geometric features.
上述的一种基于特征融合的弱纹理目标位姿估计方法,步骤七中所述将颜色特征与点特征、局部几何特征和全局几何特征进行融合,获得目标融合特征的具体过程包括:In the above-mentioned method for estimating the pose of a weak texture target based on feature fusion, as described in step 7, the color feature is fused with the point feature, the local geometric feature and the global geometric feature, and the specific process of obtaining the target fusion feature includes:
步骤701、对所述颜色特征进行一次1d卷积操作,再与所述点特征进行融合,形成点融合特征;Step 701: Perform a 1d convolution operation on the color feature, and then fuse with the point feature to form a point fusion feature;
步骤702、对所述点融合特征再次进行一次1d卷积操作,再与所述局部几何特征进行融合,形成局部融合特征;Step 702: Perform a 1d convolution operation on the point fusion feature again, and then fuse with the local geometric feature to form a local fusion feature;
步骤703、对步骤609中的全局几何特征、步骤701中的点融合特征和步骤702中的局部融合特征进行融合,形成最终的目标融合特征。Step 703 , fuse the global geometric feature in step 609 , the point fusion feature in step 701 and the local fusion feature in step 702 to form the final target fusion feature.
上述的一种基于特征融合的弱纹理目标位姿估计方法,步骤八中所述将目标融合特征输入到位姿估计网络,输出位姿估计结果的具体过程包括:In the above-mentioned method for estimating the pose of a weak texture target based on feature fusion, the target fusion feature is input into the pose estimation network as described in step 8, and the specific process of outputting the pose estimation result includes:
步骤801、所述目标融合特征作为训练集,训练位姿估计网络;Step 801, the target fusion feature is used as a training set to train a pose estimation network;
步骤802、所述位姿估计网络预测目标的旋转、平移以及位姿预测的置信度;Step 802, the pose estimation network predicts the rotation, translation and the confidence of the pose prediction of the target;
步骤803、以置信度最高的位置点所做出的位姿预测,作为初始位姿;Step 803, use the pose prediction made by the position point with the highest confidence as the initial pose;
步骤804、采用四层全连接网络对初始位姿进行优化,得到最终的位姿估计结果。Step 804 , using a four-layer fully connected network to optimize the initial pose to obtain a final pose estimation result.
上述的一种基于特征融合的弱纹理目标位姿估计方法,步骤八中所述位姿估计网络包括损失函数,所述损失函数通过置信度对位姿损失进行加权,所述损失函数Loss为:In the above-mentioned method for estimating the pose of a weak texture target based on feature fusion, the pose estimation network in step 8 includes a loss function, and the loss function weights the pose loss through confidence, and the loss function Loss is:
其中,i表示N个位置点的第i点,表示N个位置点中第i点的位姿损失,ci表示第i点预测位姿的置信度,ω代表平衡参数。Among them, i represents the ith point of N position points, Represents the pose loss of the ith point in the N position points, ci represents the confidence of the predicted pose of the ith point, and ω represents the balance parameter.
本发明与现有技术相比具有以下优点:Compared with the prior art, the present invention has the following advantages:
1、本发明方法步骤简单,设计合理,实现方便。1. The method of the present invention has simple steps, reasonable design and convenient implementation.
2、本发明通过关注点云数据局部之间存在细粒度的几何特征,获取更高质量的目标特征。2. The present invention obtains higher-quality target features by paying attention to the fine-grained geometric features existing between parts of the point cloud data.
3、本发明解决了位姿估计在特征融合时对局部特征考虑不足的问题,提升了位姿估计的准确性。3. The present invention solves the problem of insufficient consideration of local features during feature fusion in pose estimation, and improves the accuracy of pose estimation.
4、本发明能够有效应用在弱纹理目标位姿估计中,精度高,模型适配性更强,使用效果好,便于推广使用。4. The present invention can be effectively applied in the pose estimation of weak texture targets, has high precision, stronger model adaptability, good use effect, and is easy to popularize and use.
综上所述,本发明方法步骤简单,设计合理,实现方便,能够有效应用在弱纹理目标位姿估计中,解决了位姿估计在特征融合时对局部特征考虑不足的问题,提升了位姿估计的准确性,模型适配性更强,使用效果好,便于推广使用。To sum up, the method of the present invention has simple steps, reasonable design and convenient implementation, and can be effectively applied in the pose estimation of weak texture targets, solves the problem of insufficient consideration of local features during feature fusion in pose estimation, and improves the pose The accuracy of the estimation, the model adaptability is stronger, the use effect is good, and it is easy to popularize and use.
下面通过附图和实施例,对本发明的技术方案做进一步的详细描述。The technical solutions of the present invention will be further described in detail below through the accompanying drawings and embodiments.
附图说明Description of drawings
图1为本发明的方法流程图;Fig. 1 is the method flow chart of the present invention;
图2为本发明的网络结构图;Fig. 2 is the network structure diagram of the present invention;
图3为本发明各目标位姿估计结果可视化效果图。FIG. 3 is a visualization effect diagram of each target pose estimation result of the present invention.
具体实施方式Detailed ways
如图1和图2所示,本发明的基于特征融合的弱纹理目标位姿估计方法,包括以下步骤:As shown in Figure 1 and Figure 2, the feature fusion-based weak texture target pose estimation method of the present invention includes the following steps:
步骤一、对目标所在的RGB图像进行语义分割,获得目标像素掩码和目标对应的最小包围框;Step 1. Semantically segment the RGB image where the target is located to obtain the target pixel mask and the minimum bounding box corresponding to the target;
步骤二、采用所述最小包围框裁剪RGB图像,获得裁剪后的RGB图像;Step 2, using the minimum bounding box to crop the RGB image to obtain the cropped RGB image;
步骤三、采用卷积神经网络提取裁剪后的RGB图像中目标的颜色特征;Step 3, using a convolutional neural network to extract the color feature of the target in the cropped RGB image;
步骤四、获取裁剪后的RGB图像的深度图像;Step 4: Obtain the depth image of the cropped RGB image;
步骤五、采用所述掩码对所述深度图像进行分割,转换为点云数据;Step 5, using the mask to segment the depth image and convert it into point cloud data;
步骤六、获取所述点云数据中的点特征、局部几何特征和全局几何特征;Step 6, obtaining point features, local geometric features and global geometric features in the point cloud data;
步骤七、将所述颜色特征与所述点特征、局部几何特征和全局几何特征进行融合,获得目标融合特征;Step 7, fuse the color feature with the point feature, the local geometric feature and the global geometric feature to obtain the target fusion feature;
步骤八、将所述目标融合特征输入到位姿估计网络,输出位姿估计结果。Step 8: Input the target fusion feature into the pose estimation network, and output the pose estimation result.
本实施例中,步骤三中所述采用卷积神经网络提取裁剪后的RGB图像中目标的颜色特征的具体过程包括:In this embodiment, the specific process of using the convolutional neural network to extract the color feature of the target in the cropped RGB image described in step 3 includes:
步骤301、采用18个卷积层对裁剪后的RGB图像特征进行下采样,获得维度为512的特征;Step 301, using 18 convolutional layers to downsample the cropped RGB image features to obtain features with a dimension of 512;
步骤302、采用四个上采样层对特征进行上采样,获得32维的颜色特征。Step 302 , up-sampling the features using four up-sampling layers to obtain 32-dimensional color features.
本实施例中,步骤六中所述获取点云数据中的点特征、局部几何特征和全局几何特征的具体过程包括:In this embodiment, the specific process of acquiring point features, local geometric features and global geometric features in the point cloud data described in step 6 includes:
步骤601、采用PointNet网络提取点云数据中的点特征;Step 601, using PointNet network to extract point features in point cloud data;
步骤602、随机选择256个位置点,对多个位置点的特征进行融合,减轻目标遮挡和分割存在噪声的影响;Step 602, randomly select 256 position points, and fuse the features of multiple position points to reduce the influence of target occlusion and segmentation noise;
步骤603、采用最远点采样的方式,找到空间中均匀分布的128个点;Step 603: Using the farthest point sampling method, find 128 points that are evenly distributed in the space;
步骤604、以每个均匀分布的点作为中心点,将固定半径的球体划分为一个局部区域;Step 604, taking each evenly distributed point as the center point, dividing the sphere of fixed radius into a local area;
步骤605、在每个局部区域内,对0.05cm、0.1cm和0.2cm三个空间尺度内的点云,采用PointNet提取特征,并进行连接聚集,形成局部几何特征;Step 605: In each local area, use PointNet to extract features for point clouds in three spatial scales of 0.05cm, 0.1cm and 0.2cm, and perform connection aggregation to form local geometric features;
步骤606、再采用最远点采样的方式,找到空间中均匀分布的64个点;Step 606, using the farthest point sampling method to find 64 points evenly distributed in the space;
步骤607、以每个均匀分布的点作为中心点,将固定半径的球体划分为一个局部区域;Step 607, using each evenly distributed point as the center point, divide the sphere of fixed radius into a local area;
步骤608、在每个局部区域内,对0.2cm和0.3cm两个空间尺度内的点云进行特征提取聚集;Step 608 , in each local area, perform feature extraction and aggregation on point clouds in two spatial scales of 0.2 cm and 0.3 cm;
步骤609、采用MLP在局部几何特征的基础上提取目标的全局几何特征。Step 609 , using MLP to extract the global geometric features of the target on the basis of the local geometric features.
本实施例中,步骤七中所述将颜色特征与点特征、局部几何特征和全局几何特征进行融合,获得目标融合特征的具体过程包括:In this embodiment, as described in step 7, the color feature is fused with the point feature, the local geometric feature and the global geometric feature, and the specific process of obtaining the target fusion feature includes:
步骤701、对所述颜色特征进行一次1d卷积操作,再与所述点特征进行融合,形成点融合特征;Step 701: Perform a 1d convolution operation on the color feature, and then fuse with the point feature to form a point fusion feature;
步骤702、对所述点融合特征再次进行一次1d卷积操作,再与所述局部几何特征进行融合,形成局部融合特征;Step 702: Perform a 1d convolution operation on the point fusion feature again, and then fuse with the local geometric feature to form a local fusion feature;
步骤703、对步骤609中的全局几何特征、步骤701中的点融合特征和步骤702中的局部融合特征进行融合,形成最终的目标融合特征。Step 703 , fuse the global geometric feature in step 609 , the point fusion feature in step 701 and the local fusion feature in step 702 to form the final target fusion feature.
本实施例中,步骤八中所述将目标融合特征输入到位姿估计网络,输出位姿估计结果的具体过程包括:In this embodiment, the target fusion feature is input into the pose estimation network described in step 8, and the specific process of outputting the pose estimation result includes:
步骤801、所述目标融合特征作为训练集,训练位姿估计网络;Step 801, the target fusion feature is used as a training set to train a pose estimation network;
步骤802、所述位姿估计网络预测目标的旋转、平移以及位姿预测的置信度;Step 802, the pose estimation network predicts the rotation, translation and the confidence of the pose prediction of the target;
步骤803、以置信度最高的位置点所做出的位姿预测,作为初始位姿;Step 803, use the pose prediction made by the position point with the highest confidence as the initial pose;
步骤804、采用四层全连接网络对初始位姿进行优化,得到最终的位姿估计结果。Step 804 , using a four-layer fully connected network to optimize the initial pose to obtain a final pose estimation result.
本实施例中,步骤八中所述位姿估计网络包括损失函数,所述损失函数通过置信度对位姿损失进行加权,所述损失函数Loss为:In this embodiment, the pose estimation network in step 8 includes a loss function, and the loss function weights the pose loss through the confidence, and the loss function Loss is:
其中,i表示N个位置点的第i点,表示N个位置点中第i点的位姿损失,ci表示第i点预测位姿的置信度,ω代表平衡参数。Among them, i represents the ith point of N position points, Represents the pose loss of the ith point in the N position points, ci represents the confidence of the predicted pose of the ith point, and ω represents the balance parameter.
具体实施时,为目标3D模型采样点的坐标,分别通过Ground Truth位姿矩阵[R|t]和估计位姿矩阵变换之后,对应点坐标之间的平均距离。计算公式为:When implemented, The coordinates of the sampling points of the target 3D model are passed through the Ground Truth pose matrix [R|t] and the estimated pose matrix respectively. After transformation, the average distance between corresponding point coordinates. The calculation formula is:
其中,M表示目标3D模型采样点集合,i表示N个位置点的第i点,p用作标记,表示此处为位姿损失,xj表示采样点集合的第j点,(Rxj+t)表示采样点通过Ground Truth位姿变换后的坐标,表示通过第点估计位姿变换后的坐标。Among them, M represents the set of sampling points of the target 3D model, i represents the ith point of the N position points, p is used as a marker, which means the pose loss here, x j represents the jth point of the set of sampling points, (Rx j + t) represents the coordinates of the sampling point transformed by the Ground Truth pose, Represents the coordinates transformed by the estimated pose of the point.
对于具有旋转对称性的目标,位姿损失定义为:目标3D模型采样点坐标,通过估计位姿矩阵变换后的坐标,与通过Ground Truth位姿矩阵[R|t]变换之后,最近点坐标之间的平均距离,计算公式为:For targets with rotational symmetry, the pose loss is defined as: the coordinates of the sample points of the target 3D model, by estimating the pose matrix The average distance between the transformed coordinates and the coordinates of the closest point after transformation by the Ground Truth pose matrix [R|t], the calculation formula is:
其中,xk表示与xj距离最近的点。Among them, x k represents the point closest to x j .
本发明基于Intel(R)Xeon(R)CPU E5-2678 v3 2.50GHz处理器,NVIDIA GeForceRTX 2080显卡,在Ubuntu16.04系统下,使用Pytorch0.4.1及Python3.6进行开发,使用CUDA9.0及cuDNN7.6.4进行网络加速训练。The invention is based on Intel(R) Xeon(R) CPU E5-2678 v3 2.50GHz processor, NVIDIA GeForceRTX 2080 graphics card, under Ubuntu16.04 system, using Pytorch0.4.1 and Python3.6 for development, using CUDA9.0 and cuDNN7 .6.4 Perform network acceleration training.
表1为本文方法在LineMOD数据集的初始位姿结果及位姿优化结果。该数据集包含13种复杂背景下的弱纹理目标序列,每个序列包含1100-1300张RGB-D图像。分别为Ape、Benchvise、Camera、Can、Cat、Driller、Duck、Eggbox、Glue、Holepuncher、Iron、Lamp以及Phone。其中,Eggbox与Glue两种目标具有旋转对称性,每种目标尺寸均不相同。实验将每种目标15%的RGB-D图像划分为训练集,其余为测试集。训练集包含13种目标总2372张RGB-D图像,测试集包含13种目标总13406张RGB-D图像。Table 1 shows the initial pose results and pose optimization results of our method in the LineMOD dataset. The dataset contains 13 weakly textured object sequences against complex backgrounds, and each sequence contains 1100-1300 RGB-D images. They are Ape, Benchvise, Camera, Can, Cat, Driller, Duck, Eggbox, Glue, Holepuncher, Iron, Lamp and Phone. Among them, Eggbox and Glue have rotational symmetry, and the size of each target is different. In the experiment, 15% of the RGB-D images of each target are divided into training sets and the rest are test sets. The training set contains a total of 2372 RGB-D images for 13 targets, and the test set contains a total of 13406 RGB-D images for 13 targets.
对比指标采用ADD精度,ADD计算公式如下:The comparison index adopts ADD accuracy, and the ADD calculation formula is as follows:
其中,Numpre表示正确位姿估计的数量,NumGT表示全部真实位姿的数量。如果ADD小于目标最大直径值的10%,则认为位姿估计正确,ADD计算公式为:Among them, Num pre represents the number of correct pose estimates, and Num GT represents the number of all true poses. If the ADD is less than 10% of the maximum diameter of the target, the pose estimation is considered correct. The ADD calculation formula is:
精度越高表示位姿估计方法越好。The higher the accuracy, the better the pose estimation method.
表1位姿优化结果Table 1 Pose optimization results
本发明方法在提取点云局部几何特征时,在局部区域内对不同半径空间的点云分别提取点云特征,由于同时对不同的目标进行训练,所以选择的多尺度半径大小相同。导致对于体型较小的目标来说,局部几何特征提取不够精细,从而影响实验精度。对于体型大中体型的目标,位姿估计精度较优。When extracting the local geometric features of the point cloud, the method of the invention extracts the point cloud features respectively from the point clouds of different radius spaces in the local area. Since different targets are trained at the same time, the selected multi-scale radii are the same. As a result, the extraction of local geometric features is not precise enough for small targets, which affects the experimental accuracy. For objects of large and medium size, the pose estimation accuracy is better.
为了更进一步验证本发明方法的效果,目标的位姿估计结果进行了可视化,其结果图如图3所示。In order to further verify the effect of the method of the present invention, the pose estimation result of the target is visualized, and the result is shown in Figure 3.
以上所述,仅是本发明的较佳实施例,并非对本发明作任何限制,凡是根据本发明技术实质对以上实施例所作的任何简单修改、变更以及等效结构变化,均仍属于本发明技术方案的保护范围内。The above are only preferred embodiments of the present invention and do not limit the present invention. Any simple modifications, changes and equivalent structural changes made to the above embodiments according to the technical essence of the present invention still belong to the technology of the present invention. within the scope of the program.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210623453.4A CN114821263B (en) | 2022-06-01 | 2022-06-01 | A pose estimation method for weakly textured targets based on feature fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210623453.4A CN114821263B (en) | 2022-06-01 | 2022-06-01 | A pose estimation method for weakly textured targets based on feature fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114821263A true CN114821263A (en) | 2022-07-29 |
CN114821263B CN114821263B (en) | 2025-01-14 |
Family
ID=82520157
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210623453.4A Active CN114821263B (en) | 2022-06-01 | 2022-06-01 | A pose estimation method for weakly textured targets based on feature fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114821263B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115331263A (en) * | 2022-09-19 | 2022-11-11 | 北京航空航天大学 | Robust attitude estimation method and application thereof in orientation judgment and related method |
CN115661929A (en) * | 2022-10-28 | 2023-01-31 | 北京此刻启动科技有限公司 | Time sequence feature coding method and device, electronic equipment and storage medium |
CN115880717A (en) * | 2022-10-28 | 2023-03-31 | 北京此刻启动科技有限公司 | Heatmap key point prediction method and device, electronic equipment and storage medium |
CN118470114A (en) * | 2024-05-28 | 2024-08-09 | 广州维希尔智能技术有限公司 | A 6D pose estimation method for robot grasping tasks |
WO2025051040A1 (en) * | 2023-09-04 | 2025-03-13 | 华为技术有限公司 | Pose determination method and apparatus, and electronic device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111179324A (en) * | 2019-12-30 | 2020-05-19 | 同济大学 | Object six-degree-of-freedom pose estimation method based on color and depth information fusion |
CN113065546A (en) * | 2021-02-25 | 2021-07-02 | 湖南大学 | A target pose estimation method and system based on attention mechanism and Hough voting |
CN113221647A (en) * | 2021-04-08 | 2021-08-06 | 湖南大学 | 6D pose estimation method fusing point cloud local features |
CN113284184A (en) * | 2021-05-24 | 2021-08-20 | 湖南大学 | Robot RGBD visual perception oriented 6D pose estimation method and system |
CN114299150A (en) * | 2021-12-31 | 2022-04-08 | 河北工业大学 | A deep 6D pose estimation network model and workpiece pose estimation method |
-
2022
- 2022-06-01 CN CN202210623453.4A patent/CN114821263B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111179324A (en) * | 2019-12-30 | 2020-05-19 | 同济大学 | Object six-degree-of-freedom pose estimation method based on color and depth information fusion |
CN113065546A (en) * | 2021-02-25 | 2021-07-02 | 湖南大学 | A target pose estimation method and system based on attention mechanism and Hough voting |
CN113221647A (en) * | 2021-04-08 | 2021-08-06 | 湖南大学 | 6D pose estimation method fusing point cloud local features |
CN113284184A (en) * | 2021-05-24 | 2021-08-20 | 湖南大学 | Robot RGBD visual perception oriented 6D pose estimation method and system |
CN114299150A (en) * | 2021-12-31 | 2022-04-08 | 河北工业大学 | A deep 6D pose estimation network model and workpiece pose estimation method |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115331263A (en) * | 2022-09-19 | 2022-11-11 | 北京航空航天大学 | Robust attitude estimation method and application thereof in orientation judgment and related method |
CN115331263B (en) * | 2022-09-19 | 2023-11-07 | 北京航空航天大学 | Robust attitude estimation method and its application in orientation judgment and related methods |
CN115661929A (en) * | 2022-10-28 | 2023-01-31 | 北京此刻启动科技有限公司 | Time sequence feature coding method and device, electronic equipment and storage medium |
CN115880717A (en) * | 2022-10-28 | 2023-03-31 | 北京此刻启动科技有限公司 | Heatmap key point prediction method and device, electronic equipment and storage medium |
CN115661929B (en) * | 2022-10-28 | 2023-11-17 | 北京此刻启动科技有限公司 | Time sequence feature coding method and device, electronic equipment and storage medium |
CN115880717B (en) * | 2022-10-28 | 2023-11-17 | 北京此刻启动科技有限公司 | Heat map key point prediction method and device, electronic equipment and storage medium |
WO2025051040A1 (en) * | 2023-09-04 | 2025-03-13 | 华为技术有限公司 | Pose determination method and apparatus, and electronic device |
CN118470114A (en) * | 2024-05-28 | 2024-08-09 | 广州维希尔智能技术有限公司 | A 6D pose estimation method for robot grasping tasks |
CN118470114B (en) * | 2024-05-28 | 2025-02-11 | 广州维希尔智能技术有限公司 | 6D pose estimation method applied to robot grabbing task |
Also Published As
Publication number | Publication date |
---|---|
CN114821263B (en) | 2025-01-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114821263A (en) | Weak texture target pose estimation method based on feature fusion | |
CN110188598B (en) | A Real-time Hand Pose Estimation Method Based on MobileNet-v2 | |
CN111627065B (en) | A visual positioning method, device, and storage medium | |
WO2020108362A1 (en) | Body posture detection method, apparatus and device, and storage medium | |
WO2019018063A1 (en) | Fine-grained image recognition | |
CN109086777B (en) | Saliency map refining method based on global pixel characteristics | |
CN109063139B (en) | Three-dimensional model classification and retrieval method based on panorama and multi-channel CNN | |
CN112446381B (en) | A Hybrid Semantic Segmentation Method Based on Geodesic Active Contours Driven by Fully Convolutional Networks | |
CN113159232A (en) | Three-dimensional target classification and segmentation method | |
CN111414953A (en) | Point cloud classification method and device | |
Kang et al. | Yolo-6d+: single shot 6d pose estimation using privileged silhouette information | |
WO2023207531A1 (en) | Image processing method and related device | |
CN116912299A (en) | Medical image registration method, device, equipment and medium of motion decomposition model | |
CN115830375A (en) | Point cloud classification method and device | |
CN112734772A (en) | Image processing method, image processing apparatus, electronic device, and storage medium | |
WO2023109086A1 (en) | Character recognition method, apparatus and device, and storage medium | |
CN110717405A (en) | Face feature point positioning method, device, medium and electronic equipment | |
CN118097467A (en) | Aviation image directed target detection method based on semi-supervised learning | |
CN117495924A (en) | A point cloud registration method with adaptive attention mechanism | |
Liu et al. | CMT-6D: a lightweight iterative 6DoF pose estimation network based on cross-modal Transformer | |
CN117237643A (en) | A point cloud semantic segmentation method and system | |
CN116740134A (en) | Image target tracking method, device and equipment based on hierarchical attention strategy | |
Shi et al. | Light-weight 3D mesh generation networks based on multi-stage and progressive knowledge distillation | |
CN114677508A (en) | A point cloud instance semantic segmentation method based on dynamic filtering and point-by-point correlation | |
CN118982688B (en) | Information extraction method, information extraction device, electronic equipment and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |