CN114637880A

CN114637880A - Cross-dimensional data retrieval method based on multi-view projection

Info

Publication number: CN114637880A
Application number: CN202210151825.8A
Authority: CN
Inventors: 刘伟权; 王程; 赖柏锜; 臧彧; 沈思淇; 温程璐; 程明
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2022-02-18
Filing date: 2022-02-18
Publication date: 2022-06-17
Anticipated expiration: 2042-02-18
Also published as: CN114637880B

Abstract

The invention provides a cross-dimensional data retrieval method based on multi-view projection, which comprises the following steps: acquiring two-dimensional image data and a corresponding matched original three-dimensional point cloud; carrying out voxelization processing on the corresponding matched original three-dimensional point cloud to obtain a corresponding voxel; projecting the corresponding voxels to a two-dimensional space to generate a point cloud multi-view projection image corresponding to each two-dimensional image; constructing a deep learning model according to the twin network, and inputting two-dimensional image data and the correspondingly matched point cloud multi-view projection image into the deep learning model for training; acquiring a plurality of two-dimensional images and three-dimensional point clouds to be retrieved, and retrieving the three-dimensional point clouds from the two-dimensional images based on the trained deep learning model to obtain the three-dimensional point clouds which are most matched in all the three-dimensional point clouds of each two-dimensional image to be retrieved; therefore, the data difference between the point cloud data and the two-dimensional image in the cross-dimensional matching process can be reduced, and the retrieval accuracy rate from the two-dimensional image to the three-dimensional point cloud is improved.

Description

Cross-dimensional data retrieval method based on multi-view projection

技术领域technical field

本发明涉及增强现实技术领域，特别涉及一种基于多视角投影的跨维数据检索方法、一种计算机可读存储介质和一种计算机设备。The present invention relates to the technical field of augmented reality, and in particular, to a cross-dimensional data retrieval method based on multi-view projection, a computer-readable storage medium and a computer device.

背景技术Background technique

相关技术中，基于检索的位姿估计方法，分为二维图像到二维图像的检索和二维图像到三维模型的检索两个方式；其中，二维图像到二维图像的检索，需要将海量二维图像三维重建到三维空间中，然后将目标图像与三维空间中的所有图像进行检索匹配，并通过PnP 估计相机位姿，该方法由于二维图像的局限性，容易受到角度的影响，难以运用到复杂场景中；而二维图像到三维模型的检索，是将二维图像与预先建立好的三维模型进行匹配，由于三维模型具有旋转不变性，所以具有更高的鲁棒性；但是，将二维图像和三维点云进行跨模态匹配的研究较为匮乏，并且不同的数据存在数据维度差异和数据结构差异，使跨维匹配难以完成，从而使得跨维数据检索的准确率低。In the related art, the retrieval-based pose estimation method is divided into two ways: retrieval from 2D images to 2D images and retrieval from 2D images to 3D models; among them, retrieval from 2D images to 2D images requires Massive 2D images are 3D reconstructed into 3D space, and then the target image is retrieved and matched with all the images in the 3D space, and the camera pose is estimated by PnP. Due to the limitation of 2D images, this method is easily affected by the angle. It is difficult to apply to complex scenes; and the retrieval of 2D images to 3D models is to match the 2D images with the pre-established 3D models. Since the 3D models are rotationally invariant, they are more robust; however, However, the research on cross-modal matching of two-dimensional images and three-dimensional point clouds is relatively scarce, and there are differences in data dimensions and data structures between different data, making cross-dimensional matching difficult to complete, resulting in low accuracy of cross-dimensional data retrieval.

发明内容SUMMARY OF THE INVENTION

本发明旨在至少在一定程度上解决上述技术中的技术问题之一。为此，本发明的一个目的在于提出一种基于多视角投影的跨维数据检索方法，通过多视角投影的点云处理方法，缩小点云数据在跨维匹配中与二维图像的数据差异，从而提高二维图像到三维点云的检索准确率。The present invention aims to solve one of the technical problems in the above technologies at least to a certain extent. To this end, an object of the present invention is to propose a cross-dimensional data retrieval method based on multi-view projection, through the point cloud processing method of multi-view projection, to reduce the data difference between point cloud data and two-dimensional images in cross-dimensional matching, Thus, the retrieval accuracy of 2D image to 3D point cloud can be improved.

本发明的第二个目的在于提出一种计算机可读存储介质。A second object of the present invention is to provide a computer-readable storage medium.

本发明的第三个目的在于提出一种计算机设备。The third object of the present invention is to propose a computer device.

为达到上述目的，本发明第一方面实施例提出了一种基于多视角投影的跨维数据检索方法，包括以下步骤：获取二维图像数据和与所述二维图像数据中每个二维图像对应匹配的原始三维点云；对所述二维图像数据中每个二维图像对应匹配的原始三维点云进行体素化处理，以得到对应的体素；将所述对应的体素投影到二维空间以生成每个二维图像对应匹配的点云多视角投影图像；根据孪生网络构建深度学习模型，并将所述二维图像数据和与所述二维图像数据中每个二维图像对应匹配的点云多视角投影图像输入到所述深度学习模型进行训练；获取多个待检索的二维图像和三维点云，将所述多个待检索的三维点云输入到训练好的深度学习模型中以得到点云特征描述，并将所述多个待检索的二维图像输入到训练好的深度学习模型中以得到图像特征描述，以及根据所述点云特征描述和所述图像特征描述检索每个待检索的二维图像在所有三维点云中最匹配的三维点云。In order to achieve the above object, the embodiment of the first aspect of the present invention proposes a cross-dimensional data retrieval method based on multi-view projection, which includes the following steps: acquiring two-dimensional image data and corresponding to each two-dimensional image in the two-dimensional image data. Correspondingly matched original three-dimensional point cloud; performing voxelization processing on the corresponding matched original three-dimensional point cloud of each two-dimensional image in the two-dimensional image data to obtain corresponding voxels; projecting the corresponding voxels to Two-dimensional space to generate a corresponding matching point cloud multi-view projection image of each two-dimensional image; build a deep learning model according to the twin network, and combine the two-dimensional image data with each two-dimensional image in the two-dimensional image data. The corresponding matching point cloud multi-view projection images are input into the deep learning model for training; a plurality of 2D images and 3D point clouds to be retrieved are obtained, and the plurality of 3D point clouds to be retrieved are input into the trained depth In the learning model to obtain the point cloud feature description, and input the plurality of two-dimensional images to be retrieved into the trained deep learning model to obtain the image feature description, and according to the point cloud feature description and the image feature description Description Retrieves the 3D point cloud that best matches among all 3D point clouds for each 2D image to be retrieved.

根据本发明实施例的基于多视角投影的跨维数据检索方法，首先获取二维图像数据和与二维图像数据中每个二维图像对应匹配的原始三维点云；然后对二维图像数据中每个二维图像对应匹配的原始三维点云进行体素化处理，以得到对应的体素；接着将对应的体素投影到二维空间以生成每个二维图像对应匹配的点云多视角投影图像；再接着根据孪生网络构建深度学习模型，并将二维图像数据和与二维图像数据中每个二维图像对应匹配的点云多视角投影图像输入到深度学习模型进行训练；最后获取多个待检索的二维图像和三维点云，并基于训练好的深度学习模型对每个待检索的二维图像进行检索，以得到每个待检索的二维图像在所有三维点云中最匹配的三维点云；由此，通过多视角投影的点云处理方法，缩小点云数据在跨维匹配中与二维图像的数据差异，从而提高二维图像到三维点云的检索准确率。According to the cross-dimensional data retrieval method based on multi-view projection according to the embodiment of the present invention, firstly obtain two-dimensional image data and an original three-dimensional point cloud corresponding to each two-dimensional image in the two-dimensional image data; Each 2D image corresponds to the matched original 3D point cloud to be voxelized to obtain the corresponding voxel; then the corresponding voxel is projected into the 2D space to generate a multi-view point cloud corresponding to each 2D image. projection image; then build a deep learning model according to the twin network, and input the two-dimensional image data and the point cloud multi-view projection image corresponding to each two-dimensional image in the two-dimensional image data into the deep learning model for training; finally obtain There are multiple 2D images and 3D point clouds to be retrieved, and each 2D image to be retrieved is retrieved based on the trained deep learning model, so that each 2D image to be retrieved is the most among all 3D point clouds. Matching 3D point cloud; thus, through the point cloud processing method of multi-view projection, the data difference between the point cloud data and the 2D image in the cross-dimensional matching is reduced, thereby improving the retrieval accuracy of the 2D image to the 3D point cloud.

另外，根据本发明上述实施例提出的基于多视角投影的跨维数据检索方法还可以具有如下附加的技术特征：In addition, the cross-dimensional data retrieval method based on multi-view projection proposed according to the above-mentioned embodiments of the present invention may also have the following additional technical features:

可选地，对所述二维图像数据中每个二维图像对应匹配的原始三维点云进行体素化处理，以得到对应的体素，包括：将原始三维点云的立方体边界框作为边界进行体素化空间划分；在所述划分的体素化空间中均匀分割多个立方体，并将每个立方体作为一个体素；将每个体素的体素值定义为每个立方体空间中包含的点云RGB值的均值。Optionally, performing voxelization processing on the original three-dimensional point cloud corresponding to each two-dimensional image in the two-dimensional image data to obtain corresponding voxels, including: taking the cubic bounding box of the original three-dimensional point cloud as the boundary Perform voxelized space division; evenly divide multiple cubes in the divided voxelized space, and treat each cube as a voxel; define the voxel value of each voxel as The mean of the point cloud RGB values.

可选地，在对所述二维图像数据中每个二维图像对应匹配的原始三维点云进行体素化处理之前还包括：对所述原始三维点云进行角度随机旋转，以使得到的三维体素具有旋转随机性。Optionally, before performing the voxelization process on the original three-dimensional point cloud corresponding to each two-dimensional image in the two-dimensional image data, it also includes: performing random angle rotation on the original three-dimensional point cloud, so that the obtained 3D voxels have rotational randomness.

可选地，根据孪生网络构建深度学习模型，包括：Optionally, build a deep learning model based on the Siamese network, including:

采用深度学习孪生网络结构框架搭建，设计具有图像分支和点云分支的非对称结构双分支网络；The deep learning twin network structure framework is used to build, and an asymmetric structure dual-branch network with image branch and point cloud branch is designed;

其中，所述图像分支包括基于卷积网络的图像特征提取网络，以便通过所述图像特征提取网络处理二维图像数据，以得到对应的图像特征描述；其中，所述点云分支包括基于点云多视角投影的点云特征提取网络和融合点云纹理特征和点云结构特征的融合网络，点云特征提取网络包含纹理感知器和结构感知器，以便通过纹理感知器和结构感知器对点云数据进行处理，以得到点云纹理特征和点云结构特征；融合点云纹理特征和点云结构特征的融合网络，同时接收所述点云纹理特征和点云结构特征，融合以得到对应的点云特征描述；局部特征损失函数设计，损失函数依照训练过程中采样的二维图像负样本和三维点云负样本，使图像特征描述和点云特征描述的距离拉近，使图像特征描述与三维点云负样本以及点云特征描述与二维图像负样本的距离疏远。Wherein, the image branch includes an image feature extraction network based on a convolutional network, so as to process two-dimensional image data through the image feature extraction network to obtain a corresponding image feature description; wherein, the point cloud branch includes a point cloud-based The point cloud feature extraction network of multi-view projection and the fusion network that fuses point cloud texture features and point cloud structure features, the point cloud feature extraction network contains texture perceptron and structure perceptron, so that the point cloud The data is processed to obtain point cloud texture features and point cloud structure features; a fusion network that fuses point cloud texture features and point cloud structure features receives the point cloud texture features and point cloud structure features at the same time, and fuses them to obtain corresponding points Cloud feature description; local feature loss function design, the loss function is based on the 2D image negative samples and 3D point cloud negative samples sampled in the training process, so that the distance between the image feature description and the point cloud feature description is shortened, so that the image feature description and the 3D point cloud feature description are closer. The distance between the point cloud negative samples and the point cloud feature description is estranged from the two-dimensional image negative samples.

可选地，所述纹理感知器用于处理点云多视角投影图像，感知点云包含的纹理信息；所述纹理感知器包括卷积网络和特征融合函数。Optionally, the texture perceptron is used for processing point cloud multi-view projection images, and perceiving texture information contained in the point cloud; the texture perceptron includes a convolutional network and a feature fusion function.

可选地，所述卷积网络用于处理多角度投影得到n个长度固定d的一维特征描述；特征融合函数为对输入顺序不敏感的对称函数，n个固定长度d的一维特征描述由特征融合函数变换为一个固定长度d的一维特征描述t，将所述一个固定长度d的一维特征描述t作为点云纹理特征。Optionally, the convolutional network is used to process multi-angle projections to obtain n one-dimensional feature descriptions with a fixed length d; the feature fusion function is a symmetrical function that is insensitive to the input order, and n one-dimensional feature descriptions with a fixed length d. It is transformed into a one-dimensional feature description t with a fixed length d by a feature fusion function, and the one-dimensional feature description t with a fixed length d is used as a point cloud texture feature.

可选地，所述结构感知器用于处理原始点云数据，感知原始点云数据包含的结构信息，其中，所述结构感知器采用基于PointNet网络结构的特征提取方法，得到一个固定长度d 的一维特征描述s，将所述一个固定长度d的一维特征描述s作为点云结构特征。Optionally, the structure perceptron is used to process the original point cloud data, and perceive the structure information contained in the original point cloud data, wherein, the structure perceptron adopts a feature extraction method based on the PointNet network structure to obtain a fixed length d. The one-dimensional feature description s is the one-dimensional feature description s with a fixed length d as the point cloud structure feature.

为达到上述目的，本发明第二方面实施例提出了一种计算机可读存储介质，其上存储有基于多视角投影的跨维数据检索程序，该基于多视角投影的跨维数据检索程序被处理器执行时实现如上述的基于多视角投影的跨维数据检索方法。In order to achieve the above object, a second aspect of the present invention provides a computer-readable storage medium on which a multi-view projection-based cross-dimensional data retrieval program is stored, and the multi-view projection-based cross-dimensional data retrieval program is processed. The multi-view projection-based cross-dimensional data retrieval method as described above is implemented when the processor is executed.

根据本发明实施例的计算机可读存储介质，通过存储基于多视角投影的跨维数据检索程序，以便处理器在执行该基于多视角投影的跨维数据检索程序时实现如上述的基于多视角投影的跨维数据检索方法，从而提高二维图像到三维点云的检索准确率。According to the computer-readable storage medium of the embodiment of the present invention, the multi-view projection-based cross-dimensional data retrieval program is stored, so that the processor implements the multi-view projection-based multi-view projection-based data retrieval program as described above when the multi-view projection-based cross-dimensional data retrieval program is executed. The cross-dimensional data retrieval method can improve the retrieval accuracy of 2D image to 3D point cloud.

为达到上述目的，本发明第三方面实施例提出了一种计算机设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述计算机程序时，实现如上述的基于多视角投影的跨维数据检索方法。In order to achieve the above object, a third aspect of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and running on the processor, when the processor executes the computer program , to realize the cross-dimensional data retrieval method based on multi-view projection as described above.

根据本发明实施例的计算机设备，通过存储器存储可在处理器上运行的计算机程序，以便处理器在执行该计算机程序时，实现如上述的基于多视角投影的跨维数据检索方法，从而提高二维图像到三维点云的检索准确率。According to the computer device of the embodiment of the present invention, a computer program that can be run on the processor is stored in the memory, so that when the processor executes the computer program, the multi-view projection-based cross-dimensional data retrieval method as described above is implemented, thereby improving the two-dimensionality. Retrieval accuracy from 3D image to 3D point cloud.

附图说明Description of drawings

图1为根据本发明实施例的基于多视角投影的跨维数据检索方法的流程示意图；1 is a schematic flowchart of a method for retrieving cross-dimensional data based on multi-view projection according to an embodiment of the present invention;

图2为根据本发明一个实施例的二维三维共同特征描述网络结构示意图；2 is a schematic diagram of a two-dimensional and three-dimensional common feature description network structure according to an embodiment of the present invention;

图3为根据本发明一个实施例的点云分支的网络模型结构示意图；3 is a schematic structural diagram of a network model of a point cloud branch according to an embodiment of the present invention;

图4为根据本发明一个实施例的平面点云可视化及其投影示意图；4 is a schematic diagram of a plane point cloud visualization and its projection according to an embodiment of the present invention;

图5为根据本发明一个实施例的成对匹配的二维图像-三维点云；5 is a paired matching 2D image-3D point cloud according to an embodiment of the present invention;

图6为根据本发明一个实施例的基于难样本的负样本采样策略示意图；6 is a schematic diagram of a negative sample sampling strategy based on difficult samples according to an embodiment of the present invention;

图7为根据本发明一个实施例的二维图像到三维点云的检索结果示意图；7 is a schematic diagram of a retrieval result from a two-dimensional image to a three-dimensional point cloud according to an embodiment of the present invention;

图8为根据本发明一个实施例的二维三维共同特征描述子可视化示意图。FIG. 8 is a schematic diagram of visualization of two-dimensional and three-dimensional common feature descriptors according to an embodiment of the present invention.

具体实施方式Detailed ways

下面详细描述本发明的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，旨在用于解释本发明，而不能理解为对本发明的限制。The following describes in detail the embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary, and are intended to explain the present invention and should not be construed as limiting the present invention.

为了更好的理解上述技术方案，下面将参照附图更详细地描述本发明的示例性实施例。虽然附图中显示了本发明的示例性实施例，然而应当理解，可以以各种形式实现本发明而不应被这里阐述的实施例所限制。相反，提供这些实施例是为了能够更透彻地理解本发明，并且能够将本发明的范围完整的传达给本领域的技术人员。For better understanding of the above technical solutions, exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that the present invention will be more thoroughly understood, and will fully convey the scope of the present invention to those skilled in the art.

为了更好的理解上述技术方案，下面将结合说明书附图以及具体的实施方式对上述技术方案进行详细的说明。In order to better understand the above technical solutions, the above technical solutions will be described in detail below with reference to the accompanying drawings and specific embodiments.

图1为根据本发明实施例的基于多视角投影的跨维数据检索方法的流程示意图，如图1 所示，本发明实施例的基于多视角投影的跨维数据检索方法包括以下步骤：1 is a schematic flowchart of a multi-view projection-based cross-dimensional data retrieval method according to an embodiment of the present invention. As shown in FIG. 1 , the multi-view projection-based cross-dimensional data retrieval method according to an embodiment of the present invention includes the following steps:

步骤101，获取二维图像数据和与二维图像数据中每个二维图像对应匹配的原始三维点云。Step 101: Acquire two-dimensional image data and an original three-dimensional point cloud corresponding to each two-dimensional image in the two-dimensional image data.

作为一个实施例，二维图像数据可通过相机获取，原始三维点云可通过激光雷达获取，本发明对此不作具体限定。As an embodiment, the two-dimensional image data may be acquired by a camera, and the original three-dimensional point cloud may be acquired by a lidar, which is not specifically limited in the present invention.

步骤102，对二维图像数据中每个二维图像对应匹配的原始三维点云进行体素化处理，以得到对应的体素。Step 102: Perform voxelization processing on the original three-dimensional point cloud corresponding to each two-dimensional image in the two-dimensional image data, so as to obtain corresponding voxels.

作为一个实施例，对二维图像数据中每个二维图像对应匹配的原始三维点云进行体素化处理，以得到对应的体素，包括：将原始三维点云的立方体边界框作为边界进行体素化空间划分；在划分的体素化空间中均匀分割多个立方体，并将每个立方体作为一个体素；将每个体素的体素值定义为每个立方体空间中包含的点云RGB值的均值。As an embodiment, performing voxelization processing on the original 3D point cloud corresponding to each 2D image in the 2D image data to obtain corresponding voxels, including: using the cube bounding box of the original 3D point cloud as a boundary Voxelized space division; evenly divides multiple cubes in the divided voxelized space, and treats each cube as a voxel; defines the voxel value of each voxel as the point cloud RGB contained in each cube space the mean of the values.

需要说明的是，对三维点云数据进行体素化，为后续投影做前期准备。It should be noted that the 3D point cloud data is voxelized to prepare for the subsequent projection.

作为一个具体实施例，包括以下步骤：As a specific embodiment, the following steps are included:

S21、以原始三维点云的立方体边界框作为边界，划分体素化空间。S21 , using the cube bounding box of the original three-dimensional point cloud as a boundary to divide the voxelized space.

S22、在S21划分的体素化空间中，均匀分割32x32x32的立方体，每个小立方体作为一个体素；每个小立方体定义为V_i,j,k，其中i，j,k代表平行于相互垂直的三个方向的立方体数量。S22. In the voxelized space divided by S21, the cubes of 32x32x32 are evenly divided, and each small cube is used as a voxel; each small cube is defined as V _i,j,k , where i, j, k represent parallel to each other The number of cubes in three vertical directions.

S23、对于数据集中包含1024个点的点云定义为P＝{p₀,p₁,…,p₁₀₂₃}，构造零矩阵M_{32×32×32×1024}，记录空间点云是否落在V_i,j,k；定义每个体素的体素值为单个小立方体空间中包含的点云RGB值的均值：S23. For the point cloud containing 1024 points in the dataset, define it as P={p ₀ , p ₁ ,...,p ₁₀₂₃ }, construct a zero matrix M _{32×32×32×1024} , and record whether the spatial point cloud falls on V _{i ,j,k} ; define the voxel value of each voxel as the mean of the RGB values of the point cloud contained in a single small cube space:

其中p_v表示点云中点的RGB值，M_i,j,k代表体素空间中的体素值。where p _v represents the RGB value of the point in the point cloud, and Mi _,j,k represent the voxel value in the voxel space.

作为一个实施例，在对二维图像数据中每个二维图像对应匹配的原始三维点云进行体素化处理之前还包括：对原始三维点云进行角度随机旋转，以使得到的三维体素具有旋转随机性。As an embodiment, before performing voxelization processing on the original 3D point cloud corresponding to each 2D image in the 2D image data, the method further includes: performing random angle rotation on the original 3D point cloud, so that the obtained 3D voxel Has rotational randomness.

作为一个具体实施例，对原始三维点云随机旋转一个角度(α,β,γ)，使步骤102得到的三维体素具有旋转随机性：As a specific embodiment, the original three-dimensional point cloud is randomly rotated by an angle (α, β, γ), so that the three-dimensional voxels obtained in step 102 have rotational randomness:

M＝R_x(α)R_y(β)R_z(γ)M=R _x (α)R _y (β)R _z (γ)

P′＝MPP'=MP

其中R表示以坐标轴为旋转轴的旋转变换，点云P经过完整变换M，得到点云P′。Among them, R represents the rotation transformation with the coordinate axis as the rotation axis, and the point cloud P is completely transformed by M to obtain the point cloud P′.

步骤103，将对应的体素投影到二维空间以生成每个二维图像对应匹配的点云多视角投影图像。Step 103: Project the corresponding voxels into a two-dimensional space to generate a corresponding matching point cloud multi-view projection image of each two-dimensional image.

也就是说，将步骤102得到的体素投影到二维空间中的xOy,yOz,xOz三个平面中，其中，三个平面相互垂直，投影后保存为如图4所示的64x64的图像格式。That is to say, the voxels obtained in step 102 are projected into three planes xOy, yOz, and xOz in a two-dimensional space, wherein the three planes are perpendicular to each other, and the projection is saved as a 64x64 image format as shown in Figure 4 .

步骤104，根据孪生网络构建深度学习模型，并将二维图像数据和与二维图像数据中每个二维图像对应匹配的点云多视角投影图像输入到深度学习模型进行训练。In step 104, a deep learning model is constructed according to the twin network, and the two-dimensional image data and the point cloud multi-view projection image corresponding to each two-dimensional image in the two-dimensional image data are input into the deep learning model for training.

也就是说，在场景中收集二维图片块和三维点云块，然后输入到构建的深度学习模型 2D3D-MVPNet提取共同的特征描述子，其中，深度学习模型由二维图片分支和三维点云构成，其中三维点云特别地采用了融合多视角视图的新型网络模型，2D3D-MVPNet网络结构图如图 3所示。That is to say, two-dimensional image blocks and three-dimensional point cloud blocks are collected in the scene, and then input to the constructed deep learning model 2D3D-MVPNet to extract common feature descriptors, in which the deep learning model consists of two-dimensional image branches and three-dimensional point clouds. The 3D point cloud adopts a new network model that integrates multi-view views, and the 2D3D-MVPNet network structure diagram is shown in Figure 3.

作为一个实施例，根据孪生网络构建深度学习模型，包括：采用深度学习孪生网络结构框架搭建，设计具有图像分支和点云分支的非对称结构双分支网络；其中，图像分支包括基于卷积网络的图像特征提取网络，以便通过图像特征提取网络处理二维图像数据，以得到对应的图像特征描述；其中，点云分支包括基于点云多视角投影的点云特征提取网络以及融合点云纹理特征和点云结构特征的融合网络，点云特征提取网络包含纹理感知器和结构感知器，以便通过纹理感知器和结构感知器对点云数据进行处理，以得到点云纹理特征和点云结构特征；融合点云纹理特征和点云结构特征的融合网络，同时接收点云纹理特征和点云结构特征，融合以得到对应的点云特征描述；局部特征损失函数设计，损失函数依照训练过程中采样的二维图像负样本和三维点云负样本，使图像特征描述和点云特征描述的距离拉近，使图像特征描述与三维点云负样本以及点云特征描述与二维图像负样本的距离疏远。As an embodiment, constructing a deep learning model according to a twin network includes: building a deep learning twin network structure framework, and designing an asymmetric structure double-branch network with image branches and point cloud branches; wherein, the image branch includes a convolutional network-based The image feature extraction network is used to process the two-dimensional image data through the image feature extraction network to obtain the corresponding image feature description; wherein, the point cloud branch includes a point cloud feature extraction network based on point cloud multi-view projection and fusion point cloud texture features and The fusion network of point cloud structure features, the point cloud feature extraction network includes texture perceptron and structure perceptron, so that point cloud data can be processed by texture perceptron and structure perceptron to obtain point cloud texture features and point cloud structure features; A fusion network that fuses point cloud texture features and point cloud structure features, simultaneously receives point cloud texture features and point cloud structure features, and fuses them to obtain the corresponding point cloud feature description; the local feature loss function is designed, and the loss function is based on the sampling in the training process. 2D image negative samples and 3D point cloud negative samples make the distance between image feature description and point cloud feature description closer, and make the distance between image feature description and 3D point cloud negative sample and point cloud feature description and 2D image negative sample distance away .

作为一个实施例，纹理感知器用于处理点云多视角投影图像，感知点云包含的纹理信息；纹理感知器包括卷积网络和特征融合函数。As an embodiment, the texture perceptron is used to process the multi-view projection image of the point cloud, and perceive the texture information contained in the point cloud; the texture perceptron includes a convolutional network and a feature fusion function.

作为一个实施例，卷积网络用于处理多角度投影得到n个长度固定d的一维特征描述；特征融合函数为对输入顺序不敏感的对称函数，n个固定长度d的一维特征描述由特征融合函数变换为一个固定长度d的一维特征描述t，将一个固定长度d的一维特征描述t作为点云纹理特征。As an embodiment, a convolutional network is used to process multi-angle projections to obtain n one-dimensional feature descriptions with a fixed length d; the feature fusion function is a symmetric function that is insensitive to the input order, and the n one-dimensional feature descriptions with a fixed length d are represented by The feature fusion function is transformed into a one-dimensional feature description t with a fixed length d, and a one-dimensional feature description t with a fixed length d is used as the point cloud texture feature.

作为一个实施例，结构感知器用于处理原始点云数据，感知原始点云数据包含的结构信息，其中，结构感知器采用基于PointNet网络结构的特征提取方法，得到一个固定长度 d的一维特征描述s，将一个固定长度d的一维特征描述s作为点云结构特征。As an embodiment, the structure perceptron is used to process the original point cloud data and perceive the structure information contained in the original point cloud data, wherein the structure perceptron adopts a feature extraction method based on the PointNet network structure to obtain a one-dimensional feature description with a fixed length d s, a one-dimensional feature description s with a fixed length d is used as the point cloud structure feature.

也就是说，如图2所示，深度学习模型的搭建，包括以下步骤：That is to say, as shown in Figure 2, the construction of a deep learning model includes the following steps:

S11、深度学习孪生网络结构框架搭建，设计具有图像分支和点云分支的非对称结构双分支网络。S11. The deep learning twin network structure framework is built, and an asymmetric structure dual-branch network with image branch and point cloud branch is designed.

S12、基于卷积网络的图像特征提取网络结构设计，通过图像特征提取网络处理二维图像数据，图像特征提取网络的网络参数为C(32,4,2)-BN–ReLU-C(64,4,2)-BN -ReLU-C(128,4,2)-BN-ReLU-C(256,4,2)-BN-ReLU-C(256,4,4)，注： C(n,k,s)表示带有n个滤波器，卷积核为k*k，步长为s的卷子网络参数，BN表示束正则化，RELU表示激活函数。最终，设计网络得到固定长度为256的一维特征描述p，即图像特征描述；S12. The network structure design of image feature extraction based on convolutional network, the two-dimensional image data is processed through the image feature extraction network, and the network parameters of the image feature extraction network are C(32,4,2)-BN-ReLU-C(64, 4,2)-BN-ReLU-C(128,4,2)-BN-ReLU-C(256,4,2)-BN-ReLU-C(256,4,4), Note: C(n, k, s) represents the volume network parameters with n filters, the convolution kernel is k*k, and the stride is s, BN represents bundle regularization, and RELU represents activation function. Finally, the design network obtains a one-dimensional feature description p with a fixed length of 256, that is, the image feature description;

S13、基于点云多视角投影的点云特征提取网络结构设计，点云特征提取网络包含纹理感知器和结构感知器：纹理感知器用于处理点云多视角投影视图，感知点云包含的纹理信息；它由卷积网络和特征融合函数组成，卷积网络参数与S12中的卷积网络相同，卷积网络处理多角度投影得到n个长度固定256的一维特征描述{f₁,f₂,…,f_n}；特征融合函数本质是对输入顺序不敏感的对称函数，在本例子中，方法选取sum函数作为特征融合函数，因此n个固定长度256的一维特征描述由特征融合函数变换为一个固定长度为256的一维特征描述t：S13. Network structure design of point cloud feature extraction based on point cloud multi-perspective projection. The point cloud feature extraction network includes a texture perceptron and a structure perceptron: the texture perceptron is used to process the multi-perspective projection view of the point cloud and perceive the texture information contained in the point cloud ; It consists of a convolutional network and a feature fusion function. The parameters of the convolutional network are the same as those of the convolutional network in S12. The convolutional network processes multi-angle projection to obtain n one-dimensional feature descriptions with a fixed length of 256 {f ₁ , f ₂ , ...,f _n }; The feature fusion function is essentially a symmetric function that is insensitive to the input order. In this example, the method selects the sum function as the feature fusion function, so n one-dimensional feature descriptions with a fixed length of 256 are transformed by the feature fusion function. Describe t for a one-dimensional feature of fixed length 256:

sum{f₁,f₂,...,f_n}＝tsum{f ₁ ,f ₂ ,...,f _n }=t

结构感知器用于处理原始点云数据，感知点云数据包含的结构信息；具体的，结构感知器采用基于PointNet网络结构的特征提取方法，得到一个固定长度为256的一维特征描述s。The structure perceptron is used to process the original point cloud data and perceive the structure information contained in the point cloud data; specifically, the structure perceptron adopts the feature extraction method based on the PointNet network structure to obtain a one-dimensional feature description s with a fixed length of 256.

S14、融合点云纹理特征和结构特征的融合网络结构设计；S13中，点云数据分别输入纹理感知器和结构感知器得到长度为256的纹理特征t和结构特征s；融合网络设计为2层的全连接层，其具体参数为FC(512,256)-ReLU-FC(256,256)，注：FC(p,q)代表将长度为p的一维向量通过神经网络映射为长度q的一维向量上；融合网络同时接收纹理特征t和结构特征s，连接t和s为长度为512的向量并输入全连接层中，得到一个固定长度 d的点云一维特征描述v，即点云特征描述；S14, fusion network structure design of fusion point cloud texture features and structural features; in S13, point cloud data are input into texture sensor and structure sensor respectively to obtain texture feature t and structural feature s of length 256; fusion network design is 2 layers The fully connected layer of , whose specific parameters are FC(512,256)-ReLU-FC(256,256), Note: FC(p,q) represents mapping a one-dimensional vector of length p to a one-dimensional vector of length q through a neural network ; The fusion network simultaneously receives texture feature t and structural feature s, connects t and s as a vector of length 512 and inputs it into the fully connected layer, and obtains a point cloud one-dimensional feature description v of fixed length d, that is, point cloud feature description;

S15、局部特征损失函数设计。S12和S14中分别得到图像特征描述p和点云特征描述v，损失函数依照训练过程中采样的负样本p_n和v_n，使特征p和v的距离拉近，特征p与v_n，以及特征v与p_n的距离疏远：S15. Design of local feature loss function. In S12 and S14, the image feature description p and the point cloud feature description v are obtained respectively. The loss function is based on the negative samples p _n and v _n sampled during the training process, so that the distance between the features p and v is shortened, and the features p and v _n , and The distance distance between feature v and p _n :

公式中，d(p_i,v_i)表示图像特征p_i和点云特征v_i之间的欧式距离，

和

分别表示与p_i最相近的非匹配点云特征和与v_i最相近的非匹配图像特征。In the formula, d( _pi , _vi ) represents the _Euclidean distance between the image feature _pi and the point cloud feature vi,

and

represent the non-matching point cloud features closest to _pi and the non-matching image features closest to _vi , respectively.

作为一个实施例，训练过程中以如图5所示的数据集作为训练数据，共280,000对二位图像-三维匹配对；基于此，训练提取共同特征描述的网络参数，该步骤具体通过以下步骤实现：As an embodiment, the data set shown in FIG. 5 is used as training data in the training process, with a total of 280,000 pairs of two-dimensional image-3D matching pairs; based on this, the training extracts network parameters described by common features, and this step is specifically performed through the following steps accomplish:

S41、训练参数设定，其中每一个batch定义大小为64，每次有64对的二维图像-三维点云匹配对输入网络。S41. Training parameter setting, wherein each batch is defined as having a size of 64, and 64 pairs of 2D image-3D point cloud matching pairs are input to the network each time.

S42、负样本采样策略。输出的图像特征描述p和点云特征描述v，采用难样本策略构造负样本，每次选取网络最难分辨的二维图片和三维网络作为负样本，输入的二维图片和三维点云作为正样本，构造三元损失。难样本采样如图6所示，在一束训练数据中，对于匹配对(p_i,v_j)寻找并定义

和

其分别表示与图像p_i最相近的非匹配点云特征和与v_i最相近的非匹配图像特征；然后比较

和

的大小，选择较小的样本作为匹配对(p_i,v_j)的负样本。S42, a negative sample sampling strategy. The output image feature description p and point cloud feature description v are used to construct negative samples using the difficult sample strategy. Each time, the two-dimensional images and three-dimensional networks that are the most difficult to distinguish are selected as negative samples, and the input two-dimensional images and 3D point clouds are used as positive samples. Sample, construct ternary loss. Difficult sample sampling is shown in Figure 6. In a bundle of training data, for matching pairs ( _{pi , v j} ₎ , find and define

and

which respectively represent the non-matching point cloud features closest to the image _pi and the non-matching image features closest to _vi ; then compare

and

, select the smaller sample as the negative sample of the matching pair ( _{pi ,v j} ₎ .

步骤105，获取多个待检索的二维图像和三维点云，将多个待检索的三维点云输入到训练好的深度学习模型中以得到点云特征描述，并将多个待检索的二维图像输入到训练好的深度学习模型中以得到图像特征描述，以及根据点云特征描述和图像特征描述检索每个待检索的二维图像在所有三维点云中最匹配的三维点云。Step 105: Acquire multiple 2D images and 3D point clouds to be retrieved, input multiple 3D point clouds to be retrieved into the trained deep learning model to obtain point cloud feature descriptions, and combine the multiple 2D images to be retrieved. The 2D image is input into the trained deep learning model to obtain the image feature description, and according to the point cloud feature description and the image feature description, each 2D image to be retrieved is the most matching 3D point cloud among all the 3D point clouds.

也就是说，训练完成的网络具有提取共同特征描述符的能力，通过检索任务量化其特征描述子的质量；检索结果如图7所示，具体步骤如下：That is to say, the trained network has the ability to extract common feature descriptors, and quantifies the quality of its feature descriptors through retrieval tasks; the retrieval results are shown in Figure 7, and the specific steps are as follows:

S51、选取效果最佳的S1构造的网络模型作为工程模型。S51. Select the network model constructed by S1 with the best effect as the engineering model.

S52、将所有的二维图片{P₁,P₂,P₃……P_n}，输入模型的图片分支得到二维图片特征 {p₁,p₂,p₃……p_n}。S52. Input all two-dimensional pictures {P ₁ , P ₂ , P ₃ ......P _n } into the picture branch of the model to obtain two-dimensional picture features {p ₁ , p ₂ , p ₃ ...... p _n }.

S53、将所有的三维点云{V₁,V₂,V₃……V_n}，输入模型的点云分支得到三维点云特征 {v₁,v₂,v₃……v_n}。S53. Input all the three-dimensional point clouds {V ₁ , V ₂ , V ₃ ......V _n } into the point cloud branches of the model to obtain the three-dimensional point cloud features {v ₁ , v ₂ , v ₃ ...... v _n }.

S54、对于单个二维图片特征p_i，向三维点云所有特征{v₁,v₂,v₃……v_n}检索最近的特征描述v_j。如果i等于j，对于TOP1检索准确率，属于一组成功的二维图片到三维点云的检索。如果j属于i的五个检索最近邻中的一个，对于TOP5检索准确率，属于一组成功的二维图片到三维点云的检索。检索成功的TOP1结果特征描述子可视化检索结果如图8所示；以TOP1和TOP5检索准确率作为判断标准，公式如下：S54. For a single two-dimensional image feature p _i , retrieve the nearest feature description v _j from all features {v ₁ , v ₂ , v ₃ ...... v _n } of the three-dimensional point cloud. If i is equal to j, for the TOP1 retrieval accuracy, it belongs to a group of successful retrievals from 2D images to 3D point clouds. If j belongs to one of the five retrieved nearest neighbors of i, for TOP5 retrieval accuracy, it belongs to a set of successful retrievals from 2D images to 3D point clouds. The visual retrieval results of the feature descriptors of the TOP1 results that have been successfully retrieved are shown in Figure 8. Taking the retrieval accuracy of TOP1 and TOP5 as the criterion, the formula is as follows:

其中TP表示成功检索样本数，TN表示失败检索样本数，n为样本总量。Among them, TP represents the number of successful retrieval samples, TN represents the number of failed retrieval samples, and n represents the total number of samples.

需要说明的是，本发明提出一种二维图像和三维点云的共同特征描述网络框架，为位姿估计提出方法思路；通过基于点云多角度投影的点云特征提取器，缩小点云数据在跨维匹配中与二维图片的数据差异，从而提高二维图片到三维点云的检索准确率；通过对称函数融合无序特征输入，解决了点云多投影技术的特征融合问题；利用大量数据训练二维三维共同特征描述，自动化的深度学习方法取代手工设计特征描述，节约人力成本，并提高了机器位姿估计的效率和准确性。It should be noted that the present invention proposes a common feature description network framework for two-dimensional images and three-dimensional point clouds, and proposes a method idea for pose estimation; the point cloud data is reduced by a point cloud feature extractor based on point cloud multi-angle projection. The data difference between the two-dimensional image and the two-dimensional image in the cross-dimensional matching improves the retrieval accuracy of the two-dimensional image to the three-dimensional point cloud; the unordered feature input is fused by the symmetric function, and the feature fusion problem of the point cloud multi-projection technology is solved; the use of a large number of Data training 2D and 3D common feature description, automatic deep learning method replaces manual design feature description, saves labor costs, and improves the efficiency and accuracy of machine pose estimation.

综上所述，根据本发明实施例的基于多视角投影的跨维数据检索方法，首先获取二维图像数据和与二维图像数据中每个二维图像对应匹配的原始三维点云；然后对二维图像数据中每个二维图像对应匹配的原始三维点云进行体素化处理，以得到对应的体素；接着将对应的体素投影到二维空间以生成每个二维图像对应匹配的点云多视角投影图像；再接着根据孪生网络构建深度学习模型，并将二维图像数据和与二维图像数据中每个二维图像对应匹配的点云多视角投影图像输入到深度学习模型进行训练；最后获取多个待检索的二维图像和三维点云，并基于训练好的深度学习模型对每个待检索的二维图像进行检索，以得到每个待检索的二维图像在所有三维点云中最匹配的三维点云；由此，通过多视角投影的点云处理方法，缩小点云数据在跨维匹配中与二维图像的数据差异，从而提高二维图像到三维点云的检索准确率。To sum up, according to the cross-dimensional data retrieval method based on multi-view projection of the embodiment of the present invention, firstly obtain the two-dimensional image data and the original three-dimensional point cloud corresponding to each two-dimensional image in the two-dimensional image data; Each two-dimensional image in the two-dimensional image data corresponds to the matched original three-dimensional point cloud for voxelization to obtain the corresponding voxels; then the corresponding voxels are projected into the two-dimensional space to generate the corresponding matching of each two-dimensional image point cloud multi-perspective projection image; then build a deep learning model according to the twin network, and input the two-dimensional image data and the point cloud multi-perspective projection image corresponding to each two-dimensional image in the two-dimensional image data into the deep learning model Carry out training; finally obtain multiple 2D images and 3D point clouds to be retrieved, and retrieve each 2D image to be retrieved based on the trained deep learning model, so as to obtain each 2D image to be retrieved in all The most matching 3D point cloud in the 3D point cloud; thus, through the point cloud processing method of multi-view projection, the data difference between the point cloud data and the 2D image in the cross-dimensional matching is reduced, thereby improving the 2D image to the 3D point cloud. retrieval accuracy.

另外，本发明实施例还提出了一种计算机可读存储介质，其上存储有基于多视角投影的跨维数据检索程序，该基于多视角投影的跨维数据检索程序被处理器执行时实现如上述的基于多视角投影的跨维数据检索方法。In addition, an embodiment of the present invention also provides a computer-readable storage medium on which a multi-view projection-based cross-dimensional data retrieval program is stored, and when the multi-view projection-based cross-dimensional data retrieval program is executed by a processor, the following The above-mentioned cross-dimensional data retrieval method based on multi-view projection.

另外，本发明实施例还提出了一种计算机设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述计算机程序时，实现如上述的基于多视角投影的跨维数据检索方法。In addition, an embodiment of the present invention also proposes a computer device, including a memory, a processor, and a computer program stored in the memory and running on the processor. When the processor executes the computer program, the above-mentioned computer program is implemented. A cross-dimensional data retrieval method based on multi-view projection.

本领域内的技术人员应明白，本发明的实施例可提供为方法、系统、或计算机程序产品。因此，本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和 /或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

应当注意的是，在权利要求中，不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的部件或步骤。位于部件之前的单词“一”或“一个”不排除存在多个这样的部件。本发明可以借助于包括有若干不同部件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中，这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It should be noted that, in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several different components and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. do not denote any order. These words can be interpreted as names.

尽管已描述了本发明的优选实施例，但本领域内的技术人员一旦得知了基本创造性概念，则可对这些实施例作出另外的变更和修改。所以，所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。Although the preferred embodiments of the present invention have been described, additional changes and modifications to these embodiments may occur to those skilled in the art once the basic inventive concepts are known. Therefore, the appended claims are intended to be construed to include the preferred embodiment and all changes and modifications that fall within the scope of the present invention.

显然，本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样，倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内，则本发明也意图包含这些改动和变型在内。It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit and scope of the invention. Thus, provided that these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

在本发明的描述中，需要理解的是，术语“第一”、“第二”仅用于描述目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本发明的描述中，“多个”的含义是两个或两个以上，除非另有明确具体的限定。In the description of the present invention, it should be understood that the terms "first" and "second" are only used for description purposes, and cannot be interpreted as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature defined as "first", "second" may expressly or implicitly include one or more of that feature. In the description of the present invention, "plurality" means two or more, unless otherwise expressly and specifically defined.

在本发明中，除非另有明确的规定和限定，术语“安装”、“相连”、“连接”、 “固定”等术语应做广义理解，例如，可以是固定连接，也可以是可拆卸连接，或成一体；可以是机械连接，也可以是电连接；可以是直接相连，也可以通过中间媒介间接相连，可以是两个元件内部的连通或两个元件的相互作用关系。对于本领域的普通技术人员而言，可以根据具体情况理解上述术语在本发明中的具体含义。In the present invention, unless otherwise expressly specified and limited, the terms "installation", "connection", "connection", "fixation" and other terms should be understood in a broad sense, for example, it may be a fixed connection or a detachable connection , or integrated; it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediate medium, and it can be the internal connection of the two elements or the interaction relationship between the two elements. For those of ordinary skill in the art, the specific meanings of the above terms in the present invention can be understood according to specific situations.

在本发明中，除非另有明确的规定和限定，第一特征在第二特征“上”或“下” 可以是第一和第二特征直接接触，或第一和第二特征通过中间媒介间接接触。而且，第一特征在第二特征“之上”、“上方”和“上面”可是第一特征在第二特征正上方或斜上方，或仅仅表示第一特征水平高度高于第二特征。第一特征在第二特征“之下”、“下方”和“下面”可以是第一特征在第二特征正下方或斜下方，或仅仅表示第一特征水平高度小于第二特征。In the present invention, unless otherwise expressly specified and limited, a first feature "on" or "under" a second feature may be in direct contact between the first and second features, or the first and second features indirectly through an intermediary touch. Also, a first feature being "above", "over" and "above" a second feature may mean that the first feature is directly above or obliquely above the second feature, or simply means that the first feature is level higher than the second feature. A first feature being "below", "below" and "below" a second feature may mean that the first feature is directly below or diagonally below the second feature, or simply means that the first feature is level less than the second feature.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、 “具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不应理解为必须针对的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外，在不相互矛盾的情况下，本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, description with reference to the terms "one embodiment," "some embodiments," "example," "specific example," or "some examples", etc., mean specific features described in connection with the embodiment or example , structure, material or feature is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms should not be construed as necessarily referring to the same embodiment or example. Furthermore, the particular features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art may combine and combine the different embodiments or examples described in this specification, as well as the features of the different embodiments or examples, without conflicting each other.

尽管上面已经示出和描述了本发明的实施例，可以理解的是，上述实施例是示例性的，不能理解为对本发明的限制，本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。Although the embodiments of the present invention have been shown and described above, it should be understood that the above-mentioned embodiments are exemplary and should not be construed as limiting the present invention. Embodiments are subject to variations, modifications, substitutions and variations.

Claims

1. A cross-dimensional data retrieval method based on multi-view projection is characterized by comprising the following steps:

acquiring two-dimensional image data and an original three-dimensional point cloud correspondingly matched with each two-dimensional image in the two-dimensional image data;

performing voxelization processing on the original three-dimensional point cloud correspondingly matched with each two-dimensional image in the two-dimensional image data to obtain corresponding voxels;

projecting the corresponding voxels to a two-dimensional space to generate a point cloud multi-view projection image corresponding to each two-dimensional image;

constructing a deep learning model according to a twin network, and inputting the two-dimensional image data and the point cloud multi-view projection image which is correspondingly matched with each two-dimensional image in the two-dimensional image data into the deep learning model for training;

the method comprises the steps of obtaining a plurality of two-dimensional images to be retrieved and three-dimensional point clouds, inputting the three-dimensional point clouds to be retrieved into a trained deep learning model to obtain point cloud feature description, inputting the two-dimensional images to be retrieved into the trained deep learning model to obtain image feature description, and retrieving the three-dimensional point clouds, which are most matched in all the three-dimensional point clouds, of each two-dimensional image to be retrieved according to the point cloud feature description and the image feature description.

2. The multi-view projection-based cross-dimensional data retrieval method of claim 1, wherein performing voxelization processing on the original three-dimensional point cloud corresponding to each two-dimensional image in the two-dimensional image data to obtain corresponding voxels comprises:

performing voxelization space division by taking a cubic boundary frame of the original three-dimensional point cloud as a boundary;

uniformly dividing a plurality of cubes in the divided voxelized space, and taking each cube as a voxel;

the voxel value of each voxel is defined as the mean of the point cloud RGB values contained in each cube space.

3. The multi-view projection-based cross-dimensional data retrieval method of claim 2, wherein before performing the voxelization processing on the original three-dimensional point cloud corresponding to each two-dimensional image in the two-dimensional image data, the method further comprises: and carrying out angle random rotation on the original three-dimensional point cloud so as to enable the obtained three-dimensional voxel to have rotation randomness.

4. The multi-view projection-based cross-dimensional data retrieval method of claim 2, wherein constructing a deep learning model from a twin network comprises:

constructing by adopting a deep learning twin network structure framework, and designing an asymmetric structure double-branch network with image branches and point cloud branches;

wherein the image branch comprises an image feature extraction network based on a convolution network, so that two-dimensional image data is processed through the image feature extraction network to obtain corresponding image feature description;

the point cloud branch comprises a point cloud characteristic extraction network based on point cloud multi-view projection and a fusion network fusing point cloud texture characteristics and point cloud structural characteristics, wherein the point cloud characteristic extraction network comprises a texture sensor and a structure sensor so as to process point cloud data through the texture sensor and the structure sensor to obtain point cloud texture characteristics and point cloud structural characteristics; fusing a fusion network of the point cloud texture features and the point cloud structural features, receiving the point cloud texture features and the point cloud structural features at the same time, and fusing to obtain corresponding point cloud characteristic descriptions;

and designing a local characteristic loss function, wherein the loss function enables the distance between the image characteristic description and the point cloud characteristic description to be shortened according to the two-dimensional image negative sample and the three-dimensional point cloud negative sample sampled in the training process, and enables the distance between the image characteristic description and the three-dimensional point cloud negative sample and the distance between the point cloud characteristic description and the two-dimensional image negative sample to be distant.

5. The multi-view projection-based cross-dimensional data retrieval method of claim 4, wherein the texture perceptron is configured to process the point cloud multi-view projection image and to perceive texture information contained in the point cloud; the texture perceptron includes a convolutional network and a feature fusion function.

6. The multi-view projection-based cross-dimensional data retrieval method of claim 5, wherein the convolutional network is used for processing multi-angle projection to obtain n one-dimensional feature descriptions with fixed length d; the feature fusion function is a symmetric function insensitive to an input sequence, the one-dimensional feature description of n fixed lengths d is converted into one-dimensional feature description t of one fixed length d by the feature fusion function, and the one-dimensional feature description t of one fixed length d is used as a point cloud texture feature.

7. The method as claimed in claim 5, wherein the structure sensor is configured to process original point cloud data and sense structure information contained in the original point cloud data, wherein the structure sensor employs a feature extraction method based on a PointNet network structure to obtain a one-dimensional feature description s with a fixed length d, and the one-dimensional feature description s with the fixed length d is used as the point cloud structure feature.

8. A computer-readable storage medium, on which a multi-view projection-based cross-dimensional data retrieval program is stored, which, when executed by a processor, implements the multi-view projection-based cross-dimensional data retrieval method according to any one of claims 1 to 7.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the computer program, implements the multi-perspective projection based cross-dimensional data retrieval method according to any one of claims 1-7.