CN114637880A - Cross-dimensional data retrieval method based on multi-view projection - Google Patents
Cross-dimensional data retrieval method based on multi-view projection Download PDFInfo
- Publication number
- CN114637880A CN114637880A CN202210151825.8A CN202210151825A CN114637880A CN 114637880 A CN114637880 A CN 114637880A CN 202210151825 A CN202210151825 A CN 202210151825A CN 114637880 A CN114637880 A CN 114637880A
- Authority
- CN
- China
- Prior art keywords
- dimensional
- point cloud
- image
- feature
- dimensional image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000013136 deep learning model Methods 0.000 claims abstract description 27
- 238000012545 processing Methods 0.000 claims abstract description 23
- 238000012549 training Methods 0.000 claims abstract description 17
- 230000008569 process Effects 0.000 claims abstract description 16
- 230000006870 function Effects 0.000 claims description 33
- 230000004927 fusion Effects 0.000 claims description 21
- 238000000605 extraction Methods 0.000 claims description 20
- 238000004590 computer program Methods 0.000 claims description 17
- 238000013135 deep learning Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 14
- 238000012986 modification Methods 0.000 description 6
- 230000004048 modification Effects 0.000 description 6
- 239000013598 vector Substances 0.000 description 4
- 102100024607 DNA topoisomerase 1 Human genes 0.000 description 3
- 101000830681 Homo sapiens DNA topoisomerase 1 Proteins 0.000 description 3
- 238000003672 processing method Methods 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 230000004075 alteration Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/5854—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using shape and object relationship
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/5862—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using texture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/06—Topological mapping of higher dimensional structures onto lower dimensional surfaces
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Library & Information Science (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a cross-dimensional data retrieval method based on multi-view projection, which comprises the following steps: acquiring two-dimensional image data and a corresponding matched original three-dimensional point cloud; carrying out voxelization processing on the corresponding matched original three-dimensional point cloud to obtain a corresponding voxel; projecting the corresponding voxels to a two-dimensional space to generate a point cloud multi-view projection image corresponding to each two-dimensional image; constructing a deep learning model according to the twin network, and inputting two-dimensional image data and the correspondingly matched point cloud multi-view projection image into the deep learning model for training; acquiring a plurality of two-dimensional images and three-dimensional point clouds to be retrieved, and retrieving the three-dimensional point clouds from the two-dimensional images based on the trained deep learning model to obtain the three-dimensional point clouds which are most matched in all the three-dimensional point clouds of each two-dimensional image to be retrieved; therefore, the data difference between the point cloud data and the two-dimensional image in the cross-dimensional matching process can be reduced, and the retrieval accuracy rate from the two-dimensional image to the three-dimensional point cloud is improved.
Description
Technical Field
The invention relates to the technical field of augmented reality, in particular to a multi-view projection-based cross-dimensional data retrieval method, a computer-readable storage medium and computer equipment.
Background
In the related technology, a pose estimation method based on retrieval is divided into two modes of retrieval from a two-dimensional image to a two-dimensional image and retrieval from the two-dimensional image to a three-dimensional model; the method comprises the steps of searching two-dimensional images from two-dimensional images, reconstructing massive two-dimensional images into a three-dimensional space in a three-dimensional mode, searching and matching target images and all images in the three-dimensional space, and estimating the pose of a camera through PnP, wherein the method is easily influenced by angles due to the limitation of the two-dimensional images and is difficult to apply to complex scenes; the retrieval from the two-dimensional image to the three-dimensional model is to match the two-dimensional image with the pre-established three-dimensional model, and the three-dimensional model has higher robustness because of rotation invariance; however, the research of cross-modal matching of the two-dimensional image and the three-dimensional point cloud is deficient, and different data have data dimension difference and data structure difference, so that cross-dimensional matching is difficult to complete, and the accuracy of cross-dimensional data retrieval is low.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the art described above. Therefore, an object of the present invention is to provide a multi-view projection-based cross-dimensional data retrieval method, which reduces the data difference between the point cloud data and the two-dimensional image in the cross-dimensional matching by using a multi-view projection point cloud processing method, thereby improving the accuracy of retrieving the two-dimensional image from the three-dimensional point cloud.
A second object of the invention is to propose a computer-readable storage medium.
A third object of the invention is to propose a computer device.
In order to achieve the above object, an embodiment of a first aspect of the present invention provides a cross-dimensional data retrieval method based on multi-view projection, including the following steps: acquiring two-dimensional image data and an original three-dimensional point cloud correspondingly matched with each two-dimensional image in the two-dimensional image data; performing voxelization processing on the original three-dimensional point cloud correspondingly matched with each two-dimensional image in the two-dimensional image data to obtain corresponding voxels; projecting the corresponding voxels to a two-dimensional space to generate a point cloud multi-view projection image corresponding to each two-dimensional image; constructing a deep learning model according to a twin network, and inputting the two-dimensional image data and the point cloud multi-view projection image which is correspondingly matched with each two-dimensional image in the two-dimensional image data into the deep learning model for training; the method comprises the steps of obtaining a plurality of two-dimensional images to be retrieved and three-dimensional point clouds, inputting the three-dimensional point clouds to be retrieved into a trained deep learning model to obtain point cloud feature description, inputting the two-dimensional images to be retrieved into the trained deep learning model to obtain image feature description, and retrieving the three-dimensional point clouds, which are most matched in all the three-dimensional point clouds, of each two-dimensional image to be retrieved according to the point cloud feature description and the image feature description.
According to the cross-dimensional data retrieval method based on multi-view projection, two-dimensional image data and an original three-dimensional point cloud which is correspondingly matched with each two-dimensional image in the two-dimensional image data are obtained; then carrying out voxelization processing on the original three-dimensional point cloud correspondingly matched with each two-dimensional image in the two-dimensional image data to obtain corresponding voxels; then projecting the corresponding voxels to a two-dimensional space to generate a point cloud multi-view projection image which is correspondingly matched with each two-dimensional image; then, a deep learning model is built according to the twin network, and the two-dimensional image data and the point cloud multi-view projection image which is correspondingly matched with each two-dimensional image in the two-dimensional image data are input into the deep learning model for training; finally, acquiring a plurality of two-dimensional images to be retrieved and three-dimensional point clouds, and retrieving each two-dimensional image to be retrieved based on the trained deep learning model to obtain the three-dimensional point cloud of each two-dimensional image to be retrieved, which is most matched with the three-dimensional point clouds; therefore, the data difference between the point cloud data and the two-dimensional image in the cross-dimensional matching is reduced through a multi-view projection point cloud processing method, and the retrieval accuracy rate from the two-dimensional image to the three-dimensional point cloud is improved.
In addition, the multi-view projection-based cross-dimensional data retrieval method proposed according to the above embodiment of the present invention may further have the following additional technical features:
optionally, performing voxelization processing on the original three-dimensional point cloud correspondingly matched with each two-dimensional image in the two-dimensional image data to obtain a corresponding voxel, including: performing voxelization space division by taking a cubic boundary frame of the original three-dimensional point cloud as a boundary; uniformly dividing a plurality of cubes in the divided voxelized space, and taking each cube as a voxel; the voxel value of each voxel is defined as the mean of the point cloud RGB values contained in each cube space.
Optionally, before performing the voxelization processing on the original three-dimensional point cloud corresponding to and matching with each two-dimensional image in the two-dimensional image data, the method further includes: and carrying out angle random rotation on the original three-dimensional point cloud so as to enable the obtained three-dimensional voxel to have rotational randomness.
Optionally, constructing a deep learning model from the twin network comprises:
constructing by adopting a deep learning twin network structure framework, and designing an asymmetric structure double-branch network with image branches and point cloud branches;
wherein the image branch comprises an image feature extraction network based on a convolution network, so that two-dimensional image data is processed by the image feature extraction network to obtain corresponding image feature description; the point cloud characteristic extraction network comprises a texture sensor and a structure sensor, so that point cloud data can be processed through the texture sensor and the structure sensor to obtain point cloud texture characteristics and point cloud structure characteristics; fusing a fusion network of the point cloud texture features and the point cloud structural features, receiving the point cloud texture features and the point cloud structural features at the same time, and fusing to obtain corresponding point cloud characteristic descriptions; and designing a local characteristic loss function, wherein the loss function enables the distance between the image characteristic description and the point cloud characteristic description to be shortened according to the two-dimensional image negative sample and the three-dimensional point cloud negative sample sampled in the training process, and enables the distance between the image characteristic description and the three-dimensional point cloud negative sample and the distance between the point cloud characteristic description and the two-dimensional image negative sample to be distant.
Optionally, the texture sensor is configured to process the point cloud multi-view projection image and sense texture information contained in the point cloud; the texture sensor comprises a convolution network and a feature fusion function.
Optionally, the convolutional network is configured to process multi-angle projection to obtain n one-dimensional feature descriptions with fixed lengths d; the feature fusion function is a symmetric function insensitive to an input sequence, n one-dimensional feature descriptions with fixed length d are converted into one-dimensional feature description t with fixed length d through a feature fusion function, and the one-dimensional feature description t with fixed length d is used as a point cloud texture feature.
Optionally, the structure sensor is configured to process original point cloud data and sense structure information included in the original point cloud data, wherein the structure sensor obtains a one-dimensional feature description s with a fixed length d by using a feature extraction method based on a PointNet network structure, and the one-dimensional feature description s with the fixed length d is used as a point cloud structure feature.
In order to achieve the above object, a second embodiment of the present invention provides a computer-readable storage medium, on which a multi-view projection-based cross-dimensional data retrieval program is stored, which when executed by a processor implements the multi-view projection-based cross-dimensional data retrieval method as described above.
According to the computer-readable storage medium of the embodiment of the invention, the cross-dimensional data retrieval program based on multi-view projection is stored, so that the processor realizes the cross-dimensional data retrieval method based on multi-view projection when executing the cross-dimensional data retrieval program based on multi-view projection, thereby improving the retrieval accuracy of the two-dimensional image to the three-dimensional point cloud.
In order to achieve the above object, a third embodiment of the present invention provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the multi-view projection-based cross-dimensional data retrieval method as described above.
According to the computer equipment provided by the embodiment of the invention, the computer program capable of running on the processor is stored through the memory, so that the processor can realize the cross-dimensional data retrieval method based on multi-view projection when executing the computer program, and the retrieval accuracy of the two-dimensional image to the three-dimensional point cloud is improved.
Drawings
FIG. 1 is a flow chart of a cross-dimensional data retrieval method based on multi-view projection according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a two-dimensional three-dimensional common feature description network architecture according to one embodiment of the present invention;
FIG. 3 is a schematic diagram of a network model structure of point cloud branching according to one embodiment of the invention;
FIG. 4 is a schematic view of a planar point cloud visualization and its projection according to one embodiment of the present invention;
FIG. 5 is a two-dimensional image-three-dimensional point cloud matched in pairs according to one embodiment of the invention;
FIG. 6 is a schematic diagram of a negative sample sampling strategy based on difficult samples according to one embodiment of the present invention;
FIG. 7 is a schematic diagram illustrating the search result from two-dimensional image to three-dimensional point cloud according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a two-dimensional three-dimensional common feature descriptor visualization according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be illustrative of the invention and are not to be construed as limiting the invention.
In order to better understand the above technical solutions, exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
In order to better understand the technical scheme, the technical scheme is described in detail in the following with reference to the figures of the specification and the specific embodiments.
Fig. 1 is a schematic flowchart of a cross-dimensional data retrieval method based on multi-view projection according to an embodiment of the present invention, and as shown in fig. 1, the cross-dimensional data retrieval method based on multi-view projection according to an embodiment of the present invention includes the following steps:
As an embodiment, the two-dimensional image data may be acquired by a camera, and the original three-dimensional point cloud may be acquired by a laser radar, which is not particularly limited in the present invention.
As an embodiment, performing voxelization processing on an original three-dimensional point cloud corresponding to and matching each two-dimensional image in two-dimensional image data to obtain a corresponding voxel includes: performing voxelization space division by taking a cubic boundary frame of the original three-dimensional point cloud as a boundary; uniformly dividing a plurality of cubes in the divided voxelized space, and taking each cube as a voxel; the voxel value of each voxel is defined as the mean of the point cloud RGB values contained in each cube space.
It should be noted that the three-dimensional point cloud data is voxelized to prepare for the previous stage of the subsequent projection.
As a specific embodiment, the method comprises the following steps:
and S21, dividing the voxelized space by taking the cubic boundary frame of the original three-dimensional point cloud as a boundary.
S22, uniformly dividing a 32x32x32 cube in the voxelization space divided by the S21, wherein each small cube is used as a voxel; each cube is defined as Vi,j,kWhere i, j, k represents the number of cubes parallel to three directions perpendicular to each other.
S23, P ═ P is defined for the point cloud containing 1024 points in the data set0,p1,…,p1023}, constructing a zero matrix M32×32×32×1024Recording whether the spatial point cloud falls on Vi,j,k(ii) a Defining the voxel value of each voxel as the mean value of the point cloud RGB values contained in a single small cube space:
wherein p isvRGB value, M, representing the midpoint of a point cloudi,j,kRepresenting the voxel values in the voxel space.
As an embodiment, before performing the voxel processing on the original three-dimensional point cloud corresponding to and matching with each two-dimensional image in the two-dimensional image data, the method further includes: and carrying out angle random rotation on the original three-dimensional point cloud so as to enable the obtained three-dimensional voxel to have rotational randomness.
As a specific embodiment, the original three-dimensional point cloud is randomly rotated by an angle (α, β, γ), so that the three-dimensional voxel obtained in step 102 has rotation randomness:
M=Rx(α)Ry(β)Rz(γ)
P′=MP
wherein R represents the rotation transformation taking coordinate axes as rotating axes, and the point cloud P is subjected to complete transformation M to obtain the point cloud P'.
And 103, projecting the corresponding voxels to a two-dimensional space to generate a point cloud multi-view projection image corresponding to each two-dimensional image.
That is, the voxels obtained in step 102 are projected onto three planes xOy, yOz, and xOz in a two-dimensional space, wherein the three planes are perpendicular to each other, and the three planes are stored in an image format of 64x64 as shown in fig. 4 after projection.
And 104, constructing a deep learning model according to the twin network, and inputting the two-dimensional image data and the point cloud multi-view projection image which is correspondingly matched with each two-dimensional image in the two-dimensional image data into the deep learning model for training.
That is, two-dimensional picture blocks and three-dimensional point cloud blocks are collected in a scene and then input into a built deep learning model 2D3D-MVPNet to extract a common feature descriptor, wherein the deep learning model is composed of two-dimensional picture branches and three-dimensional point cloud, the three-dimensional point cloud is particularly a novel network model fusing multi-view views, and a 2D3D-MVPNet network structure diagram is shown in FIG. 3.
As an embodiment, constructing a deep learning model from a twin network includes: adopting a deep learning twin network structure framework to build, and designing an asymmetric structure double-branch network with image branches and point cloud branches; the image branch comprises an image feature extraction network based on a convolution network, so that two-dimensional image data is processed through the image feature extraction network to obtain corresponding image feature description; the point cloud branch comprises a point cloud characteristic extraction network based on point cloud multi-view projection and a fusion network fusing point cloud texture characteristics and point cloud structure characteristics, wherein the point cloud characteristic extraction network comprises a texture sensor and a structure sensor so as to process point cloud data through the texture sensor and the structure sensor to obtain point cloud texture characteristics and point cloud structure characteristics; fusing a fusion network of the point cloud texture features and the point cloud structural features, receiving the point cloud texture features and the point cloud structural features, and fusing to obtain corresponding point cloud feature descriptions; and designing a local characteristic loss function, wherein the loss function enables the distance between the image characteristic description and the point cloud characteristic description to be shortened according to the two-dimensional image negative sample and the three-dimensional point cloud negative sample sampled in the training process, and enables the distance between the image characteristic description and the three-dimensional point cloud negative sample and the distance between the point cloud characteristic description and the two-dimensional image negative sample to be separated.
As one embodiment, the texture perceptron is used for processing the point cloud multi-view projection image and perceiving texture information contained in the point cloud; the texture perceptron includes a convolutional network and a feature fusion function.
As an embodiment, the convolutional network is used for processing multi-angle projection to obtain n one-dimensional feature descriptions with fixed lengths d; the feature fusion function is a symmetric function insensitive to an input sequence, n one-dimensional feature descriptions with fixed length d are converted into a one-dimensional feature description t with fixed length d through the feature fusion function, and the one-dimensional feature description t with fixed length d is used as a point cloud texture feature.
As an embodiment, the structure sensor is used for processing original point cloud data and sensing structure information contained in the original point cloud data, wherein the structure sensor obtains a one-dimensional feature description s with a fixed length d by adopting a feature extraction method based on a PointNet network structure, and the one-dimensional feature description s with the fixed length d is used as a point cloud structure feature.
That is, as shown in fig. 2, the building of the deep learning model includes the following steps:
s11, building a deep learning twin network structure framework, and designing an asymmetric structure double-branch network with image branches and point cloud branches.
S12, designing an image feature extraction network structure based on a convolutional network, processing two-dimensional image data through the image feature extraction network, wherein the network parameters of the image feature extraction network are C (32,4,2) -BN-ReLU-C (64,4,2) -BN-ReLU-C (128,4,2) -BN-ReLU-C (256,4,2) -BN-ReLU-C (256,4,4), and the following steps are taken: c (n, k, s) represents the parameters of the volume network with n filters, convolution kernel k, step size s, BN represents bundle regularization, RELU represents the activation function. Finally, designing a network to obtain a one-dimensional feature description p with the fixed length of 256, namely an image feature description;
s13, designing a point cloud feature extraction network structure based on point cloud multi-view projection, wherein the point cloud feature extraction network comprises a texture sensor and a structure sensor: the texture perceptron is used for processing the point cloud multi-view projection view and perceiving texture information contained in the point cloud; the method consists of a convolution network and a feature fusion function, wherein the parameters of the convolution network are the same as those of the convolution network in S12, and the convolution network processes multi-angle projection to obtain n one-dimensional feature descriptions { f } with fixed length of 2561,f2,…,fn}; the feature fusion function is essentially a symmetric function insensitive to the input sequence, in this example, the method selects sum function as the feature fusion function, so that n one-dimensional feature descriptions of fixed length 256 are transformed from the feature fusion function into one-dimensional feature description t of fixed length 256:
sum{f1,f2,...,fn}=t
the structure sensor is used for processing the original point cloud data and sensing the structure information contained in the point cloud data; specifically, the structure sensor adopts a feature extraction method based on a PointNet network structure to obtain a one-dimensional feature description s with the fixed length of 256.
S14, fusing the point cloud texture features and the structural features to form a fused network structure; in S13, the point cloud data are respectively input into a texture sensor and a structure sensor to obtain a texture feature t and a structure feature S with the length of 256; the fusion network is designed as a full connection layer of 2 layers, and the specific parameters are FC (512,256) -ReLU-FC (256 ), note that: FC (p, q) represents mapping the one-dimensional vector with the length p to the one-dimensional vector with the length q through a neural network; the fusion network receives texture features t and structural features s at the same time, connects t and s as vectors with the length of 512 and inputs the vectors into a full connection layer to obtain a point cloud one-dimensional feature description v with the fixed length d, namely point cloud feature description;
and S15, designing a local feature loss function. S12 and S14 respectively obtain an image feature description p and a point cloud feature description v, and a loss function is according to a negative sample p sampled in a training processnAnd vnThe distance between the features p and v is reducednAnd features v and pnDistance (2) is sparse:
in the formula, d (p)i,vi) Representing image features piAnd point cloud characteristics viThe euclidean distance between them,andrespectively represent and piNearest non-matching point cloud feature sum and viThe closest non-matching image features.
As an example, the data set shown in fig. 5 is used as training data in the training process, and 280,000 pairs of two-bit image-three-dimensional matching pairs are used in the training process; based on the above, training and extracting network parameters described by common features, the steps are specifically realized by the following steps:
and S41, setting training parameters, wherein the size of each batch is defined to be 64, and 64 pairs of two-dimensional image-three-dimensional point cloud matching pairs are input into the network each time.
And S42, negative sample sampling strategy. And (3) outputting an image feature description p and a point cloud feature description v, constructing a negative sample by adopting a difficult sample strategy, selecting a two-dimensional image and a three-dimensional network which are the most difficult to distinguish by the network as the negative sample each time, and constructing a ternary loss by taking the input two-dimensional image and three-dimensional point cloud as a positive sample. Hard sample sampling As shown in FIG. 6, in a bundle of training data, for a matching pair (p)i,vj) Find and defineAndrespectively representing images piNearest non-matching point cloud feature sum and viClosest non-matching image features; then compareAndof smaller samples are selected as matching pairs (p)i,vj) Negative examples of (3).
And 105, acquiring a plurality of two-dimensional images and three-dimensional point clouds to be retrieved, inputting the plurality of three-dimensional point clouds to be retrieved into a trained deep learning model to obtain point cloud feature description, inputting the plurality of two-dimensional images to be retrieved into the trained deep learning model to obtain image feature description, and retrieving the three-dimensional point clouds, which are most matched in all the three-dimensional point clouds, of each two-dimensional image to be retrieved according to the point cloud feature description and the image feature description.
That is, the trained network has the capability of extracting common feature descriptors, and the quality of the feature descriptors is quantified through a retrieval task; the search result is shown in fig. 7, and the specific steps are as follows:
and S51, selecting the network model with the best effect and the structure S1 as an engineering model.
S52, all the two-dimensional pictures { P1,P2,P3……PnGet two-dimensional picture characteristics { p } from the picture branches of the input model1,p2,p3……pn}。
S53, all three-dimensional point clouds { V1,V2,V3……VnAnd (6) inputting point cloud branches of the model to obtain three-dimensional point cloud characteristics (v)1,v2,v3……vn}。
S54, for a single two-dimensional picture feature piFrom all the features of the three-dimensional point cloud { v1,v2,v3……vnRetrieve the nearest feature description vj. If i is equal to j, for TOP1 search accuracy, belongs to a successful two-dimensional picture to three-dimensional point cloud search. If j belongs to one of the five search nearest neighbors of i, belongs to a successful two-dimensional picture to three-dimensional point cloud search for TOP5 search accuracy. TOP1 result feature descriptors of successful retrieval visualize the retrieval results as shown in fig. 8; taking the accuracy of TOP1 and TOP5 as judgment criteria, the formula is as follows:
wherein TP represents the number of successfully searched samples, TN represents the number of failed searched samples, and n is the total number of samples.
The invention provides a common feature description network framework of a two-dimensional image and a three-dimensional point cloud, and provides a method idea for posture estimation; the data difference between the point cloud data and a two-dimensional picture in cross-dimensional matching is reduced through a point cloud feature extractor based on point cloud multi-angle projection, so that the retrieval accuracy rate from the two-dimensional picture to a three-dimensional point cloud is improved; the problem of feature fusion of a point cloud multi-projection technology is solved by fusing disordered feature input through a symmetric function; a large amount of data are used for training two-dimensional and three-dimensional common feature description, an automatic deep learning method replaces manual design feature description, labor cost is saved, and efficiency and accuracy of machine pose estimation are improved.
In summary, according to the cross-dimensional data retrieval method based on multi-view projection of the embodiment of the present invention, first, two-dimensional image data and an original three-dimensional point cloud corresponding to each two-dimensional image in the two-dimensional image data are obtained; performing voxelization processing on the original three-dimensional point cloud correspondingly matched with each two-dimensional image in the two-dimensional image data to obtain corresponding voxels; then projecting the corresponding voxels to a two-dimensional space to generate a point cloud multi-view projection image which is correspondingly matched with each two-dimensional image; then, a deep learning model is constructed according to the twin network, and the two-dimensional image data and the point cloud multi-view projection image which is correspondingly matched with each two-dimensional image in the two-dimensional image data are input into the deep learning model for training; finally, a plurality of two-dimensional images and three-dimensional point clouds to be retrieved are obtained, and each two-dimensional image to be retrieved is retrieved based on the trained deep learning model so as to obtain the three-dimensional point cloud which is the best match of each two-dimensional image to be retrieved in all the three-dimensional point clouds; therefore, the data difference between the point cloud data and the two-dimensional image in the cross-dimensional matching process is reduced through a multi-view projection point cloud processing method, and the retrieval accuracy rate from the two-dimensional image to the three-dimensional point cloud is improved.
In addition, an embodiment of the present invention further provides a computer-readable storage medium, on which a cross-dimensional data retrieval program based on multi-view projection is stored, and when executed by a processor, the cross-dimensional data retrieval program based on multi-view projection implements the cross-dimensional data retrieval method based on multi-view projection as described above.
According to the computer-readable storage medium of the embodiment of the invention, the cross-dimensional data retrieval program based on multi-view projection is stored, so that the processor realizes the cross-dimensional data retrieval method based on multi-view projection when executing the cross-dimensional data retrieval program based on multi-view projection, thereby improving the retrieval accuracy of the two-dimensional image to the three-dimensional point cloud.
In addition, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the cross-dimensional data retrieval method based on multi-view projection as described above is implemented.
According to the computer equipment provided by the embodiment of the invention, the computer program capable of running on the processor is stored through the memory, so that the processor can realize the cross-dimensional data retrieval method based on multi-view projection when executing the computer program, and the retrieval accuracy of the two-dimensional image to the three-dimensional point cloud is improved.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
In the description of the present invention, it is to be understood that the terms "first", "second" and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically limited otherwise.
In the present invention, unless otherwise expressly specified or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected or detachably connected, or integrated; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or may be connected through the interconnection of two elements or through the interaction of two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "over," and "above" a second feature may be directly or diagonally above the second feature, or may simply mean that the first feature is at a higher level than the second feature. A first feature "under," "beneath," and "under" a second feature may be directly under or obliquely under the second feature, or may simply mean that the first feature is at a lesser elevation than the second feature.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the above-described terms are not to be understood as necessarily referring to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Moreover, various embodiments or examples described in this specification, as well as features of various embodiments or examples, may be combined and combined by those skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.
Claims (9)
1. A cross-dimensional data retrieval method based on multi-view projection is characterized by comprising the following steps:
acquiring two-dimensional image data and an original three-dimensional point cloud correspondingly matched with each two-dimensional image in the two-dimensional image data;
performing voxelization processing on the original three-dimensional point cloud correspondingly matched with each two-dimensional image in the two-dimensional image data to obtain corresponding voxels;
projecting the corresponding voxels to a two-dimensional space to generate a point cloud multi-view projection image corresponding to each two-dimensional image;
constructing a deep learning model according to a twin network, and inputting the two-dimensional image data and the point cloud multi-view projection image which is correspondingly matched with each two-dimensional image in the two-dimensional image data into the deep learning model for training;
the method comprises the steps of obtaining a plurality of two-dimensional images to be retrieved and three-dimensional point clouds, inputting the three-dimensional point clouds to be retrieved into a trained deep learning model to obtain point cloud feature description, inputting the two-dimensional images to be retrieved into the trained deep learning model to obtain image feature description, and retrieving the three-dimensional point clouds, which are most matched in all the three-dimensional point clouds, of each two-dimensional image to be retrieved according to the point cloud feature description and the image feature description.
2. The multi-view projection-based cross-dimensional data retrieval method of claim 1, wherein performing voxelization processing on the original three-dimensional point cloud corresponding to each two-dimensional image in the two-dimensional image data to obtain corresponding voxels comprises:
performing voxelization space division by taking a cubic boundary frame of the original three-dimensional point cloud as a boundary;
uniformly dividing a plurality of cubes in the divided voxelized space, and taking each cube as a voxel;
the voxel value of each voxel is defined as the mean of the point cloud RGB values contained in each cube space.
3. The multi-view projection-based cross-dimensional data retrieval method of claim 2, wherein before performing the voxelization processing on the original three-dimensional point cloud corresponding to each two-dimensional image in the two-dimensional image data, the method further comprises: and carrying out angle random rotation on the original three-dimensional point cloud so as to enable the obtained three-dimensional voxel to have rotation randomness.
4. The multi-view projection-based cross-dimensional data retrieval method of claim 2, wherein constructing a deep learning model from a twin network comprises:
constructing by adopting a deep learning twin network structure framework, and designing an asymmetric structure double-branch network with image branches and point cloud branches;
wherein the image branch comprises an image feature extraction network based on a convolution network, so that two-dimensional image data is processed through the image feature extraction network to obtain corresponding image feature description;
the point cloud branch comprises a point cloud characteristic extraction network based on point cloud multi-view projection and a fusion network fusing point cloud texture characteristics and point cloud structural characteristics, wherein the point cloud characteristic extraction network comprises a texture sensor and a structure sensor so as to process point cloud data through the texture sensor and the structure sensor to obtain point cloud texture characteristics and point cloud structural characteristics; fusing a fusion network of the point cloud texture features and the point cloud structural features, receiving the point cloud texture features and the point cloud structural features at the same time, and fusing to obtain corresponding point cloud characteristic descriptions;
and designing a local characteristic loss function, wherein the loss function enables the distance between the image characteristic description and the point cloud characteristic description to be shortened according to the two-dimensional image negative sample and the three-dimensional point cloud negative sample sampled in the training process, and enables the distance between the image characteristic description and the three-dimensional point cloud negative sample and the distance between the point cloud characteristic description and the two-dimensional image negative sample to be distant.
5. The multi-view projection-based cross-dimensional data retrieval method of claim 4, wherein the texture perceptron is configured to process the point cloud multi-view projection image and to perceive texture information contained in the point cloud; the texture perceptron includes a convolutional network and a feature fusion function.
6. The multi-view projection-based cross-dimensional data retrieval method of claim 5, wherein the convolutional network is used for processing multi-angle projection to obtain n one-dimensional feature descriptions with fixed length d; the feature fusion function is a symmetric function insensitive to an input sequence, the one-dimensional feature description of n fixed lengths d is converted into one-dimensional feature description t of one fixed length d by the feature fusion function, and the one-dimensional feature description t of one fixed length d is used as a point cloud texture feature.
7. The method as claimed in claim 5, wherein the structure sensor is configured to process original point cloud data and sense structure information contained in the original point cloud data, wherein the structure sensor employs a feature extraction method based on a PointNet network structure to obtain a one-dimensional feature description s with a fixed length d, and the one-dimensional feature description s with the fixed length d is used as the point cloud structure feature.
8. A computer-readable storage medium, on which a multi-view projection-based cross-dimensional data retrieval program is stored, which, when executed by a processor, implements the multi-view projection-based cross-dimensional data retrieval method according to any one of claims 1 to 7.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the computer program, implements the multi-perspective projection based cross-dimensional data retrieval method according to any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210151825.8A CN114637880B (en) | 2022-02-18 | 2022-02-18 | Cross-dimension data retrieval method based on multi-view projection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210151825.8A CN114637880B (en) | 2022-02-18 | 2022-02-18 | Cross-dimension data retrieval method based on multi-view projection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114637880A true CN114637880A (en) | 2022-06-17 |
CN114637880B CN114637880B (en) | 2024-07-19 |
Family
ID=81946603
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210151825.8A Active CN114637880B (en) | 2022-02-18 | 2022-02-18 | Cross-dimension data retrieval method based on multi-view projection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114637880B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117197063A (en) * | 2023-08-30 | 2023-12-08 | 深圳职业技术学院 | Point cloud quality evaluation method based on multi-view projection and transducer model and related products |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170213093A1 (en) * | 2016-01-27 | 2017-07-27 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and apparatus for detecting vehicle contour based on point cloud data |
CN108765475A (en) * | 2018-05-25 | 2018-11-06 | 厦门大学 | A kind of building three-dimensional point cloud method for registering based on deep learning |
CN111243094A (en) * | 2020-01-09 | 2020-06-05 | 南京理工大学 | Three-dimensional model accurate voxelization method based on lighting method |
CN112037138A (en) * | 2020-07-29 | 2020-12-04 | 大连理工大学 | Method for completing cloud scene semantics of single depth map point |
CN112766229A (en) * | 2021-02-08 | 2021-05-07 | 南京林业大学 | Human face point cloud image intelligent identification system and method based on attention mechanism |
CN113052066A (en) * | 2021-03-24 | 2021-06-29 | 中国科学技术大学 | Multi-mode fusion method based on multi-view and image segmentation in three-dimensional target detection |
CN113628329A (en) * | 2021-08-20 | 2021-11-09 | 天津大学 | Zero-sample sketch three-dimensional point cloud retrieval method |
-
2022
- 2022-02-18 CN CN202210151825.8A patent/CN114637880B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170213093A1 (en) * | 2016-01-27 | 2017-07-27 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and apparatus for detecting vehicle contour based on point cloud data |
CN108765475A (en) * | 2018-05-25 | 2018-11-06 | 厦门大学 | A kind of building three-dimensional point cloud method for registering based on deep learning |
CN111243094A (en) * | 2020-01-09 | 2020-06-05 | 南京理工大学 | Three-dimensional model accurate voxelization method based on lighting method |
CN112037138A (en) * | 2020-07-29 | 2020-12-04 | 大连理工大学 | Method for completing cloud scene semantics of single depth map point |
CN112766229A (en) * | 2021-02-08 | 2021-05-07 | 南京林业大学 | Human face point cloud image intelligent identification system and method based on attention mechanism |
CN113052066A (en) * | 2021-03-24 | 2021-06-29 | 中国科学技术大学 | Multi-mode fusion method based on multi-view and image segmentation in three-dimensional target detection |
CN113628329A (en) * | 2021-08-20 | 2021-11-09 | 天津大学 | Zero-sample sketch three-dimensional point cloud retrieval method |
Non-Patent Citations (2)
Title |
---|
华顺刚;李春泽;: "基于深度学习方法的三维模型相似度计算", 机电工程技术, no. 09, 20 September 2020 (2020-09-20) * |
梁振斌;熊风光;韩燮;陶谦;: "基于深度学习的点云匹配", 计算机工程与设计, no. 06, 15 June 2020 (2020-06-15) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117197063A (en) * | 2023-08-30 | 2023-12-08 | 深圳职业技术学院 | Point cloud quality evaluation method based on multi-view projection and transducer model and related products |
Also Published As
Publication number | Publication date |
---|---|
CN114637880B (en) | 2024-07-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Dual-resolution correspondence networks | |
Laskar et al. | Camera relocalization by computing pairwise relative poses using convolutional neural network | |
EP3279803B1 (en) | Picture display method and device | |
Snavely | Scene reconstruction and visualization from internet photo collections: A survey | |
Agarwal et al. | Building rome in a day | |
Hartmann et al. | Recent developments in large-scale tie-point matching | |
Agarwal et al. | Reconstructing rome | |
Sun et al. | A dataset for benchmarking image-based localization | |
CN111260794B (en) | Outdoor augmented reality application method based on cross-source image matching | |
Santos et al. | 3D plant modeling: localization, mapping and segmentation for plant phenotyping using a single hand-held camera | |
CN109842811B (en) | Method and device for implanting push information into video and electronic equipment | |
WO2022126529A1 (en) | Positioning method and device, and unmanned aerial vehicle and storage medium | |
Avraham et al. | Nerfels: renderable neural codes for improved camera pose estimation | |
CN108734773A (en) | A kind of three-dimensional rebuilding method and system for mixing picture | |
CN114241141B (en) | Smooth object three-dimensional reconstruction method and device, computer equipment and storage medium | |
Nousias et al. | A saliency aware CNN-based 3D model simplification and compression framework for remote inspection of heritage sites | |
CN114637880B (en) | Cross-dimension data retrieval method based on multi-view projection | |
CN111340889A (en) | Method for automatically acquiring matched image block and point cloud ball based on vehicle-mounted laser scanning | |
CN113298871B (en) | Map generation method, positioning method, system thereof, and computer-readable storage medium | |
Pultar | Improving the hardnet descriptor | |
Imre et al. | Calibration of nodal and free-moving cameras in dynamic scenes for post-production | |
Boin et al. | Efficient panorama database indexing for indoor localization | |
Skuratovskyi et al. | Outdoor mapping framework: from images to 3d model | |
Munoz-Silva et al. | A Survey on Point Cloud Generation for 3D Scene Reconstruction | |
Vincent et al. | RECONSTRUCTION OF 3D MODEL FROM 2D SURVEILLANCE IMAGES |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |