CN115409931A

CN115409931A - Three-dimensional reconstruction method based on image and point cloud data fusion

Info

Publication number: CN115409931A
Application number: CN202211342750.8A
Authority: CN
Inventors: 李骏; 李想; 杨苏; 周方明
Original assignee: Suzhou Lichuang Zhiheng Electronic Technology Co ltd
Current assignee: Suzhou Lichuang Zhiheng Electronic Technology Co ltd
Priority date: 2022-10-31
Filing date: 2022-10-31
Publication date: 2022-11-29
Anticipated expiration: 2042-10-31
Also published as: CN115409931B

Abstract

The application provides a three-dimensional reconstruction method based on image and point cloud data fusion, and relates to the field of computer vision and computer graphics. The three-dimensional reconstruction method obtains a panoramic point cloud of a measured object through point cloud registration and fusion; and then, according to the corresponding image data, performing saliency feature extraction and multi-scale aggregation feature extraction to obtain a saliency feature vector and a multi-scale aggregation feature vector of each point in the panoramic point cloud, and performing point-based volume rendering by using a nerve radiation field to obtain a three-dimensional model with near-real color and texture information.

Description

Three-dimensional reconstruction method based on image and point cloud data fusion

Technical Field

The application relates to the field of computer vision and computer graphics, in particular to a three-dimensional reconstruction method based on image and point cloud data fusion.

Background

The cost of manually building a three-dimensional model is high, and this work not only requires great expertise, but also is time-consuming. In virtual reality, a large number of three-dimensional models of characters, objects, scenes, and the like with high geometric accuracy and complex colors and textures are required, so that the three-dimensional reconstruction technology plays a very critical role in AR, VR, and the metas. How fast and high quality reconstruction or generation of three-dimensional models is a key technology for computer vision and computer graphics.

The point cloud is a set of data points that are measurements of the surface of the inspected object by the three-dimensional measuring device. At present, with the more and more convenient acquisition mode of point cloud data, the point cloud further becomes a very important three-dimensional data form. And the multi-view point cloud registration and fusion are carried out by utilizing a deep learning technology, so that a geometric model of a scene can be quickly and accurately reconstructed.

The current three-dimensional reconstruction technology based on point cloud data focuses on the reconstruction of a three-dimensional geometric structure, and usually comprises the following steps: point cloud data acquisition, point cloud pretreatment, point cloud registration and fusion, and three-dimensional surface generation. After point cloud registration and fusion, an original three-dimensional model is obtained, the three-dimensional model at the moment is formed by a batch of discrete points, and three-dimensional surface generation is to enable the surface of a three-dimensional object to be formed by a plurality of planes, namely to be in a continuous state at the surface. The above three-dimensional reconstruction steps realize the geometric reconstruction of the three-dimensional object or scene, but the reconstructed three-dimensional model lacks texture and color information, so that the reconstruction result is not true enough.

Disclosure of Invention

In order to solve the problem that a three-dimensional model obtained by the existing three-dimensional reconstruction method lacks texture and color information, and the reconstruction result is not real enough, the application provides a three-dimensional reconstruction method based on image and point cloud data fusion, a terminal device and a computer readable storage medium.

The application provides a three-dimensional reconstruction method based on image and point cloud data fusion, which comprises the following steps:

acquiring a point cloud sequence and an image sequence of a measured object, wherein the point cloud sequence of the measured object comprises a plurality of sequentially adjacent point cloud data of the measured object, and the point cloud sequence covers a panoramic area of the measured object; the image sequence comprises a plurality of image data, and the image data respectively correspond to the point cloud data one by one;

registering and fusing a plurality of point cloud data in the point cloud sequence to obtain a panoramic point cloud of the measured object;

respectively extracting salient features and describing multi-scale aggregation features for a plurality of image data in an image sequence to obtain a salient feature vector and a multi-scale aggregation feature vector corresponding to each point in the panoramic point cloud;

calculating by using a first full-connection network according to the position information of the target point, the position information of the point to be observed and the multi-scale aggregation characteristic vector of the target point to obtain an observation characteristic vector of the target point relative to the point to be observed, wherein the target point is any point except the point to be observed in the panoramic point cloud;

performing polymerization calculation according to observation characteristic vectors of k points closest to the point to be observed relative to the point to be observed and the saliency characteristic vectors to obtain an appearance description vector of the point to be observed;

calculating the observation characteristic vector of the target point relative to the point to be observed by using a second fully-connected network to obtain an observation density vector of the target point relative to the point to be observed;

performing polymerization calculation according to observation density vectors of k points closest to the point to be observed relative to the point to be observed and the significant characteristic vectors to obtain bulk density information of the point to be observed;

performing position coding calculation according to the position information of the observation sampling point and the position information of the point to be observed to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point;

and calculating by using a third full-connection network according to the high-dimensional position vector of the point to be observed and the appearance description vector of the point to be observed to obtain the radiation information of the point to be observed relative to the observation sampling point.

In some embodiments, calculating, using the first fully-connected network, according to the position information of the target point, the position information of the point to be observed, and the multi-scale aggregated feature vector of the target point, to obtain an observed feature vector of the target point relative to the point to be observed, includes:

subtracting the position information of the target point from the position information of the point to be observed to obtain the relative position information of the target point relative to the point to be observed;

splicing the relative position information of the target point relative to the point to be observed and the multi-scale aggregation characteristic vector of the target point to obtain a spliced multi-scale aggregation characteristic vector;

and calculating the spliced multi-scale aggregation characteristic vector by using a first full-connection network to obtain an observation characteristic vector of the target point relative to the point to be observed.

In some embodiments, the appearance description vector of the point to be observed is obtained by performing an aggregation calculation according to the following formula:

wherein fx represents an appearance description vector of the point to be observed,

i denotes the ith target point, ai denotes the salient feature vector corresponding to the ith target point,

wherein, in the step (A),

is the position information of the ith target point, x is the position information of the point to be observed,

representing an observation feature vector of the ith target point relative to the point to be observed;

performing polymerization calculation according to the following formula to obtain the bulk density information of the point to be observed:

wherein the content of the first and second substances,

representing the bulk density information of the point to be observed,

representing an observation density vector of the ith target point relative to the point to be observed.

In some embodiments, performing a position coding calculation according to the position information of the observation sampling point and the position information of the point to be observed to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point includes:

subtracting the position information of the point to be observed from the position information of the observation sampling point to obtain the relative position information of the point to be observed relative to the observation sampling point;

and mapping the relative position information of the point to be observed relative to the observation sampling point into a 32-dimensional space to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point.

In some embodiments, saliency extraction is performed on image data, including:

carrying out multi-scale feature extraction on the image data by using a multi-scale feature extraction convolution network to obtain a first-level feature map, a second-level feature map and a third-level feature map, wherein the number of channels of the first-level feature map is 8, the number of channels of the second-level feature map is 16, and the number of channels of the third-level feature map is 32;

processing the first-level feature map, the second-level feature map and the third-level feature map by using a saliency extraction network to obtain a first intermediate feature map, a second intermediate feature map and a third intermediate feature map correspondingly, wherein the number of output channels of the saliency extraction network is 1;

and multiplying the first intermediate feature map, the second intermediate feature map and the third intermediate feature map by corresponding significance weights respectively, and then adding the two to obtain the significance feature map of the image data.

In some embodiments, multi-scale aggregate characterization of image data includes:

multiplying the first-level feature map, the second-level feature map and the third-level feature map by corresponding aggregation weights respectively to obtain a first multi-scale feature map, a second multi-scale feature map and a third multi-scale feature map;

and stacking the first multi-scale feature map, the second multi-scale feature map and the third multi-scale feature map according to the channel dimension to obtain a multi-scale aggregation feature map of the image data.

In some embodiments, a plurality of point cloud data in the point cloud sequence are registered and fused to obtain a panoramic point cloud of the measured object; the method comprises the following steps:

sequentially registering two adjacent point cloud data in the point cloud sequence to obtain a rotation matrix and a translation vector corresponding to the two adjacent point cloud data;

sequentially fusing two adjacent point cloud data according to the rotation matrix and the translation vector corresponding to the two adjacent point cloud data to obtain a new point cloud sequence;

taking the new point cloud sequence as the point cloud sequence of the measured object, repeating the process of obtaining the new point cloud sequence, and guiding the number of point cloud data contained in the new point cloud sequence to be 1;

and obtaining the panoramic point cloud of the measured object.

In some embodiments, registering two adjacent point cloud data in the point cloud sequence in sequence to obtain a translation vector of a rotation matrix corresponding to the two adjacent point cloud data includes:

obtaining a first initial geometric feature and a second initial geometric feature by using a point cloud encoder based on FCGF, wherein the first initial geometric feature corresponds to one of two adjacent point cloud data, and the second initial geometric feature corresponds to the other of the two adjacent point cloud data;

obtaining a first target geometric feature corresponding to the first initial geometric feature and a second target geometric feature corresponding to the second initial geometric feature by using a point cloud decoder based on FCGF;

and obtaining a rotation matrix and a translation vector of the first target geometric feature and the second target geometric feature by using a Ransac algorithm.

A second aspect of the present application provides a terminal apparatus, comprising: at least one processor and memory;

a memory for storing program instructions;

and a processor for calling and executing the program instructions stored in the memory to make the terminal device execute the three-dimensional reconstruction method provided by the first aspect of the present application.

A third aspect of the present application is a computer-readable storage medium,

the computer-readable storage medium has stored therein instructions, which when run on a computer, cause the computer to perform the three-dimensional reconstruction method provided in the first aspect of the present application.

The application provides a three-dimensional reconstruction method based on image and point cloud data fusion, which comprises the following steps: acquiring a point cloud sequence and an image sequence of a measured object; registering and fusing the three-dimensional point cloud data to obtain a panoramic point cloud of the measured object; obtaining a salient feature vector and a multi-scale aggregation feature vector corresponding to each point according to two-dimensional image data; obtaining an observation characteristic vector of the target point relative to the point to be observed according to the position information of the point to be observed, the position information of the target point and the multi-scale aggregation characteristic vector; aggregating observation characteristic vectors and significance characteristic vectors of the nearest k points relative to the point to be observed to obtain an appearance description vector and volume density information of the point to be observed; and obtaining radiation information of the point to be observed relative to the observation sampling point according to the appearance description vector and the position information of the point to be observed and the position information of the observation sampling point. According to the three-dimensional reconstruction method, an initial three-dimensional model is generated through point cloud registration and fusion, then salient feature extraction and multi-scale aggregation feature extraction are carried out according to image data, salient feature vectors and multi-scale aggregation feature vectors of all points in the panoramic point cloud are obtained, point-based volume rendering is carried out according to a nerve radiation field, and a three-dimensional model with near-real color and texture information is obtained.

Drawings

Fig. 1 is a schematic work flow diagram of a three-dimensional reconstruction method based on image and point cloud data fusion according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram illustrating a process of obtaining a coordinate transformation relationship between two adjacent point clouds;

fig. 3 is a schematic diagram illustrating a process of acquiring salient feature images and multi-scale aggregation feature images of image data.

Detailed Description

In order to solve the problem that a three-dimensional model obtained by the conventional three-dimensional reconstruction method lacks texture and color information, so that a reconstruction result is not real enough, the application provides a three-dimensional reconstruction method based on image and point cloud data fusion through the following embodiments.

Referring to fig. 1, a three-dimensional reconstruction method based on image and point cloud data fusion provided by the embodiment of the present application includes steps 101 to 109.

101, acquiring a point cloud sequence and an image sequence of a measured object, wherein the point cloud sequence of the measured object comprises a plurality of sequentially adjacent point cloud data of the measured object, and the point cloud sequence covers a panoramic area of the measured object; the image sequence comprises a plurality of image data, and the image data respectively correspond to the point cloud data one by one.

Selecting a measured object (an object or a scene) to be modeled, and acquiring multi-view sequence point cloud data and color image data at the same position and direction of the measured object by adopting structured light or other methods. The data collected is required to cover the entire surface of the object or scene.

And 102, registering and fusing a plurality of point cloud data in the point cloud sequence to obtain a panoramic point cloud of the measured object.

In the field, the image data and the point cloud data are in one-to-one correspondence respectively, that is, the image data and the corresponding point cloud data are obtained under the same visual angle, and the position information of the image data and the corresponding point cloud data are corresponding to each other; the position information of the image data is subjected to back projection to obtain the position information of the corresponding point cloud data, and correspondingly, the position information of the point cloud data is subjected to projection to obtain the two-dimensional position information of the corresponding image data. Therefore, the point cloud data are subjected to registration and fusion processing to obtain the panoramic point cloud of the measured object, and meanwhile, the coordinate registration and fusion relation among the image data can be determined.

The point cloud registration refers to finding a rotation matrix and a translation vector between two point clouds, and the point cloud fusion instruction fuses the two point clouds into a new point cloud according to the rotation matrix and the translation vector. In some embodiments, the step 102 includes steps 201-204.

Referring to fig. 2, a schematic diagram of a process of obtaining a coordinate transformation relationship between two adjacent point clouds is shown as an example.

Step 201, registering two adjacent point cloud data in the point cloud sequence in sequence to obtain a translation vector of a rotation matrix corresponding to the two adjacent point cloud data.

And step 202, sequentially fusing the two adjacent point cloud data according to the rotation matrix and the translation vector corresponding to the two adjacent point cloud data to obtain a new point cloud sequence.

And 203, taking the new point cloud sequence as the point cloud sequence of the measured object, repeating the process of obtaining the new point cloud sequence, and guiding the number of point cloud data contained in the new point cloud sequence to be 1.

And 204, obtaining a panoramic point cloud of the measured object.

Illustratively, for n (n >2, which is exemplified by n = 6) point clouds from multiple perspectives of an object or scene to be modeled, two adjacent point clouds are continuously registered and merged using the method provided in the above steps 201-204. The specific operation is as follows: inputting the 1 st point cloud and the 2 nd point cloud to a pairwise point cloud registration network to obtain a coordinate transformation relation (namely a rotation matrix and a translation vector) between the point clouds, merging the point clouds into the 1 st point cloud of a new point cloud sequence by using the relation, registering and merging the 3 rd point cloud and the 4 th point cloud into the 2 nd point cloud of the new point cloud sequence, and so on until all the point clouds are merged into 3 new point clouds. And continuously performing the registration and fusion of the two point clouds on the 3 point clouds until all the point clouds are registered and merged into a complete panoramic point cloud, and finishing the registration and fusion. The resulting panoramic point cloud is an initial three-dimensional model composed of discrete points.

In order to ensure the registration progress, a deep learning mode can be adopted to obtain a rotation matrix and a translation vector corresponding to two adjacent point cloud data. As such, in some embodiments, the step 201 includes steps 301-303.

Step 301, using a FCGF (full-volume Geometric Features) -based point cloud encoder, obtaining a first initial Geometric feature and a second initial Geometric feature, where the first initial Geometric feature corresponds to one of the two adjacent point cloud data, and the second initial Geometric feature corresponds to the other of the two adjacent point cloud data.

Step 302, using a FCGF-based point cloud decoder, obtaining the first target geometric feature corresponding to the first initial geometric feature and the second target geometric feature corresponding to the second initial geometric feature.

Step 303, obtaining a rotation matrix and a translation vector of the first target geometric feature and the second target geometric feature by using a ranac algorithm.

In order to clearly understand the method for acquiring the rotation matrix and the translation vector corresponding to the two point cloud data provided in these embodiments, the method provided in steps 301 to 303 in these embodiments is described below by way of an example.

And acquiring a first point cloud X and a second point cloud Y. The point number of the point cloud X is n, and the point number of the point cloud Y is m. Corresponding to the previous step 201, the first point cloud X is one of two adjacent point cloud data, and the second point cloud Y is the other of the two adjacent point cloud data.

Extracting large local context information of input point clouds X and Y by using a 3D convolution layer with convolution kernel of 7 multiplied by 7 contained in a point cloud encoder based on FCGF (fuzzy C-F) to obtain point cloud characteristics

，

. Then, aggregating richer local context information by using three layers of stride convolutional layers with residual blocks; the specific process is as follows:

for the first level, point cloud features

，

Obtaining the characteristics by passing through 3D convolutional layers with two layers of convolutional kernels of 3 multiplied by 3, step length of 1 and 2 respectively and channel number of 32 and 64 respectively

And

wherein, in the process,

and

the point number of the point clouds is n/2 and m/2 respectively, and the number of the characteristic channels is 64. Then, the residual block convolution layer of the first layer is processed to obtain the characteristics

And

。

for the second level, will

And

inputting the second layer of the point cloud coder based on the FCGF network, and obtaining the characteristics after passing through a 3D convolution layer with convolution kernel of 3 multiplied by 3, step length of 2 and channel number of 128

And

，

and

the point number of the point clouds is n/4 and m/4 respectively, and the number of the characteristic channels is 128. Then obtaining the characteristics after the residual block convolution layer of the second layer

And

。

for the third level, will

And

inputting the third layer of the point cloud encoder based on the FCGF network, and passing through a layer of convolution kernel with the convolution kernel of 3 multiplied by 3, the step length of 2 and the channel number of 256 after 3D convolution of the layers, the features were obtained

And

，

and

the point number of the point clouds is n/8 and m/8 respectively, and the number of the characteristic channels is 256. Then, after the residual block convolution layer of the second layer, the first initial geometric characteristic is obtained

And a second initial geometric characteristic

。

After the point cloud encoder based on the FCGF network, the point cloud characteristics of the first point cloud X and the second point cloud Y are respectively the first initial geometric characteristics

And a second initial geometric characteristic

. In this embodiment, a point cloud decoder based on the FCGF network is used to perform feature upsampling, which is divided into three layers in total, and the specific process is as follows:

for the first level, respectively inputting a first enhanced self-attention feature

And a second enhanced self-attention feature

3 multiplied by 3 after passing through a layer of convolution kernel, the step length is respectively 2, and the output is obtainedThe 3D upsampling convolutional layer with the channel number of 128 is processed by the residual block convolutional layer with the output channel number of 128 of the first layer to obtain the characteristics

And

。

for the second level, will

And

after splicing, and

and

the spliced features are respectively input into a second layer of the point cloud decoder, pass through a 3D up-sampling convolutional layer with a convolutional kernel of 3 multiplied by 3, a step length of 2 and an output channel number of 64, and then pass through the residual block convolutional layer of the second layer to obtain the features

And

。

for the third level, will

And

after splicing, and

and

the spliced features are respectively input into the third layer of the point cloud decoder, and are subjected to 3D up-sampling convolution layer with a convolution kernel of 3 multiplied by 3, a step length of 2 and 64 output channels to obtain the features

And with

。

Finally, the process is carried out in a batch,

and

respectively passing through a layer of 3D convolution layers with convolution kernels of 1 multiplied by 1 and output channel number of 32 to obtain the final first target geometric characteristics of the point clouds X and Y

And a second target geometry

。

In the embodiment, a ranaca algorithm is used for finding a coordinate transformation relation between point clouds, namely a rotation matrix and a translation vector, so as to complete subsequent point cloud registration fusion. The process of finding the coordinate transformation relationship between point clouds using the ranaca algorithm is as follows:

inputting a first target geometric feature

And a second target geometry

And a first point cloud X and a second point cloud Y, according to a descriptor (any point X is at

The 32-bit description vector in (1) and any point y in

32-dimensional description vectors) to obtain the coordinate relationship of the points corresponding to the two descriptors, and calculating an initial rotation matrix and an initial translation vector. And then minimizing the projection error to obtain the final coordinate transformation relation, namely a rotation matrix and a translation vector.

Since collecting coordinate transformation relations for point cloud registration is very difficult. In some embodiments, the coordinate transformation relationships in the data sets used for training are generated using existing methods. Firstly, point cloud data under each scene is subjected to down sampling and noise reduction. The specific mode is to carry out uniform down-sampling on each original point cloud data and delete outliers. And then, obtaining an initial transformation relation between every two point cloud data under each scene by using a RANSAC-based method in sequence, and finally generating a more detailed transformation relation by using a point-to-face ICP algorithm. And then the refined transformation relation is used as a coordinate transformation relation to obtain a coordinate transformation relation for point cloud registration and fusion, and a point cloud encoder and a point cloud decoder based on FCGF are trained to obtain a precise coordinate transformation relation so as to construct an initial point cloud three-dimensional model. Meanwhile, the same is true of the coordinate transformation relation between the images corresponding to the point clouds.

103, respectively performing salient feature extraction and multi-scale aggregation feature description on the plurality of image data in the image sequence to obtain a salient feature vector and a multi-scale aggregation feature vector corresponding to each point in the panoramic point cloud.

In some embodiments, salient feature extraction and multi-scale aggregation feature description are respectively performed on a plurality of image data in the image sequence to obtain a salient feature vector and a multi-scale aggregation feature vector corresponding to each point in the panoramic point cloud, including steps 401 to 405.

Step 401, performing multi-scale feature extraction on the image data by using a multi-scale feature extraction convolution network to obtain a first-level feature map, a second-level feature map and a third-level feature map, wherein the number of channels of the first-level feature map is 8, the number of channels of the second-level feature map is 16, and the number of channels of the third-level feature map is 32.

Step 402, processing the first-level feature map, the second-level feature map and the third-level feature map by using a saliency extraction network to obtain a first intermediate feature map, a second intermediate feature map and a third intermediate feature map correspondingly, wherein the number of output channels of the saliency extraction network is 1.

Step 403, multiplying the first intermediate feature map, the second intermediate feature map and the third intermediate feature map by corresponding significance weights respectively, and then adding the result to obtain a significance feature map of the image data.

And 404, multiplying the first-level feature map, the second-level feature map and the third-level feature map by corresponding aggregation weights respectively to obtain a first multi-scale feature map, a second multi-scale feature map and a third multi-scale feature map.

Step 405, stacking the first multi-scale feature map, the second multi-scale feature map and the third multi-scale feature map according to a channel dimension to obtain the multi-scale aggregation feature map of the image data.

Referring to the process of performing registration and fusion processing on the point cloud data in the foregoing step 201 to obtain a panoramic point cloud of the object to be measured, a coordinate transformation relation required by the registration of the corresponding image data can be obtained, and a salient feature vector and a multi-scale aggregation feature vector corresponding to each point in the panoramic point cloud are obtained by combining the obtained salient feature map and the multi-scale aggregation feature map of the image data.

In order for those skilled in the art to clearly understand the method for acquiring the saliency map and the multi-scale aggregation map of image data provided in these embodiments, the method provided in steps 401-405 in these embodiments is described below by way of an example.

Referring to fig. 3, an example of a process for acquiring a saliency map and a multi-scale aggregation map of image data provided in these embodiments is shown.

In the first part, the multi-level feature extraction of the main stem. Inputting a picture with the size of h multiplied by w multiplied by 3, and obtaining a first-level feature map, a second-level feature map and a third-level feature map through a backbone multi-scale feature extraction convolution network. Specifically, for the first level, the image first passes through 3 layers (conv 1/2/3 in fig. 3) of convolution layers with the size of 3 × 3 × 8 pixels and the step size of 1 pixel, and a first-level feature map of h × w × 8 pixels is obtained.

For the second level, a convolution kernel with a size of 3 × 3 × 16 pixels and a step size of 2 pixels is then passed through 1 level (conv 4 in fig. 3), and a convolution kernel with a size of 3 × 3 × 16 pixels and a step size of 1 pixel is obtained through 2 levels (conv 5/6 in fig. 3), resulting in a convolution kernel with a size of 1 pixel

×

Second level feature map of x 16 pixels.

For the third level, a convolutional layer with a size of 3 × 3 × 32 pixels and a step size of 2 convolutional kernels is obtained through 1 layer (conv 7 in fig. 3), and a convolutional layer with a size of 3 × 3 × 32 pixels and a step size of 1 pixel is obtained through 2 layers (conv 8/9 in fig. 3), so that the size of the convolutional layer is

×

X 32 third-level feature map.

And performing bilinear interpolation on all the three layers of feature maps to up-sample the resolution of the original image, namely performing up-sampling on the feature maps of the next two layers by 2 times and 4 times respectively to finally obtain the feature maps of the three layers: i.e., s1 (first-level feature map) h × w × 8, s2 (second-level feature map) h × w × 16, and s3 (third-level feature map) h × w × 32.

Second, significant extraction. For a significance extraction part, respectively passing the feature maps of the three layers through 1 layer of convolution kernels with the size of 3 multiplied by 1 pixel and the step length of 1 to obtain three feature maps with the size of h multiplied by w multiplied by 1 pixel; then, considering that the shallow feature is easily affected by noise, in order to reduce the influence of noise, the three layers of feature maps from shallow to deep are respectively multiplied by coefficients: 0.17, 0.33 and 0.5, and then summed up to give a significance signature of size h × w × 1.

Located in the saliency map(x,y)The value at (b) represents the saliency of the point a, i.e. a point with a greater saliency a is a point that is more distinct from the surrounding points, typically a point with a significant color change or a drastic structural change. Naturally, the reconstruction results of the spatial points at the three-dimensional model corresponding to these points have a large influence on the quality of the final three-dimensional reconstruction.

In the third section, multi-scale aggregation characterization. For the feature description part, three layers of feature maps from shallow to deep obtained by the backbone multi-scale feature extraction network are respectively multiplied by a weight coefficient (1,2,3) and then stacked together according to the channel dimension to obtain a multi-scale aggregation feature map with the size of h multiplied by w multiplied by 32.

Compared with the method that (R, G and B) information is directly used as color information of point cloud as input, single color information is mapped to a high-dimensional vector through multi-level feature fusion along with the increase of the number of channels, the difference is larger, and the neural network can learn better.

And 104, calculating by using a first full-connection network according to the position information of a target point, the position information of the point to be observed and the multi-scale aggregation characteristic vector of the target point to obtain an observation characteristic vector of the target point relative to the point to be observed, wherein the target point is any point except the point to be observed in the panoramic point cloud. Wherein the point to be observed can be any point in the panoramic point cloud.

And 105, performing aggregation calculation according to the observation feature vector of the k points closest to the point to be observed relative to the point to be observed and the saliency feature vector to obtain an appearance description vector of the point to be observed.

Because the multi-scale aggregated feature vector used to describe the appearance information of one point in the panoramic point cloud is obtained from a specified viewing position, and the appearance features observed by the same point at different viewing positions are not necessarily the same. To regress the difference, the appearance description vector of the point to be observed is obtained using the methods of step 104 and step 105.

Further, subtracting the position information of the target point from the position information of the point to be observed to obtain the relative position information of the target point relative to the point to be observed; splicing the relative position information of the target point relative to the point to be observed with the multi-scale aggregation characteristic vector of the target point to obtain a spliced multi-scale aggregation characteristic vector; and calculating the spliced multi-scale aggregation characteristic vector by using the first fully-connected network to obtain an observation characteristic vector of the target point relative to the point to be observed.

Illustratively, for a point to be observed, the observed feature vector of the target point relative to the point to be observed is

Wherein, in the step (A),f _p is a multi-scale aggregated feature vector of the target point, p is the position information of the target point (represented in the form of a three-dimensional vector), x is used to represent the position information of the point to be observed (represented in the form of a three-dimensional vector), and the function W is used to represent the vector

And with

The difference is simulated by inputting the spliced image data into a first fully-connected network, wherein the first fully-connected network comprises three fully-connected layers with the sizes of 35 multiplied by 128, 128 multiplied by 256 and 256 multiplied by 128 respectively, so that observation characteristic vectors of a target point at p relative to a point to be observed at x are obtained

. Wherein the use of the relative position p-x keeps the network point-to-point translation unchanged, resulting in better generalization.

In the embodiment of the application, observation characteristic vectors of k nearest target points around the point to be observed relative to the point to be observed are combined. Illustratively, the k nearest neighbors to the point to be observed at x are

，

I denotes the ith target point, the appearance description vector of the point to be observed at x

The polymerization calculation is carried out by the following formula:

，

wherein the content of the first and second substances,A _i representing the salient feature vector corresponding to the ith target point,

，

is the position information of the ith target point. Using inverse distance weights

As

The weights of (2) are used to aggregate neural features so that the target points closer to the point to be observed contribute more to the calculation of the appearance description vector, while the saliency isA _i The larger target points are points which are different from the surrounding target points, usually points with obvious color change or drastic structure change, and are considered, so that the specific target points also contribute more to the calculation of the appearance description vector of the point to be observed.

And 106, calculating by using a second fully-connected network according to the observation characteristic vector of the target point relative to the point to be observed to obtain an observation density vector of the target point relative to the point to be observed.

And 107, performing aggregation calculation according to the observation density vector of the k points closest to the point to be observed relative to the point to be observed and the significant characteristic vector to obtain the volume density information of the point to be observed.

In this embodiment, the second fully connected network includes three fully connected layers with sizes of 160 × 256, 256 × 128, and 128 × 1, respectively. The method and the device for acquiring the volume density information of the point to be observed use the k nearest target points of the point to be observed to aggregate relative to the observation density vector of the point to be observed. As shown in the following two equations:

，

，

wherein the function D represents the observed feature vector of the ith target point relative to the point to be observed

And inputting the data into the second fully-connected network for calculation.

And 108, performing position coding calculation according to the position information of the observation sampling point and the position information of the point to be observed to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point.

Because the radiation information of the point to be observed is related to the observation direction, the embodiment of the application subtracts the position information of the point to be observed from the position information of the observation sampling point to obtain the relative position information of the point to be observed relative to the observation sampling point; and mapping the relative position information of the point to be observed relative to the observation sampling point into a 32-dimensional space to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point.

Illustratively, the position information of the observed sampling point at s

With position information of the point to be observed at x

The coordinate difference of (2) is regarded as the observation direction

. As the number of channels is increased, single position information is mapped to a high-dimensional vector, the difference is larger, and the neural network can learn better. In this embodiment, the viewing direction is set

Mapping Cheng Gaowei position vector

。

And step 109, calculating by using a third full-connection network according to the high-dimensional position vector of the point to be observed relative to the observation sampling point and the appearance description vector of the point to be observed, so as to obtain the radiation information of the point to be observed relative to the observation sampling point.

The high-dimensional position vector of the point to be observed relative to the observation sampling point

And an appearance description vector of the point to be observed

Splicing to obtain

. Using a third fully connected network pair

Calculating to obtain the radiation (color) information of the point to be observed relative to the observation sampling point

. Wherein the third fully connected network comprises three fully connected layers having dimensions of 160 x 256, 256 x 128 and 128 x 3, respectively.

And step 107 and step 109, respectively obtaining the volume density information of the point to be observed and the radiation information relative to the observation sampling point, namely completing the reconstruction of the 3D model.

Wherein the second fully connected network and the third fully connected network can be regarded as a nerf (Neural radiation Fields) network model. During the training of the NeRF network model, the NeRF network model is optimized by minimizing the error between each observed image and the corresponding view presented from the model reconstruction.

The embodiment of the application provides a three-dimensional reconstruction method based on image and point cloud data fusion, which comprises the following steps: acquiring a point cloud sequence and an image sequence of a measured object; registering and fusing a plurality of point cloud data in the point cloud sequence to obtain a panoramic point cloud of the measured object; respectively extracting salient features and describing multi-scale aggregation features for a plurality of image data in the image sequence to obtain a salient feature vector and a multi-scale aggregation feature vector corresponding to each point in the panoramic point cloud; calculating by using a first fully-connected network according to the position information of the target point, the position information of the point to be observed and the multi-scale aggregation characteristic vector of the target point to obtain an observation characteristic vector of the target point relative to the point to be observed; performing aggregation calculation according to observation feature vectors of k points closest to the point to be observed relative to the point to be observed and the significance feature vectors to obtain an appearance description vector of the point to be observed; calculating the observation characteristic vector of the target point relative to the point to be observed by using a second fully-connected network to obtain an observation density vector of the target point relative to the point to be observed; performing polymerization calculation according to observation density vectors of k points closest to the point to be observed relative to the point to be observed and the significant characteristic vector to obtain the bulk density information of the point to be observed; performing position coding calculation according to the position information of the observation sampling point and the position information of the point to be observed to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point; and calculating by using a third full-connection network according to the high-dimensional position vector of the point to be observed and the appearance description vector of the point to be observed to obtain the radiation information of the point to be observed relative to the observation sampling point. According to the three-dimensional reconstruction method, an initial three-dimensional model is generated through point cloud registration and fusion, then salient feature extraction and multi-scale aggregation feature extraction are carried out according to image data, salient feature vectors and multi-scale aggregation feature vectors of all points in the panoramic point cloud are obtained, point-based volume rendering is carried out according to a nerve radiation field, and a three-dimensional model with near-real color and texture information is obtained.

An embodiment of the present application further provides a terminal device, including: at least one processor and memory; the memory to store program instructions; the processor is configured to call and execute the program instructions stored in the memory, so as to enable the terminal device to execute the three-dimensional reconstruction method provided in the foregoing embodiment.

The embodiment of the application also provides a computer readable storage medium. The computer-readable storage medium has stored therein instructions, which, when run on a computer, cause the computer to perform the three-dimensional reconstruction method as provided in the previous embodiments.

The steps of a method described in an embodiment of the present application may be embodied directly in hardware, in a software unit executed by a processor, or in a combination of the two. The software cells may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. For example, a storage medium may be coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC, which may be located in a UE. In the alternative, the processor and the storage medium may reside in different components in the UE.

It should be understood that, in the various embodiments of the present application, the size of the serial number of each process does not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.

Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The above-described embodiments of the present application do not limit the scope of the present application.

Claims

1. A three-dimensional reconstruction method based on image and point cloud data fusion is characterized by comprising the following steps:

respectively extracting salient features and describing multi-scale aggregation features for a plurality of image data in the image sequence to obtain a salient feature vector and a multi-scale aggregation feature vector corresponding to each point in the panoramic point cloud;

calculating by using a first full-connection network according to the position information of a target point, the position information of a point to be observed and the multi-scale aggregation characteristic vector of the target point to obtain an observation characteristic vector of the target point relative to the point to be observed, wherein the target point is any one point except the point to be observed in the panoramic point cloud;

performing polymerization calculation according to observation characteristic vectors of k points closest to the point to be observed relative to the point to be observed and the significant characteristic vector to obtain an appearance description vector of the point to be observed;

performing aggregation calculation according to observation density vectors of k points closest to the point to be observed relative to the point to be observed and the significant characteristic vectors to obtain bulk density information of the point to be observed;

2. The three-dimensional reconstruction method according to claim 1, wherein the obtaining of the observation feature vector of the target point relative to the point to be observed by performing a calculation using a first fully-connected network according to the position information of the target point, the position information of the point to be observed, and the multi-scale aggregation feature vector of the target point comprises:

and calculating the spliced multi-scale aggregation characteristic vector by using the first fully-connected network to obtain an observation characteristic vector of the target point relative to the point to be observed.

3. The three-dimensional reconstruction method according to claim 1, wherein the appearance description vector of the point to be observed is obtained by performing an aggregation calculation according to the following formula:

wherein, the first and the second end of the pipe are connected with each other,f _x an appearance description vector representing the point to be observed,

，iis shown asiThe number of the target points is,A _i is shown asiThe salient feature vectors corresponding to the individual target points,

wherein, in the step (A),

is as followsiInformation on the position of the individual target points,xis the position information of the point to be observed,

is shown asiThe observation characteristic vector of the target point relative to the point to be observed;

wherein the content of the first and second substances,

representing the bulk density information of the point to be observed,

denotes the firstiA plurality of said purposesAnd the observation density vector of the punctuation relative to the point to be observed.

4. The three-dimensional reconstruction method according to claim 1, wherein performing a position coding calculation according to position information of an observation sampling point and position information of the point to be observed to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point comprises:

5. The three-dimensional reconstruction method of claim 1, wherein the saliency extraction of the image data comprises:

performing multi-scale feature extraction on the image data by using a multi-scale feature extraction convolution network to obtain a first-level feature map, a second-level feature map and a third-level feature map, wherein the number of channels of the first-level feature map is 8, the number of channels of the second-level feature map is 16, and the number of channels of the third-level feature map is 32;

processing the first-level feature map, the second-level feature map and the third-level feature map by using a saliency extraction network to correspondingly obtain a first intermediate feature map, a second intermediate feature map and a third intermediate feature map, wherein the number of output channels of the saliency extraction network is 1;

and multiplying the first intermediate feature map, the second intermediate feature map and the third intermediate feature map by corresponding significance weights respectively and then adding the results to obtain the significance feature map of the image data.

6. The three-dimensional reconstruction method of claim 5, wherein performing multi-scale aggregate characterization on the image data comprises:

stacking the first multi-scale feature map, the second multi-scale feature map and the third multi-scale feature map according to channel dimensions to obtain the multi-scale aggregation feature map of the image data.

7. The three-dimensional reconstruction method according to claim 1, wherein the plurality of point cloud data in the point cloud sequence are registered and fused to obtain a panoramic point cloud of the measured object; the method comprises the following steps:

sequentially fusing the two adjacent point cloud data according to the rotation matrix and the translation vector corresponding to the two adjacent point cloud data to obtain a new point cloud sequence;

and obtaining the panoramic point cloud of the measured object.

8. The three-dimensional reconstruction method of claim 7, wherein registering two adjacent point cloud data in the point cloud sequence in sequence to obtain a translation vector of a rotation matrix corresponding to the two adjacent point cloud data comprises:

obtaining a first initial geometric feature and a second initial geometric feature by using a point cloud encoder based on FCGF, wherein the first initial geometric feature corresponds to one of the two adjacent point cloud data, and the second initial geometric feature corresponds to the other of the two adjacent point cloud data;

obtaining a first target geometric feature corresponding to the first initial geometric feature and a second target geometric feature corresponding to the second initial geometric feature by using a point cloud decoder based on the FCGF;

9. A terminal device, comprising: at least one processor and a memory;

the memory to store program instructions;

the processor is configured to call and execute the program instructions stored in the memory to cause the terminal device to perform the three-dimensional reconstruction method according to any one of claims 1 to 8.

10. A computer-readable storage medium, characterized in that,

the computer-readable storage medium has stored therein instructions which, when run on a computer, cause the computer to perform the three-dimensional reconstruction method according to any one of claims 1 to 8.