CN115409931A - Three-dimensional reconstruction method based on image and point cloud data fusion - Google Patents

Three-dimensional reconstruction method based on image and point cloud data fusion Download PDF

Info

Publication number
CN115409931A
CN115409931A CN202211342750.8A CN202211342750A CN115409931A CN 115409931 A CN115409931 A CN 115409931A CN 202211342750 A CN202211342750 A CN 202211342750A CN 115409931 A CN115409931 A CN 115409931A
Authority
CN
China
Prior art keywords
point
observed
point cloud
vector
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211342750.8A
Other languages
Chinese (zh)
Other versions
CN115409931B (en
Inventor
李骏
李想
杨苏
周方明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Lichuang Zhiheng Electronic Technology Co ltd
Original Assignee
Suzhou Lichuang Zhiheng Electronic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Lichuang Zhiheng Electronic Technology Co ltd filed Critical Suzhou Lichuang Zhiheng Electronic Technology Co ltd
Priority to CN202211342750.8A priority Critical patent/CN115409931B/en
Publication of CN115409931A publication Critical patent/CN115409931A/en
Application granted granted Critical
Publication of CN115409931B publication Critical patent/CN115409931B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • G06T15/205Image-based rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Abstract

The application provides a three-dimensional reconstruction method based on image and point cloud data fusion, and relates to the field of computer vision and computer graphics. The three-dimensional reconstruction method obtains a panoramic point cloud of a measured object through point cloud registration and fusion; and then, according to the corresponding image data, performing saliency feature extraction and multi-scale aggregation feature extraction to obtain a saliency feature vector and a multi-scale aggregation feature vector of each point in the panoramic point cloud, and performing point-based volume rendering by using a nerve radiation field to obtain a three-dimensional model with near-real color and texture information.

Description

Three-dimensional reconstruction method based on image and point cloud data fusion
Technical Field
The application relates to the field of computer vision and computer graphics, in particular to a three-dimensional reconstruction method based on image and point cloud data fusion.
Background
The cost of manually building a three-dimensional model is high, and this work not only requires great expertise, but also is time-consuming. In virtual reality, a large number of three-dimensional models of characters, objects, scenes, and the like with high geometric accuracy and complex colors and textures are required, so that the three-dimensional reconstruction technology plays a very critical role in AR, VR, and the metas. How fast and high quality reconstruction or generation of three-dimensional models is a key technology for computer vision and computer graphics.
The point cloud is a set of data points that are measurements of the surface of the inspected object by the three-dimensional measuring device. At present, with the more and more convenient acquisition mode of point cloud data, the point cloud further becomes a very important three-dimensional data form. And the multi-view point cloud registration and fusion are carried out by utilizing a deep learning technology, so that a geometric model of a scene can be quickly and accurately reconstructed.
The current three-dimensional reconstruction technology based on point cloud data focuses on the reconstruction of a three-dimensional geometric structure, and usually comprises the following steps: point cloud data acquisition, point cloud pretreatment, point cloud registration and fusion, and three-dimensional surface generation. After point cloud registration and fusion, an original three-dimensional model is obtained, the three-dimensional model at the moment is formed by a batch of discrete points, and three-dimensional surface generation is to enable the surface of a three-dimensional object to be formed by a plurality of planes, namely to be in a continuous state at the surface. The above three-dimensional reconstruction steps realize the geometric reconstruction of the three-dimensional object or scene, but the reconstructed three-dimensional model lacks texture and color information, so that the reconstruction result is not true enough.
Disclosure of Invention
In order to solve the problem that a three-dimensional model obtained by the existing three-dimensional reconstruction method lacks texture and color information, and the reconstruction result is not real enough, the application provides a three-dimensional reconstruction method based on image and point cloud data fusion, a terminal device and a computer readable storage medium.
The application provides a three-dimensional reconstruction method based on image and point cloud data fusion, which comprises the following steps:
acquiring a point cloud sequence and an image sequence of a measured object, wherein the point cloud sequence of the measured object comprises a plurality of sequentially adjacent point cloud data of the measured object, and the point cloud sequence covers a panoramic area of the measured object; the image sequence comprises a plurality of image data, and the image data respectively correspond to the point cloud data one by one;
registering and fusing a plurality of point cloud data in the point cloud sequence to obtain a panoramic point cloud of the measured object;
respectively extracting salient features and describing multi-scale aggregation features for a plurality of image data in an image sequence to obtain a salient feature vector and a multi-scale aggregation feature vector corresponding to each point in the panoramic point cloud;
calculating by using a first full-connection network according to the position information of the target point, the position information of the point to be observed and the multi-scale aggregation characteristic vector of the target point to obtain an observation characteristic vector of the target point relative to the point to be observed, wherein the target point is any point except the point to be observed in the panoramic point cloud;
performing polymerization calculation according to observation characteristic vectors of k points closest to the point to be observed relative to the point to be observed and the saliency characteristic vectors to obtain an appearance description vector of the point to be observed;
calculating the observation characteristic vector of the target point relative to the point to be observed by using a second fully-connected network to obtain an observation density vector of the target point relative to the point to be observed;
performing polymerization calculation according to observation density vectors of k points closest to the point to be observed relative to the point to be observed and the significant characteristic vectors to obtain bulk density information of the point to be observed;
performing position coding calculation according to the position information of the observation sampling point and the position information of the point to be observed to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point;
and calculating by using a third full-connection network according to the high-dimensional position vector of the point to be observed and the appearance description vector of the point to be observed to obtain the radiation information of the point to be observed relative to the observation sampling point.
In some embodiments, calculating, using the first fully-connected network, according to the position information of the target point, the position information of the point to be observed, and the multi-scale aggregated feature vector of the target point, to obtain an observed feature vector of the target point relative to the point to be observed, includes:
subtracting the position information of the target point from the position information of the point to be observed to obtain the relative position information of the target point relative to the point to be observed;
splicing the relative position information of the target point relative to the point to be observed and the multi-scale aggregation characteristic vector of the target point to obtain a spliced multi-scale aggregation characteristic vector;
and calculating the spliced multi-scale aggregation characteristic vector by using a first full-connection network to obtain an observation characteristic vector of the target point relative to the point to be observed.
In some embodiments, the appearance description vector of the point to be observed is obtained by performing an aggregation calculation according to the following formula:
Figure 872465DEST_PATH_IMAGE001
wherein fx represents an appearance description vector of the point to be observed,
Figure 596576DEST_PATH_IMAGE002
i denotes the ith target point, ai denotes the salient feature vector corresponding to the ith target point,
Figure 421313DEST_PATH_IMAGE003
wherein, in the step (A),
Figure 868474DEST_PATH_IMAGE004
is the position information of the ith target point, x is the position information of the point to be observed,
Figure 402355DEST_PATH_IMAGE005
representing an observation feature vector of the ith target point relative to the point to be observed;
performing polymerization calculation according to the following formula to obtain the bulk density information of the point to be observed:
Figure 567757DEST_PATH_IMAGE006
wherein the content of the first and second substances,
Figure 665026DEST_PATH_IMAGE007
representing the bulk density information of the point to be observed,
Figure 12700DEST_PATH_IMAGE008
representing an observation density vector of the ith target point relative to the point to be observed.
In some embodiments, performing a position coding calculation according to the position information of the observation sampling point and the position information of the point to be observed to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point includes:
subtracting the position information of the point to be observed from the position information of the observation sampling point to obtain the relative position information of the point to be observed relative to the observation sampling point;
and mapping the relative position information of the point to be observed relative to the observation sampling point into a 32-dimensional space to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point.
In some embodiments, saliency extraction is performed on image data, including:
carrying out multi-scale feature extraction on the image data by using a multi-scale feature extraction convolution network to obtain a first-level feature map, a second-level feature map and a third-level feature map, wherein the number of channels of the first-level feature map is 8, the number of channels of the second-level feature map is 16, and the number of channels of the third-level feature map is 32;
processing the first-level feature map, the second-level feature map and the third-level feature map by using a saliency extraction network to obtain a first intermediate feature map, a second intermediate feature map and a third intermediate feature map correspondingly, wherein the number of output channels of the saliency extraction network is 1;
and multiplying the first intermediate feature map, the second intermediate feature map and the third intermediate feature map by corresponding significance weights respectively, and then adding the two to obtain the significance feature map of the image data.
In some embodiments, multi-scale aggregate characterization of image data includes:
multiplying the first-level feature map, the second-level feature map and the third-level feature map by corresponding aggregation weights respectively to obtain a first multi-scale feature map, a second multi-scale feature map and a third multi-scale feature map;
and stacking the first multi-scale feature map, the second multi-scale feature map and the third multi-scale feature map according to the channel dimension to obtain a multi-scale aggregation feature map of the image data.
In some embodiments, a plurality of point cloud data in the point cloud sequence are registered and fused to obtain a panoramic point cloud of the measured object; the method comprises the following steps:
sequentially registering two adjacent point cloud data in the point cloud sequence to obtain a rotation matrix and a translation vector corresponding to the two adjacent point cloud data;
sequentially fusing two adjacent point cloud data according to the rotation matrix and the translation vector corresponding to the two adjacent point cloud data to obtain a new point cloud sequence;
taking the new point cloud sequence as the point cloud sequence of the measured object, repeating the process of obtaining the new point cloud sequence, and guiding the number of point cloud data contained in the new point cloud sequence to be 1;
and obtaining the panoramic point cloud of the measured object.
In some embodiments, registering two adjacent point cloud data in the point cloud sequence in sequence to obtain a translation vector of a rotation matrix corresponding to the two adjacent point cloud data includes:
obtaining a first initial geometric feature and a second initial geometric feature by using a point cloud encoder based on FCGF, wherein the first initial geometric feature corresponds to one of two adjacent point cloud data, and the second initial geometric feature corresponds to the other of the two adjacent point cloud data;
obtaining a first target geometric feature corresponding to the first initial geometric feature and a second target geometric feature corresponding to the second initial geometric feature by using a point cloud decoder based on FCGF;
and obtaining a rotation matrix and a translation vector of the first target geometric feature and the second target geometric feature by using a Ransac algorithm.
A second aspect of the present application provides a terminal apparatus, comprising: at least one processor and memory;
a memory for storing program instructions;
and a processor for calling and executing the program instructions stored in the memory to make the terminal device execute the three-dimensional reconstruction method provided by the first aspect of the present application.
A third aspect of the present application is a computer-readable storage medium,
the computer-readable storage medium has stored therein instructions, which when run on a computer, cause the computer to perform the three-dimensional reconstruction method provided in the first aspect of the present application.
The application provides a three-dimensional reconstruction method based on image and point cloud data fusion, which comprises the following steps: acquiring a point cloud sequence and an image sequence of a measured object; registering and fusing the three-dimensional point cloud data to obtain a panoramic point cloud of the measured object; obtaining a salient feature vector and a multi-scale aggregation feature vector corresponding to each point according to two-dimensional image data; obtaining an observation characteristic vector of the target point relative to the point to be observed according to the position information of the point to be observed, the position information of the target point and the multi-scale aggregation characteristic vector; aggregating observation characteristic vectors and significance characteristic vectors of the nearest k points relative to the point to be observed to obtain an appearance description vector and volume density information of the point to be observed; and obtaining radiation information of the point to be observed relative to the observation sampling point according to the appearance description vector and the position information of the point to be observed and the position information of the observation sampling point. According to the three-dimensional reconstruction method, an initial three-dimensional model is generated through point cloud registration and fusion, then salient feature extraction and multi-scale aggregation feature extraction are carried out according to image data, salient feature vectors and multi-scale aggregation feature vectors of all points in the panoramic point cloud are obtained, point-based volume rendering is carried out according to a nerve radiation field, and a three-dimensional model with near-real color and texture information is obtained.
Drawings
Fig. 1 is a schematic work flow diagram of a three-dimensional reconstruction method based on image and point cloud data fusion according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram illustrating a process of obtaining a coordinate transformation relationship between two adjacent point clouds;
fig. 3 is a schematic diagram illustrating a process of acquiring salient feature images and multi-scale aggregation feature images of image data.
Detailed Description
In order to solve the problem that a three-dimensional model obtained by the conventional three-dimensional reconstruction method lacks texture and color information, so that a reconstruction result is not real enough, the application provides a three-dimensional reconstruction method based on image and point cloud data fusion through the following embodiments.
Referring to fig. 1, a three-dimensional reconstruction method based on image and point cloud data fusion provided by the embodiment of the present application includes steps 101 to 109.
101, acquiring a point cloud sequence and an image sequence of a measured object, wherein the point cloud sequence of the measured object comprises a plurality of sequentially adjacent point cloud data of the measured object, and the point cloud sequence covers a panoramic area of the measured object; the image sequence comprises a plurality of image data, and the image data respectively correspond to the point cloud data one by one.
Selecting a measured object (an object or a scene) to be modeled, and acquiring multi-view sequence point cloud data and color image data at the same position and direction of the measured object by adopting structured light or other methods. The data collected is required to cover the entire surface of the object or scene.
And 102, registering and fusing a plurality of point cloud data in the point cloud sequence to obtain a panoramic point cloud of the measured object.
In the field, the image data and the point cloud data are in one-to-one correspondence respectively, that is, the image data and the corresponding point cloud data are obtained under the same visual angle, and the position information of the image data and the corresponding point cloud data are corresponding to each other; the position information of the image data is subjected to back projection to obtain the position information of the corresponding point cloud data, and correspondingly, the position information of the point cloud data is subjected to projection to obtain the two-dimensional position information of the corresponding image data. Therefore, the point cloud data are subjected to registration and fusion processing to obtain the panoramic point cloud of the measured object, and meanwhile, the coordinate registration and fusion relation among the image data can be determined.
The point cloud registration refers to finding a rotation matrix and a translation vector between two point clouds, and the point cloud fusion instruction fuses the two point clouds into a new point cloud according to the rotation matrix and the translation vector. In some embodiments, the step 102 includes steps 201-204.
Referring to fig. 2, a schematic diagram of a process of obtaining a coordinate transformation relationship between two adjacent point clouds is shown as an example.
Step 201, registering two adjacent point cloud data in the point cloud sequence in sequence to obtain a translation vector of a rotation matrix corresponding to the two adjacent point cloud data.
And step 202, sequentially fusing the two adjacent point cloud data according to the rotation matrix and the translation vector corresponding to the two adjacent point cloud data to obtain a new point cloud sequence.
And 203, taking the new point cloud sequence as the point cloud sequence of the measured object, repeating the process of obtaining the new point cloud sequence, and guiding the number of point cloud data contained in the new point cloud sequence to be 1.
And 204, obtaining a panoramic point cloud of the measured object.
Illustratively, for n (n >2, which is exemplified by n = 6) point clouds from multiple perspectives of an object or scene to be modeled, two adjacent point clouds are continuously registered and merged using the method provided in the above steps 201-204. The specific operation is as follows: inputting the 1 st point cloud and the 2 nd point cloud to a pairwise point cloud registration network to obtain a coordinate transformation relation (namely a rotation matrix and a translation vector) between the point clouds, merging the point clouds into the 1 st point cloud of a new point cloud sequence by using the relation, registering and merging the 3 rd point cloud and the 4 th point cloud into the 2 nd point cloud of the new point cloud sequence, and so on until all the point clouds are merged into 3 new point clouds. And continuously performing the registration and fusion of the two point clouds on the 3 point clouds until all the point clouds are registered and merged into a complete panoramic point cloud, and finishing the registration and fusion. The resulting panoramic point cloud is an initial three-dimensional model composed of discrete points.
In order to ensure the registration progress, a deep learning mode can be adopted to obtain a rotation matrix and a translation vector corresponding to two adjacent point cloud data. As such, in some embodiments, the step 201 includes steps 301-303.
Step 301, using a FCGF (full-volume Geometric Features) -based point cloud encoder, obtaining a first initial Geometric feature and a second initial Geometric feature, where the first initial Geometric feature corresponds to one of the two adjacent point cloud data, and the second initial Geometric feature corresponds to the other of the two adjacent point cloud data.
Step 302, using a FCGF-based point cloud decoder, obtaining the first target geometric feature corresponding to the first initial geometric feature and the second target geometric feature corresponding to the second initial geometric feature.
Step 303, obtaining a rotation matrix and a translation vector of the first target geometric feature and the second target geometric feature by using a ranac algorithm.
In order to clearly understand the method for acquiring the rotation matrix and the translation vector corresponding to the two point cloud data provided in these embodiments, the method provided in steps 301 to 303 in these embodiments is described below by way of an example.
And acquiring a first point cloud X and a second point cloud Y. The point number of the point cloud X is n, and the point number of the point cloud Y is m. Corresponding to the previous step 201, the first point cloud X is one of two adjacent point cloud data, and the second point cloud Y is the other of the two adjacent point cloud data.
Extracting large local context information of input point clouds X and Y by using a 3D convolution layer with convolution kernel of 7 multiplied by 7 contained in a point cloud encoder based on FCGF (fuzzy C-F) to obtain point cloud characteristics
Figure 966749DEST_PATH_IMAGE009
Figure 370180DEST_PATH_IMAGE010
. Then, aggregating richer local context information by using three layers of stride convolutional layers with residual blocks; the specific process is as follows:
for the first level, point cloud features
Figure 802298DEST_PATH_IMAGE011
Figure 224052DEST_PATH_IMAGE012
Obtaining the characteristics by passing through 3D convolutional layers with two layers of convolutional kernels of 3 multiplied by 3, step length of 1 and 2 respectively and channel number of 32 and 64 respectively
Figure 801533DEST_PATH_IMAGE013
And
Figure 738265DEST_PATH_IMAGE014
wherein, in the process,
Figure 177337DEST_PATH_IMAGE015
and
Figure 266647DEST_PATH_IMAGE016
the point number of the point clouds is n/2 and m/2 respectively, and the number of the characteristic channels is 64. Then, the residual block convolution layer of the first layer is processed to obtain the characteristics
Figure 500182DEST_PATH_IMAGE017
And
Figure 191056DEST_PATH_IMAGE018
for the second level, will
Figure 168239DEST_PATH_IMAGE017
And
Figure 564585DEST_PATH_IMAGE018
inputting the second layer of the point cloud coder based on the FCGF network, and obtaining the characteristics after passing through a 3D convolution layer with convolution kernel of 3 multiplied by 3, step length of 2 and channel number of 128
Figure 782071DEST_PATH_IMAGE019
And
Figure 631078DEST_PATH_IMAGE020
Figure 208690DEST_PATH_IMAGE021
and
Figure 708811DEST_PATH_IMAGE022
the point number of the point clouds is n/4 and m/4 respectively, and the number of the characteristic channels is 128. Then obtaining the characteristics after the residual block convolution layer of the second layer
Figure 346465DEST_PATH_IMAGE023
And
Figure 682769DEST_PATH_IMAGE024
for the third level, will
Figure 549225DEST_PATH_IMAGE023
And
Figure 920163DEST_PATH_IMAGE024
inputting the third layer of the point cloud encoder based on the FCGF network, and passing through a layer of convolution kernel with the convolution kernel of 3 multiplied by 3, the step length of 2 and the channel number of 256 after 3D convolution of the layers, the features were obtained
Figure 712407DEST_PATH_IMAGE025
And
Figure 332745DEST_PATH_IMAGE026
Figure 455421DEST_PATH_IMAGE027
and
Figure 228337DEST_PATH_IMAGE028
the point number of the point clouds is n/8 and m/8 respectively, and the number of the characteristic channels is 256. Then, after the residual block convolution layer of the second layer, the first initial geometric characteristic is obtained
Figure 145477DEST_PATH_IMAGE029
And a second initial geometric characteristic
Figure 767957DEST_PATH_IMAGE030
After the point cloud encoder based on the FCGF network, the point cloud characteristics of the first point cloud X and the second point cloud Y are respectively the first initial geometric characteristics
Figure 428746DEST_PATH_IMAGE029
And a second initial geometric characteristic
Figure 508697DEST_PATH_IMAGE030
. In this embodiment, a point cloud decoder based on the FCGF network is used to perform feature upsampling, which is divided into three layers in total, and the specific process is as follows:
for the first level, respectively inputting a first enhanced self-attention feature
Figure 675367DEST_PATH_IMAGE031
And a second enhanced self-attention feature
Figure 207980DEST_PATH_IMAGE032
3 multiplied by 3 after passing through a layer of convolution kernel, the step length is respectively 2, and the output is obtainedThe 3D upsampling convolutional layer with the channel number of 128 is processed by the residual block convolutional layer with the output channel number of 128 of the first layer to obtain the characteristics
Figure 469197DEST_PATH_IMAGE033
And
Figure 658782DEST_PATH_IMAGE034
for the second level, will
Figure 245621DEST_PATH_IMAGE035
And
Figure 999950DEST_PATH_IMAGE021
after splicing, and
Figure 284432DEST_PATH_IMAGE036
and
Figure 135714DEST_PATH_IMAGE022
the spliced features are respectively input into a second layer of the point cloud decoder, pass through a 3D up-sampling convolutional layer with a convolutional kernel of 3 multiplied by 3, a step length of 2 and an output channel number of 64, and then pass through the residual block convolutional layer of the second layer to obtain the features
Figure 565558DEST_PATH_IMAGE037
And
Figure 384347DEST_PATH_IMAGE038
for the third level, will
Figure 190629DEST_PATH_IMAGE039
And
Figure 647149DEST_PATH_IMAGE027
after splicing, and
Figure 779053DEST_PATH_IMAGE040
and
Figure 39133DEST_PATH_IMAGE028
the spliced features are respectively input into the third layer of the point cloud decoder, and are subjected to 3D up-sampling convolution layer with a convolution kernel of 3 multiplied by 3, a step length of 2 and 64 output channels to obtain the features
Figure 960691DEST_PATH_IMAGE041
And with
Figure 520985DEST_PATH_IMAGE042
Finally, the process is carried out in a batch,
Figure 43364DEST_PATH_IMAGE043
and
Figure 587478DEST_PATH_IMAGE044
respectively passing through a layer of 3D convolution layers with convolution kernels of 1 multiplied by 1 and output channel number of 32 to obtain the final first target geometric characteristics of the point clouds X and Y
Figure 735563DEST_PATH_IMAGE045
And a second target geometry
Figure 665210DEST_PATH_IMAGE046
In the embodiment, a ranaca algorithm is used for finding a coordinate transformation relation between point clouds, namely a rotation matrix and a translation vector, so as to complete subsequent point cloud registration fusion. The process of finding the coordinate transformation relationship between point clouds using the ranaca algorithm is as follows:
inputting a first target geometric feature
Figure 607759DEST_PATH_IMAGE047
And a second target geometry
Figure 124322DEST_PATH_IMAGE048
And a first point cloud X and a second point cloud Y, according to a descriptor (any point X is at
Figure 76097DEST_PATH_IMAGE049
The 32-bit description vector in (1) and any point y in
Figure 610984DEST_PATH_IMAGE050
32-dimensional description vectors) to obtain the coordinate relationship of the points corresponding to the two descriptors, and calculating an initial rotation matrix and an initial translation vector. And then minimizing the projection error to obtain the final coordinate transformation relation, namely a rotation matrix and a translation vector.
Since collecting coordinate transformation relations for point cloud registration is very difficult. In some embodiments, the coordinate transformation relationships in the data sets used for training are generated using existing methods. Firstly, point cloud data under each scene is subjected to down sampling and noise reduction. The specific mode is to carry out uniform down-sampling on each original point cloud data and delete outliers. And then, obtaining an initial transformation relation between every two point cloud data under each scene by using a RANSAC-based method in sequence, and finally generating a more detailed transformation relation by using a point-to-face ICP algorithm. And then the refined transformation relation is used as a coordinate transformation relation to obtain a coordinate transformation relation for point cloud registration and fusion, and a point cloud encoder and a point cloud decoder based on FCGF are trained to obtain a precise coordinate transformation relation so as to construct an initial point cloud three-dimensional model. Meanwhile, the same is true of the coordinate transformation relation between the images corresponding to the point clouds.
103, respectively performing salient feature extraction and multi-scale aggregation feature description on the plurality of image data in the image sequence to obtain a salient feature vector and a multi-scale aggregation feature vector corresponding to each point in the panoramic point cloud.
In some embodiments, salient feature extraction and multi-scale aggregation feature description are respectively performed on a plurality of image data in the image sequence to obtain a salient feature vector and a multi-scale aggregation feature vector corresponding to each point in the panoramic point cloud, including steps 401 to 405.
Step 401, performing multi-scale feature extraction on the image data by using a multi-scale feature extraction convolution network to obtain a first-level feature map, a second-level feature map and a third-level feature map, wherein the number of channels of the first-level feature map is 8, the number of channels of the second-level feature map is 16, and the number of channels of the third-level feature map is 32.
Step 402, processing the first-level feature map, the second-level feature map and the third-level feature map by using a saliency extraction network to obtain a first intermediate feature map, a second intermediate feature map and a third intermediate feature map correspondingly, wherein the number of output channels of the saliency extraction network is 1.
Step 403, multiplying the first intermediate feature map, the second intermediate feature map and the third intermediate feature map by corresponding significance weights respectively, and then adding the result to obtain a significance feature map of the image data.
And 404, multiplying the first-level feature map, the second-level feature map and the third-level feature map by corresponding aggregation weights respectively to obtain a first multi-scale feature map, a second multi-scale feature map and a third multi-scale feature map.
Step 405, stacking the first multi-scale feature map, the second multi-scale feature map and the third multi-scale feature map according to a channel dimension to obtain the multi-scale aggregation feature map of the image data.
Referring to the process of performing registration and fusion processing on the point cloud data in the foregoing step 201 to obtain a panoramic point cloud of the object to be measured, a coordinate transformation relation required by the registration of the corresponding image data can be obtained, and a salient feature vector and a multi-scale aggregation feature vector corresponding to each point in the panoramic point cloud are obtained by combining the obtained salient feature map and the multi-scale aggregation feature map of the image data.
In order for those skilled in the art to clearly understand the method for acquiring the saliency map and the multi-scale aggregation map of image data provided in these embodiments, the method provided in steps 401-405 in these embodiments is described below by way of an example.
Referring to fig. 3, an example of a process for acquiring a saliency map and a multi-scale aggregation map of image data provided in these embodiments is shown.
In the first part, the multi-level feature extraction of the main stem. Inputting a picture with the size of h multiplied by w multiplied by 3, and obtaining a first-level feature map, a second-level feature map and a third-level feature map through a backbone multi-scale feature extraction convolution network. Specifically, for the first level, the image first passes through 3 layers (conv 1/2/3 in fig. 3) of convolution layers with the size of 3 × 3 × 8 pixels and the step size of 1 pixel, and a first-level feature map of h × w × 8 pixels is obtained.
For the second level, a convolution kernel with a size of 3 × 3 × 16 pixels and a step size of 2 pixels is then passed through 1 level (conv 4 in fig. 3), and a convolution kernel with a size of 3 × 3 × 16 pixels and a step size of 1 pixel is obtained through 2 levels (conv 5/6 in fig. 3), resulting in a convolution kernel with a size of 1 pixel
Figure 233421DEST_PATH_IMAGE051
×
Figure 689810DEST_PATH_IMAGE052
Second level feature map of x 16 pixels.
For the third level, a convolutional layer with a size of 3 × 3 × 32 pixels and a step size of 2 convolutional kernels is obtained through 1 layer (conv 7 in fig. 3), and a convolutional layer with a size of 3 × 3 × 32 pixels and a step size of 1 pixel is obtained through 2 layers (conv 8/9 in fig. 3), so that the size of the convolutional layer is
Figure 242014DEST_PATH_IMAGE053
×
Figure 585402DEST_PATH_IMAGE054
X 32 third-level feature map.
And performing bilinear interpolation on all the three layers of feature maps to up-sample the resolution of the original image, namely performing up-sampling on the feature maps of the next two layers by 2 times and 4 times respectively to finally obtain the feature maps of the three layers: i.e., s1 (first-level feature map) h × w × 8, s2 (second-level feature map) h × w × 16, and s3 (third-level feature map) h × w × 32.
Second, significant extraction. For a significance extraction part, respectively passing the feature maps of the three layers through 1 layer of convolution kernels with the size of 3 multiplied by 1 pixel and the step length of 1 to obtain three feature maps with the size of h multiplied by w multiplied by 1 pixel; then, considering that the shallow feature is easily affected by noise, in order to reduce the influence of noise, the three layers of feature maps from shallow to deep are respectively multiplied by coefficients: 0.17, 0.33 and 0.5, and then summed up to give a significance signature of size h × w × 1.
Located in the saliency map(x,y)The value at (b) represents the saliency of the point a, i.e. a point with a greater saliency a is a point that is more distinct from the surrounding points, typically a point with a significant color change or a drastic structural change. Naturally, the reconstruction results of the spatial points at the three-dimensional model corresponding to these points have a large influence on the quality of the final three-dimensional reconstruction.
In the third section, multi-scale aggregation characterization. For the feature description part, three layers of feature maps from shallow to deep obtained by the backbone multi-scale feature extraction network are respectively multiplied by a weight coefficient (1,2,3) and then stacked together according to the channel dimension to obtain a multi-scale aggregation feature map with the size of h multiplied by w multiplied by 32.
Compared with the method that (R, G and B) information is directly used as color information of point cloud as input, single color information is mapped to a high-dimensional vector through multi-level feature fusion along with the increase of the number of channels, the difference is larger, and the neural network can learn better.
And 104, calculating by using a first full-connection network according to the position information of a target point, the position information of the point to be observed and the multi-scale aggregation characteristic vector of the target point to obtain an observation characteristic vector of the target point relative to the point to be observed, wherein the target point is any point except the point to be observed in the panoramic point cloud. Wherein the point to be observed can be any point in the panoramic point cloud.
And 105, performing aggregation calculation according to the observation feature vector of the k points closest to the point to be observed relative to the point to be observed and the saliency feature vector to obtain an appearance description vector of the point to be observed.
Because the multi-scale aggregated feature vector used to describe the appearance information of one point in the panoramic point cloud is obtained from a specified viewing position, and the appearance features observed by the same point at different viewing positions are not necessarily the same. To regress the difference, the appearance description vector of the point to be observed is obtained using the methods of step 104 and step 105.
Further, subtracting the position information of the target point from the position information of the point to be observed to obtain the relative position information of the target point relative to the point to be observed; splicing the relative position information of the target point relative to the point to be observed with the multi-scale aggregation characteristic vector of the target point to obtain a spliced multi-scale aggregation characteristic vector; and calculating the spliced multi-scale aggregation characteristic vector by using the first fully-connected network to obtain an observation characteristic vector of the target point relative to the point to be observed.
Illustratively, for a point to be observed, the observed feature vector of the target point relative to the point to be observed is
Figure 197649DEST_PATH_IMAGE055
Wherein, in the step (A),f p is a multi-scale aggregated feature vector of the target point, p is the position information of the target point (represented in the form of a three-dimensional vector), x is used to represent the position information of the point to be observed (represented in the form of a three-dimensional vector), and the function W is used to represent the vector
Figure 875755DEST_PATH_IMAGE056
And with
Figure 215338DEST_PATH_IMAGE057
The difference is simulated by inputting the spliced image data into a first fully-connected network, wherein the first fully-connected network comprises three fully-connected layers with the sizes of 35 multiplied by 128, 128 multiplied by 256 and 256 multiplied by 128 respectively, so that observation characteristic vectors of a target point at p relative to a point to be observed at x are obtained
Figure 662500DEST_PATH_IMAGE058
. Wherein the use of the relative position p-x keeps the network point-to-point translation unchanged, resulting in better generalization.
In the embodiment of the application, observation characteristic vectors of k nearest target points around the point to be observed relative to the point to be observed are combined. Illustratively, the k nearest neighbors to the point to be observed at x are
Figure 196380DEST_PATH_IMAGE059
Figure 627362DEST_PATH_IMAGE060
I denotes the ith target point, the appearance description vector of the point to be observed at x
Figure 459052DEST_PATH_IMAGE061
The polymerization calculation is carried out by the following formula:
Figure 72304DEST_PATH_IMAGE062
wherein the content of the first and second substances,A i representing the salient feature vector corresponding to the ith target point,
Figure 964037DEST_PATH_IMAGE063
Figure 413473DEST_PATH_IMAGE059
is the position information of the ith target point. Using inverse distance weights
Figure 799586DEST_PATH_IMAGE064
As
Figure 283657DEST_PATH_IMAGE065
The weights of (2) are used to aggregate neural features so that the target points closer to the point to be observed contribute more to the calculation of the appearance description vector, while the saliency isA i The larger target points are points which are different from the surrounding target points, usually points with obvious color change or drastic structure change, and are considered, so that the specific target points also contribute more to the calculation of the appearance description vector of the point to be observed.
And 106, calculating by using a second fully-connected network according to the observation characteristic vector of the target point relative to the point to be observed to obtain an observation density vector of the target point relative to the point to be observed.
And 107, performing aggregation calculation according to the observation density vector of the k points closest to the point to be observed relative to the point to be observed and the significant characteristic vector to obtain the volume density information of the point to be observed.
In this embodiment, the second fully connected network includes three fully connected layers with sizes of 160 × 256, 256 × 128, and 128 × 1, respectively. The method and the device for acquiring the volume density information of the point to be observed use the k nearest target points of the point to be observed to aggregate relative to the observation density vector of the point to be observed. As shown in the following two equations:
Figure 611870DEST_PATH_IMAGE066
Figure 735553DEST_PATH_IMAGE067
wherein the function D represents the observed feature vector of the ith target point relative to the point to be observed
Figure 971362DEST_PATH_IMAGE068
And inputting the data into the second fully-connected network for calculation.
And 108, performing position coding calculation according to the position information of the observation sampling point and the position information of the point to be observed to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point.
Because the radiation information of the point to be observed is related to the observation direction, the embodiment of the application subtracts the position information of the point to be observed from the position information of the observation sampling point to obtain the relative position information of the point to be observed relative to the observation sampling point; and mapping the relative position information of the point to be observed relative to the observation sampling point into a 32-dimensional space to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point.
Illustratively, the position information of the observed sampling point at s
Figure 247623DEST_PATH_IMAGE069
With position information of the point to be observed at x
Figure 294207DEST_PATH_IMAGE070
The coordinate difference of (2) is regarded as the observation direction
Figure 655918DEST_PATH_IMAGE071
. As the number of channels is increased, single position information is mapped to a high-dimensional vector, the difference is larger, and the neural network can learn better. In this embodiment, the viewing direction is set
Figure 898681DEST_PATH_IMAGE072
Mapping Cheng Gaowei position vector
Figure 346892DEST_PATH_IMAGE073
And step 109, calculating by using a third full-connection network according to the high-dimensional position vector of the point to be observed relative to the observation sampling point and the appearance description vector of the point to be observed, so as to obtain the radiation information of the point to be observed relative to the observation sampling point.
The high-dimensional position vector of the point to be observed relative to the observation sampling point
Figure 813645DEST_PATH_IMAGE074
And an appearance description vector of the point to be observed
Figure 147806DEST_PATH_IMAGE075
Splicing to obtain
Figure 990997DEST_PATH_IMAGE076
. Using a third fully connected network pair
Figure 287855DEST_PATH_IMAGE077
Calculating to obtain the radiation (color) information of the point to be observed relative to the observation sampling point
Figure 863193DEST_PATH_IMAGE078
. Wherein the third fully connected network comprises three fully connected layers having dimensions of 160 x 256, 256 x 128 and 128 x 3, respectively.
And step 107 and step 109, respectively obtaining the volume density information of the point to be observed and the radiation information relative to the observation sampling point, namely completing the reconstruction of the 3D model.
Wherein the second fully connected network and the third fully connected network can be regarded as a nerf (Neural radiation Fields) network model. During the training of the NeRF network model, the NeRF network model is optimized by minimizing the error between each observed image and the corresponding view presented from the model reconstruction.
The embodiment of the application provides a three-dimensional reconstruction method based on image and point cloud data fusion, which comprises the following steps: acquiring a point cloud sequence and an image sequence of a measured object; registering and fusing a plurality of point cloud data in the point cloud sequence to obtain a panoramic point cloud of the measured object; respectively extracting salient features and describing multi-scale aggregation features for a plurality of image data in the image sequence to obtain a salient feature vector and a multi-scale aggregation feature vector corresponding to each point in the panoramic point cloud; calculating by using a first fully-connected network according to the position information of the target point, the position information of the point to be observed and the multi-scale aggregation characteristic vector of the target point to obtain an observation characteristic vector of the target point relative to the point to be observed; performing aggregation calculation according to observation feature vectors of k points closest to the point to be observed relative to the point to be observed and the significance feature vectors to obtain an appearance description vector of the point to be observed; calculating the observation characteristic vector of the target point relative to the point to be observed by using a second fully-connected network to obtain an observation density vector of the target point relative to the point to be observed; performing polymerization calculation according to observation density vectors of k points closest to the point to be observed relative to the point to be observed and the significant characteristic vector to obtain the bulk density information of the point to be observed; performing position coding calculation according to the position information of the observation sampling point and the position information of the point to be observed to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point; and calculating by using a third full-connection network according to the high-dimensional position vector of the point to be observed and the appearance description vector of the point to be observed to obtain the radiation information of the point to be observed relative to the observation sampling point. According to the three-dimensional reconstruction method, an initial three-dimensional model is generated through point cloud registration and fusion, then salient feature extraction and multi-scale aggregation feature extraction are carried out according to image data, salient feature vectors and multi-scale aggregation feature vectors of all points in the panoramic point cloud are obtained, point-based volume rendering is carried out according to a nerve radiation field, and a three-dimensional model with near-real color and texture information is obtained.
An embodiment of the present application further provides a terminal device, including: at least one processor and memory; the memory to store program instructions; the processor is configured to call and execute the program instructions stored in the memory, so as to enable the terminal device to execute the three-dimensional reconstruction method provided in the foregoing embodiment.
The embodiment of the application also provides a computer readable storage medium. The computer-readable storage medium has stored therein instructions, which, when run on a computer, cause the computer to perform the three-dimensional reconstruction method as provided in the previous embodiments.
The steps of a method described in an embodiment of the present application may be embodied directly in hardware, in a software unit executed by a processor, or in a combination of the two. The software cells may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. For example, a storage medium may be coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC, which may be located in a UE. In the alternative, the processor and the storage medium may reside in different components in the UE.
It should be understood that, in the various embodiments of the present application, the size of the serial number of each process does not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.
Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The above-described embodiments of the present application do not limit the scope of the present application.

Claims (10)

1. A three-dimensional reconstruction method based on image and point cloud data fusion is characterized by comprising the following steps:
acquiring a point cloud sequence and an image sequence of a measured object, wherein the point cloud sequence of the measured object comprises a plurality of sequentially adjacent point cloud data of the measured object, and the point cloud sequence covers a panoramic area of the measured object; the image sequence comprises a plurality of image data, and the image data respectively correspond to the point cloud data one by one;
registering and fusing a plurality of point cloud data in the point cloud sequence to obtain a panoramic point cloud of the measured object;
respectively extracting salient features and describing multi-scale aggregation features for a plurality of image data in the image sequence to obtain a salient feature vector and a multi-scale aggregation feature vector corresponding to each point in the panoramic point cloud;
calculating by using a first full-connection network according to the position information of a target point, the position information of a point to be observed and the multi-scale aggregation characteristic vector of the target point to obtain an observation characteristic vector of the target point relative to the point to be observed, wherein the target point is any one point except the point to be observed in the panoramic point cloud;
performing polymerization calculation according to observation characteristic vectors of k points closest to the point to be observed relative to the point to be observed and the significant characteristic vector to obtain an appearance description vector of the point to be observed;
calculating the observation characteristic vector of the target point relative to the point to be observed by using a second fully-connected network to obtain an observation density vector of the target point relative to the point to be observed;
performing aggregation calculation according to observation density vectors of k points closest to the point to be observed relative to the point to be observed and the significant characteristic vectors to obtain bulk density information of the point to be observed;
performing position coding calculation according to the position information of the observation sampling point and the position information of the point to be observed to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point;
and calculating by using a third full-connection network according to the high-dimensional position vector of the point to be observed and the appearance description vector of the point to be observed to obtain the radiation information of the point to be observed relative to the observation sampling point.
2. The three-dimensional reconstruction method according to claim 1, wherein the obtaining of the observation feature vector of the target point relative to the point to be observed by performing a calculation using a first fully-connected network according to the position information of the target point, the position information of the point to be observed, and the multi-scale aggregation feature vector of the target point comprises:
subtracting the position information of the target point from the position information of the point to be observed to obtain the relative position information of the target point relative to the point to be observed;
splicing the relative position information of the target point relative to the point to be observed and the multi-scale aggregation characteristic vector of the target point to obtain a spliced multi-scale aggregation characteristic vector;
and calculating the spliced multi-scale aggregation characteristic vector by using the first fully-connected network to obtain an observation characteristic vector of the target point relative to the point to be observed.
3. The three-dimensional reconstruction method according to claim 1, wherein the appearance description vector of the point to be observed is obtained by performing an aggregation calculation according to the following formula:
Figure 764538DEST_PATH_IMAGE001
wherein, the first and the second end of the pipe are connected with each other,f x an appearance description vector representing the point to be observed,
Figure 416099DEST_PATH_IMAGE003
iis shown asiThe number of the target points is,A i is shown asiThe salient feature vectors corresponding to the individual target points,
Figure 382787DEST_PATH_IMAGE004
wherein, in the step (A),
Figure 450100DEST_PATH_IMAGE005
is as followsiInformation on the position of the individual target points,xis the position information of the point to be observed,
Figure 982582DEST_PATH_IMAGE006
is shown asiThe observation characteristic vector of the target point relative to the point to be observed;
performing polymerization calculation according to the following formula to obtain the bulk density information of the point to be observed:
Figure 539465DEST_PATH_IMAGE007
wherein the content of the first and second substances,
Figure 478602DEST_PATH_IMAGE008
representing the bulk density information of the point to be observed,
Figure 395611DEST_PATH_IMAGE009
denotes the firstiA plurality of said purposesAnd the observation density vector of the punctuation relative to the point to be observed.
4. The three-dimensional reconstruction method according to claim 1, wherein performing a position coding calculation according to position information of an observation sampling point and position information of the point to be observed to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point comprises:
subtracting the position information of the point to be observed from the position information of the observation sampling point to obtain the relative position information of the point to be observed relative to the observation sampling point;
and mapping the relative position information of the point to be observed relative to the observation sampling point into a 32-dimensional space to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point.
5. The three-dimensional reconstruction method of claim 1, wherein the saliency extraction of the image data comprises:
performing multi-scale feature extraction on the image data by using a multi-scale feature extraction convolution network to obtain a first-level feature map, a second-level feature map and a third-level feature map, wherein the number of channels of the first-level feature map is 8, the number of channels of the second-level feature map is 16, and the number of channels of the third-level feature map is 32;
processing the first-level feature map, the second-level feature map and the third-level feature map by using a saliency extraction network to correspondingly obtain a first intermediate feature map, a second intermediate feature map and a third intermediate feature map, wherein the number of output channels of the saliency extraction network is 1;
and multiplying the first intermediate feature map, the second intermediate feature map and the third intermediate feature map by corresponding significance weights respectively and then adding the results to obtain the significance feature map of the image data.
6. The three-dimensional reconstruction method of claim 5, wherein performing multi-scale aggregate characterization on the image data comprises:
multiplying the first-level feature map, the second-level feature map and the third-level feature map by corresponding aggregation weights respectively to obtain a first multi-scale feature map, a second multi-scale feature map and a third multi-scale feature map;
stacking the first multi-scale feature map, the second multi-scale feature map and the third multi-scale feature map according to channel dimensions to obtain the multi-scale aggregation feature map of the image data.
7. The three-dimensional reconstruction method according to claim 1, wherein the plurality of point cloud data in the point cloud sequence are registered and fused to obtain a panoramic point cloud of the measured object; the method comprises the following steps:
sequentially registering two adjacent point cloud data in the point cloud sequence to obtain a rotation matrix and a translation vector corresponding to the two adjacent point cloud data;
sequentially fusing the two adjacent point cloud data according to the rotation matrix and the translation vector corresponding to the two adjacent point cloud data to obtain a new point cloud sequence;
taking the new point cloud sequence as the point cloud sequence of the measured object, repeating the process of obtaining the new point cloud sequence, and guiding the number of point cloud data contained in the new point cloud sequence to be 1;
and obtaining the panoramic point cloud of the measured object.
8. The three-dimensional reconstruction method of claim 7, wherein registering two adjacent point cloud data in the point cloud sequence in sequence to obtain a translation vector of a rotation matrix corresponding to the two adjacent point cloud data comprises:
obtaining a first initial geometric feature and a second initial geometric feature by using a point cloud encoder based on FCGF, wherein the first initial geometric feature corresponds to one of the two adjacent point cloud data, and the second initial geometric feature corresponds to the other of the two adjacent point cloud data;
obtaining a first target geometric feature corresponding to the first initial geometric feature and a second target geometric feature corresponding to the second initial geometric feature by using a point cloud decoder based on the FCGF;
and obtaining a rotation matrix and a translation vector of the first target geometric feature and the second target geometric feature by using a Ransac algorithm.
9. A terminal device, comprising: at least one processor and a memory;
the memory to store program instructions;
the processor is configured to call and execute the program instructions stored in the memory to cause the terminal device to perform the three-dimensional reconstruction method according to any one of claims 1 to 8.
10. A computer-readable storage medium, characterized in that,
the computer-readable storage medium has stored therein instructions which, when run on a computer, cause the computer to perform the three-dimensional reconstruction method according to any one of claims 1 to 8.
CN202211342750.8A 2022-10-31 2022-10-31 Three-dimensional reconstruction method based on image and point cloud data fusion Active CN115409931B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211342750.8A CN115409931B (en) 2022-10-31 2022-10-31 Three-dimensional reconstruction method based on image and point cloud data fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211342750.8A CN115409931B (en) 2022-10-31 2022-10-31 Three-dimensional reconstruction method based on image and point cloud data fusion

Publications (2)

Publication Number Publication Date
CN115409931A true CN115409931A (en) 2022-11-29
CN115409931B CN115409931B (en) 2023-03-31

Family

ID=84168933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211342750.8A Active CN115409931B (en) 2022-10-31 2022-10-31 Three-dimensional reconstruction method based on image and point cloud data fusion

Country Status (1)

Country Link
CN (1) CN115409931B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115631221A (en) * 2022-11-30 2023-01-20 北京航空航天大学 Low-overlapping-degree point cloud registration method based on consistency sampling
CN115631341A (en) * 2022-12-21 2023-01-20 北京航空航天大学 Point cloud registration method and system based on multi-scale feature voting
CN115690332A (en) * 2022-12-30 2023-02-03 华东交通大学 Point cloud data processing method and device, readable storage medium and electronic equipment
CN116843808A (en) * 2023-06-30 2023-10-03 北京百度网讯科技有限公司 Rendering, model training and virtual image generating method and device based on point cloud
CN117173693A (en) * 2023-11-02 2023-12-05 安徽蔚来智驾科技有限公司 3D target detection method, electronic device, medium and driving device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389671A (en) * 2018-09-25 2019-02-26 南京大学 A kind of single image three-dimensional rebuilding method based on multistage neural network
CN111242138A (en) * 2020-01-11 2020-06-05 杭州电子科技大学 RGBD significance detection method based on multi-scale feature fusion
CN114898028A (en) * 2022-04-29 2022-08-12 厦门大学 Scene reconstruction and rendering method based on point cloud, storage medium and electronic equipment
CN115018989A (en) * 2022-06-21 2022-09-06 中国科学技术大学 Three-dimensional dynamic reconstruction method based on RGB-D sequence, training device and electronic equipment
US20220327773A1 (en) * 2021-04-09 2022-10-13 Georgetown University Facial recognition using 3d model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389671A (en) * 2018-09-25 2019-02-26 南京大学 A kind of single image three-dimensional rebuilding method based on multistage neural network
CN111242138A (en) * 2020-01-11 2020-06-05 杭州电子科技大学 RGBD significance detection method based on multi-scale feature fusion
US20220327773A1 (en) * 2021-04-09 2022-10-13 Georgetown University Facial recognition using 3d model
CN114898028A (en) * 2022-04-29 2022-08-12 厦门大学 Scene reconstruction and rendering method based on point cloud, storage medium and electronic equipment
CN115018989A (en) * 2022-06-21 2022-09-06 中国科学技术大学 Three-dimensional dynamic reconstruction method based on RGB-D sequence, training device and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
孔德冕等: "基于多尺度特征融合的RGB-D显著性检测", 《微电子学与计算机》 *
江荣等: "基于双目视觉算法的路面三维纹理信息获取", 《激光与光电子学进展》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115631221A (en) * 2022-11-30 2023-01-20 北京航空航天大学 Low-overlapping-degree point cloud registration method based on consistency sampling
CN115631341A (en) * 2022-12-21 2023-01-20 北京航空航天大学 Point cloud registration method and system based on multi-scale feature voting
CN115690332A (en) * 2022-12-30 2023-02-03 华东交通大学 Point cloud data processing method and device, readable storage medium and electronic equipment
CN116843808A (en) * 2023-06-30 2023-10-03 北京百度网讯科技有限公司 Rendering, model training and virtual image generating method and device based on point cloud
CN117173693A (en) * 2023-11-02 2023-12-05 安徽蔚来智驾科技有限公司 3D target detection method, electronic device, medium and driving device
CN117173693B (en) * 2023-11-02 2024-02-27 安徽蔚来智驾科技有限公司 3D target detection method, electronic device, medium and driving device

Also Published As

Publication number Publication date
CN115409931B (en) 2023-03-31

Similar Documents

Publication Publication Date Title
CN115409931B (en) Three-dimensional reconstruction method based on image and point cloud data fusion
Chen et al. Cross parallax attention network for stereo image super-resolution
WO2023138062A1 (en) Image processing method and apparatus
CN115439694A (en) High-precision point cloud completion method and device based on deep learning
TW201839665A (en) Object recognition method and object recognition system
CN114723884A (en) Three-dimensional face reconstruction method and device, computer equipment and storage medium
CN115761258A (en) Image direction prediction method based on multi-scale fusion and attention mechanism
CN116993826A (en) Scene new view generation method based on local space aggregation nerve radiation field
JP2024507727A (en) Rendering a new image of a scene using a geometric shape recognition neural network conditioned on latent variables
US20220319055A1 (en) Multiview neural human prediction using implicit differentiable renderer for facial expression, body pose shape and clothes performance capture
CN115731336A (en) Image rendering method, image rendering model generation method and related device
CN115797561A (en) Three-dimensional reconstruction method, device and readable storage medium
US11880913B2 (en) Generation of stylized drawing of three-dimensional shapes using neural networks
EP4292059A1 (en) Multiview neural human prediction using implicit differentiable renderer for facial expression, body pose shape and clothes performance capture
CN114863007A (en) Image rendering method and device for three-dimensional object and electronic equipment
CN114627244A (en) Three-dimensional reconstruction method and device, electronic equipment and computer readable medium
CN116993926B (en) Single-view human body three-dimensional reconstruction method
CN115082322B (en) Image processing method and device, and training method and device of image reconstruction model
Jin et al. Light field reconstruction via deep adaptive fusion of hybrid lenses
CN113065521B (en) Object identification method, device, equipment and medium
US20210390772A1 (en) System and method to reconstruct a surface from partially oriented 3-d points
CN114638866A (en) Point cloud registration method and system based on local feature learning
Jee et al. Hologram Super-Resolution Using Dual-Generator GAN
Rasmuson et al. Addressing the shape-radiance ambiguity in view-dependent radiance fields
CN115063459B (en) Point cloud registration method and device and panoramic point cloud fusion method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant