CN115409931B - Three-dimensional reconstruction method based on image and point cloud data fusion - Google Patents

Three-dimensional reconstruction method based on image and point cloud data fusion Download PDF

Info

Publication number
CN115409931B
CN115409931B CN202211342750.8A CN202211342750A CN115409931B CN 115409931 B CN115409931 B CN 115409931B CN 202211342750 A CN202211342750 A CN 202211342750A CN 115409931 B CN115409931 B CN 115409931B
Authority
CN
China
Prior art keywords
point
observed
point cloud
vector
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211342750.8A
Other languages
Chinese (zh)
Other versions
CN115409931A (en
Inventor
李骏
李想
杨苏
周方明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Lichuang Zhiheng Electronic Technology Co ltd
Original Assignee
Suzhou Lichuang Zhiheng Electronic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Lichuang Zhiheng Electronic Technology Co ltd filed Critical Suzhou Lichuang Zhiheng Electronic Technology Co ltd
Priority to CN202211342750.8A priority Critical patent/CN115409931B/en
Publication of CN115409931A publication Critical patent/CN115409931A/en
Application granted granted Critical
Publication of CN115409931B publication Critical patent/CN115409931B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • G06T15/205Image-based rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Software Systems (AREA)
  • Geometry (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Architecture (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)

Abstract

The application provides a three-dimensional reconstruction method based on image and point cloud data fusion, and relates to the field of computer vision and computer graphics. The three-dimensional reconstruction method obtains a panoramic point cloud of a measured object through point cloud registration and fusion; and then, according to the corresponding image data, performing saliency feature extraction and multi-scale aggregation feature extraction to obtain a saliency feature vector and a multi-scale aggregation feature vector of each point in the panoramic point cloud, and performing point-based volume rendering by using a nerve radiation field to obtain a three-dimensional model with near-real color and texture information.

Description

Three-dimensional reconstruction method based on image and point cloud data fusion
Technical Field
The application relates to the field of computer vision and computer graphics, in particular to a three-dimensional reconstruction method based on image and point cloud data fusion.
Background
The cost of manually building a three-dimensional model is high, and this work not only requires great expertise, but also is time-consuming. In virtual reality, a large number of three-dimensional models of characters, objects, scenes, and the like with high geometric accuracy and complex colors and textures are required, so that the three-dimensional reconstruction technology plays a very critical role in AR, VR, and the metas. How fast and high quality reconstruction or generation of three-dimensional models is a key technology for computer vision and computer graphics.
The point cloud is a set of data points that are measurements of the surface of the inspected object by the three-dimensional measuring device. At present, with the more and more convenient acquisition mode of point cloud data, further make the point cloud become a very important three-dimensional data form. And the multi-view point cloud registration and fusion are carried out by utilizing a deep learning technology, so that a geometric model of a scene can be quickly and accurately reconstructed.
The current three-dimensional reconstruction technology based on point cloud data focuses on the reconstruction of a three-dimensional geometric structure, and usually comprises the following steps: acquiring point cloud data, preprocessing the point cloud, registering and fusing the point cloud, and generating a three-dimensional surface. After point cloud registration and fusion, an original three-dimensional model is obtained, the three-dimensional model at the moment is composed of a batch of discrete points, and the three-dimensional surface generation is to enable the surface of a three-dimensional object to be composed of planes, namely to be in a continuous state at the surface. The above three-dimensional reconstruction steps realize the geometric reconstruction of the three-dimensional object or scene, but the reconstructed three-dimensional model lacks texture and color information, resulting in an unreal reconstruction result.
Disclosure of Invention
In order to solve the problem that a three-dimensional model obtained by the conventional three-dimensional reconstruction method lacks texture and color information, so that a reconstruction result is not real enough, the application provides a three-dimensional reconstruction method based on image and point cloud data fusion, a terminal device and a computer readable storage medium.
The application provides a three-dimensional reconstruction method based on image and point cloud data fusion, which comprises the following steps:
acquiring a point cloud sequence and an image sequence of a measured object, wherein the point cloud sequence of the measured object comprises a plurality of sequentially adjacent point cloud data of the measured object, and the point cloud sequence covers a panoramic area of the measured object; the image sequence comprises a plurality of image data, and the image data corresponds to the point cloud data one by one;
registering and fusing a plurality of point cloud data in the point cloud sequence to obtain a panoramic point cloud of the measured object;
respectively extracting salient features and describing multi-scale aggregation features of a plurality of image data in an image sequence to obtain a salient feature vector and a multi-scale aggregation feature vector corresponding to each point in the panoramic point cloud;
calculating by using a first full-connection network according to the position information of the target point, the position information of the point to be observed and the multi-scale aggregation characteristic vector of the target point to obtain an observation characteristic vector of the target point relative to the point to be observed, wherein the target point is any point except the point to be observed in the panoramic point cloud;
performing polymerization calculation according to observation characteristic vectors of k points closest to the point to be observed relative to the point to be observed and the saliency characteristic vectors to obtain an appearance description vector of the point to be observed;
calculating the observation characteristic vector of the target point relative to the point to be observed by using a second fully-connected network to obtain an observation density vector of the target point relative to the point to be observed;
performing polymerization calculation according to observation density vectors of k points closest to the point to be observed relative to the point to be observed and the significant characteristic vectors to obtain the bulk density information of the point to be observed;
performing position coding calculation according to the position information of the observation sampling point and the position information of the point to be observed to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point;
and calculating by using a third full-connection network according to the high-dimensional position vector of the point to be observed and the appearance description vector of the point to be observed to obtain the radiation information of the point to be observed relative to the observation sampling point.
In some embodiments, the obtaining the observation feature vector of the target point relative to the point to be observed by using the first fully-connected network to perform calculation according to the position information of the target point, the position information of the point to be observed, and the multi-scale aggregation feature vector of the target point includes:
subtracting the position information of the target point from the position information of the point to be observed to obtain the relative position information of the target point relative to the point to be observed;
splicing the relative position information of the target point relative to the point to be observed and the multi-scale aggregation characteristic vector of the target point to obtain a spliced multi-scale aggregation characteristic vector;
and calculating the spliced multi-scale aggregation characteristic vector by using a first full-connection network to obtain an observation characteristic vector of the target point relative to the point to be observed.
In some embodiments, the appearance description vector of the point to be observed is obtained by performing an aggregation calculation according to the following formula:
Figure 872465DEST_PATH_IMAGE001
wherein fx represents an appearance description vector of the point to be observed,
Figure 596576DEST_PATH_IMAGE002
i denotes the ith target point, ai denotes the salient feature vector corresponding to the ith target point,
Figure 421313DEST_PATH_IMAGE003
wherein, in the step (A),
Figure 868474DEST_PATH_IMAGE004
is the position information of the ith target point, x is the position information of the point to be observed,
Figure 402355DEST_PATH_IMAGE005
representing an observation feature vector of the ith target point relative to the point to be observed;
performing polymerization calculation according to the following formula to obtain the bulk density information of the point to be observed:
Figure 567757DEST_PATH_IMAGE006
wherein the content of the first and second substances,
Figure 665026DEST_PATH_IMAGE007
representing the bulk density information of the point to be observed,
Figure 12700DEST_PATH_IMAGE008
representing an observation density vector of the ith target point relative to the point to be observed.
In some embodiments, performing a position coding calculation according to the position information of the observation sampling point and the position information of the point to be observed to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point includes:
subtracting the position information of the point to be observed from the position information of the observation sampling point to obtain the relative position information of the point to be observed relative to the observation sampling point;
and mapping the relative position information of the point to be observed relative to the observation sampling point into a 32-dimensional space to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point.
In some embodiments, saliency extraction is performed on image data, including:
carrying out multi-scale feature extraction on the image data by using a multi-scale feature extraction convolution network to obtain a first-level feature map, a second-level feature map and a third-level feature map, wherein the number of channels of the first-level feature map is 8, the number of channels of the second-level feature map is 16, and the number of channels of the third-level feature map is 32;
processing the first-level feature map, the second-level feature map and the third-level feature map by using a significance extraction network to correspondingly obtain a first intermediate feature map, a second intermediate feature map and a third intermediate feature map, wherein the number of output channels of the significance extraction network is 1;
and multiplying the first intermediate feature map, the second intermediate feature map and the third intermediate feature map by corresponding significance weights respectively, and then adding the two to obtain the significance feature map of the image data.
In some embodiments, multi-scale aggregate characterization of image data includes:
multiplying the first-level feature map, the second-level feature map and the third-level feature map by corresponding aggregation weights respectively to obtain a first multi-scale feature map, a second multi-scale feature map and a third multi-scale feature map;
and stacking the first multi-scale feature map, the second multi-scale feature map and the third multi-scale feature map according to the channel dimension to obtain a multi-scale aggregation feature map of the image data.
In some embodiments, a plurality of point cloud data in the point cloud sequence are registered and fused to obtain a panoramic point cloud of the measured object; the method comprises the following steps:
sequentially registering two adjacent point cloud data in the point cloud sequence to obtain a rotation matrix and a translation vector corresponding to the two adjacent point cloud data;
sequentially fusing two adjacent point cloud data according to the rotation matrix and the translation vector corresponding to the two adjacent point cloud data to obtain a new point cloud sequence;
taking the new point cloud sequence as the point cloud sequence of the object to be measured, repeating the process of obtaining the new point cloud sequence, and guiding the number of point cloud data contained in the new point cloud sequence to be 1;
and obtaining the panoramic point cloud of the measured object.
In some embodiments, sequentially registering two adjacent point cloud data in the point cloud sequence to obtain a translation vector of a rotation matrix corresponding to the two adjacent point cloud data includes:
obtaining a first initial geometric feature and a second initial geometric feature by using a point cloud encoder based on the FCGF, wherein the first initial geometric feature corresponds to one of two adjacent point cloud data, and the second initial geometric feature corresponds to the other of the two adjacent point cloud data;
obtaining a first target geometric feature corresponding to the first initial geometric feature and a second target geometric feature corresponding to the second initial geometric feature by using a point cloud decoder based on FCGF;
and obtaining a rotation matrix and a translation vector of the first target geometric feature and the second target geometric feature by using a Ransac algorithm.
A second aspect of the present application provides a terminal apparatus, comprising: at least one processor and memory;
a memory for storing program instructions;
and a processor for calling and executing the program instructions stored in the memory to make the terminal device execute the three-dimensional reconstruction method provided by the first aspect of the present application.
A third aspect of the present application is a computer-readable storage medium,
the computer-readable storage medium has stored therein instructions, which when run on a computer, cause the computer to perform the three-dimensional reconstruction method provided in the first aspect of the present application.
The application provides a three-dimensional reconstruction method based on image and point cloud data fusion, which comprises the following steps: acquiring a point cloud sequence and an image sequence of a measured object; registering and fusing the three-dimensional point cloud data to obtain a panoramic point cloud of the measured object; obtaining a salient feature vector and a multi-scale aggregation feature vector corresponding to each point according to two-dimensional image data; obtaining an observation characteristic vector of the target point relative to the point to be observed according to the position information of the point to be observed, the position information of the target point and the multi-scale aggregation characteristic vector; aggregating observation characteristic vectors and significance characteristic vectors of the nearest k points relative to the point to be observed to obtain an appearance description vector and volume density information of the point to be observed; and obtaining radiation information of the point to be observed relative to the observation sampling point according to the appearance description vector and the position information of the point to be observed and the position information of the observation sampling point. According to the three-dimensional reconstruction method, an initial three-dimensional model is generated through point cloud registration and fusion, then salient feature extraction and multi-scale aggregation feature extraction are carried out according to image data, a salient feature vector and a multi-scale aggregation feature vector of each point in the panoramic point cloud are obtained, point-based volume rendering is carried out according to a nerve radiation field, and a three-dimensional model with near-real color and texture information is obtained.
Drawings
Fig. 1 is a schematic workflow diagram of a three-dimensional reconstruction method based on image and point cloud data fusion according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram illustrating a process of obtaining a coordinate transformation relationship between two adjacent point clouds;
fig. 3 is a schematic diagram illustrating a process of acquiring salient feature images and multi-scale aggregation feature images of image data.
Detailed Description
In order to solve the problem that a three-dimensional model obtained by the existing three-dimensional reconstruction method lacks texture and color information, and the reconstruction result is not real enough, the application provides a three-dimensional reconstruction method based on fusion of image and point cloud data through the following embodiments.
Referring to fig. 1, a three-dimensional reconstruction method based on image and point cloud data fusion provided by the embodiment of the present application includes steps 101 to 109.
101, acquiring a point cloud sequence and an image sequence of a measured object, wherein the point cloud sequence of the measured object comprises a plurality of sequentially adjacent point cloud data of the measured object, and the point cloud sequence covers a panoramic area of the measured object; the image sequence comprises a plurality of image data, and the image data respectively correspond to the point cloud data one by one.
Selecting a measured object (an object or a scene) to be modeled, and acquiring multi-view sequence point cloud data and color image data at the same position and direction of the measured object by adopting structured light or other methods. The data collected is required to cover the entire surface of the object or scene.
And 102, registering and fusing a plurality of point cloud data in the point cloud sequence to obtain a panoramic point cloud of the measured object.
In the field, the image data and the point cloud data are in one-to-one correspondence respectively, that is, the image data and the corresponding point cloud data are obtained under the same visual angle, and the position information of the image data and the corresponding point cloud data are corresponding to each other; the position information of the image data is subjected to back projection to obtain the position information of the corresponding point cloud data, and correspondingly, the position information of the point cloud data is subjected to projection to obtain the two-dimensional position information of the corresponding image data. Therefore, the point cloud data are subjected to registration and fusion processing to obtain the panoramic point cloud of the measured object, and meanwhile, the coordinate registration and fusion relation among the image data can be determined.
The point cloud registration is to find a rotation matrix and a translation vector between two point clouds, and the point cloud fusion instruction fuses the two point clouds into a new point cloud according to the rotation matrix and the translation vector. In some embodiments, the step 102 includes steps 201-204.
Referring to fig. 2, a schematic diagram of a process of obtaining a coordinate transformation relationship between two adjacent point clouds is shown.
Step 201, registering two adjacent point cloud data in the point cloud sequence in sequence to obtain a translation vector of a rotation matrix corresponding to the two adjacent point cloud data.
And step 202, sequentially fusing the two adjacent point cloud data according to the rotation matrix and the translation vector corresponding to the two adjacent point cloud data to obtain a new point cloud sequence.
And 203, taking the new point cloud sequence as the point cloud sequence of the measured object, repeating the process of obtaining the new point cloud sequence, and guiding the number of point cloud data contained in the new point cloud sequence to be 1.
And 204, obtaining a panoramic point cloud of the measured object.
Illustratively, for a multi-view n (n >2, here n =6 is taken as an example) point clouds of an object or scene to be modeled, two adjacent point clouds are continuously registered and merged using the method provided in the above steps 201-204. The specific operation is as follows: inputting the 1 st point cloud and the 2 nd point cloud to a pairwise point cloud registration network to obtain a coordinate transformation relation (namely a rotation matrix and a translation vector) between the point clouds, combining the point clouds into the 1 st point cloud of a new point cloud sequence by using the relation, registering and combining the 3 rd point cloud and the 4 th point cloud into the 2 nd point cloud of the new point cloud sequence, and repeating the steps until all the point clouds are combined into 3 new point clouds. And continuously performing the registration and fusion of the two point clouds on the 3 point clouds until all the point clouds are registered and merged into a complete panoramic point cloud, and finishing the registration and fusion. The resulting panoramic point cloud is an initial three-dimensional model made up of discrete points.
In order to ensure the registration progress, a deep learning mode can be adopted to obtain a rotation matrix and a translation vector corresponding to two adjacent point cloud data. As such, in some embodiments, the step 201 includes steps 301-303.
Step 301, using a FCGF (full-volume Geometric Features) based point cloud encoder, obtaining a first initial Geometric feature and a second initial Geometric feature, wherein the first initial Geometric feature corresponds to one of the two adjacent point cloud data, and the second initial Geometric feature corresponds to the other of the two adjacent point cloud data.
Step 302, using a FCGF-based point cloud decoder, obtaining the first target geometric feature corresponding to the first initial geometric feature and the second target geometric feature corresponding to the second initial geometric feature.
Step 303, obtaining a rotation matrix and a translation vector of the first target geometric feature and the second target geometric feature by using a Ransac algorithm.
In order to clearly understand the method for acquiring the rotation matrix and the translation vector corresponding to the two point cloud data provided in these embodiments, the method provided in steps 301 to 303 in these embodiments is described below by way of an example.
And acquiring a first point cloud X and a second point cloud Y. The point number of the point cloud X is n, and the point number of the point cloud Y is m. Corresponding to the previous step 201, the first point cloud X is one of two adjacent point cloud data, and the second point cloud Y is the other of the two adjacent point cloud data.
Extracting large local context information of input point clouds X and Y by using a 3D convolution layer with convolution kernel of 7 multiplied by 7 contained in a point cloud encoder based on FCGF (fuzzy C-F) to obtain point cloud characteristics
Figure 966749DEST_PATH_IMAGE009
Figure 370180DEST_PATH_IMAGE010
. Then, aggregating richer local context information by using three layers of stride convolutional layers with residual blocks; the specific process is as follows:
for the first level, point cloud features
Figure 802298DEST_PATH_IMAGE011
Figure 224052DEST_PATH_IMAGE012
The characteristics are obtained by passing through 3D convolutional layers with 3 multiplied by 3 with step length of 1 and 2 and channel number of 32 and 64 respectively
Figure 801533DEST_PATH_IMAGE013
And
Figure 738265DEST_PATH_IMAGE014
wherein, in the step (A),
Figure 177337DEST_PATH_IMAGE015
and
Figure 266647DEST_PATH_IMAGE016
the point number of the point clouds is n/2 and m/2 respectively, and the number of the characteristic channels is 64. Then, the residual block convolution layer of the first layer is processed to obtain the characteristics
Figure 500182DEST_PATH_IMAGE017
And
Figure 191056DEST_PATH_IMAGE018
for the second level, will
Figure 168239DEST_PATH_IMAGE017
And
Figure 564585DEST_PATH_IMAGE018
inputting a second layer of the point cloud encoder based on the FCGF network, and obtaining the characteristics after passing through a 3D convolution layer with a convolution kernel of 3 multiplied by 3, a step length of 2 and a channel number of 128
Figure 782071DEST_PATH_IMAGE019
And
Figure 631078DEST_PATH_IMAGE020
Figure 208690DEST_PATH_IMAGE021
and
Figure 708811DEST_PATH_IMAGE022
the point number of the point clouds is n/4 and m/4 respectively, and the number of the characteristic channels is 128. Then obtaining the characteristics after the residual block convolution layer of the second layer
Figure 346465DEST_PATH_IMAGE023
And
Figure 682769DEST_PATH_IMAGE024
for the third level, will
Figure 549225DEST_PATH_IMAGE023
And
Figure 920163DEST_PATH_IMAGE024
inputting the third layer of the point cloud encoder based on the FCGF network, and obtaining the characteristics after passing through a 3D convolution layer with convolution kernel of 3 multiplied by 3, step length of 2 and channel number of 256
Figure 712407DEST_PATH_IMAGE025
And
Figure 332745DEST_PATH_IMAGE026
Figure 455421DEST_PATH_IMAGE027
and
Figure 228337DEST_PATH_IMAGE028
the point number of the point clouds is n/8 and m/8 respectively, and the number of the characteristic channels is 256. Then, after the residual block convolution layer of the second layer, the first initial geometric characteristic is obtained
Figure 145477DEST_PATH_IMAGE029
And a second initial geometric feature
Figure 767957DEST_PATH_IMAGE030
After the point cloud encoder based on the FCGF network, the point cloud characteristics of the first point cloud X and the second point cloud Y are respectively the first initial geometric characteristics
Figure 428746DEST_PATH_IMAGE029
And a second initial geometric characteristic
Figure 508697DEST_PATH_IMAGE030
. In this embodiment, a point cloud decoder based on the FCGF network is used to perform feature upsampling, which is divided into three layers in total, and the specific process is as follows:
for the first level, respectively inputting a first enhanced self-attention feature
Figure 675367DEST_PATH_IMAGE031
And a second enhanced self-attention feature
Figure 207980DEST_PATH_IMAGE032
The feature is obtained by a 3D up-sampling convolutional layer with a convolutional kernel of 3 multiplied by 3, step length of 2 and output channel number of 128, and then by the processing of the remaining block convolutional layer with the output channel number of 128 of the first layer
Figure 469197DEST_PATH_IMAGE033
And
Figure 658782DEST_PATH_IMAGE034
for the second level, will
Figure 245621DEST_PATH_IMAGE035
And
Figure 999950DEST_PATH_IMAGE021
after splicing, and
Figure 284432DEST_PATH_IMAGE036
and with
Figure 135714DEST_PATH_IMAGE022
The spliced features are respectively input into a second layer of the point cloud decoder, pass through a 3D up-sampling convolutional layer with a convolutional kernel of 3 multiplied by 3, a step length of 2 and an output channel number of 64, and then pass through the residual block convolutional layer of the second layer to obtain the features
Figure 565558DEST_PATH_IMAGE037
And
Figure 384347DEST_PATH_IMAGE038
for the third level, will
Figure 190629DEST_PATH_IMAGE039
And with
Figure 647149DEST_PATH_IMAGE027
After splicing, and
Figure 779053DEST_PATH_IMAGE040
and
Figure 39133DEST_PATH_IMAGE028
the spliced features are respectively input into the third layer of the point cloud decoder, and are subjected to 3D up-sampling convolution layer with a convolution kernel of 3 multiplied by 3, a step length of 2 and 64 output channels to obtain the features
Figure 960691DEST_PATH_IMAGE041
And
Figure 520985DEST_PATH_IMAGE042
finally, the process is carried out in a batch,
Figure 43364DEST_PATH_IMAGE043
and
Figure 587478DEST_PATH_IMAGE044
respectively passing through a layer of 3D convolution layer with convolution kernel of 1 multiplied by 1 and output channel number of 32 to obtain the final first target geometric characteristics of the point clouds X and Y
Figure 735563DEST_PATH_IMAGE045
And a second target geometry
Figure 665210DEST_PATH_IMAGE046
In the embodiment, a ranaca algorithm is used for finding a coordinate transformation relation between point clouds, namely a rotation matrix and a translation vector, so as to complete subsequent point cloud registration fusion. The process of finding the coordinate transformation relationship between point clouds using the ranaca algorithm is as follows:
inputting a first target geometric feature
Figure 607759DEST_PATH_IMAGE047
And a second target geometry
Figure 124322DEST_PATH_IMAGE048
And a first point cloud X and a second point cloud Y, according to a descriptor (any point X is at
Figure 76097DEST_PATH_IMAGE049
The 32-bit description vector in (1) and any point y in
Figure 610984DEST_PATH_IMAGE050
32-dimensional descriptor vector) to obtain the coordinate relationship of the points corresponding to the two descriptors, and calculating an initial rotation matrix and an initial translation vector. Then minimizing the projection error to obtain the final coordinate transformation relation, namely a rotation matrix and a translation vector.
Since collecting coordinate transformation relations for point cloud registration is very difficult. In some embodiments, the coordinate transformation relationships in the data sets used for training are generated using existing methods. Firstly, point cloud data under each scene is subjected to down sampling and noise reduction. The specific mode is to carry out uniform down-sampling on each original point cloud data and delete outliers. And then, obtaining an initial transformation relation between every two point cloud data under each scene by using a RANSAC-based method in sequence, and finally generating a more detailed transformation relation by using a point-to-face ICP algorithm. And then, the refined transformation relation is used as a coordinate transformation relation to obtain a coordinate transformation relation for point cloud registration and fusion, and a point cloud encoder and a point cloud decoder based on FCGF are trained to obtain an accurate coordinate transformation relation so as to construct an initial point cloud three-dimensional model. Meanwhile, the coordinate transformation relation between the images corresponding to the point cloud is also the same.
103, respectively performing salient feature extraction and multi-scale aggregation feature description on the plurality of image data in the image sequence to obtain a salient feature vector and a multi-scale aggregation feature vector corresponding to each point in the panoramic point cloud.
In some embodiments, salient feature extraction and multi-scale aggregation feature description are respectively performed on a plurality of image data in the image sequence to obtain a salient feature vector and a multi-scale aggregation feature vector corresponding to each point in the panoramic point cloud, including steps 401 to 405.
Step 401, performing multi-scale feature extraction on the image data by using a multi-scale feature extraction convolution network to obtain a first-level feature map, a second-level feature map and a third-level feature map, wherein the number of channels of the first-level feature map is 8, the number of channels of the second-level feature map is 16, and the number of channels of the third-level feature map is 32.
And 402, processing the first-level feature map, the second-level feature map and the third-level feature map by using a significance extraction network to correspondingly obtain a first intermediate feature map, a second intermediate feature map and a third intermediate feature map, wherein the number of output channels of the significance extraction network is 1.
Step 403, multiplying the first intermediate feature map, the second intermediate feature map, and the third intermediate feature map by the corresponding significance weights respectively, and then adding the results to obtain a significance feature map of the image data.
And 404, multiplying the first-level feature map, the second-level feature map and the third-level feature map by corresponding aggregation weights respectively to obtain a first multi-scale feature map, a second multi-scale feature map and a third multi-scale feature map.
Step 405, stacking the first multi-scale feature map, the second multi-scale feature map and the third multi-scale feature map according to a channel dimension to obtain the multi-scale aggregation feature map of the image data.
Referring to the process of performing registration and fusion processing on the point cloud data in the foregoing step 201 to obtain a panoramic point cloud of the object to be measured, a coordinate transformation relation required by the registration of the corresponding image data can be obtained, and a salient feature vector and a multi-scale aggregation feature vector corresponding to each point in the panoramic point cloud are obtained by combining the obtained salient feature map and the multi-scale aggregation feature map of the image data.
In order for those skilled in the art to clearly understand the method for acquiring the saliency map and the multi-scale aggregation map of image data provided in these embodiments, the method provided in steps 401-405 in these embodiments is described below by way of an example.
Referring to fig. 3, an example of a process for acquiring a saliency map and a multi-scale aggregation map of image data provided in these embodiments is shown.
In the first part, the trunk is subjected to multi-level feature extraction. Inputting a picture with the size of h multiplied by w multiplied by 3, and obtaining a first level feature map, a second level feature map and a third level feature map through a trunk multi-scale feature extraction convolution network. Specifically, for the first level, the image first passes through 3 layers (conv 1/2/3 in fig. 3) of convolution layers with the size of 3 × 3 × 8 pixels and the step size of 1 pixel, and a first-level feature map of h × w × 8 pixels is obtained.
For the second level, a convolution kernel with a size of 3 × 3 × 16 pixels and a step size of 2 pixels is then passed through 1 level (conv 4 in fig. 3), and a convolution kernel with a size of 3 × 3 × 16 pixels and a step size of 1 pixel is obtained through 2 levels (conv 5/6 in fig. 3), resulting in a convolution kernel with a size of 1 pixel
Figure 233421DEST_PATH_IMAGE051
×
Figure 689810DEST_PATH_IMAGE052
Second level feature map of x 16 pixels.
For the third level, a convolutional layer with a size of 3 × 3 × 32 pixels and a step size of 2 convolution kernels is obtained through 1 layer (conv 7 in fig. 3), and a convolutional layer with a size of 3 × 3 × 32 pixels and a step size of 1 pixel is obtained through 2 layers (conv 8/9 in fig. 3), so that the size of the convolutional layer is obtained
Figure 242014DEST_PATH_IMAGE053
×
Figure 585402DEST_PATH_IMAGE054
X 32 third-level feature map.
And performing bilinear interpolation on all the three layers of feature maps to up-sample the resolution of the original image, namely performing up-sampling on the feature maps of the next two layers by 2 times and 4 times respectively to finally obtain the feature maps of the three layers: i.e., s1 (first-level feature map) hxw × 8, s2 (second-level feature map) hxw × 16, and s3 (third-level feature map) hxw × 32.
Second, significant extraction. For a significance extraction part, respectively passing the feature maps of the three layers through 1 layer of convolution kernels with the size of 3 multiplied by 1 pixel and the step length of 1 to obtain three feature maps with the size of h multiplied by w multiplied by 1 pixel; then, considering that shallow features are easily affected by noise, in order to reduce the influence of noise, the three layers of feature maps from shallow to deep are respectively multiplied by coefficients: 0.17, 0.33 and 0.5, and then summed up to obtain a significance signature of size h × w × 1.
Located in the saliency map(x,y)The value at (b) represents the saliency of the point a, i.e. a point with a greater saliency a is a point that is more distinct from the surrounding points, typically a point with a significant color change or a drastic structural change. Naturally, the reconstruction results of the spatial points at the three-dimensional model corresponding to these points have a large influence on the quality of the final three-dimensional reconstruction.
In the third section, multi-scale aggregation characterization. For the feature description part, three layers of feature maps from shallow to deep obtained by the backbone multi-scale feature extraction network are respectively multiplied by a weight coefficient (1,2,3) and then stacked together according to the channel dimension to obtain a multi-scale aggregation feature map with the size of h multiplied by w multiplied by 32.
Compared with the method that (R, G and B) information is directly used as color information of point cloud as input, single color information is mapped to a high-dimensional vector through multi-level feature fusion along with the increase of the number of channels, the difference is larger, and the neural network can learn better.
And 104, calculating by using a first full-connection network according to the position information of a target point, the position information of the point to be observed and the multi-scale aggregation characteristic vector of the target point to obtain an observation characteristic vector of the target point relative to the point to be observed, wherein the target point is any point except the point to be observed in the panoramic point cloud. Wherein the point to be observed can be any point in the panoramic point cloud.
And 105, performing aggregation calculation according to the observation feature vector of the k points closest to the point to be observed relative to the point to be observed and the saliency feature vector to obtain an appearance description vector of the point to be observed.
Because the multi-scale aggregated feature vector used to describe the appearance information of one point in the panoramic point cloud is obtained from a specified viewing position, and the appearance features observed by the same point at different viewing positions are not necessarily the same. To regress the difference, the appearance description vector of the point to be observed is obtained using the methods of step 104 and step 105.
Further, subtracting the position information of the target point from the position information of the point to be observed to obtain the relative position information of the target point relative to the point to be observed; splicing the relative position information of the target point relative to the point to be observed and the multi-scale aggregation characteristic vector of the target point to obtain a spliced multi-scale aggregation characteristic vector; and calculating the spliced multi-scale aggregation characteristic vector by using the first fully-connected network to obtain an observation characteristic vector of the target point relative to the point to be observed.
Examples of the inventionIllustratively, for a point to be observed, the observed feature vector of the target point relative to the point to be observed is
Figure 197649DEST_PATH_IMAGE055
Wherein, in the step (A),f p is a multi-scale aggregated feature vector of the target point, p is the position information of the target point (represented in the form of a three-dimensional vector), x is used to represent the position information of the point to be observed (represented in the form of a three-dimensional vector), and the function W is used to represent the vector
Figure 875755DEST_PATH_IMAGE056
And
Figure 215338DEST_PATH_IMAGE057
the difference is simulated by inputting the spliced image data into a first fully-connected network, wherein the first fully-connected network comprises three fully-connected layers with the sizes of 35 multiplied by 128, 128 multiplied by 256 and 256 multiplied by 128 respectively, so that observation characteristic vectors of a target point at p relative to a point to be observed at x are obtained
Figure 662500DEST_PATH_IMAGE058
. Wherein the network to point translation is kept constant using the relative position p-x, resulting in better generalization.
In the embodiment of the application, observation characteristic vectors of k nearest target points around the point to be observed relative to the point to be observed are combined. Illustratively, the k nearest neighbors to the point to be observed at x are
Figure 196380DEST_PATH_IMAGE059
Figure 627362DEST_PATH_IMAGE060
I denotes the ith target point, the appearance description vector of the point to be observed at x
Figure 459052DEST_PATH_IMAGE061
The polymerization calculation is carried out by the following formula:
Figure 72304DEST_PATH_IMAGE062
wherein, the first and the second end of the pipe are connected with each other,A i representing the salient feature vector corresponding to the ith target point,
Figure 964037DEST_PATH_IMAGE063
Figure 413473DEST_PATH_IMAGE059
is the position information of the ith target point. Using inverse distance weights
Figure 799586DEST_PATH_IMAGE064
As
Figure 283657DEST_PATH_IMAGE065
The weights of (2) are used to aggregate neural features so that the target points closer to the point to be observed contribute more to the calculation of the appearance description vector, while the saliency isA i The larger target point is a point which is different from the surrounding target points, usually a point with obvious color change or drastic structural change, and is considered, so that the specific target point contributes more to the calculation of the appearance description vector of the point to be observed.
And 106, calculating by using a second fully-connected network according to the observation characteristic vector of the target point relative to the point to be observed to obtain an observation density vector of the target point relative to the point to be observed.
And 107, performing aggregation calculation according to the observation density vector of the k points closest to the point to be observed relative to the point to be observed and the significant characteristic vector to obtain the bulk density information of the point to be observed.
In this embodiment, the second fully connected network includes three fully connected layers with sizes of 160 × 256, 256 × 128, and 128 × 1, respectively. The method and the device for acquiring the volume density information of the point to be observed use the k nearest target points of the point to be observed to aggregate relative to the observation density vector of the point to be observed. As shown in the following two equations:
Figure 611870DEST_PATH_IMAGE066
Figure 735553DEST_PATH_IMAGE067
wherein the function D represents the observed feature vector of the ith target point relative to the point to be observed
Figure 971362DEST_PATH_IMAGE068
And inputting the data into the second fully-connected network for calculation.
And 108, performing position coding calculation according to the position information of the observation sampling point and the position information of the point to be observed to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point.
Because the radiation information of the point to be observed is related to the observation direction, the embodiment of the application subtracts the position information of the point to be observed from the position information of the observation sampling point to obtain the relative position information of the point to be observed relative to the observation sampling point; and mapping the relative position information of the point to be observed relative to the observation sampling point into a 32-dimensional space to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point.
Illustratively, the position information of the observed sampling point at s
Figure 247623DEST_PATH_IMAGE069
With position information of the point to be observed at x
Figure 294207DEST_PATH_IMAGE070
The coordinate difference of (2) is regarded as the observation direction
Figure 655918DEST_PATH_IMAGE071
. As the number of channels is increased, single position information is mapped to a high-dimensional vector, the difference is larger, and the neural network can learn better. In this embodiment, the viewing direction is set
Figure 898681DEST_PATH_IMAGE072
Mapping Cheng Gaowei position vector
Figure 346892DEST_PATH_IMAGE073
And step 109, calculating by using a third full-connection network according to the high-dimensional position vector of the point to be observed relative to the observation sampling point and the appearance description vector of the point to be observed, so as to obtain the radiation information of the point to be observed relative to the observation sampling point.
The high-dimensional position vector of the point to be observed relative to the observation sampling point
Figure 813645DEST_PATH_IMAGE074
And an appearance description vector of the point to be observed
Figure 147806DEST_PATH_IMAGE075
Splicing to obtain
Figure 990997DEST_PATH_IMAGE076
. Using a third fully connected network pair
Figure 287855DEST_PATH_IMAGE077
Calculating to obtain the radiation (color) information of the point to be observed relative to the observation sampling point
Figure 863193DEST_PATH_IMAGE078
. Wherein the third fully connected network comprises three fully connected layers with sizes of 160 × 256, 256 × 128 and 128 × 3, respectively.
And step 107 and step 109, respectively obtaining the volume density information of the point to be observed and the radiation information relative to the observation sampling point, namely completing the reconstruction of the 3D model.
Wherein the second fully-connected network and the third fully-connected network can be regarded as a nerf (Neural radiation Fields) network model. During the training of the NeRF network model, the NeRF network model is optimized by minimizing the error between each observed image and the corresponding view presented from the model reconstruction.
The embodiment of the application provides a three-dimensional reconstruction method based on image and point cloud data fusion, which comprises the following steps: acquiring a point cloud sequence and an image sequence of a measured object; registering and fusing a plurality of point cloud data in the point cloud sequence to obtain a panoramic point cloud of the measured object; respectively extracting salient features and describing multi-scale aggregation features for a plurality of image data in the image sequence to obtain a salient feature vector and a multi-scale aggregation feature vector corresponding to each point in the panoramic point cloud; calculating by using a first full-connection network according to the position information of the target point, the position information of the point to be observed and the multi-scale aggregation characteristic vector of the target point to obtain an observation characteristic vector of the target point relative to the point to be observed; performing polymerization calculation according to observation characteristic vectors of k points closest to the point to be observed relative to the point to be observed and the significant characteristic vector to obtain an appearance description vector of the point to be observed; calculating the observation characteristic vector of the target point relative to the point to be observed by using a second fully-connected network to obtain an observation density vector of the target point relative to the point to be observed; performing aggregation calculation according to observation density vectors of k points closest to the point to be observed relative to the point to be observed and the significant characteristic vectors to obtain bulk density information of the point to be observed; performing position coding calculation according to the position information of the observation sampling point and the position information of the point to be observed to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point; and calculating by using a third full-connection network according to the high-dimensional position vector of the point to be observed and the appearance description vector of the point to be observed to obtain the radiation information of the point to be observed relative to the observation sampling point. According to the three-dimensional reconstruction method, an initial three-dimensional model is generated through point cloud registration and fusion, then salient feature extraction and multi-scale aggregation feature extraction are carried out according to image data, salient feature vectors and multi-scale aggregation feature vectors of all points in the panoramic point cloud are obtained, point-based volume rendering is carried out according to a nerve radiation field, and a three-dimensional model with near-real color and texture information is obtained.
An embodiment of the present application further provides a terminal device, including: at least one processor and a memory; the memory to store program instructions; the processor is configured to call and execute the program instructions stored in the memory, so as to enable the terminal device to execute the three-dimensional reconstruction method provided in the foregoing embodiment.
The embodiment of the application also provides a computer readable storage medium. The computer-readable storage medium has stored therein instructions, which, when run on a computer, cause the computer to perform the three-dimensional reconstruction method as provided in the previous embodiments.
The steps of a method described in an embodiment of the present application may be embodied directly in hardware, in a software unit executed by a processor, or in a combination of the two. The software cells may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. For example, a storage medium may be coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC, which may be located in a UE. In the alternative, the processor and the storage medium may reside in different components in the UE.
It should be understood that, in the various embodiments of the present application, the size of the serial number of each process does not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.
Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The above-described embodiments of the present application do not limit the scope of the present application.

Claims (10)

1. A three-dimensional reconstruction method based on image and point cloud data fusion is characterized by comprising the following steps:
acquiring a point cloud sequence and an image sequence of a measured object, wherein the point cloud sequence of the measured object comprises a plurality of sequentially adjacent point cloud data of the measured object, and the point cloud sequence covers a panoramic area of the measured object; the image sequence comprises a plurality of image data, and the image data respectively correspond to the point cloud data one by one;
registering and fusing a plurality of point cloud data in the point cloud sequence to obtain a panoramic point cloud of the measured object; the panoramic point cloud is an initial three-dimensional model composed of discrete points;
respectively extracting salient features and describing multi-scale aggregation features for a plurality of image data in the image sequence to obtain a salient feature vector and a multi-scale aggregation feature vector corresponding to each point in the panoramic point cloud;
calculating by using a first full-connection network according to the position information of a target point, the position information of a point to be observed and the multi-scale aggregation characteristic vector of the target point to obtain an observation characteristic vector of the target point relative to the point to be observed, wherein the target point is any one point except the point to be observed in the panoramic point cloud; the point to be observed is any point in the panoramic point cloud;
performing polymerization calculation according to observation characteristic vectors of k points closest to the point to be observed relative to the point to be observed and the significant characteristic vector to obtain an appearance description vector of the point to be observed;
calculating the observation characteristic vector of the target point relative to the point to be observed by using a second fully-connected network to obtain an observation density vector of the target point relative to the point to be observed;
performing polymerization calculation according to observation density vectors of k points closest to the point to be observed relative to the point to be observed and the significant characteristic vector to obtain the bulk density information of the point to be observed;
performing position coding calculation according to the position information of the observation sampling point and the position information of the point to be observed to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point;
calculating by using a third fully-connected network according to the high-dimensional position vector of the point to be observed and the appearance description vector of the point to be observed to obtain radiation information of the point to be observed relative to the observation sampling point;
performing point-based volume rendering according to the nerve radiation field to obtain a reconstructed three-dimensional model with color and texture information;
wherein, the step of respectively performing salient feature extraction and multi-scale aggregation feature description on the plurality of image data in the image sequence to obtain a salient feature vector and a multi-scale aggregation feature vector corresponding to each point in the panoramic point cloud comprises the steps of:
respectively performing salient feature extraction and multi-scale aggregation feature description on a plurality of image data in the image sequence to obtain a salient feature map and a multi-scale aggregation feature map of the image data;
based on the process of carrying out registration and fusion processing on the plurality of point cloud data in the point cloud sequence to obtain the panoramic point cloud of the detected object, the coordinate transformation relation required by the registration of the corresponding image data can be obtained, and the significant feature vector and the multi-scale aggregation feature vector corresponding to each point in the panoramic point cloud are obtained by combining the obtained significant feature map and the multi-scale aggregation feature map of the image data.
2. The three-dimensional reconstruction method according to claim 1, wherein the step of calculating using a first fully-connected network according to the position information of the target point, the position information of the point to be observed, and the multi-scale aggregation feature vector of the target point to obtain the observation feature vector of the target point relative to the point to be observed comprises:
subtracting the position information of the target point from the position information of the point to be observed to obtain the relative position information of the target point relative to the point to be observed;
splicing the relative position information of the target point relative to the point to be observed and the multi-scale aggregation characteristic vector of the target point to obtain a spliced multi-scale aggregation characteristic vector;
and calculating the spliced multi-scale aggregation characteristic vector by using the first fully-connected network to obtain an observation characteristic vector of the target point relative to the point to be observed.
3. The three-dimensional reconstruction method according to claim 1, wherein the appearance description vector of the point to be observed is obtained by performing an aggregation calculation according to the following formula:
Figure 365988DEST_PATH_IMAGE001
wherein the content of the first and second substances,f x an appearance description vector representing the point to be observed,
Figure 156090DEST_PATH_IMAGE002
iis shown asiThe number of the target points is,A i is shown asiThe salient feature vectors corresponding to the individual target points,
Figure 582523DEST_PATH_IMAGE003
wherein, in the step (A),
Figure 99961DEST_PATH_IMAGE004
is a firstiInformation on the position of the individual target points,xis the position information of the point to be observed,
Figure 685663DEST_PATH_IMAGE005
denotes the firstiThe observation feature vector of the target point relative to the point to be observed;
performing polymerization calculation according to the following formula to obtain the bulk density information of the point to be observed:
Figure 928557DEST_PATH_IMAGE006
wherein the content of the first and second substances,
Figure 232499DEST_PATH_IMAGE007
representing the bulk density information of the point to be observed,
Figure 819207DEST_PATH_IMAGE008
is shown asiAnd the observation density vector of the target point relative to the point to be observed.
4. The three-dimensional reconstruction method according to claim 1, wherein performing a position coding calculation according to position information of an observation sampling point and position information of the point to be observed to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point comprises:
subtracting the position information of the point to be observed from the position information of the observation sampling point to obtain the relative position information of the point to be observed relative to the observation sampling point;
and mapping the relative position information of the point to be observed relative to the observation sampling point into a 32-dimensional space to obtain a high-dimensional position vector of the point to be observed relative to the observation sampling point.
5. The three-dimensional reconstruction method according to claim 1, wherein the extracting the saliency of the image data comprises:
performing multi-scale feature extraction on the image data by using a multi-scale feature extraction convolution network to obtain a first-level feature map, a second-level feature map and a third-level feature map, wherein the number of channels of the first-level feature map is 8, the number of channels of the second-level feature map is 16, and the number of channels of the third-level feature map is 32;
processing the first-level feature map, the second-level feature map and the third-level feature map by using a saliency extraction network to correspondingly obtain a first intermediate feature map, a second intermediate feature map and a third intermediate feature map, wherein the number of output channels of the saliency extraction network is 1;
and multiplying the first intermediate feature map, the second intermediate feature map and the third intermediate feature map by corresponding significance weights respectively and then adding the results to obtain the significance feature map of the image data.
6. The three-dimensional reconstruction method of claim 5, wherein performing multi-scale aggregate characterization on the image data comprises:
multiplying the first-level feature map, the second-level feature map and the third-level feature map by corresponding aggregation weights respectively to obtain a first multi-scale feature map, a second multi-scale feature map and a third multi-scale feature map;
stacking the first multi-scale feature map, the second multi-scale feature map and the third multi-scale feature map according to channel dimensions to obtain the multi-scale aggregation feature map of the image data.
7. The three-dimensional reconstruction method according to claim 1, wherein the plurality of point cloud data in the point cloud sequence are registered and fused to obtain a panoramic point cloud of the measured object; the method comprises the following steps:
sequentially registering two adjacent point cloud data in the point cloud sequence to obtain a rotation matrix and a translation vector corresponding to the two adjacent point cloud data;
sequentially fusing the two adjacent point cloud data according to the rotation matrix and the translation vector corresponding to the two adjacent point cloud data to obtain a new point cloud sequence;
taking the new point cloud sequence as the point cloud sequence of the measured object, repeating the process of obtaining the new point cloud sequence, and guiding the number of point cloud data contained in the new point cloud sequence to be 1;
and obtaining the panoramic point cloud of the measured object.
8. The three-dimensional reconstruction method of claim 7, wherein sequentially registering two adjacent point cloud data in the point cloud sequence to obtain a translation vector of a rotation matrix corresponding to the two adjacent point cloud data comprises:
obtaining a first initial geometric feature and a second initial geometric feature by using a point cloud encoder based on FCGF, wherein the first initial geometric feature corresponds to one of the two adjacent point cloud data, and the second initial geometric feature corresponds to the other of the two adjacent point cloud data;
obtaining a first target geometric feature corresponding to the first initial geometric feature and a second target geometric feature corresponding to the second initial geometric feature by using a point cloud decoder based on FCGF;
and obtaining a rotation matrix and a translation vector of the first target geometric feature and the second target geometric feature by using a Ransac algorithm.
9. A terminal device, comprising: at least one processor and memory;
the memory to store program instructions;
the processor is configured to call and execute the program instructions stored in the memory to cause the terminal device to perform the three-dimensional reconstruction method according to any one of claims 1 to 8.
10. A computer-readable storage medium, comprising,
the computer-readable storage medium has stored therein instructions which, when run on a computer, cause the computer to perform the three-dimensional reconstruction method according to any one of claims 1 to 8.
CN202211342750.8A 2022-10-31 2022-10-31 Three-dimensional reconstruction method based on image and point cloud data fusion Active CN115409931B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211342750.8A CN115409931B (en) 2022-10-31 2022-10-31 Three-dimensional reconstruction method based on image and point cloud data fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211342750.8A CN115409931B (en) 2022-10-31 2022-10-31 Three-dimensional reconstruction method based on image and point cloud data fusion

Publications (2)

Publication Number Publication Date
CN115409931A CN115409931A (en) 2022-11-29
CN115409931B true CN115409931B (en) 2023-03-31

Family

ID=84168933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211342750.8A Active CN115409931B (en) 2022-10-31 2022-10-31 Three-dimensional reconstruction method based on image and point cloud data fusion

Country Status (1)

Country Link
CN (1) CN115409931B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115631221B (en) * 2022-11-30 2023-04-28 北京航空航天大学 Low-overlapping-degree point cloud registration method based on consistency sampling
CN115631341A (en) * 2022-12-21 2023-01-20 北京航空航天大学 Point cloud registration method and system based on multi-scale feature voting
CN115690332B (en) * 2022-12-30 2023-03-31 华东交通大学 Point cloud data processing method and device, readable storage medium and electronic equipment
CN116843808A (en) * 2023-06-30 2023-10-03 北京百度网讯科技有限公司 Rendering, model training and virtual image generating method and device based on point cloud
CN117173693B (en) * 2023-11-02 2024-02-27 安徽蔚来智驾科技有限公司 3D target detection method, electronic device, medium and driving device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389671A (en) * 2018-09-25 2019-02-26 南京大学 A kind of single image three-dimensional rebuilding method based on multistage neural network

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111242138B (en) * 2020-01-11 2022-04-01 杭州电子科技大学 RGBD significance detection method based on multi-scale feature fusion
US20220327851A1 (en) * 2021-04-09 2022-10-13 Georgetown University Document search for document retrieval using 3d model
CN114898028A (en) * 2022-04-29 2022-08-12 厦门大学 Scene reconstruction and rendering method based on point cloud, storage medium and electronic equipment
CN115018989B (en) * 2022-06-21 2024-03-29 中国科学技术大学 Three-dimensional dynamic reconstruction method based on RGB-D sequence, training device and electronic equipment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389671A (en) * 2018-09-25 2019-02-26 南京大学 A kind of single image three-dimensional rebuilding method based on multistage neural network

Also Published As

Publication number Publication date
CN115409931A (en) 2022-11-29

Similar Documents

Publication Publication Date Title
CN115409931B (en) Three-dimensional reconstruction method based on image and point cloud data fusion
Chen et al. Cross parallax attention network for stereo image super-resolution
CN114511778A (en) Image processing method and device
CN112562001B (en) Object 6D pose estimation method, device, equipment and medium
US20240161355A1 (en) Generation of stylized drawing of three-dimensional shapes using neural networks
CN114863007A (en) Image rendering method and device for three-dimensional object and electronic equipment
CN115439694A (en) High-precision point cloud completion method and device based on deep learning
TW201839665A (en) Object recognition method and object recognition system
CN115761258A (en) Image direction prediction method based on multi-scale fusion and attention mechanism
CN115082322B (en) Image processing method and device, and training method and device of image reconstruction model
CN116993826A (en) Scene new view generation method based on local space aggregation nerve radiation field
CN115937552A (en) Image matching method based on fusion of manual features and depth features
CN114723884A (en) Three-dimensional face reconstruction method and device, computer equipment and storage medium
CN116051719A (en) Image rendering method and device based on nerve radiation field model
US20220319055A1 (en) Multiview neural human prediction using implicit differentiable renderer for facial expression, body pose shape and clothes performance capture
CN115731336A (en) Image rendering method, image rendering model generation method and related device
WO2022208440A1 (en) Multiview neural human prediction using implicit differentiable renderer for facial expression, body pose shape and clothes performance capture
CN117372604B (en) 3D face model generation method, device, equipment and readable storage medium
CN114627244A (en) Three-dimensional reconstruction method and device, electronic equipment and computer readable medium
CN114663880A (en) Three-dimensional target detection method based on multi-level cross-modal self-attention mechanism
Shrestha et al. A real world dataset for multi-view 3d reconstruction
CN116912675A (en) Underwater target detection method and system based on feature migration
CN113065521B (en) Object identification method, device, equipment and medium
US20210390772A1 (en) System and method to reconstruct a surface from partially oriented 3-d points
CN115205112A (en) Model training method and device for super-resolution of real complex scene image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant