CN115439388A - Free viewpoint image synthesis method based on multilayer neural surface expression - Google Patents
Free viewpoint image synthesis method based on multilayer neural surface expression Download PDFInfo
- Publication number
- CN115439388A CN115439388A CN202211391996.4A CN202211391996A CN115439388A CN 115439388 A CN115439388 A CN 115439388A CN 202211391996 A CN202211391996 A CN 202211391996A CN 115439388 A CN115439388 A CN 115439388A
- Authority
- CN
- China
- Prior art keywords
- viewpoint
- sparse
- module
- image synthesis
- free viewpoint
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000001537 neural effect Effects 0.000 title claims abstract description 69
- 238000001308 synthesis method Methods 0.000 title claims abstract description 15
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 46
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 46
- 238000000034 method Methods 0.000 claims abstract description 33
- 238000012549 training Methods 0.000 claims abstract description 19
- 230000004927 fusion Effects 0.000 claims description 15
- 238000000605 extraction Methods 0.000 claims description 13
- 238000009877 rendering Methods 0.000 claims description 12
- 238000005070 sampling Methods 0.000 claims description 9
- 230000003068 static effect Effects 0.000 claims description 6
- 238000010200 validation analysis Methods 0.000 claims description 6
- 239000000126 substance Substances 0.000 claims description 4
- 230000001360 synchronised effect Effects 0.000 claims description 4
- 241000208340 Araliaceae Species 0.000 claims description 3
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims description 3
- 235000003140 Panax quinquefolius Nutrition 0.000 claims description 3
- 235000008434 ginseng Nutrition 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 4
- 230000002194 synthesizing effect Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000007654 immersion Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000007499 fusion processing Methods 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a free viewpoint image synthesis method based on multilayer neural surface expression, which relates to the field of computer vision and comprises the following steps of S1, acquiring image data collected by sparse viewpoints, and estimating the pose of the sparse viewpoints; s2, designing a sparse viewpoint free viewpoint image synthesis network based on multilayer neural surface expression; s3, training the sparse viewpoint free viewpoint image synthesis network based on the multilayer neural surface expression by using a large-scale multi-viewpoint data set; and S4, after obtaining the image synthesis network model parameters, applying the image synthesis network model parameters to the free viewpoint synthesis task of the sparse multi-viewpoint data obtained in the first step. According to the method, a multilayer neural surface expression model is designed, the characteristics of sparse multi-viewpoint images are fully utilized, a high-quality and generalized free viewpoint image synthesis algorithm is completed, and the method is suitable for a free viewpoint image synthesis task of a multi-viewpoint acquisition system.
Description
Technical Field
The invention relates to the field of computer vision, in particular to a free viewpoint image synthesis method based on multilayer neural surface expression.
Background
Free viewpoint image synthesis is a key problem in the field of computer vision. With the advent of the 5G era and the development and popularization of virtual reality technology and augmented reality technology, digital images have developed toward the inevitable trend of interchangeability and immersion.
The free viewpoint synthesis has the characteristics of strong three-dimensional immersion, large viewing freedom and rich interactive experience, and is widely applied to various fields such as virtual reality, movie and television production, live sports, cultural social contact and the like.
However, the current free viewpoint system still needs hundreds of cameras, and has complex and expensive structure; meanwhile, most domestic landing systems adopt fixed imaging tracks, the viewing viewpoints are limited, the immersion feeling is insufficient, and the practicability and the economy are improved.
Disclosure of Invention
In order to achieve the purpose, the invention aims to research the problem of free viewpoint image synthesis under a sparse viewpoint, overcome the problem that the final viewpoint synthesis result is influenced by the defect that each group of multi-viewpoint images are required to be trained for a long time or the geometric estimation under the sparse viewpoint in the existing free viewpoint generation algorithm, and provide a framework based on multilayer neural surface expression to realize the scene geometric estimation and texture mapping synthesis of the viewpoint to be synthesized end to end, realize high-quality and efficient free viewpoint image generation, and solve the problems in the background technology.
In the method, based on innovative multilayer neural surface expression, a workflow for synthesizing the free viewpoint images input by sparse viewpoints is designed, scene structure information of a new viewpoint to be synthesized and accurate texture migration and fusion processes are fully learned in a network, and the synthesis of the free viewpoint images with high quality and high efficiency is completed.
The technical scheme is as follows: the free viewpoint image synthesis method based on the multilayer neural surface expression comprises the following steps:
s1, acquiring multi-view synchronous or static scene image data acquired by sparse views, and estimating the pose of the sparse views;
s2, designing a sparse viewpoint free viewpoint image synthesis network based on multilayer neural surface expression;
and S3, training the sparse viewpoint free viewpoint image synthetic network based on the multilayer neural surface expression by using a large-scale multi-viewpoint data set, so that the sparse viewpoint free viewpoint image synthetic network can be generalized to various multi-viewpoint data.
And S4, after the trained sparse viewpoint free viewpoint image synthesis network model parameters based on the multilayer neural surface expression are obtained, applying the trained sparse viewpoint free viewpoint image synthesis network model parameters to the free viewpoint synthesis task of the sparse multi-viewpoint data obtained in the first step.
Further, after step S4, there are: and S5, when the trained network has certain generalization on data which does not appear in the training set, directly utilizing the trained network model in the step S3 to perform forward prediction, and realizing high-quality free viewpoint image synthesis of the sparse multi-viewpoint data to be tested.
Further, in step S1, the acquisition method is a Structure-from-Motion method or a multi-view calibration method with a given calibration object scale.
Further, the free viewpoint synthesis network comprises a Multi-scale image feature extraction module, an MVS (Multi-ViewStereo) module for target-oriented Multi-scale refined scene depth (depth) estimation, a Multi-layer neural surface density estimation module, a reverse feature fusion and Multi-layer neural surface color decoding module, and a Multi-layer neural surface voxel rendering module.
Further, in step S3, the training data is multi-viewpoint image data with a camera pose, and is divided into a training set, a validation set, and a test set, and the training makes the network converge on the validation set.
Further, in step S3, setting is made, Is the number of viewpoints entered; estimating the pose of the sparse viewpoints to obtain the pose of each viewpoint(ii) a WhereinRespectively including the internal reference of each viewpointRoot of external ginseng(rotation matrix and translation matrix).
Further, in step S3, the pose of the target viewpoint is defined asFinding out the images of M source viewpoints which are closest to the target viewpoint in the input viewpoints according to the position and the orientation of the target viewpointAnd camera poseAs input to the network.
Further, the multi-scale image feature extraction module is composed of a convolution layer and a jump connection layer, and the multi-scale image feature extraction module is represented as:
wherein, the first and the second end of the pipe are connected with each other,on behalf of the network of the present module,for any image input to the module, the output of the module can be three-scale image features。
Further, the MVS module implements scene geometry estimation at any viewpoint by modifying a learning-based MVS network, and the implementation method includes the following steps:
m source viewpoint images pass through a multi-scale image feature extraction module to obtain M multiplied by 3 image features;
the method comprises the steps of realizing the transformation from source viewpoint characteristics to a certain depth of a target viewpoint corresponding to each scale, and outputting the probability of each pixel point of a target image on each depth after the regularization of 3D convolution through constructing a cost body of variance;
optimizing from small scale to large scale gradually, updating sampling according to the depth value probability of the previous layer, and finally outputting the depth probability of the target point corresponding to the multilayer surface and the curved surface under the resolution of the original image, wherein the depth probability is determined by the finally sampled depth value).
Further, the multi-layer neural surface density estimation module samples the depth probability volume from the output of the MVS moduleTo recover the density values on the multi-layer surface pointsCorrespondingly representing the opacity of the multilayer surface, and preparing for volume rendering to obtain a final output image;
the reverse feature fusion and multilayer neural surface color decoding module reversely accesses the source viewpoint features by utilizing a multilayer surface sampling point set obtained by an MVS moduleFusing and decoding the corresponding characteristic values to form color values of the multilayer surface;
and the multilayer neural surface voxel rendering module performs voxel rendering after acquiring the density corresponding to the multilayer neural surface through the multilayer neural surface density estimation module and acquiring the color corresponding to the multilayer neural surface through inverse feature fusion and the multilayer neural surface color decoding module, so as to complete the synthesis of the final target image.
Compared with the prior art, the invention has the following beneficial effects:
(1) In the invention, because the trained network has certain generalization on the data which does not appear in the training set, the free viewpoint image synthesis task of the sparse multi-viewpoint data to be tested can be completed by directly utilizing the forward prediction of the network;
(2) In the invention, by designing a multilayer neural surface expression model and fully utilizing the characteristics of sparse multi-viewpoint images, a high-quality and generalized free viewpoint image synthesis algorithm is completed, and the method is suitable for a free viewpoint image synthesis task of a multi-viewpoint acquisition system;
(3) In the invention, a network designs multilayer neural surface expression, aims to realize reconstruction of multilayer surfaces of a scene in an end-to-end new viewpoint synthesis framework, and completes high-quality new viewpoint texture fusion and generation based on the multilayer surface expression.
Drawings
Fig. 1 is a work flow chart of a free viewpoint image synthesis method based on multi-layer neural surface expression in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
In the method, based on innovative multilayer neural surface expression, a workflow for synthesizing the free viewpoint images input by sparse viewpoints is designed, scene structure information of a new viewpoint to be synthesized and accurate texture migration and fusion processes are fully learned in a network, and the synthesis of the free viewpoint images with high quality and high efficiency is completed.
Examples
As shown in fig. 1, the method for synthesizing a free viewpoint image based on multi-layer neural surface expression in this embodiment includes the following steps:
s1, acquiring multi-view synchronous or static scene image data acquired by sparse views, and estimating the pose of the sparse views;
wherein, the acquisition method is a Structure-from-Motion method or a multi-viewpoint calibration method for giving a calibration object scale;
that is, the step acquires multi-view data of the same static scene or dynamic scene at the same time, and the multi-view data can be sparse (namely, the change of the viewpoint pose is large);
when the system is used, the purpose of the acquisition is to acquire different viewpoints of a scene to be observed by using limited acquisition equipment, and a free viewpoint image of the scene is expected to be recovered through an algorithm.
S2, designing a sparse viewpoint free viewpoint image synthesis network based on multilayer neural surface expression;
when the method is used, the network designs multilayer neural surface expression, aims to realize reconstruction of multilayer surfaces of scenes in an end-to-end new viewpoint synthesis framework, and completes high-quality new viewpoint texture fusion and generation based on the multilayer surface expression.
Wherein the free viewpoint synthesis network includes: the system comprises a Multi-scale image feature extraction module, an MVS (Multi-ViewStereo) module for estimating depth (depth) of a target-oriented Multi-scale fined scene, a multilayer neural surface density estimation module, a reverse feature fusion and multilayer neural surface color decoding module and a multilayer neural surface voxel rendering module.
When in use, the deep neural network (DeepNeralNet) design of the part is a core part.
And S3, training the sparse viewpoint image synthesis network based on the multilayer neural surface expression by using a large-scale multi-viewpoint data set, so that the sparse viewpoint image synthesis network can be generalized to various multi-viewpoint data.
The training data is multi-viewpoint image data with a camera pose; the training data is divided into a training set, a validation set, and a test set, and the training converges the network on the validation set.
The step S3 comprises the following steps:
s31, inputting N viewpoints which are most similar to the viewpoints to be synthesized into the network, and outputting a predicted image under the viewpoints to be synthesized;
s32, supervision can be pixel-level loss functions such as L1 and L2 or perception loss functions;
and S4, after the trained sparse viewpoint free viewpoint image synthesis network model parameters based on the multilayer neural surface expression are obtained, applying the trained sparse viewpoint free viewpoint image synthesis network model parameters to the free viewpoint synthesis task of the sparse multi-viewpoint data obtained in the first step.
As shown in fig. 1, the method for synthesizing a free viewpoint image based on multi-layer neural surface expression in this embodiment includes the following specific steps:
acquiring multi-view synchronous (or static scene) image data acquired by sparse view;
wherein, the first and the second end of the pipe are connected with each other,, is the number of views input; estimating the pose of the sparse viewpoints to obtain the pose of each viewpointWhereinRespectively including the internal reference of each viewpointRoot of Redborne ginseng(rotation matrix and translation matrix).
Specifically, the step acquires multi-view data of the same static scene or dynamic scene at the same time, and the multi-view data can be sparse (namely, the change of the viewpoint pose is large); the purpose of the acquisition is to acquire different viewpoints of a scene to be observed by using limited acquisition equipment, and it is expected that a free viewpoint image of the scene can be recovered through an algorithm.
Defining the pose of the target viewpoint asFinding out the images of M source viewpoints closest to the target viewpoint from the input viewpoints according to the position and orientation of the target viewpointAnd camera poseAs input to the network.
The method specifically comprises the following steps:
the multi-scale image feature extraction module is based on a U-Net model and consists of a multi-scale convolution layer and a jump connection layer, and the multi-scale image feature extraction module can be expressed as follows:
wherein the content of the first and second substances,on behalf of the network of the present module,for any image input to the module, the output of the module can be three-scale image features;
When the device is used, the three-channel image passes through the feature extraction module, so that multi-scale features with different scales and different channel numbers extracted at different depths in a network can be obtained, the multi-scale features comprise image features corresponding to different perception domains, and the multi-scale features are used for subsequent neural surface positioning and reverse feature fusion.
The target-oriented multi-scale MVS module capable of refining scene depth estimation realizes scene geometric estimation of any viewpoint by modifying a learning-based MVS network, and the realization method comprises the following steps:
(1) Enabling the M source viewpoint images to pass through a multi-scale image feature extraction module to obtain M multiplied by 3 image features;
(2) The method comprises the steps of realizing the transformation from source viewpoint characteristics to a certain depth of a target viewpoint corresponding to each scale, and outputting the probability of each pixel point of a target image on each depth after the regularization of 3D convolution through constructing a cost body of variance;
(3) Optimizing from small scale to large scale gradually, updating sampling according to the depth value probability of the previous layer, and finally outputting the depth probability of the target point corresponding to the multilayer surface (curved surface, determined by the final sampled depth value) under the resolution of the original image.
The MVS module can be expressed as:
wherein the content of the first and second substances,the module is the MVS module for the target-oriented multi-scale scene depth estimation capable of being refined, and the output is a multi-layer surface sampling point set of a target viewpointCorresponding sampling depth probability body of。
Multi-layer neural surface density estimation module, sampling depth probability volume from output of MVS moduleTo recover the density values on the multi-layer surface pointsCorrespondingly representing the opacity of the multilayer surface, and preparing for volume rendering to obtain a final output image;
a reverse feature fusion and multilayer neural surface color decoding module, which utilizes the multilayer surface sampling point set obtained by the MVS module to reversely access the source viewpoint featureAnd fusing and decoding the corresponding features to form color values of the multilayer surface.
For convenience of explanation, the processing procedure of the inverse feature fusion and multi-layer neural surface color decoding module is described as follows:
(1) Corresponding to a certain depthAnd a certain pixel pointBy passingAndfinding out the characteristics of the corresponding source viewpoint, and extracting the characteristics corresponding to the M source viewpoints into a characteristic set;
(2) Respectively encoding the M feature vectors by MLP (multi-layer perceptron), and averagingOperate to obtain fused features;
(3) Corresponding to each pixel pointAnd depthAll go backward to perform feature fusion to obtain multi-layer features;
It is noted that the feature fusion method may also be in other feature fusion forms;
(4) Through decoding by a decoder, multi-layer colors are obtained as follows:
wherein the content of the first and second substances,is the number of layers on the surface of the multi-layer nerve,for image decoder, multi-layer color is finally output。
The multilayer neural surface voxel rendering module performs voxel rendering after the density corresponding to the multilayer neural surface obtained by the multilayer neural surface density estimation module and the color corresponding to the multilayer neural surface obtained by the inverse feature fusion and multilayer neural surface color decoding module to complete the synthesis of a final target image, which can be expressed as:
the sparse viewpoint free viewpoint image synthesis network based on the multilayer neural surface expression is trained by utilizing a large-scale multi-viewpoint data set, so that the sparse viewpoint free viewpoint image synthesis network can be generalized to various multi-viewpoint data.
In particular, the training data may be multi-view image data with camera poses.
The input of the network is M viewpoint images most similar to the viewpoint to be synthesizedAnd corresponding camera poseOutput as a predicted image at the view point to be synthesizedSupervision may be a pixel level loss function such as L1, L2, or a perceptual loss function, etc.; the loss function may be the L2 loss:
after the trained sparse viewpoint free viewpoint image synthesis network model parameters based on the multilayer neural surface expression are obtained, the method can be applied to the free viewpoint synthesis task of the sparse multi-viewpoint data obtained in the first step. Because the trained network has certain generalization on data which does not appear in a training set, forward prediction can be directly carried out by using the trained network model, and high-quality free viewpoint image synthesis of sparse multi-viewpoint data to be tested is realized.
It is noted that, in this document, relational terms such as first and second, and the like, if any, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
The technical problems to be solved are still consistent with the present invention and should be included in the scope of the present invention.
Claims (10)
1. The free viewpoint image synthesis method based on the multilayer neural surface expression is characterized by comprising the following steps of:
s1, acquiring multi-view synchronous or static scene image data acquired by sparse views, and estimating the pose of the sparse views;
s2, designing a sparse viewpoint free viewpoint image synthesis network based on multilayer neural surface expression;
s3, training the sparse viewpoint free viewpoint image synthetic network based on the multilayer neural surface expression by using a large-scale multi-viewpoint data set, so that the sparse viewpoint free viewpoint image synthetic network can be generalized to various multi-viewpoint data;
and S4, after the trained sparse viewpoint free viewpoint image synthesis network model parameters based on the multilayer neural surface expression are obtained, applying the trained sparse viewpoint free viewpoint image synthesis network model parameters to the free viewpoint synthesis task of the sparse multi-viewpoint data obtained in the S1.
2. The free viewpoint image synthesis method based on multilayer neural surface expression as set forth in claim 1, wherein after step S4, there are further:
and S5, when the trained network has certain generalization on the data which does not appear in the training set, directly utilizing the network model trained in the step S3 to carry out forward prediction, and realizing the high-quality free viewpoint image synthesis of the sparse multi-viewpoint data to be tested.
3. The free viewpoint image synthesis method based on multi-layer neural surface expression as claimed in claim 1, wherein in step S1, the collection method is Structure-from-Motion method or multi-viewpoint calibration method with given calibration object dimension.
4. The method of claim 1, wherein the free viewpoint image synthesis network comprises a multi-scale image feature extraction module, an MVS module for object-oriented multi-scale refineable scene depth estimation, a multi-layer neural surface density estimation module, a reverse feature fusion and multi-layer neural surface color decoding module, and a multi-layer neural surface voxel rendering module.
5. The free viewpoint image synthesis method based on multi-layer neural surface expression as claimed in claim 1, wherein in step S3, the training data is multi-viewpoint image data with camera pose, and is divided into a training set, a validation set and a test set, and the training makes the network converge on the validation set.
6. The free viewpoint image synthesis method based on multi-layer neural surface expression as claimed in claim 1, wherein in step S3,
7. The free viewpoint image synthesis method based on multi-layer neural surface expression as claimed in claim 1, wherein in step S3, the pose of the target viewpoint is defined asFinding out the images of M source viewpoints which are closest to the target viewpoint in the input viewpoints according to the position and the orientation of the target viewpointAnd camera poseAs input to the network.
8. The free viewpoint image synthesis method based on multi-layer neural surface expression as claimed in claim 1, wherein the multi-scale image feature extraction module is composed of a convolutional layer and a jump connection layer, and the multi-scale image feature extraction module is represented as:
9. The free viewpoint image synthesis method based on multilayer neural surface expression as claimed in claim 4, wherein the MVS module realizes scene geometric estimation of any viewpoint by modifying the learning-based MVS network, and the realization method comprises the following steps:
enabling the M source viewpoint images to pass through a multi-scale image feature extraction module to obtain M multiplied by 3 image features;
the method comprises the steps of realizing the transformation from source viewpoint characteristics to a certain depth of a target viewpoint corresponding to each scale, and outputting the probability of each pixel point of a target image on each depth after the regularization of 3D convolution through constructing a cost body of variance;
optimizing from small scale to large scale gradually, updating sampling according to the depth value probability of the previous layer, and finally outputting the depth probability of the target point corresponding to the multilayer surface and the curved surface under the resolution of the original image, wherein the depth probability is determined by the finally sampled depth value).
10. The free viewpoint map based on multi-layer neural surface representation of claim 4The image synthesis method is characterized in that the multi-layer neural surface density estimation module samples a depth probability volume from the output of the MVS moduleTo recover the density values on the multi-layer surface pointsCorrespondingly representing the opacity of the multilayer surface, and preparing for volume rendering to obtain a final output image;
the reverse feature fusion and multilayer neural surface color decoding module reversely accesses the source viewpoint features by utilizing a multilayer surface sampling point set obtained by the MVS moduleFusing and decoding the corresponding characteristic values to form color values of the multilayer surface;
and the multilayer neural surface voxel rendering module performs voxel rendering after acquiring the density corresponding to the multilayer neural surface through the multilayer neural surface density estimation module and acquiring the color corresponding to the multilayer neural surface through inverse feature fusion and the multilayer neural surface color decoding module, so as to complete the synthesis of the final target image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211391996.4A CN115439388B (en) | 2022-11-08 | 2022-11-08 | Free viewpoint image synthesis method based on multilayer nerve surface expression |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211391996.4A CN115439388B (en) | 2022-11-08 | 2022-11-08 | Free viewpoint image synthesis method based on multilayer nerve surface expression |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115439388A true CN115439388A (en) | 2022-12-06 |
CN115439388B CN115439388B (en) | 2024-02-06 |
Family
ID=84252759
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211391996.4A Active CN115439388B (en) | 2022-11-08 | 2022-11-08 | Free viewpoint image synthesis method based on multilayer nerve surface expression |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115439388B (en) |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105141956A (en) * | 2015-08-03 | 2015-12-09 | 西安电子科技大学 | Incremental rate distortion optimization method based on free viewpoint video depth map coding |
CN105247862A (en) * | 2013-04-09 | 2016-01-13 | 联发科技股份有限公司 | Method and apparatus of view synthesis prediction in three-dimensional video coding |
JP2019159840A (en) * | 2018-03-13 | 2019-09-19 | 萩原電気ホールディングス株式会社 | Image synthesizing apparatus and image synthesizing method |
CN111028273A (en) * | 2019-11-27 | 2020-04-17 | 山东大学 | Light field depth estimation method based on multi-stream convolution neural network and implementation system thereof |
CN111144214A (en) * | 2019-11-27 | 2020-05-12 | 中国石油大学(华东) | Hyperspectral image unmixing method based on multilayer stack type automatic encoder |
CN111951203A (en) * | 2020-07-01 | 2020-11-17 | 北京大学深圳研究生院 | Viewpoint synthesis method, apparatus, device and computer readable storage medium |
US20210012561A1 (en) * | 2019-07-12 | 2021-01-14 | Adobe Inc. | Deep novel view and lighting synthesis from sparse images |
CN112637582A (en) * | 2020-12-09 | 2021-04-09 | 吉林大学 | Three-dimensional fuzzy surface synthesis method for monocular video virtual view driven by fuzzy edge |
CN114463408A (en) * | 2021-12-20 | 2022-05-10 | 北京邮电大学 | Free viewpoint image generation method, device, equipment and storage medium |
CN114627223A (en) * | 2022-03-04 | 2022-06-14 | 华南师范大学 | Free viewpoint video synthesis method and device, electronic equipment and storage medium |
CN114666564A (en) * | 2022-03-23 | 2022-06-24 | 南京邮电大学 | Method for synthesizing virtual viewpoint image based on implicit neural scene representation |
CN114663543A (en) * | 2022-03-31 | 2022-06-24 | 西安交通大学 | Virtual view synthesis method based on deep learning and multi-view geometry |
CN114820901A (en) * | 2022-04-08 | 2022-07-29 | 浙江大学 | Large-scene free viewpoint interpolation method based on neural network |
CN114820945A (en) * | 2022-05-07 | 2022-07-29 | 北京影数科技有限公司 | Sparse sampling-based method and system for generating image from ring shot image to any viewpoint image |
-
2022
- 2022-11-08 CN CN202211391996.4A patent/CN115439388B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105247862A (en) * | 2013-04-09 | 2016-01-13 | 联发科技股份有限公司 | Method and apparatus of view synthesis prediction in three-dimensional video coding |
CN105141956A (en) * | 2015-08-03 | 2015-12-09 | 西安电子科技大学 | Incremental rate distortion optimization method based on free viewpoint video depth map coding |
JP2019159840A (en) * | 2018-03-13 | 2019-09-19 | 萩原電気ホールディングス株式会社 | Image synthesizing apparatus and image synthesizing method |
US20210012561A1 (en) * | 2019-07-12 | 2021-01-14 | Adobe Inc. | Deep novel view and lighting synthesis from sparse images |
CN111028273A (en) * | 2019-11-27 | 2020-04-17 | 山东大学 | Light field depth estimation method based on multi-stream convolution neural network and implementation system thereof |
CN111144214A (en) * | 2019-11-27 | 2020-05-12 | 中国石油大学(华东) | Hyperspectral image unmixing method based on multilayer stack type automatic encoder |
CN111951203A (en) * | 2020-07-01 | 2020-11-17 | 北京大学深圳研究生院 | Viewpoint synthesis method, apparatus, device and computer readable storage medium |
CN112637582A (en) * | 2020-12-09 | 2021-04-09 | 吉林大学 | Three-dimensional fuzzy surface synthesis method for monocular video virtual view driven by fuzzy edge |
CN114463408A (en) * | 2021-12-20 | 2022-05-10 | 北京邮电大学 | Free viewpoint image generation method, device, equipment and storage medium |
CN114627223A (en) * | 2022-03-04 | 2022-06-14 | 华南师范大学 | Free viewpoint video synthesis method and device, electronic equipment and storage medium |
CN114666564A (en) * | 2022-03-23 | 2022-06-24 | 南京邮电大学 | Method for synthesizing virtual viewpoint image based on implicit neural scene representation |
CN114663543A (en) * | 2022-03-31 | 2022-06-24 | 西安交通大学 | Virtual view synthesis method based on deep learning and multi-view geometry |
CN114820901A (en) * | 2022-04-08 | 2022-07-29 | 浙江大学 | Large-scene free viewpoint interpolation method based on neural network |
CN114820945A (en) * | 2022-05-07 | 2022-07-29 | 北京影数科技有限公司 | Sparse sampling-based method and system for generating image from ring shot image to any viewpoint image |
Non-Patent Citations (12)
Title |
---|
KATJA SCHWARZ 等: "VoxGRAF: Fast 3D-Aware Image Synthesis with Sparse Voxel Grids", 《ARXIV:2206.07695V2》 * |
KATJA SCHWARZ 等: "VoxGRAF: Fast 3D-Aware Image Synthesis with Sparse Voxel Grids", 《ARXIV:2206.07695V2》, 30 June 2022 (2022-06-30), pages 1 - 20 * |
LINGJIE LIU 等: "Neural Sparse Voxel Fields", 《ARXIV:2007.11571V2》 * |
LINGJIE LIU 等: "Neural Sparse Voxel Fields", 《ARXIV:2007.11571V2》, 31 January 2021 (2021-01-31), pages 1 - 22 * |
TONY TUNG 等: "Complete Multi-View Reconstruction of Dynamic Scenes from Probabilistic Fusion of Narrow andWide Baseline Stereo", 《2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION》 * |
TONY TUNG 等: "Complete Multi-View Reconstruction of Dynamic Scenes from Probabilistic Fusion of Narrow andWide Baseline Stereo", 《2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION》, 31 December 2009 (2009-12-31), pages 1709 - 1716 * |
李明豪: "基于图像的自由视点合成方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
李明豪: "基于图像的自由视点合成方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, vol. 2022, no. 01, 15 January 2022 (2022-01-15), pages 138 - 2246 * |
汪晏如: "三维人脸重建及自由视点视频生成的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
汪晏如: "三维人脸重建及自由视点视频生成的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, vol. 2021, no. 04, 15 April 2021 (2021-04-15), pages 138 - 552 * |
王硕等: "基于多流对极卷积神经网络的光场图像深度估计", 《计算机应用与软件》 * |
王硕等: "基于多流对极卷积神经网络的光场图像深度估计", 《计算机应用与软件》, vol. 37, no. 08, 12 August 2020 (2020-08-12), pages 194 - 201 * |
Also Published As
Publication number | Publication date |
---|---|
CN115439388B (en) | 2024-02-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lee et al. | From big to small: Multi-scale local planar guidance for monocular depth estimation | |
CN110738697A (en) | Monocular depth estimation method based on deep learning | |
Song et al. | Starenhancer: Learning real-time and style-aware image enhancement | |
Chen et al. | Cross parallax attention network for stereo image super-resolution | |
Gu et al. | Coupled real-synthetic domain adaptation for real-world deep depth enhancement | |
Li et al. | Deep sketch-guided cartoon video inbetweening | |
CN112288788A (en) | Monocular image depth estimation method | |
Li et al. | Uphdr-gan: Generative adversarial network for high dynamic range imaging with unpaired data | |
CN110598537A (en) | Video significance detection method based on deep convolutional network | |
Zhang et al. | Removing Foreground Occlusions in Light Field using Micro-lens Dynamic Filter. | |
Wang et al. | Neural opacity point cloud | |
Xiao et al. | Image hazing algorithm based on generative adversarial networks | |
Mu et al. | Neural 3D reconstruction from sparse views using geometric priors | |
Zhu et al. | Occlusion-free scene recovery via neural radiance fields | |
Nie et al. | Context and detail interaction network for stereo rain streak and raindrop removal | |
CN116402908A (en) | Dense light field image reconstruction method based on heterogeneous imaging | |
CN115439388B (en) | Free viewpoint image synthesis method based on multilayer nerve surface expression | |
Jung et al. | Depth image interpolation using confidence-based Markov random field | |
CN114820323A (en) | Multi-scale residual binocular image super-resolution method based on stereo attention mechanism | |
Guo et al. | Stereo cross-attention network for unregistered hyperspectral and multispectral image fusion | |
CN113673567A (en) | Panorama emotion recognition method and system based on multi-angle subregion self-adaption | |
Xue et al. | An end-to-end multi-resolution feature fusion defogging network | |
Zhu et al. | Fused network for view synthesis | |
Li et al. | Delving Deeper Into Image Dehazing: A Survey | |
CN117058049B (en) | New view image synthesis method, synthesis model training method and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |