CN115439388B - Free viewpoint image synthesis method based on multilayer nerve surface expression - Google Patents
Free viewpoint image synthesis method based on multilayer nerve surface expression Download PDFInfo
- Publication number
- CN115439388B CN115439388B CN202211391996.4A CN202211391996A CN115439388B CN 115439388 B CN115439388 B CN 115439388B CN 202211391996 A CN202211391996 A CN 202211391996A CN 115439388 B CN115439388 B CN 115439388B
- Authority
- CN
- China
- Prior art keywords
- viewpoint
- image
- layer
- module
- scale
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 210000005036 nerve Anatomy 0.000 title claims abstract description 50
- 238000001308 synthesis method Methods 0.000 title claims abstract description 9
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 48
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 48
- 238000000034 method Methods 0.000 claims abstract description 31
- 238000012549 training Methods 0.000 claims abstract description 16
- 230000004927 fusion Effects 0.000 claims description 15
- 238000000605 extraction Methods 0.000 claims description 12
- 238000009877 rendering Methods 0.000 claims description 12
- 238000005070 sampling Methods 0.000 claims description 12
- 230000001537 neural effect Effects 0.000 claims description 6
- 230000003068 static effect Effects 0.000 claims description 5
- 230000001360 synchronised effect Effects 0.000 claims description 5
- 230000002194 synthesizing effect Effects 0.000 claims description 4
- 238000012360 testing method Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000010200 validation analysis Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000007654 immersion Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000007499 fusion processing Methods 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 238000007500 overflow downdraw method Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000003997 social interaction Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a free viewpoint image synthesis method based on multilayer nerve surface expression, and relates to the field of computer vision, the method comprises the following steps of S1, acquiring image data acquired by sparse viewpoints, and estimating the pose of the sparse viewpoints; s2, designing a sparse viewpoint free viewpoint image synthesis network based on multilayer nerve surface expression; s3, training the sparse view free view image synthesis network based on the multi-layer nerve surface expression by utilizing a large-scale multi-view data set; and S4, after obtaining the image synthesis network model parameters, applying the image synthesis network model parameters to a free viewpoint synthesis task of the sparse multi-viewpoint data obtained in the first step. According to the invention, by designing the multi-layer nerve surface expression model, the characteristics of the sparse multi-view image are fully utilized, a free view image synthesis algorithm with high quality and generalization is completed, and the method is suitable for free view image synthesis tasks of a multi-view acquisition system.
Description
Technical Field
The invention relates to the field of computer vision, in particular to a free viewpoint image synthesis method based on multilayer nerve surface expression.
Background
Free viewpoint image synthesis is a major problem in the field of computer vision. With the advent of the 5G age and the development and popularization of virtual reality technology and augmented reality technology, digital images have developed into a necessary trend toward interactivity and immersion.
The free viewpoint synthesis is widely applied to various fields such as virtual reality, movie and television production, sports live broadcast, cultural social interaction and the like due to the characteristics of strong three-dimensional immersion, large viewing freedom and rich interaction experience.
However, the current free viewpoint system still requires hundreds of cameras, and the structure is complex and expensive; meanwhile, most of domestic landing systems adopt fixed imaging tracks, the watching view point is limited, the immersion feeling is insufficient, and the practicability and the economy are to be improved.
Disclosure of Invention
In order to achieve the above purpose, the invention aims to study the free viewpoint image synthesis problem under the sparse viewpoint, solve the problem that the existing free viewpoint generation algorithm needs each group of multi-viewpoint images to be trained for a long time or the geometric estimation defect under the sparse viewpoint influences the final viewpoint synthesis result, propose a framework based on multi-layer nerve surface expression, realize the scene geometric estimation and texture mapping synthesis of the viewpoint to be synthesized from end to end, realize the free viewpoint image generation with high quality and high efficiency, and solve the problems in the background technology.
In the application, based on innovative multilayer nerve surface expression, a workflow of free viewpoint image synthesis input by sparse viewpoints is designed, scene structure information of a new viewpoint to be synthesized and accurate texture migration and fusion processes are fully learned in a network, and the synthesis of the free viewpoint images with quasi-high quality and high efficiency is completed.
The technical scheme adopted is as follows: the free viewpoint image synthesis method based on the multilayer nerve surface expression comprises the following steps:
s1, acquiring multi-view synchronous or static scene image data acquired by sparse views, and estimating sparse view point pose;
s2, designing a free viewpoint image synthesis network;
s3, training the free viewpoint image synthesis network by utilizing a large-scale multi-viewpoint data set so that the free viewpoint image synthesis network can be generalized to various multi-viewpoint data;
and S4, after obtaining the trained free viewpoint image synthesis network model parameters, applying the free viewpoint synthesis network model parameters to the free viewpoint synthesis task of the sparse multi-viewpoint data obtained in the step S1.
Further, after step S4, there is also: and S5, when the trained network has certain generalization property for the data which does not appear in the training set, directly utilizing the trained network model in the step S3 to conduct forward prediction, and realizing high-quality free viewpoint image synthesis of the sparse multi-viewpoint data to be tested.
Further, in step S1, the acquisition method is a Structure-from-Motion method or a multi-viewpoint calibration method for a given calibration object scale.
Further, the free viewpoint synthesis network comprises a Multi-scale image feature extraction module, a target-oriented Multi-scale refined scene depth (depth) estimation MVS (Multi-view Stereo) module, a Multi-layer nerve surface density estimation module, a reverse feature fusion and Multi-layer nerve surface color decoding module and a Multi-layer nerve surface voxel rendering module;
the multi-scale image feature extraction module consists of a convolution layer and a jump connection layer, and the multi-scale image feature extraction module is expressed as follows:
wherein,network representing the module->For arbitrary input of the image of the module, the output of the module can be three-scale image features +.>;
The MVS module realizes scene geometric estimation of any view point by modifying the MVS network based on learning, and the realization method comprises the following steps:
the M source viewpoint images are subjected to a multi-scale image feature extraction module to obtain M multiplied by 3 image features;
the transformation from the source viewpoint characteristics to a certain depth of the target viewpoint is realized corresponding to each scale, and the probability of each pixel point of the target image on each depth is output after 3D convolution regularization by constructing a cost body of variance;
gradually optimizing from small scale to large scale, updating and sampling according to the probability of the depth value of the previous layer, and finally outputting a multi-layer surface (curved surface) with the target point corresponding to the original image resolution, wherein the probability on depth is determined by the depth value of the final sampling;
the multi-layer nerve surface density estimation module samples depth probability bodies from the output of the MVS moduleIn recovering the density value +.>Providing for volume rendering to obtain a final output image corresponding to the opacity representing the multi-layer surface;
the reverse feature fusion and multi-layer nerve surface color decoding module uses the multi-layer surface sampling point set obtained in the MVS module to access the source viewpoint feature in a reverse wayCorresponding characteristic values in the code, and fusing and decoding the corresponding characteristics to formColor values of the multi-layer surface;
and the multi-layer nerve surface voxel rendering module performs voxel rendering after the density corresponding to the multi-layer nerve surface obtained by the multi-layer nerve surface density estimation module and the color corresponding to the multi-layer nerve surface obtained by the reverse feature fusion and multi-layer nerve surface color decoding module, so as to complete the synthesis of a final target image.
Further, in step S3, the training data is multi-viewpoint image data with camera pose, and is divided into a training set, a verification set and a test set, where the training converges the network and is on the verification set.
Further, in step S3, the multi-view synchronous (or still scene) image data acquired by sparse view is set as,/>The number of views to be input; estimating the sparse viewpoint pose to obtain the pose of each viewpointWherein->The internal references of each view are respectively contained>External parameters->(rotation matrix and translation matrix).
Further, in step S3, the pose of the target viewpoint is defined asFinding the image of M source viewpoints closest to the target viewpoint among the input viewpoints according to the position and orientation of the target viewpoint +.>And camera pose->As an input to the network.
Compared with the prior art, the invention has the following beneficial effects:
(1) In the invention, because the trained network has certain generalization for the data which does not appear in the training set, the forward prediction of the network can be directly utilized to complete the free viewpoint image synthesis task of the sparse multi-viewpoint data to be tested;
(2) According to the invention, by designing a multi-layer nerve surface expression model, the characteristics of sparse multi-view images are fully utilized to complete a high-quality free-view image synthesis algorithm with generalization, and the method is suitable for free-view image synthesis tasks of a multi-view acquisition system;
(3) In the invention, a network is designed for expressing the multi-layer nerve surface, the reconstruction of the multi-layer surface of a scene is realized in a new viewpoint synthesis frame from end to end, and the fusion and the generation of high-quality new viewpoint textures are completed based on the expression of the multi-layer surface.
Drawings
Fig. 1 is a flowchart of a free viewpoint image synthesis method based on multi-layer neural surface expression in the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the application, based on innovative multilayer nerve surface expression, a workflow of free viewpoint image synthesis input by sparse viewpoints is designed, scene structure information of a new viewpoint to be synthesized and accurate texture migration and fusion processes are fully learned in a network, and the synthesis of the free viewpoint images with quasi-high quality and high efficiency is completed.
Examples
As shown in fig. 1, the free viewpoint image synthesis method based on the multilayer neural surface expression described in the present embodiment includes the following steps:
s1, acquiring multi-view synchronous or static scene image data acquired by sparse views, and estimating sparse view point pose;
the acquisition method is a Structure-from-Motion method or a multi-view calibration method for a given calibration object scale;
that is, the step collects multi-view data of the same static scene or dynamic scene at the same moment, and the multi-view data can be sparse (i.e. the view point pose changes greatly);
when the method is used, the purpose of the acquisition is to acquire different viewpoints of a scene to be observed by using limited acquisition equipment, and the free viewpoint image of the scene can be expected to be recovered through an algorithm.
S2, designing a free viewpoint image synthesis network;
when the network is used, the network is used for designing the expression of the multi-layer nerve surface, the reconstruction of the multi-layer surface of the scene is realized in the end-to-end new viewpoint synthesis frame, and the high-quality fusion and generation of the new viewpoint texture are completed based on the expression of the multi-layer surface.
Wherein the free viewpoint synthesis network comprises: the system comprises a Multi-scale image feature extraction module, a target-oriented Multi-scale refined scene depth (depth) estimated MVS (Multi-view Stereo) module, a Multi-layer nerve surface density estimation module, a reverse feature fusion and Multi-layer nerve surface color decoding module and a Multi-layer nerve surface voxel rendering module.
In use, this part of the deep neural network (deep neural network) design is the core part.
S3, training the free viewpoint image synthesis network by utilizing a large-scale multi-viewpoint data set so that the free viewpoint image synthesis network can be generalized to various multi-viewpoint data;
the training data are multi-view image data with camera pose; the training data is divided into a training set, a validation set and a test set, the training causing the network to converge and be on the validation set.
The step S3 comprises the following steps:
s31, inputting N viewpoints which are most similar to the viewpoint to be synthesized into the network, and outputting predicted images under the viewpoint to be synthesized;
s32, supervision may be a pixel level loss function such as L1, L2 or a perceptual loss function.
And S4, after obtaining the trained free viewpoint image synthesis network model parameters, applying the free viewpoint synthesis network model parameters to the free viewpoint synthesis task of the sparse multi-viewpoint data obtained in the step S1.
As shown in fig. 1, the method for synthesizing a free viewpoint image based on the expression of a multilayered nerve surface in this embodiment specifically includes the following steps:
acquiring multi-view synchronous (or still scene) image data acquired by sparse views;
wherein,,/>the number of views to be input; estimating the sparse viewpoint pose to obtain the pose of each viewpoint>Wherein->The internal references of each view are respectively contained>External parameters->(rotation matrix and translation matrix).
Specifically, the step acquires multi-view data of the same static scene or dynamic scene at the same moment, and the multi-view data can be sparse (namely, the position and the posture of the view point are changed greatly); the purpose of the acquisition is to acquire different viewpoints of a scene to be observed by using limited acquisition equipment, and the free viewpoint image of the scene can be expected to be restored through an algorithm.
Defining the pose of a target viewpoint asFinding the image of M source viewpoints closest to the target viewpoint among the input viewpoints according to the position and orientation of the target viewpoint +.>And camera pose->As an input to the network.
The method specifically comprises the following steps:
the multi-scale image feature extraction module is based on a U-Net model and consists of a multi-scale convolution layer and a jump connection layer, and can be expressed as:
wherein,network representing the module->For arbitrary input of the image of the module, the output of the module can be three-scale image features +.>;
When the three-channel image fusion method is used, the three-channel image passes through the feature extraction module, so that multi-scale features with different scales and different channel numbers extracted at different depths in a network can be obtained, and the three-channel image comprises image features corresponding to different perception domains and is used for subsequent nerve surface positioning and reverse feature fusion.
The MVS module for target-oriented multi-scale and refined scene depth estimation realizes scene geometric estimation of any view point by modifying a learning-based MVS network, and the realization method comprises the following steps:
(1) The M source viewpoint images are subjected to a multi-scale image feature extraction module to obtain M multiplied by 3 image features;
(2) The transformation from the source viewpoint characteristics to a certain depth of the target viewpoint is realized corresponding to each scale, and the probability of each pixel point of the target image on each depth is output after 3D convolution regularization by constructing a cost body of variance;
(3) Gradually optimizing from small scale to large scale, updating sampling according to the probability of the depth value of the previous layer, outputting the multi-layer surface (curved surface) with the target point corresponding to the original image resolution, and determining the probability on depth according to the final sampled depth value.
The MVS module may be expressed as:
wherein,the module is the MVS module for target-oriented multi-scale and refined scene depth estimation, and outputs the MVS module as a sampling point set of a target viewpoint on a multi-layer surface>On a corresponding sampling depth probability volume。
Multi-layer nerve surface density estimation module, sampling depth probability body from output of MVS moduleIn recovering the density value +.>Providing for volume rendering to obtain a final output image corresponding to the opacity representing the multi-layer surface;
reverse feature fusion and multi-layer nerve surface color decoding module, and reverse access source viewpoint feature by utilizing multi-layer surface sampling point set obtained in MVS moduleAnd fusing and decoding the corresponding characteristic values to form the color values of the multi-layer surface.
For convenience of explanation, the processing procedure of the inverse feature fusion and multi-layer nerve surface color decoding module is described:
(1) Corresponding to a certain depthAnd a certain pixel point->By->Finding out the characteristic of the corresponding source view, and taking out the characteristic set corresponding to the M source view as +.>;
(2) M feature vectors are respectively encoded by an MLP (multi-layer perceptron), and fused features are obtained through mean value operation;
(3) Corresponds to each pixel pointAnd depth->All of which are de-invertedFeature fusion is carried out to obtain multi-layer features;
It may be noted that the feature fusion method may be other feature fusion forms;
(4) By decoding by a decoder, a multi-layer color is obtained as follows:
wherein,layer number of the surface of the multilayer nerve, < > and the like>For the image decoder, a multi-layer color is finally output +.>。
The multi-layer nerve surface voxel rendering module performs voxel rendering after the density corresponding to the multi-layer nerve surface obtained by the multi-layer nerve surface density estimation module and the color corresponding to the multi-layer nerve surface obtained by the inverse feature fusion and multi-layer nerve surface color decoding module, and completes the synthesis of a final target image, which can be expressed as:
the sparse view free view image synthesis network based on the multi-layer nerve surface expression is trained by utilizing a large-scale multi-view data set, so that the sparse view free view image synthesis network can be generalized to various multi-view data.
In particular, the training data may be multi-view image data with camera pose.
The input of the network is M viewpoint images most similar to the viewpoint to be synthesizedAnd corresponding camera poseOutput as predicted image +.>The supervision may be a pixel level loss function such as L1, L2 or a perceptual loss function, etc.; the loss function may be an L2 loss:
it is noted that relational terms such as first and second, and the like, if any, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
The technical problems to be solved are still consistent with the invention, and all the technical problems are included in the protection scope of the invention.
Claims (6)
1. The free viewpoint image synthesis method based on the multilayer nerve surface expression is characterized by comprising the following steps of:
s1, acquiring multi-view synchronous or static scene image data acquired by sparse views, and estimating sparse view point pose;
s2, designing a free viewpoint image synthesis network;
s3, training the free viewpoint image synthesis network by utilizing a large-scale multi-viewpoint data set so that the free viewpoint image synthesis network can be generalized to various multi-viewpoint data;
step S4, after obtaining the trained free viewpoint image synthesis network model parameters, the method is applied to a free viewpoint synthesis task of the sparse multi-viewpoint data obtained in the step S1;
the free viewpoint image synthesis network comprises a multi-scale image feature extraction module, a target-oriented multi-scale refined scene depth estimation MVS module, a multi-layer nerve surface density estimation module, a reverse feature fusion multi-layer nerve surface color decoding module and a multi-layer nerve surface voxel rendering module;
the multi-scale image feature extraction module consists of a convolution layer and a jump connection layer, and the multi-scale image feature extraction module is expressed as follows:
,
wherein,network representing the module->For arbitrary input of the image of the module, the output of the module is the three-scale image feature +.>;
The MVS module for target-oriented multi-scale and refined scene depth estimation realizes scene geometric estimation of any view point by modifying a learning-based MVS network, and the realization method comprises the following steps:
the M source viewpoint images are subjected to a multi-scale image feature extraction module to obtain M multiplied by 3 image features;
the transformation from the source viewpoint characteristics to a certain depth of the target viewpoint is realized corresponding to each scale, and the probability of each pixel point of the target image on each depth is output after 3D convolution regularization by constructing a cost body of variance;
gradually optimizing from a small scale to a large scale, updating and sampling according to the probability of the depth value of the previous layer, and finally outputting a multi-layer surface with a target point corresponding to the original image resolution, wherein the probability on the depth is determined by the depth value of the final sampling;
the multi-layer nerve surface density estimation module outputs a sampling depth probability volume of an MVS module for target-oriented multi-scale refineable scene depth estimationIn recovering the density value +.>Providing for volume rendering to obtain a final output image corresponding to the opacity representing the multi-layer surface;
the reverse feature fusion and multi-layer nerve surface color decoding module utilizes a multi-layer surface sampling point set obtained in an MVS module of target-oriented multi-scale refinement scene depth estimation to access source viewpoint features reverselyCorresponding characteristic values in the multi-layer surface are fused and decoded to form color values of the multi-layer surface;
and the multi-layer nerve surface voxel rendering module performs voxel rendering after the density corresponding to the multi-layer nerve surface obtained by the multi-layer nerve surface density estimation module and the color corresponding to the multi-layer nerve surface obtained by the reverse feature fusion and multi-layer nerve surface color decoding module, so as to complete the synthesis of a final target image.
2. The method for synthesizing a free viewpoint image based on multi-layer neural surface expression according to claim 1, wherein after step S4, there is further provided:
and S5, when the trained network has certain generalization property for the data which does not appear in the training set, directly utilizing the trained network model in the step S3 to conduct forward prediction, and realizing high-quality free viewpoint image synthesis of the sparse multi-viewpoint data to be tested.
3. The method for synthesizing a free viewpoint image based on multilayer neural surface expression according to claim 1, wherein the acquisition method in step S1 is a Structure-from-Motion method or a multi-viewpoint scaling method of a given scaling.
4. The method of claim 1, wherein in step S3, the large-scale multi-view dataset is multi-view image data with camera pose, and is divided into a training set, a validation set, and a test set.
5. The method for synthesizing a free-viewpoint image based on multi-layer neural surface expression according to claim 1, wherein in step S3, sparse viewpoint-acquired multi-viewpoint-synchronized or still scene image data is set as,/>The number of views to be input; estimating the sparse viewpoint pose to obtain the pose of each viewpoint>The method comprises the steps of carrying out a first treatment on the surface of the Wherein->The internal references of each view are respectively contained>External parameters->。
6. The free viewpoint image synthesis method based on the multilayer neural surface expression according to claim 1, wherein in step S3, the pose of the target viewpoint is defined asFinding the image of M source viewpoints closest to the target viewpoint among the input viewpoints according to the position and orientation of the target viewpoint +.>And camera poseAs an input to the network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211391996.4A CN115439388B (en) | 2022-11-08 | 2022-11-08 | Free viewpoint image synthesis method based on multilayer nerve surface expression |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211391996.4A CN115439388B (en) | 2022-11-08 | 2022-11-08 | Free viewpoint image synthesis method based on multilayer nerve surface expression |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115439388A CN115439388A (en) | 2022-12-06 |
CN115439388B true CN115439388B (en) | 2024-02-06 |
Family
ID=84252759
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211391996.4A Active CN115439388B (en) | 2022-11-08 | 2022-11-08 | Free viewpoint image synthesis method based on multilayer nerve surface expression |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115439388B (en) |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105141956A (en) * | 2015-08-03 | 2015-12-09 | 西安电子科技大学 | Incremental rate distortion optimization method based on free viewpoint video depth map coding |
CN105247862A (en) * | 2013-04-09 | 2016-01-13 | 联发科技股份有限公司 | Method and apparatus of view synthesis prediction in three-dimensional video coding |
JP2019159840A (en) * | 2018-03-13 | 2019-09-19 | 萩原電気ホールディングス株式会社 | Image synthesizing apparatus and image synthesizing method |
CN111028273A (en) * | 2019-11-27 | 2020-04-17 | 山东大学 | Light field depth estimation method based on multi-stream convolution neural network and implementation system thereof |
CN111144214A (en) * | 2019-11-27 | 2020-05-12 | 中国石油大学(华东) | Hyperspectral image unmixing method based on multilayer stack type automatic encoder |
CN111951203A (en) * | 2020-07-01 | 2020-11-17 | 北京大学深圳研究生院 | Viewpoint synthesis method, apparatus, device and computer readable storage medium |
CN112637582A (en) * | 2020-12-09 | 2021-04-09 | 吉林大学 | Three-dimensional fuzzy surface synthesis method for monocular video virtual view driven by fuzzy edge |
CN114463408A (en) * | 2021-12-20 | 2022-05-10 | 北京邮电大学 | Free viewpoint image generation method, device, equipment and storage medium |
CN114627223A (en) * | 2022-03-04 | 2022-06-14 | 华南师范大学 | Free viewpoint video synthesis method and device, electronic equipment and storage medium |
CN114663543A (en) * | 2022-03-31 | 2022-06-24 | 西安交通大学 | Virtual view synthesis method based on deep learning and multi-view geometry |
CN114666564A (en) * | 2022-03-23 | 2022-06-24 | 南京邮电大学 | Method for synthesizing virtual viewpoint image based on implicit neural scene representation |
CN114820901A (en) * | 2022-04-08 | 2022-07-29 | 浙江大学 | Large-scene free viewpoint interpolation method based on neural network |
CN114820945A (en) * | 2022-05-07 | 2022-07-29 | 北京影数科技有限公司 | Sparse sampling-based method and system for generating image from ring shot image to any viewpoint image |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10950037B2 (en) * | 2019-07-12 | 2021-03-16 | Adobe Inc. | Deep novel view and lighting synthesis from sparse images |
-
2022
- 2022-11-08 CN CN202211391996.4A patent/CN115439388B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105247862A (en) * | 2013-04-09 | 2016-01-13 | 联发科技股份有限公司 | Method and apparatus of view synthesis prediction in three-dimensional video coding |
CN105141956A (en) * | 2015-08-03 | 2015-12-09 | 西安电子科技大学 | Incremental rate distortion optimization method based on free viewpoint video depth map coding |
JP2019159840A (en) * | 2018-03-13 | 2019-09-19 | 萩原電気ホールディングス株式会社 | Image synthesizing apparatus and image synthesizing method |
CN111028273A (en) * | 2019-11-27 | 2020-04-17 | 山东大学 | Light field depth estimation method based on multi-stream convolution neural network and implementation system thereof |
CN111144214A (en) * | 2019-11-27 | 2020-05-12 | 中国石油大学(华东) | Hyperspectral image unmixing method based on multilayer stack type automatic encoder |
CN111951203A (en) * | 2020-07-01 | 2020-11-17 | 北京大学深圳研究生院 | Viewpoint synthesis method, apparatus, device and computer readable storage medium |
CN112637582A (en) * | 2020-12-09 | 2021-04-09 | 吉林大学 | Three-dimensional fuzzy surface synthesis method for monocular video virtual view driven by fuzzy edge |
CN114463408A (en) * | 2021-12-20 | 2022-05-10 | 北京邮电大学 | Free viewpoint image generation method, device, equipment and storage medium |
CN114627223A (en) * | 2022-03-04 | 2022-06-14 | 华南师范大学 | Free viewpoint video synthesis method and device, electronic equipment and storage medium |
CN114666564A (en) * | 2022-03-23 | 2022-06-24 | 南京邮电大学 | Method for synthesizing virtual viewpoint image based on implicit neural scene representation |
CN114663543A (en) * | 2022-03-31 | 2022-06-24 | 西安交通大学 | Virtual view synthesis method based on deep learning and multi-view geometry |
CN114820901A (en) * | 2022-04-08 | 2022-07-29 | 浙江大学 | Large-scene free viewpoint interpolation method based on neural network |
CN114820945A (en) * | 2022-05-07 | 2022-07-29 | 北京影数科技有限公司 | Sparse sampling-based method and system for generating image from ring shot image to any viewpoint image |
Non-Patent Citations (6)
Title |
---|
Complete Multi-View Reconstruction of Dynamic Scenes from Probabilistic Fusion of Narrow andWide Baseline Stereo;Tony Tung 等;《2009 IEEE 12th International Conference on Computer Vision》;20091231;第1709-1716页 * |
Neural Sparse Voxel Fields;Lingjie Liu 等;《arXiv:2007.11571v2》;20210131;第1-22页 * |
VoxGRAF: Fast 3D-Aware Image Synthesis with Sparse Voxel Grids;Katja Schwarz 等;《arXiv:2206.07695v2》;20220630;第1-20页 * |
三维人脸重建及自由视点视频生成的研究;汪晏如;《中国优秀硕士学位论文全文数据库 信息科技辑》;20210415;第2021年卷(第04期);第I138-552页 * |
基于图像的自由视点合成方法研究;李明豪;《中国优秀硕士学位论文全文数据库 信息科技辑》;20220115;第2022年卷(第01期);第I138-2246页 * |
基于多流对极卷积神经网络的光场图像深度估计;王硕等;《计算机应用与软件》;20200812;第37卷(第08期);第194-201页 * |
Also Published As
Publication number | Publication date |
---|---|
CN115439388A (en) | 2022-12-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lee et al. | From big to small: Multi-scale local planar guidance for monocular depth estimation | |
CN110782490B (en) | Video depth map estimation method and device with space-time consistency | |
Wang et al. | Deep learning for hdr imaging: State-of-the-art and future trends | |
WO2021048607A1 (en) | Motion deblurring using neural network architectures | |
CN111901598B (en) | Video decoding and encoding method, device, medium and electronic equipment | |
CN114936605A (en) | Knowledge distillation-based neural network training method, device and storage medium | |
Gu et al. | Coupled real-synthetic domain adaptation for real-world deep depth enhancement | |
CN112288788A (en) | Monocular image depth estimation method | |
JP2022536381A (en) | MOTION TRANSITION METHOD, APPARATUS, DEVICE, AND STORAGE MEDIUM | |
Zhao et al. | Vcgan: Video colorization with hybrid generative adversarial network | |
Li et al. | Uphdr-gan: Generative adversarial network for high dynamic range imaging with unpaired data | |
CN115002379B (en) | Video frame inserting method, training device, electronic equipment and storage medium | |
CN114996814A (en) | Furniture design system based on deep learning and three-dimensional reconstruction | |
CN115496663A (en) | Video super-resolution reconstruction method based on D3D convolution intra-group fusion network | |
CN115170388A (en) | Character line draft generation method, device, equipment and medium | |
CN116091955A (en) | Segmentation method, segmentation device, segmentation equipment and computer readable storage medium | |
CN113379606A (en) | Face super-resolution method based on pre-training generation model | |
CN116342675B (en) | Real-time monocular depth estimation method, system, electronic equipment and storage medium | |
CN115439388B (en) | Free viewpoint image synthesis method based on multilayer nerve surface expression | |
Xiao et al. | Progressive motion boosting for video frame interpolation | |
Nie et al. | Context and detail interaction network for stereo rain streak and raindrop removal | |
CN116402908A (en) | Dense light field image reconstruction method based on heterogeneous imaging | |
CN116486009A (en) | Monocular three-dimensional human body reconstruction method and device and electronic equipment | |
Jung et al. | Depth image interpolation using confidence-based Markov random field | |
CN115830094A (en) | Unsupervised stereo matching method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |