CN115439388A - Free viewpoint image synthesis method based on multilayer neural surface expression - Google Patents

Free viewpoint image synthesis method based on multilayer neural surface expression Download PDF

Info

Publication number
CN115439388A
CN115439388A CN202211391996.4A CN202211391996A CN115439388A CN 115439388 A CN115439388 A CN 115439388A CN 202211391996 A CN202211391996 A CN 202211391996A CN 115439388 A CN115439388 A CN 115439388A
Authority
CN
China
Prior art keywords
viewpoint
sparse
module
image synthesis
free viewpoint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211391996.4A
Other languages
Chinese (zh)
Other versions
CN115439388B (en
Inventor
戴翘楚
吴翼天
曹静萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Yilan Technology Co ltd
Original Assignee
Hangzhou Yilan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Yilan Technology Co ltd filed Critical Hangzhou Yilan Technology Co ltd
Priority to CN202211391996.4A priority Critical patent/CN115439388B/en
Publication of CN115439388A publication Critical patent/CN115439388A/en
Application granted granted Critical
Publication of CN115439388B publication Critical patent/CN115439388B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a free viewpoint image synthesis method based on multilayer neural surface expression, which relates to the field of computer vision and comprises the following steps of S1, acquiring image data collected by sparse viewpoints, and estimating the pose of the sparse viewpoints; s2, designing a sparse viewpoint free viewpoint image synthesis network based on multilayer neural surface expression; s3, training the sparse viewpoint free viewpoint image synthesis network based on the multilayer neural surface expression by using a large-scale multi-viewpoint data set; and S4, after obtaining the image synthesis network model parameters, applying the image synthesis network model parameters to the free viewpoint synthesis task of the sparse multi-viewpoint data obtained in the first step. According to the method, a multilayer neural surface expression model is designed, the characteristics of sparse multi-viewpoint images are fully utilized, a high-quality and generalized free viewpoint image synthesis algorithm is completed, and the method is suitable for a free viewpoint image synthesis task of a multi-viewpoint acquisition system.

Description

Free viewpoint image synthesis method based on multilayer neural surface expression
Technical Field
The invention relates to the field of computer vision, in particular to a free viewpoint image synthesis method based on multilayer neural surface expression.
Background
Free viewpoint image synthesis is a key problem in the field of computer vision. With the advent of the 5G era and the development and popularization of virtual reality technology and augmented reality technology, digital images have developed toward the inevitable trend of interchangeability and immersion.
The free viewpoint synthesis has the characteristics of strong three-dimensional immersion, large viewing freedom and rich interactive experience, and is widely applied to various fields such as virtual reality, movie and television production, live sports, cultural social contact and the like.
However, the current free viewpoint system still needs hundreds of cameras, and has complex and expensive structure; meanwhile, most domestic landing systems adopt fixed imaging tracks, the viewing viewpoints are limited, the immersion feeling is insufficient, and the practicability and the economy are improved.
Disclosure of Invention
In order to achieve the purpose, the invention aims to research the problem of free viewpoint image synthesis under a sparse viewpoint, overcome the problem that the final viewpoint synthesis result is influenced by the defect that each group of multi-viewpoint images are required to be trained for a long time or the geometric estimation under the sparse viewpoint in the existing free viewpoint generation algorithm, and provide a framework based on multilayer neural surface expression to realize the scene geometric estimation and texture mapping synthesis of the viewpoint to be synthesized end to end, realize high-quality and efficient free viewpoint image generation, and solve the problems in the background technology.
In the method, based on innovative multilayer neural surface expression, a workflow for synthesizing the free viewpoint images input by sparse viewpoints is designed, scene structure information of a new viewpoint to be synthesized and accurate texture migration and fusion processes are fully learned in a network, and the synthesis of the free viewpoint images with high quality and high efficiency is completed.
The technical scheme is as follows: the free viewpoint image synthesis method based on the multilayer neural surface expression comprises the following steps:
s1, acquiring multi-view synchronous or static scene image data acquired by sparse views, and estimating the pose of the sparse views;
s2, designing a sparse viewpoint free viewpoint image synthesis network based on multilayer neural surface expression;
and S3, training the sparse viewpoint free viewpoint image synthetic network based on the multilayer neural surface expression by using a large-scale multi-viewpoint data set, so that the sparse viewpoint free viewpoint image synthetic network can be generalized to various multi-viewpoint data.
And S4, after the trained sparse viewpoint free viewpoint image synthesis network model parameters based on the multilayer neural surface expression are obtained, applying the trained sparse viewpoint free viewpoint image synthesis network model parameters to the free viewpoint synthesis task of the sparse multi-viewpoint data obtained in the first step.
Further, after step S4, there are: and S5, when the trained network has certain generalization on data which does not appear in the training set, directly utilizing the trained network model in the step S3 to perform forward prediction, and realizing high-quality free viewpoint image synthesis of the sparse multi-viewpoint data to be tested.
Further, in step S1, the acquisition method is a Structure-from-Motion method or a multi-view calibration method with a given calibration object scale.
Further, the free viewpoint synthesis network comprises a Multi-scale image feature extraction module, an MVS (Multi-ViewStereo) module for target-oriented Multi-scale refined scene depth (depth) estimation, a Multi-layer neural surface density estimation module, a reverse feature fusion and Multi-layer neural surface color decoding module, and a Multi-layer neural surface voxel rendering module.
Further, in step S3, the training data is multi-viewpoint image data with a camera pose, and is divided into a training set, a validation set, and a test set, and the training makes the network converge on the validation set.
Further, in step S3, setting is made
Figure 37540DEST_PATH_IMAGE001
Figure 160217DEST_PATH_IMAGE002
Is the number of viewpoints entered; estimating the pose of the sparse viewpoints to obtain the pose of each viewpoint
Figure 651241DEST_PATH_IMAGE003
(ii) a Wherein
Figure 99540DEST_PATH_IMAGE004
Respectively including the internal reference of each viewpoint
Figure 613698DEST_PATH_IMAGE005
Root of external ginseng
Figure 290798DEST_PATH_IMAGE006
(rotation matrix and translation matrix).
Further, in step S3, the pose of the target viewpoint is defined as
Figure 901908DEST_PATH_IMAGE007
Finding out the images of M source viewpoints which are closest to the target viewpoint in the input viewpoints according to the position and the orientation of the target viewpoint
Figure 458792DEST_PATH_IMAGE008
And camera pose
Figure 522563DEST_PATH_IMAGE009
As input to the network.
Further, the multi-scale image feature extraction module is composed of a convolution layer and a jump connection layer, and the multi-scale image feature extraction module is represented as:
Figure 252621DEST_PATH_IMAGE010
wherein, the first and the second end of the pipe are connected with each other,
Figure 655921DEST_PATH_IMAGE011
on behalf of the network of the present module,
Figure 931175DEST_PATH_IMAGE012
for any image input to the module, the output of the module can be three-scale image features
Figure 482242DEST_PATH_IMAGE013
Further, the MVS module implements scene geometry estimation at any viewpoint by modifying a learning-based MVS network, and the implementation method includes the following steps:
m source viewpoint images pass through a multi-scale image feature extraction module to obtain M multiplied by 3 image features;
the method comprises the steps of realizing the transformation from source viewpoint characteristics to a certain depth of a target viewpoint corresponding to each scale, and outputting the probability of each pixel point of a target image on each depth after the regularization of 3D convolution through constructing a cost body of variance;
optimizing from small scale to large scale gradually, updating sampling according to the depth value probability of the previous layer, and finally outputting the depth probability of the target point corresponding to the multilayer surface and the curved surface under the resolution of the original image, wherein the depth probability is determined by the finally sampled depth value).
Further, the multi-layer neural surface density estimation module samples the depth probability volume from the output of the MVS module
Figure 953675DEST_PATH_IMAGE014
To recover the density values on the multi-layer surface points
Figure 8219DEST_PATH_IMAGE015
Correspondingly representing the opacity of the multilayer surface, and preparing for volume rendering to obtain a final output image;
the reverse feature fusion and multilayer neural surface color decoding module reversely accesses the source viewpoint features by utilizing a multilayer surface sampling point set obtained by an MVS module
Figure 234801DEST_PATH_IMAGE016
Fusing and decoding the corresponding characteristic values to form color values of the multilayer surface;
and the multilayer neural surface voxel rendering module performs voxel rendering after acquiring the density corresponding to the multilayer neural surface through the multilayer neural surface density estimation module and acquiring the color corresponding to the multilayer neural surface through inverse feature fusion and the multilayer neural surface color decoding module, so as to complete the synthesis of the final target image.
Compared with the prior art, the invention has the following beneficial effects:
(1) In the invention, because the trained network has certain generalization on the data which does not appear in the training set, the free viewpoint image synthesis task of the sparse multi-viewpoint data to be tested can be completed by directly utilizing the forward prediction of the network;
(2) In the invention, by designing a multilayer neural surface expression model and fully utilizing the characteristics of sparse multi-viewpoint images, a high-quality and generalized free viewpoint image synthesis algorithm is completed, and the method is suitable for a free viewpoint image synthesis task of a multi-viewpoint acquisition system;
(3) In the invention, a network designs multilayer neural surface expression, aims to realize reconstruction of multilayer surfaces of a scene in an end-to-end new viewpoint synthesis framework, and completes high-quality new viewpoint texture fusion and generation based on the multilayer surface expression.
Drawings
Fig. 1 is a work flow chart of a free viewpoint image synthesis method based on multi-layer neural surface expression in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
In the method, based on innovative multilayer neural surface expression, a workflow for synthesizing the free viewpoint images input by sparse viewpoints is designed, scene structure information of a new viewpoint to be synthesized and accurate texture migration and fusion processes are fully learned in a network, and the synthesis of the free viewpoint images with high quality and high efficiency is completed.
Examples
As shown in fig. 1, the method for synthesizing a free viewpoint image based on multi-layer neural surface expression in this embodiment includes the following steps:
s1, acquiring multi-view synchronous or static scene image data acquired by sparse views, and estimating the pose of the sparse views;
wherein, the acquisition method is a Structure-from-Motion method or a multi-viewpoint calibration method for giving a calibration object scale;
that is, the step acquires multi-view data of the same static scene or dynamic scene at the same time, and the multi-view data can be sparse (namely, the change of the viewpoint pose is large);
when the system is used, the purpose of the acquisition is to acquire different viewpoints of a scene to be observed by using limited acquisition equipment, and a free viewpoint image of the scene is expected to be recovered through an algorithm.
S2, designing a sparse viewpoint free viewpoint image synthesis network based on multilayer neural surface expression;
when the method is used, the network designs multilayer neural surface expression, aims to realize reconstruction of multilayer surfaces of scenes in an end-to-end new viewpoint synthesis framework, and completes high-quality new viewpoint texture fusion and generation based on the multilayer surface expression.
Wherein the free viewpoint synthesis network includes: the system comprises a Multi-scale image feature extraction module, an MVS (Multi-ViewStereo) module for estimating depth (depth) of a target-oriented Multi-scale fined scene, a multilayer neural surface density estimation module, a reverse feature fusion and multilayer neural surface color decoding module and a multilayer neural surface voxel rendering module.
When in use, the deep neural network (DeepNeralNet) design of the part is a core part.
And S3, training the sparse viewpoint image synthesis network based on the multilayer neural surface expression by using a large-scale multi-viewpoint data set, so that the sparse viewpoint image synthesis network can be generalized to various multi-viewpoint data.
The training data is multi-viewpoint image data with a camera pose; the training data is divided into a training set, a validation set, and a test set, and the training converges the network on the validation set.
The step S3 comprises the following steps:
s31, inputting N viewpoints which are most similar to the viewpoints to be synthesized into the network, and outputting a predicted image under the viewpoints to be synthesized;
s32, supervision can be pixel-level loss functions such as L1 and L2 or perception loss functions;
and S4, after the trained sparse viewpoint free viewpoint image synthesis network model parameters based on the multilayer neural surface expression are obtained, applying the trained sparse viewpoint free viewpoint image synthesis network model parameters to the free viewpoint synthesis task of the sparse multi-viewpoint data obtained in the first step.
As shown in fig. 1, the method for synthesizing a free viewpoint image based on multi-layer neural surface expression in this embodiment includes the following specific steps:
acquiring multi-view synchronous (or static scene) image data acquired by sparse view;
wherein, the first and the second end of the pipe are connected with each other,
Figure 210847DEST_PATH_IMAGE017
Figure 767861DEST_PATH_IMAGE002
is the number of views input; estimating the pose of the sparse viewpoints to obtain the pose of each viewpoint
Figure 208070DEST_PATH_IMAGE018
Wherein
Figure 74395DEST_PATH_IMAGE019
Respectively including the internal reference of each viewpoint
Figure 537737DEST_PATH_IMAGE005
Root of Redborne ginseng
Figure 147710DEST_PATH_IMAGE020
(rotation matrix and translation matrix).
Specifically, the step acquires multi-view data of the same static scene or dynamic scene at the same time, and the multi-view data can be sparse (namely, the change of the viewpoint pose is large); the purpose of the acquisition is to acquire different viewpoints of a scene to be observed by using limited acquisition equipment, and it is expected that a free viewpoint image of the scene can be recovered through an algorithm.
Defining the pose of the target viewpoint as
Figure 193158DEST_PATH_IMAGE021
Finding out the images of M source viewpoints closest to the target viewpoint from the input viewpoints according to the position and orientation of the target viewpoint
Figure 699225DEST_PATH_IMAGE022
And camera pose
Figure 181022DEST_PATH_IMAGE023
As input to the network.
The method specifically comprises the following steps:
the multi-scale image feature extraction module is based on a U-Net model and consists of a multi-scale convolution layer and a jump connection layer, and the multi-scale image feature extraction module can be expressed as follows:
Figure 860265DEST_PATH_IMAGE024
wherein the content of the first and second substances,
Figure 743908DEST_PATH_IMAGE025
on behalf of the network of the present module,
Figure 420877DEST_PATH_IMAGE012
for any image input to the module, the output of the module can be three-scale image features
Figure 406281DEST_PATH_IMAGE026
When the device is used, the three-channel image passes through the feature extraction module, so that multi-scale features with different scales and different channel numbers extracted at different depths in a network can be obtained, the multi-scale features comprise image features corresponding to different perception domains, and the multi-scale features are used for subsequent neural surface positioning and reverse feature fusion.
The target-oriented multi-scale MVS module capable of refining scene depth estimation realizes scene geometric estimation of any viewpoint by modifying a learning-based MVS network, and the realization method comprises the following steps:
(1) Enabling the M source viewpoint images to pass through a multi-scale image feature extraction module to obtain M multiplied by 3 image features;
(2) The method comprises the steps of realizing the transformation from source viewpoint characteristics to a certain depth of a target viewpoint corresponding to each scale, and outputting the probability of each pixel point of a target image on each depth after the regularization of 3D convolution through constructing a cost body of variance;
(3) Optimizing from small scale to large scale gradually, updating sampling according to the depth value probability of the previous layer, and finally outputting the depth probability of the target point corresponding to the multilayer surface (curved surface, determined by the final sampled depth value) under the resolution of the original image.
The MVS module can be expressed as:
Figure 358057DEST_PATH_IMAGE027
Figure 565047DEST_PATH_IMAGE028
wherein the content of the first and second substances,
Figure 209655DEST_PATH_IMAGE029
the module is the MVS module for the target-oriented multi-scale scene depth estimation capable of being refined, and the output is a multi-layer surface sampling point set of a target viewpoint
Figure 931624DEST_PATH_IMAGE030
Corresponding sampling depth probability body of
Figure 890352DEST_PATH_IMAGE031
Multi-layer neural surface density estimation module, sampling depth probability volume from output of MVS module
Figure 233740DEST_PATH_IMAGE032
To recover the density values on the multi-layer surface points
Figure 314829DEST_PATH_IMAGE015
Correspondingly representing the opacity of the multilayer surface, and preparing for volume rendering to obtain a final output image;
a reverse feature fusion and multilayer neural surface color decoding module, which utilizes the multilayer surface sampling point set obtained by the MVS module to reversely access the source viewpoint feature
Figure 258514DEST_PATH_IMAGE033
And fusing and decoding the corresponding features to form color values of the multilayer surface.
For convenience of explanation, the processing procedure of the inverse feature fusion and multi-layer neural surface color decoding module is described as follows:
(1) Corresponding to a certain depth
Figure 20933DEST_PATH_IMAGE034
And a certain pixel point
Figure 999254DEST_PATH_IMAGE035
By passing
Figure 470817DEST_PATH_IMAGE036
And
Figure 105061DEST_PATH_IMAGE037
finding out the characteristics of the corresponding source viewpoint, and extracting the characteristics corresponding to the M source viewpoints into a characteristic set
Figure 467909DEST_PATH_IMAGE038
(2) Respectively encoding the M feature vectors by MLP (multi-layer perceptron), and averagingOperate to obtain fused features
Figure 300736DEST_PATH_IMAGE039
(3) Corresponding to each pixel point
Figure 395731DEST_PATH_IMAGE035
And depth
Figure 314009DEST_PATH_IMAGE034
All go backward to perform feature fusion to obtain multi-layer features
Figure 231280DEST_PATH_IMAGE040
It is noted that the feature fusion method may also be in other feature fusion forms;
(4) Through decoding by a decoder, multi-layer colors are obtained as follows:
Figure 856296DEST_PATH_IMAGE041
wherein the content of the first and second substances,
Figure 450089DEST_PATH_IMAGE042
is the number of layers on the surface of the multi-layer nerve,
Figure 590083DEST_PATH_IMAGE043
for image decoder, multi-layer color is finally output
Figure 294734DEST_PATH_IMAGE044
The multilayer neural surface voxel rendering module performs voxel rendering after the density corresponding to the multilayer neural surface obtained by the multilayer neural surface density estimation module and the color corresponding to the multilayer neural surface obtained by the inverse feature fusion and multilayer neural surface color decoding module to complete the synthesis of a final target image, which can be expressed as:
Figure 39836DEST_PATH_IMAGE045
the sparse viewpoint free viewpoint image synthesis network based on the multilayer neural surface expression is trained by utilizing a large-scale multi-viewpoint data set, so that the sparse viewpoint free viewpoint image synthesis network can be generalized to various multi-viewpoint data.
In particular, the training data may be multi-view image data with camera poses.
The input of the network is M viewpoint images most similar to the viewpoint to be synthesized
Figure 555262DEST_PATH_IMAGE022
And corresponding camera pose
Figure 182553DEST_PATH_IMAGE023
Output as a predicted image at the view point to be synthesized
Figure 628577DEST_PATH_IMAGE046
Supervision may be a pixel level loss function such as L1, L2, or a perceptual loss function, etc.; the loss function may be the L2 loss:
Figure 290503DEST_PATH_IMAGE047
after the trained sparse viewpoint free viewpoint image synthesis network model parameters based on the multilayer neural surface expression are obtained, the method can be applied to the free viewpoint synthesis task of the sparse multi-viewpoint data obtained in the first step. Because the trained network has certain generalization on data which does not appear in a training set, forward prediction can be directly carried out by using the trained network model, and high-quality free viewpoint image synthesis of sparse multi-viewpoint data to be tested is realized.
It is noted that, in this document, relational terms such as first and second, and the like, if any, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
The technical problems to be solved are still consistent with the present invention and should be included in the scope of the present invention.

Claims (10)

1. The free viewpoint image synthesis method based on the multilayer neural surface expression is characterized by comprising the following steps of:
s1, acquiring multi-view synchronous or static scene image data acquired by sparse views, and estimating the pose of the sparse views;
s2, designing a sparse viewpoint free viewpoint image synthesis network based on multilayer neural surface expression;
s3, training the sparse viewpoint free viewpoint image synthetic network based on the multilayer neural surface expression by using a large-scale multi-viewpoint data set, so that the sparse viewpoint free viewpoint image synthetic network can be generalized to various multi-viewpoint data;
and S4, after the trained sparse viewpoint free viewpoint image synthesis network model parameters based on the multilayer neural surface expression are obtained, applying the trained sparse viewpoint free viewpoint image synthesis network model parameters to the free viewpoint synthesis task of the sparse multi-viewpoint data obtained in the S1.
2. The free viewpoint image synthesis method based on multilayer neural surface expression as set forth in claim 1, wherein after step S4, there are further:
and S5, when the trained network has certain generalization on the data which does not appear in the training set, directly utilizing the network model trained in the step S3 to carry out forward prediction, and realizing the high-quality free viewpoint image synthesis of the sparse multi-viewpoint data to be tested.
3. The free viewpoint image synthesis method based on multi-layer neural surface expression as claimed in claim 1, wherein in step S1, the collection method is Structure-from-Motion method or multi-viewpoint calibration method with given calibration object dimension.
4. The method of claim 1, wherein the free viewpoint image synthesis network comprises a multi-scale image feature extraction module, an MVS module for object-oriented multi-scale refineable scene depth estimation, a multi-layer neural surface density estimation module, a reverse feature fusion and multi-layer neural surface color decoding module, and a multi-layer neural surface voxel rendering module.
5. The free viewpoint image synthesis method based on multi-layer neural surface expression as claimed in claim 1, wherein in step S3, the training data is multi-viewpoint image data with camera pose, and is divided into a training set, a validation set and a test set, and the training makes the network converge on the validation set.
6. The free viewpoint image synthesis method based on multi-layer neural surface expression as claimed in claim 1, wherein in step S3,
setting up
Figure 313376DEST_PATH_IMAGE001
Figure 752448DEST_PATH_IMAGE002
Is the number of viewpoints entered;
estimating the pose of the sparse viewpoints to obtain the pose of each viewpoint
Figure 294288DEST_PATH_IMAGE003
Wherein
Figure 996665DEST_PATH_IMAGE004
Respectively including the internal reference of each viewpoint
Figure 623955DEST_PATH_IMAGE005
Root of external ginseng
Figure 617450DEST_PATH_IMAGE006
7. The free viewpoint image synthesis method based on multi-layer neural surface expression as claimed in claim 1, wherein in step S3, the pose of the target viewpoint is defined as
Figure 279375DEST_PATH_IMAGE007
Finding out the images of M source viewpoints which are closest to the target viewpoint in the input viewpoints according to the position and the orientation of the target viewpoint
Figure 949391DEST_PATH_IMAGE008
And camera pose
Figure 267240DEST_PATH_IMAGE009
As input to the network.
8. The free viewpoint image synthesis method based on multi-layer neural surface expression as claimed in claim 1, wherein the multi-scale image feature extraction module is composed of a convolutional layer and a jump connection layer, and the multi-scale image feature extraction module is represented as:
Figure 313694DEST_PATH_IMAGE010
wherein the content of the first and second substances,
Figure 112016DEST_PATH_IMAGE011
on behalf of the network of the present module,
Figure 687354DEST_PATH_IMAGE012
for any image input to the module, the output of the module can be three-scale image features
Figure 836707DEST_PATH_IMAGE013
9. The free viewpoint image synthesis method based on multilayer neural surface expression as claimed in claim 4, wherein the MVS module realizes scene geometric estimation of any viewpoint by modifying the learning-based MVS network, and the realization method comprises the following steps:
enabling the M source viewpoint images to pass through a multi-scale image feature extraction module to obtain M multiplied by 3 image features;
the method comprises the steps of realizing the transformation from source viewpoint characteristics to a certain depth of a target viewpoint corresponding to each scale, and outputting the probability of each pixel point of a target image on each depth after the regularization of 3D convolution through constructing a cost body of variance;
optimizing from small scale to large scale gradually, updating sampling according to the depth value probability of the previous layer, and finally outputting the depth probability of the target point corresponding to the multilayer surface and the curved surface under the resolution of the original image, wherein the depth probability is determined by the finally sampled depth value).
10. The free viewpoint map based on multi-layer neural surface representation of claim 4The image synthesis method is characterized in that the multi-layer neural surface density estimation module samples a depth probability volume from the output of the MVS module
Figure 624534DEST_PATH_IMAGE014
To recover the density values on the multi-layer surface points
Figure 261052DEST_PATH_IMAGE015
Correspondingly representing the opacity of the multilayer surface, and preparing for volume rendering to obtain a final output image;
the reverse feature fusion and multilayer neural surface color decoding module reversely accesses the source viewpoint features by utilizing a multilayer surface sampling point set obtained by the MVS module
Figure 272870DEST_PATH_IMAGE016
Fusing and decoding the corresponding characteristic values to form color values of the multilayer surface;
and the multilayer neural surface voxel rendering module performs voxel rendering after acquiring the density corresponding to the multilayer neural surface through the multilayer neural surface density estimation module and acquiring the color corresponding to the multilayer neural surface through inverse feature fusion and the multilayer neural surface color decoding module, so as to complete the synthesis of the final target image.
CN202211391996.4A 2022-11-08 2022-11-08 Free viewpoint image synthesis method based on multilayer nerve surface expression Active CN115439388B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211391996.4A CN115439388B (en) 2022-11-08 2022-11-08 Free viewpoint image synthesis method based on multilayer nerve surface expression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211391996.4A CN115439388B (en) 2022-11-08 2022-11-08 Free viewpoint image synthesis method based on multilayer nerve surface expression

Publications (2)

Publication Number Publication Date
CN115439388A true CN115439388A (en) 2022-12-06
CN115439388B CN115439388B (en) 2024-02-06

Family

ID=84252759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211391996.4A Active CN115439388B (en) 2022-11-08 2022-11-08 Free viewpoint image synthesis method based on multilayer nerve surface expression

Country Status (1)

Country Link
CN (1) CN115439388B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105141956A (en) * 2015-08-03 2015-12-09 西安电子科技大学 Incremental rate distortion optimization method based on free viewpoint video depth map coding
CN105247862A (en) * 2013-04-09 2016-01-13 联发科技股份有限公司 Method and apparatus of view synthesis prediction in three-dimensional video coding
JP2019159840A (en) * 2018-03-13 2019-09-19 萩原電気ホールディングス株式会社 Image synthesizing apparatus and image synthesizing method
CN111028273A (en) * 2019-11-27 2020-04-17 山东大学 Light field depth estimation method based on multi-stream convolution neural network and implementation system thereof
CN111144214A (en) * 2019-11-27 2020-05-12 中国石油大学(华东) Hyperspectral image unmixing method based on multilayer stack type automatic encoder
CN111951203A (en) * 2020-07-01 2020-11-17 北京大学深圳研究生院 Viewpoint synthesis method, apparatus, device and computer readable storage medium
US20210012561A1 (en) * 2019-07-12 2021-01-14 Adobe Inc. Deep novel view and lighting synthesis from sparse images
CN112637582A (en) * 2020-12-09 2021-04-09 吉林大学 Three-dimensional fuzzy surface synthesis method for monocular video virtual view driven by fuzzy edge
CN114463408A (en) * 2021-12-20 2022-05-10 北京邮电大学 Free viewpoint image generation method, device, equipment and storage medium
CN114627223A (en) * 2022-03-04 2022-06-14 华南师范大学 Free viewpoint video synthesis method and device, electronic equipment and storage medium
CN114666564A (en) * 2022-03-23 2022-06-24 南京邮电大学 Method for synthesizing virtual viewpoint image based on implicit neural scene representation
CN114663543A (en) * 2022-03-31 2022-06-24 西安交通大学 Virtual view synthesis method based on deep learning and multi-view geometry
CN114820901A (en) * 2022-04-08 2022-07-29 浙江大学 Large-scene free viewpoint interpolation method based on neural network
CN114820945A (en) * 2022-05-07 2022-07-29 北京影数科技有限公司 Sparse sampling-based method and system for generating image from ring shot image to any viewpoint image

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105247862A (en) * 2013-04-09 2016-01-13 联发科技股份有限公司 Method and apparatus of view synthesis prediction in three-dimensional video coding
CN105141956A (en) * 2015-08-03 2015-12-09 西安电子科技大学 Incremental rate distortion optimization method based on free viewpoint video depth map coding
JP2019159840A (en) * 2018-03-13 2019-09-19 萩原電気ホールディングス株式会社 Image synthesizing apparatus and image synthesizing method
US20210012561A1 (en) * 2019-07-12 2021-01-14 Adobe Inc. Deep novel view and lighting synthesis from sparse images
CN111028273A (en) * 2019-11-27 2020-04-17 山东大学 Light field depth estimation method based on multi-stream convolution neural network and implementation system thereof
CN111144214A (en) * 2019-11-27 2020-05-12 中国石油大学(华东) Hyperspectral image unmixing method based on multilayer stack type automatic encoder
CN111951203A (en) * 2020-07-01 2020-11-17 北京大学深圳研究生院 Viewpoint synthesis method, apparatus, device and computer readable storage medium
CN112637582A (en) * 2020-12-09 2021-04-09 吉林大学 Three-dimensional fuzzy surface synthesis method for monocular video virtual view driven by fuzzy edge
CN114463408A (en) * 2021-12-20 2022-05-10 北京邮电大学 Free viewpoint image generation method, device, equipment and storage medium
CN114627223A (en) * 2022-03-04 2022-06-14 华南师范大学 Free viewpoint video synthesis method and device, electronic equipment and storage medium
CN114666564A (en) * 2022-03-23 2022-06-24 南京邮电大学 Method for synthesizing virtual viewpoint image based on implicit neural scene representation
CN114663543A (en) * 2022-03-31 2022-06-24 西安交通大学 Virtual view synthesis method based on deep learning and multi-view geometry
CN114820901A (en) * 2022-04-08 2022-07-29 浙江大学 Large-scene free viewpoint interpolation method based on neural network
CN114820945A (en) * 2022-05-07 2022-07-29 北京影数科技有限公司 Sparse sampling-based method and system for generating image from ring shot image to any viewpoint image

Non-Patent Citations (12)

* Cited by examiner, † Cited by third party
Title
KATJA SCHWARZ 等: "VoxGRAF: Fast 3D-Aware Image Synthesis with Sparse Voxel Grids", 《ARXIV:2206.07695V2》 *
KATJA SCHWARZ 等: "VoxGRAF: Fast 3D-Aware Image Synthesis with Sparse Voxel Grids", 《ARXIV:2206.07695V2》, 30 June 2022 (2022-06-30), pages 1 - 20 *
LINGJIE LIU 等: "Neural Sparse Voxel Fields", 《ARXIV:2007.11571V2》 *
LINGJIE LIU 等: "Neural Sparse Voxel Fields", 《ARXIV:2007.11571V2》, 31 January 2021 (2021-01-31), pages 1 - 22 *
TONY TUNG 等: "Complete Multi-View Reconstruction of Dynamic Scenes from Probabilistic Fusion of Narrow andWide Baseline Stereo", 《2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION》 *
TONY TUNG 等: "Complete Multi-View Reconstruction of Dynamic Scenes from Probabilistic Fusion of Narrow andWide Baseline Stereo", 《2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION》, 31 December 2009 (2009-12-31), pages 1709 - 1716 *
李明豪: "基于图像的自由视点合成方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
李明豪: "基于图像的自由视点合成方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, vol. 2022, no. 01, 15 January 2022 (2022-01-15), pages 138 - 2246 *
汪晏如: "三维人脸重建及自由视点视频生成的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
汪晏如: "三维人脸重建及自由视点视频生成的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, vol. 2021, no. 04, 15 April 2021 (2021-04-15), pages 138 - 552 *
王硕等: "基于多流对极卷积神经网络的光场图像深度估计", 《计算机应用与软件》 *
王硕等: "基于多流对极卷积神经网络的光场图像深度估计", 《计算机应用与软件》, vol. 37, no. 08, 12 August 2020 (2020-08-12), pages 194 - 201 *

Also Published As

Publication number Publication date
CN115439388B (en) 2024-02-06

Similar Documents

Publication Publication Date Title
Lee et al. From big to small: Multi-scale local planar guidance for monocular depth estimation
CN110738697A (en) Monocular depth estimation method based on deep learning
Song et al. Starenhancer: Learning real-time and style-aware image enhancement
Chen et al. Cross parallax attention network for stereo image super-resolution
Gu et al. Coupled real-synthetic domain adaptation for real-world deep depth enhancement
Li et al. Deep sketch-guided cartoon video inbetweening
CN112288788A (en) Monocular image depth estimation method
Li et al. Uphdr-gan: Generative adversarial network for high dynamic range imaging with unpaired data
CN110598537A (en) Video significance detection method based on deep convolutional network
Zhang et al. Removing Foreground Occlusions in Light Field using Micro-lens Dynamic Filter.
Wang et al. Neural opacity point cloud
Xiao et al. Image hazing algorithm based on generative adversarial networks
Mu et al. Neural 3D reconstruction from sparse views using geometric priors
Zhu et al. Occlusion-free scene recovery via neural radiance fields
Nie et al. Context and detail interaction network for stereo rain streak and raindrop removal
CN116402908A (en) Dense light field image reconstruction method based on heterogeneous imaging
CN115439388B (en) Free viewpoint image synthesis method based on multilayer nerve surface expression
Jung et al. Depth image interpolation using confidence-based Markov random field
CN114820323A (en) Multi-scale residual binocular image super-resolution method based on stereo attention mechanism
Guo et al. Stereo cross-attention network for unregistered hyperspectral and multispectral image fusion
CN113673567A (en) Panorama emotion recognition method and system based on multi-angle subregion self-adaption
Xue et al. An end-to-end multi-resolution feature fusion defogging network
Zhu et al. Fused network for view synthesis
Li et al. Delving Deeper Into Image Dehazing: A Survey
CN117058049B (en) New view image synthesis method, synthesis model training method and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant