CN115457182A - Interactive viewpoint image synthesis method based on multi-plane image scene representation - Google Patents

Interactive viewpoint image synthesis method based on multi-plane image scene representation Download PDF

Info

Publication number
CN115457182A
CN115457182A CN202211191210.4A CN202211191210A CN115457182A CN 115457182 A CN115457182 A CN 115457182A CN 202211191210 A CN202211191210 A CN 202211191210A CN 115457182 A CN115457182 A CN 115457182A
Authority
CN
China
Prior art keywords
image
network
coding
dimensional
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211191210.4A
Other languages
Chinese (zh)
Inventor
霍智勇
魏俊宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202211191210.4A priority Critical patent/CN115457182A/en
Publication of CN115457182A publication Critical patent/CN115457182A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computer Graphics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)

Abstract

In order to eliminate distortion and artifacts in a new viewpoint image and improve the synthesis quality of the new viewpoint image, a three-dimensional convolution neural network is used for capturing spatial features across a plurality of depth planes, and meanwhile, the capability of predicting an occlusion area on the depth plane is established. The homography transformation module is used for coding the position information of the input reference image pair and respectively re-projecting the input images into the target camera through the homography transformation matrix; the network framework uses a coder decoder framework based on three-dimensional convolution, and a coder and a decoder of each layer are connected into a U-shaped network through jumping, so that the capability of capturing context information is enhanced; the network output module generates a multi-plane image scene representation by mixing the weight predicted by the network and the alpha image, and the image of the target viewpoint is rendered by synthesizing the scene representation from back to front. The method can improve the accuracy of the new viewpoint image synthesis.

Description

Interactive viewpoint image synthesis method based on multi-plane image scene representation
Technical Field
The invention relates to the field of image processing, in particular to an interactive viewpoint image synthesis method based on multi-plane image scene representation.
Background
With the development of deep learning, view synthesis has gained much attention in computer vision and computer graphics research because of its wide range of applications, such as providing users with real-time interactive experiences in immersive displays and augmented and virtual reality (AR/VR). The challenge of view synthesis of an explicit three-dimensional representation is that it requires an inference of accurate scene geometry from existing viewpoints. MPI (multi-planar image) has proven to be a convenient volumetric representation that can effectively estimate the geometry of a scene to synthesize a new viewpoint image. MPI is a set of semi-transparent images distributed at different depths that can encode diffuse surfaces as well as non-lambertian effects, such as transparent and reflective areas. Given an MPI scene representation, a new viewpoint image of the target image can be rendered simply by applying an inverse homography transform and a back-to-front alpha composition.
Currently, much work is focused on learning MPI scene representations from single or multiple inputs. Whereas Tucker et al (Tucker R, snap n.single-view view synthesis with multiplane images.) predict an MPI representation of a scene from a single input image, the MINE method extends the MPI into one continuous depth plane by introducing a planar nerve radiation field, i.e. a plane of any depth in the MPI can be predicted given a single image as input. Although the single input method achieves good effect, it needs to use point cloud as supervision to solve the scale ambiguity problem of monocular depth estimation, however, the process of predicting MPI is an estimation of depth by itself; the MPI prediction method based on multiple inputs has no scale ambiguity problem. Given an input stereo image pair taken by a narrow baseline stereo camera, zhou et al (Zhou T, tuner R, flyn J, et al, stereo vision synthesis using multiplane images) uses an end-to-end two-dimensional depth Learning network to infer the MPI scene representation, which uses a differentiable rendering module to generate new viewpoint images, a problem known as stereo zoom. On the basis of this work, tucker et al (Srinivasan P, tucker R, barron J T, et al, pushing the bounding of views with multiplane images) proposed a theoretical analysis that the range of views that can be rendered from MPI increases linearly with increasing MPI disparity sampling frequency, and a two-stage framework based on a three-dimensional convolutional neural network that uses optical flow to predict the content of occlusions. Although increasing the number of layers of MPI can effectively extend the boundary of view extrapolation, the number of layers of MPI cannot be increased indefinitely due to GPU limitations. Flynn et al (Flynn J, broxton M, debevec P, et al. Deepview: view synthesis with a sparse gradient component.) consider the prediction process of MPI as an inverse problem, belonging to the overfitting problem, so a gradient descent method is proposed to solve the problem, but the calculation amount of the method is large.
Disclosure of Invention
Aiming at the problem that the existing method can not capture the characteristic connection of MPI across depth planes, so that the synthesized new view image often has obvious distortion and artifacts, the invention provides a new view synthesis method based on MPI scene representation. The verification is carried out on the Spaces data set and the RealEstate10K data set, and experiments prove that the method can effectively synthesize the new viewpoint image and has better performance than the existing method.
An interactive viewpoint image synthesis method based on multi-plane image scene representation comprises the following steps:
step 1, acquiring training data and preprocessing;
step 2, inputting the training image data obtained in the step 1 into a built three-dimensional convolution neural network based on multi-plane image scene representation for training;
the three-dimensional convolution neural network comprises a homography transformation module, a three-dimensional convolution coding and decoding framework and a network output module;
the homography transformation module encodes the position information of the reference image pair acquired by the input reference camera and respectively re-projects the input image into the target camera through the homography transformation matrix;
the three-dimensional convolutional coding and decoding framework comprises a preprocessing block and a four-layer coding-decoding structure, and a coder and a decoder of each layer are connected into a U-shaped network through jumping;
the network output module generates a multi-plane image scene representation through the weight of the mixed network prediction and an alpha image, and the image of the target viewpoint is rendered through the synthesis of the multi-plane image from back to front;
and 3, inputting a test image for detection based on the trained network to obtain a final new viewpoint image synthesis result.
Further, in step 1, the data preprocessing includes data enhancement of the training image.
Further, in step 2, homography transformation uses the same depth, inferring the geometry of the scene by comparing different input images.
Further, in the homography transformation module, the input reference image is subjected to I pair based on the parameters of the reference camera 1 And I 2 The corresponding planar scanning Volume PSV (Plane scanning Volume) is calculated, the reference image pair is respectively re-projected into the target camera at a set of fixed depths D, wherein the projection points in the reference view and the target view are linked by a homography matrix,
further, in step 2, the three-dimensional convolution coding and decoding framework captures spatial features across multiple depth planes, and predicts the sheltering and hiding areas of the multi-plane image in the depth planes.
Further, in the three-dimensional convolution coding and decoding architecture, convolution operation is performed on 3N color channels of D consecutive images with the resolution of H × W (where N is the number of input images); the first convolution layer in the pre-processing block uses convolution kernels of size 7 × 7 × 7, and the rest of the three-dimensional convolutions use kernels of size 3 × 3 × 3.
Further, each layer of the codec structure comprises one coding block and one decoding block; the coding blocks are composed of four three-dimensional convolutions, each two convolutions are connected in a jumping mode, and except the coding blocks of the first layer, input tensors are sampled in a downsampling mode; the decoding block is composed of two three-dimensional convolution layers with convolution kernel size of 3 × 3 × 3 and one up-sampling layer.
The invention achieves the following beneficial effects:
1) Establishing a three-dimensional convolutional neural network of an encoder-decoder framework to capture spatial features spanning multiple depth planes, eliminating distortion and artifacts in the new viewpoint image, and improving the synthesis quality of the new viewpoint image;
2) The ability to predict occlusion regions on the depth plane is established.
3) The accuracy of the new viewpoint image synthesis is improved.
Drawings
Fig. 1 is a flowchart of a view synthesis algorithm using a three-dimensional convolutional neural network based on MPI scene representation in an embodiment of the present invention.
Fig. 2 is a schematic diagram of a three-dimensional convolutional neural network architecture in an embodiment of the present invention.
Fig. 3 is a schematic diagram of an encoding block 2 architecture according to an embodiment of the present invention.
FIG. 4 is a block diagram of a decoding block 2 according to an embodiment of the present invention.
Detailed Description
The technical scheme of the invention is further explained in detail by combining the drawings in the specification.
The general structure of the invention is shown in figure 1, and provides a viewpoint synthesis framework based on MPI scene representation. The method specifically comprises the following steps:
step 1, acquiring training image data and preprocessing.
Because the network needs to be trained iteratively for many times and is adapted to various application situations, the prepared training data volume needs to reach a certain magnitude requirement. Adopting a Spaces data set and a RealEstate10K data set as training image data, wherein the RealEstate10K data set comprises about 7500 indoor and outdoor scenes extracted from a YouTube video, and calibrating the internal reference and relative position of a camera; the space data set contains 100 indoor and outdoor scenes, taken with 16 cameras, each spaced about 10 cm apart, and calibrated for internal and external reference using a Motion recovery Structure SFM (Structure-from-Motion). Training was performed using 90 scenes in the dataset and evaluation was performed in the remaining 10 scenes, with the image resolution set to 800 × 480. It allows deep learning methods to train their architecture using large-scale data.
Step 2, a view synthesis algorithm flow using a three-dimensional convolution neural network is represented based on an MPI scene and is shown in fig. 1, and the view synthesis algorithm flow comprises three parts: the system comprises a homography transformation module, a network framework and a network output module. The homography transformation module is used for coding the position information of the input reference image pair and respectively re-projecting the input images into the target camera through the homography transformation matrix; the network framework uses a coder decoder framework based on three-dimensional convolution, and a coder and a decoder of each layer are connected into a U-shaped network through jumping, so that the capability of capturing context information is enhanced; the network output module generates MPI scene representation by mixing the weight of network prediction and an alpha image, and the image of a target viewpoint is rendered by synthesizing MPI from back to front.
Step 21, two input images I are given 1 And I 2 Knowing the camera parameters C 1 =(A 1 ,[R 1 ,t 1 ]) And C 2 =(A 2 ,[R 2 ,t 2 ]) Wherein A is i Denotes an internal reference of the camera, [ R ] i ,T i ](i =1,2) represents the external parameters of the camera (i.e. the rotation matrix and the translation vector). As shown in fig. 1, to pair I with the input reference image 1 And I 2 The position information of (a) is encoded to calculate a pair of Planar Scanning Volumes (PSVs), i.e. reference image pairs are respectively re-projected into the target camera at a set of fixed depths (D). Consider in reference view I i One pixel point p in (i =1, 2) i (u i ,v i 1) (wherein (u) i ,v i 1) is a pixel point p i Coordinates of) and lie in a reference phaseAt depth z in the machine coordinate system i The corresponding voxel. The depth at which this voxel lies in the target camera coordinate system is denoted z v Then pixel point p i (u i ,v i 1) projection onto an object view I t Pixel point p in v (u v ,v v 1) (wherein (u) v ,v v 1) is a pixel point p v Coordinates of) can be expressed as:
Figure BDA0003869502060000061
wherein A is v Denotes an internal reference of the target camera, [ R ] v ,T v ]Representing the external parameters (i.e., rotation matrix and translation vector) of the target camera. A three-dimensional scene may be segmented into planes that are the same distance (i.e., disparity values) as the reference camera. For points on such a depth plane, their projected points p in the reference view i And a projection point p in the target view v Can pass through the homography matrix H vi,z Are linked by H vi,z Can be obtained by simplifying the formula (1):
Figure BDA0003869502060000071
due to a series of homography matrices H vi,z Applied to the reference view, a set of homography warped views P can be obtained i (i.e., the planar scan volume PSV), i.e., the result of the reprojection at a different depth plane. The size of each PSV tensor is [3, D, H, W]The two PSVs are arranged along the color channel are linked to give a [3N, D, H, W]As an input to the three-dimensional convolutional neural network. Where H and W are the height and width of the image, respectively, D is the number of depth planes, and N is the number of input images. The three-dimensional convolutional neural network learns to infer the geometry of the scene by comparing PSVs of two different views.
Step 22, as shown in fig. 2, the three-dimensional convolutional neural network architecture is composed of two parts: a pre-processing block and a four-layer codec structure. Spatial features across depths are extracted using three-dimensional convolution in training, and the spatial relationship between each plane can be effectively learned. The three-dimensional convolution operation is a convolution operation performed on 3N color channels of D consecutive images with a resolution H × W, and we take the number of depth planes D =32 and N =2 input images with a resolution 480 × 800 as an example, i.e. the input of the three-dimensional convolution neural network is denoted as 6@32,480,800. The preprocessing module samples the input tensor as 32@16,240,400 by extracting spatial features across 32 depth planes. All three-dimensional convolutions use a kernel size of 3 × 3 × 3, except for the first convolution layer in the pre-processing block, which uses convolution kernels of 7 × 7 × 7 size. The coding blocks and decoding blocks in each layer are connected as a U-type network through a hopping connection, which enhances the ability to capture context information.
Each layer of the encoder-decoder architecture comprises one coding block and one decoding block, as shown in fig. 3 and 4, taking coding block 2 and decoding block 2 as an example, respectively. The coding block 2 is composed of four three-dimensional convolutions, each two convolutions having a jump connection, and the three-dimensional convolution with a convolution kernel size of 1 × 1 × 1 is applied to the first jump connection of the coding block 2 to downsample the input tensor. It should be noted that only the coding block 1 does not down-sample the input feature tensor (the structure of the remaining coding blocks is basically similar to that of the coding block 2 shown in fig. 3, and in addition, the input of the coding block 1 in fig. 2 is not down-sampled, and the down-sampling exists in the output); the decoding block 2 is composed of two three-dimensional convolution layers having convolution kernel sizes of 3 × 3 × 3 and one up-sampling layer, and the other decoding blocks are the same as the decoding block 2. In FIG. 2, the parameters of each module represent the change in size of the tensor 3N @ D, H, W, where "upsampling 2X" represents a doubling of the resolution and depth channel parameters.
Step 23, the network output module directly predicts the opacity alpha of an MPI by the three-dimensional convolution neural network as shown in figure 1 i And two mixing weights w i (i =1,2). While the RGB values of MPI may be obtained by mixing the weights and P i Well modeled, where P i The volume PSV is scanned for the plane described in step 21 by the homography matrix. Thus, for MPIFor each plane, calculate its RGB image c as:
c=∑w i ΘP i (i=1,2) (3)
finally, the target image I t M = { c may be represented by MPI scene ii H (i =1,2.. N) is rendered by alpha compositing. Wherein the rendering process is differentiable, synthesizing the target image I t Is defined as:
Figure BDA0003869502060000081
the invention trains MPI prediction networks using view synthesis as a supervise, using unit normalized VGG-19 perceptual loss in channel dimensions as a loss function:
Figure BDA0003869502060000091
wherein
Figure BDA0003869502060000092
Is a target image I t True value of (phi) l Is a set of layers of VGG-19, the weight exceeds the parameter lambda l Is set to VGG-19 (the VGG-19 is a two-dimensional convolutional neural network used in calculating loss using phi therein l Layer extraction
Figure BDA0003869502060000093
And I t And calculating the loss of both) of the number of neurons.
And 3, inputting a test image for detection based on the trained network to obtain a final new viewpoint image synthesis result.
To evaluate the model's ability to infer a target view, experiments were performed on the RealEstate10K dataset and the Spaces dataset, respectively. For fair comparison, the present method uses the same number of depth planes as the other methods. When experiments were performed on the Spaces dataset, the input views were adjusted to 4. Experimental results show that the framework can capture spatial features across multiple depth planes to deduce correct scene geometry and occlusion area content, thereby synthesizing a high-quality target view; at the same time, the method can process thin and complex structures and present clearer object edges than the previous method.
In ablation experiments, a larger number of depth planes (D =8, 16,24, 32, 40) were set to verify their importance for the present method. Two different baseline configurations were trained on the Spaces dataset, namely a baseline distance of about 20 cm for the input viewpoint and about 40 cm for the baseline, and 2 views were used as input. The data set was expanded to 16 times the original by using data enhancement. The experimental result shows that the performance of the model in synthesizing the new viewpoint image is improved along with the increase of the number of the depth planes and is not influenced by the baseline distance. The improvement in performance may be due to the more intensive sampling of depth during training, the ability of the model to learn more accurately to infer scene geometry; while using the same number of depth planes, the performance of synthesizing new viewpoint images may decrease as the baseline distance increases, because it is difficult to propagate the volumetric visibility depending on the network as the baseline distance increases, and thus the performance of MPI is affected.
In summary, the invention provides a viewpoint synthesis method based on multi-plane scene representation, in order to eliminate distortion and artifacts in a new viewpoint image and improve the synthesis quality of the new viewpoint image, the framework uses a three-dimensional convolution neural network to capture spatial features across multiple depth planes, and simultaneously establishes the capability of predicting an occlusion region on the depth plane. The method also achieves high quality composite results over special areas, such as specular reflection areas in a scene. Experimental results on both data sets showed that the quality of the new view synthesis was superior to the previous algorithm. With the increase of the number of depth planes in the ablation experiment, the rendering quality of the new viewpoint image is improved.
The above description is only a preferred embodiment of the present invention, and the scope of the present invention is not limited to the above embodiment, but equivalent modifications or changes made by those skilled in the art according to the present disclosure should be included in the scope of the present invention as set forth in the appended claims.

Claims (7)

1. An interactive viewpoint image synthesis method based on multi-plane image scene representation is characterized in that: the method comprises the following steps:
step 1, acquiring training data and preprocessing;
step 2, inputting the training image data obtained in the step 1 into a built three-dimensional convolution neural network based on multi-plane image scene representation for training;
the three-dimensional convolution neural network comprises a homography transformation module, a three-dimensional convolution coding and decoding framework and a network output module;
the homography transformation module encodes the position information of the reference image pair acquired by the input reference camera and respectively re-projects the input image into the target camera through the homography transformation matrix;
the three-dimensional convolutional coding and decoding architecture comprises a preprocessing block and a four-layer coding-decoding structure, and a coder and a decoder of each layer are connected into a U-shaped network through jumping;
the network output module generates a multi-plane image scene representation through the weight of the hybrid network prediction and an alpha image, and the image of a target viewpoint is rendered through the synthesis of the multi-plane image from back to front;
and 3, inputting a test image for detection based on the trained network to obtain a final new viewpoint image synthesis result.
2. The method of claim 1, wherein the method comprises the steps of: in step 1, the data preprocessing includes data enhancement of the training image.
3. The method of claim 1, wherein the method comprises: in step 2, homography transforms infer the geometry of the scene by comparing different input images using the same depth.
4. The method of claim 1, wherein the method comprises the steps of: in the homography transformation module, the input reference image is subjected to I based on the parameters of the reference camera 1 And I 2 The corresponding planar scan volume PSV is calculated, the reference image pair is re-projected into the object camera at a set of fixed depths D, respectively, wherein the projected points in the reference view and the object view are linked by a homography matrix.
5. The method of claim 1, wherein the method comprises: in step 2, the three-dimensional convolution coding and decoding framework captures spatial features spanning multiple depth planes, and the sheltering and hiding area of the multi-plane image in the depth planes is predicted.
6. The method of claim 1, wherein the method comprises: in the three-dimensional convolution coding and decoding framework, convolution operation is carried out on 3N color channels of D continuous images with the resolution of H multiplied by W, wherein N bits are input into the number of the images; the first convolution layer in the pre-processing block uses convolution kernels of size 7 × 7 × 7, and the rest of the three-dimensional convolutions use kernels of size 3 × 3 × 3.
7. The method of claim 1, wherein the method comprises: each layer of the codec structure comprises a coding block and a decoding block; the coding blocks are composed of four three-dimensional convolutions, each two convolutions are connected in a jumping mode, and except the coding blocks of the first layer, input tensors are sampled in a downsampling mode; the decoding block is composed of two three-dimensional convolution layers with convolution kernel size of 3 × 3 × 3 and one up-sampling layer.
CN202211191210.4A 2022-09-28 2022-09-28 Interactive viewpoint image synthesis method based on multi-plane image scene representation Pending CN115457182A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211191210.4A CN115457182A (en) 2022-09-28 2022-09-28 Interactive viewpoint image synthesis method based on multi-plane image scene representation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211191210.4A CN115457182A (en) 2022-09-28 2022-09-28 Interactive viewpoint image synthesis method based on multi-plane image scene representation

Publications (1)

Publication Number Publication Date
CN115457182A true CN115457182A (en) 2022-12-09

Family

ID=84307489

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211191210.4A Pending CN115457182A (en) 2022-09-28 2022-09-28 Interactive viewpoint image synthesis method based on multi-plane image scene representation

Country Status (1)

Country Link
CN (1) CN115457182A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116452758A (en) * 2023-06-20 2023-07-18 擎翌(上海)智能科技有限公司 Neural radiation field model acceleration training method, device, equipment and medium
CN118413675A (en) * 2024-07-02 2024-07-30 中国矿业大学 Context-based progressive three-plane coding image compression algorithm and terminal equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116452758A (en) * 2023-06-20 2023-07-18 擎翌(上海)智能科技有限公司 Neural radiation field model acceleration training method, device, equipment and medium
CN116452758B (en) * 2023-06-20 2023-10-20 擎翌(上海)智能科技有限公司 Neural radiation field model acceleration training method, device, equipment and medium
CN118413675A (en) * 2024-07-02 2024-07-30 中国矿业大学 Context-based progressive three-plane coding image compression algorithm and terminal equipment
CN118413675B (en) * 2024-07-02 2024-09-24 中国矿业大学 Context-based progressive three-plane coding image compression algorithm and terminal equipment

Similar Documents

Publication Publication Date Title
Mildenhall et al. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines
Yuan et al. Star: Self-supervised tracking and reconstruction of rigid objects in motion with neural rendering
JP7387193B2 (en) Layered scene decomposition codec system and method
US20220335636A1 (en) Scene reconstruction using geometry and reflectance volume representation of scene
CN115457182A (en) Interactive viewpoint image synthesis method based on multi-plane image scene representation
WO2022156626A1 (en) Image sight correction method and apparatus, electronic device, computer-readable storage medium, and computer program product
US11961266B2 (en) Multiview neural human prediction using implicit differentiable renderer for facial expression, body pose shape and clothes performance capture
US11727628B2 (en) Neural opacity point cloud
TWI813098B (en) Neural blending for novel view synthesis
CN111105432A (en) Unsupervised end-to-end driving environment perception method based on deep learning
CN112233165B (en) Baseline expansion implementation method based on multi-plane image learning visual angle synthesis
CN115035171B (en) Self-supervision monocular depth estimation method based on self-attention guide feature fusion
US11704853B2 (en) Techniques for feature-based neural rendering
CN108924528B (en) Binocular stylized real-time rendering method based on deep learning
Han et al. PIINET: A 360-degree panoramic image inpainting network using a cube map
CN114996814A (en) Furniture design system based on deep learning and three-dimensional reconstruction
Lochmann et al. Real-time Reflective and Refractive Novel-view Synthesis.
CN117990088A (en) Dense visual SLAM method and system using three-dimensional Gaussian back end representation
Liu et al. Deep view synthesis via self-consistent generative network
CN117635801A (en) New view synthesis method and system based on real-time rendering generalizable nerve radiation field
WO2023217867A1 (en) Variable resolution variable frame rate video coding using neural networks
CN117372644A (en) Three-dimensional content generation method based on period implicit representation
CN115297316B (en) Virtual viewpoint synthetic image cavity filling method with context feature fusion
Li et al. DGNR: Density-Guided Neural Point Rendering of Large Driving Scenes
CN117036586A (en) Global feature modeling-based MPI new viewpoint synthesis method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination