CN114463408A

CN114463408A - Free viewpoint image generation method, device, equipment and storage medium

Info

Publication number: CN114463408A
Application number: CN202111564607.9A
Authority: CN
Inventors: 桑新柱; 齐帅; 陈铎; 王鹏; 王华春; 叶晓倩; 颜玢玢
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-12-20
Filing date: 2021-12-20
Publication date: 2022-05-10

Abstract

The application provides a free viewpoint image generation method, a device, equipment and a storage medium, wherein the method comprises the following steps: extracting the characteristics of the multi-viewpoint by combining the internal and external parameters of the multi-viewpoint through a characteristic extraction network, and obtaining a final target viewpoint depth map by combining an unsupervised stereo matching network; performing feature extraction on the multi-view map through a convolutional neural network to obtain a plurality of depth coding images to be processed of the multi-view map; projecting each depth coding image to be processed by combining a final target viewpoint depth image through a DIBR method to obtain a plurality of target depth coding images; and fusing all the target depth coding images through a preset aggregation module to obtain a target viewpoint coding image, and decoding the target viewpoint coding image through a full convolution network to obtain a target free viewpoint image. According to the free viewpoint image generation method, the multi-viewpoint image is generated into the virtual target free viewpoint image through depth estimation, so that the target free viewpoint image has high accuracy.

Description

Free viewpoint image generation method, device, equipment and storage medium

Technical Field

The present application relates to the field of image processing and computer vision technologies, and in particular, to a method, an apparatus, a device, and a storage medium for generating a free viewpoint image.

Background

The traditional two-dimensional image acquisition and display technology cannot meet the increasing viewing requirements of viewers. With the innovation of display technology and the improvement of computing power, a great deal of attention is paid to a method for accurately and efficiently reproducing three-dimensional light field information. In order to reproduce the light field information of a real scene in the real three-dimensional light field display, a camera array needs to be built to collect the real scene, and the three-dimensional light field information is recovered in a dense viewpoint mode. In the process, the method is limited by the physical limitation of the camera, dense viewpoint acquisition is difficult to carry out, and only sparse viewpoint acquisition can be met. In order to generate the content of the dense viewpoint, many virtual viewpoint generation methods have been proposed by researchers, which generate the dense viewpoint using the sparse viewpoint as an input.

The existing multi-plane image-based method has a good effect when the input viewpoints are horizontally arranged, but when the input viewpoint positions rotate, the method is difficult to generate a more accurate result. Although the existing method based on stereoscopic mesh scaffold can effectively deal with the problem that the input viewpoint position includes rotation, the method needs to input a large number of scene images (more than 50 images) in advance, and is very time-consuming (in the order of hours).

That is to say, when the acquired input viewpoint images are sparse, the camera position has large rotation, and the arrangement has no obvious rule, the existing virtual viewpoint generation method is difficult to generate an accurate target free viewpoint.

Disclosure of Invention

The application provides a free viewpoint image generation method, a device, equipment and a storage medium, aiming at generating a target free viewpoint image with high accuracy.

In a first aspect, the present application provides a free viewpoint image generation method, including:

extracting the characteristics of the multi-viewpoint by combining the internal and external parameters of the multi-viewpoint through a characteristic extraction network, and obtaining a final target viewpoint depth map by combining an unsupervised stereo matching network;

performing feature extraction on the multi-view map through a convolutional neural network to obtain a plurality of depth coding maps to be processed of the multi-view map;

projecting each depth coding image to be processed by combining the final target viewpoint depth image through a DIBR method to obtain a plurality of target depth coding images;

and fusing the target depth coding images through a preset aggregation module to obtain a target viewpoint coding image, and decoding the target viewpoint coding image through a full convolution network to obtain a target free viewpoint image.

In one embodiment, the extracting the features of the multi-view map by combining the internal and external parameters of the multi-view map through a feature extraction network and obtaining the final target viewpoint depth map by combining an unsupervised stereo matching network comprises:

determining a multi-view geometric relationship between the multi-view points according to internal and external parameters of the multi-view points;

extracting the features of the multi-view points by combining the multi-view geometrical relationship through the feature extraction network to obtain preset groups of scale feature maps with different resolutions of the multiple view points;

uniformly sampling in the depth direction according to the depth distribution range of the multi-view point, and generating a plurality of depth planes;

and transforming the scale feature maps with different resolutions of the preset groups of the multiple viewpoints to each depth plane to obtain each target depth plane, and obtaining the final target viewpoint depth map according to each target depth plane.

The scale feature maps of the preset set of different resolutions for the plurality of viewpoints comprise a quarter resolution scale feature map,

the transforming the scale feature maps of the preset group of the multiple viewpoints with different resolutions to each depth plane to obtain each target depth plane, and obtaining the final target viewpoint depth map according to each target depth plane, includes:

constructing first homographic transformation matrixes in each depth plane, and transforming the scale characteristic maps of quarter-fold resolution of the multiple viewpoints to each depth plane through each first homographic transformation matrix to obtain each first target depth plane;

combining the first target depth planes according to the depth sequence to construct a first matching cost body;

matching the first matching cost body through a stereo convolution neural network to obtain first probability values of the three-dimensional object belonging to each area in the cost body space;

normalizing the first probability value through a preset function, and performing weighted superposition on the normalized first probability value and the depth value thereof to obtain a first target viewpoint depth map;

and if the resolution of the first target viewpoint depth map is equal to the resolution of the multi-viewpoint, determining the first target viewpoint depth map as the final target viewpoint depth map.

The preset set of different resolution scale feature maps for the plurality of viewpoints comprises a half resolution scale feature map,

after the normalizing the first probability value by a preset function and the weighting and stacking the normalized first probability value and the depth value thereof to obtain the first target viewpoint depth map, the method further includes:

if the resolution of the first target viewpoint depth map is smaller than the resolution of the multi-viewpoint, determining a first depth search range by taking the first target viewpoint depth map as a first initial value according to the first initial value and a first preset initial difference value;

constructing a second homography transformation matrix in each depth plane according to the first depth search range and the internal and external parameters of the plurality of viewpoints;

transforming the scale feature maps with half times of resolution of the multiple viewpoints to each depth plane through each second homography transformation matrix to obtain each second target depth plane;

combining the second target depth planes according to the depth sequence to construct a second matching cost body;

matching the second matching cost body through the stereo convolution neural network to obtain second probability values of all regions belonging to the three-dimensional object in the cost body space;

normalizing the second probability value through the preset function, and performing weighted superposition on the normalized second probability value and the depth value thereof to obtain a second target viewpoint depth map;

and if the resolution of the second target viewpoint depth map is equal to the resolution of the multi-viewpoint, determining the second target viewpoint depth map as the final target viewpoint depth map.

The scale feature maps of the preset set of different resolutions for the plurality of viewpoints comprise a scale feature map of one resolution,

after the second probability value is normalized through the preset function and the normalized second probability value and the depth value thereof are weighted and superimposed to obtain a second target viewpoint depth map, the method further includes:

if the resolution of the second target viewpoint depth map is smaller than the resolution of the multi-viewpoint, taking the second target viewpoint depth map as a second initial value, and determining a second depth search range according to the second initial value and a second preset initial difference value;

constructing a third homography transformation matrix in each depth plane according to the second depth search range and the internal and external parameters of the plurality of viewpoints;

transforming the scale feature maps with one time resolution of the multiple viewpoints to each depth plane through each third homography transformation matrix to obtain each third target depth plane;

combining the third target depth planes according to the depth sequence to construct a third matching cost body;

matching the third matching cost body through the stereo convolution neural network to obtain third probability values of all regions belonging to the three-dimensional object in the cost body space;

normalizing the third probability value through the preset function, and performing weighted superposition on the normalized third probability value and the depth value thereof to obtain a third target viewpoint depth map;

and if the resolution of the third target viewpoint depth map is equal to the resolution of the multi-viewpoint, determining the third target viewpoint depth map as the final target viewpoint depth map.

The obtaining of the multiple depth coding maps to be processed of the multi-view map by performing feature extraction on the multi-view map through the convolutional neural network comprises:

extracting features of the multi-view map through a VGG convolutional neural network, and outputting feature maps of the first three stages, wherein the sizes of the feature maps of the first three stages are the original size, the half size and the quarter size of the multi-view map respectively;

performing nearest neighbor interpolation on the feature maps with the size of one half and one quarter in the feature maps of the first three stages, and upsampling until the resolution of the feature maps is consistent with that of the multi-view map;

and splicing the processed feature map with the half size and the processed feature map with the quarter size with the feature map with the original size on the feature dimension to generate each depth coding map to be processed.

The projecting each depth coding image to be processed by combining the final target viewpoint depth image through a DIBR method to obtain a plurality of target depth coding images, including:

and projecting each depth coding image to be processed to a target viewpoint by the DIBR method in combination with the final target viewpoint depth image and the corresponding virtual camera parameters thereof to obtain each target depth coding image.

In a second aspect, the present application provides a free viewpoint image generation apparatus, including:

the first extraction module is used for extracting the characteristics of the multi-viewpoint by combining the internal and external parameters of the multi-viewpoint through a characteristic extraction network and obtaining a final target viewpoint depth map by combining an unsupervised stereo matching network;

the second extraction module is used for extracting the features of the multi-view map through a convolutional neural network to obtain a plurality of depth coding maps to be processed of the multi-view map;

the projection module is used for projecting each depth coding image to be processed by combining the final target viewpoint depth image through a DIBR method to obtain a plurality of target depth coding images;

and the fusion decoding module is used for fusing the target depth coding images through the preset aggregation module to obtain a target viewpoint coding image, and decoding the target viewpoint coding image through a full convolution network to obtain a target free viewpoint image.

In a third aspect, the present application further provides an electronic device including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the free viewpoint map generation method according to the first aspect when executing the program.

In a fourth aspect, the present application further provides a computer-readable storage medium comprising a computer program, which when executed by the processor, implements the steps of the free viewpoint map generation method of the first aspect.

In a fifth aspect, the present application further provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, implement the steps of the free viewpoint map generation method of the first aspect.

According to the free viewpoint image generation method, the free viewpoint image generation device, the free viewpoint image generation equipment and the storage medium, in the free viewpoint image generation process, the multi-viewpoint image is subjected to depth estimation through the feature extraction network and internal and external parameters of the multi-viewpoint image, the depth image obtained through the depth estimation is combined with the convolutional neural network, the DIBR method, the preset aggregation module and the full convolutional network, the multi-viewpoint image of the sparse viewpoint is combined with the multiple neural networks through the depth estimation to generate the virtual target free viewpoint image of the dense viewpoint, namely, the depth estimation and viewpoint generation are achieved through the neural networks, and then the target free viewpoint image with high accuracy is generated through joint optimization.

Drawings

In order to more clearly illustrate the technical solutions in the present application or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a free viewpoint image generation method provided in the present application;

FIG. 2 is a schematic diagram of an unsupervised learning network structure for depth estimation of the free viewpoint image generation method provided in the present application;

fig. 3 is a second schematic flow chart of the free viewpoint image generation method provided in the present application;

fig. 4 is a third schematic flowchart of a free viewpoint image generation method provided in the present application;

fig. 5 is a fourth schematic flowchart of a free viewpoint image generation method provided in the present application;

fig. 6 is a fifth schematic flowchart of a free viewpoint image generation method provided in the present application;

fig. 7 is a schematic structural diagram of a free viewpoint image generation apparatus provided in the present application;

fig. 8 is a schematic structural diagram of an electronic device provided in the present application.

Detailed Description

To make the purpose, technical solutions and advantages of the present application clearer, the technical solutions in the present application will be clearly and completely described below with reference to the drawings in the present application, and it is obvious that the described embodiments are some, but not all embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The following describes a free viewpoint image generation method, apparatus, device, and storage medium provided by the present application with reference to fig. 1 to 8.

Referring to fig. 1 to 8, fig. 1 is a schematic flow chart of a free viewpoint map generation method provided in the present application; FIG. 2 is a schematic diagram of an unsupervised learning network structure for depth estimation of the free viewpoint image generation method provided in the present application; fig. 3 is a second schematic flow chart of the free viewpoint image generation method provided in the present application; fig. 4 is a third schematic flowchart of a free viewpoint image generation method provided in the present application; fig. 5 is a fourth schematic flowchart of a free viewpoint image generation method provided in the present application; fig. 6 is a fifth schematic flowchart of a free viewpoint image generation method provided in the present application; fig. 7 is a schematic structural diagram of a free viewpoint image generation apparatus provided in the present application; fig. 8 is a schematic structural diagram of an electronic device provided in the present application.

The embodiments of the present application provide an embodiment of a free viewpoint image generation method, and it should be noted that, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be completed in an order different from that in the flowchart.

The noun explains:

application scenarios: in a sparse viewpoint acquisition scene (default input is 5 viewpoints), a virtual viewpoint image of a position adjacent to the input camera position is generated.

A target viewpoint: the virtual viewpoint to be generated is determined by K, R, T three parameters.

Depth estimation: and taking the sparse viewpoint as input, and estimating a depth map of the target viewpoint.

DIBR: for example, the color image of the viewpoint 1 is projected to the viewpoint 2 according to the spatial geometrical relationship and the pose relationship between the Depth image of the viewpoint 1 and the viewpoints 1 and 2, so as to generate a color image of the viewpoint 2.

The embodiment of the present application exemplifies an electronic device as an execution subject, and the embodiment of the present application uses a free viewpoint generating system as one of forms of the electronic device, and does not limit the electronic device.

Specifically, referring to fig. 1, fig. 1 is a schematic flow diagram of a free viewpoint map generation method provided in the present application. The method for generating the free viewpoint image provided by the embodiment of the application comprises the following steps:

and step S10, extracting the characteristics of the multi-viewpoint through a characteristic extraction network in combination with the internal and external parameters of the multi-viewpoint and obtaining a final target viewpoint depth map in combination with an unsupervised stereo matching network.

When it is necessary to generate a free viewpoint map of dense viewpoints, it is necessary to acquire a multi-viewpoint map, which is a multi-viewpoint map composed of sparse viewpoints, and input the multi-viewpoint map to the free viewpoint generating system. Further, the multi-view map does not refer to one map, but a plurality of different maps for a plurality of perspectives of one object, and the multi-view map is also a 2d (two dimensional) plan view.

The free viewpoint generation system reads an input multi-viewpoint map, and internal and external parameters of the multi-viewpoint map, wherein the external parameters include but are not limited to a K parameter and a T parameter, the internal parameters include but are not limited to an R parameter, the external parameter ([ R | T ]) describes a transformation relation between a world coordinate system and a camera coordinate system, the R parameter represents a rotation parameter, and the T parameter represents a translation parameter. The intrinsic parameters (K) describe the transformation relationship between the camera coordinate system, the image coordinate system, and the pixel coordinate system, and the K parameters include, but are not limited to, principal point coordinates, focal length, unit pixel width, and unit pixel height. Then, the free viewpoint generating system performs feature extraction on the multi-viewpoint through a feature extraction network in combination with the internal and external parameters of the multi-viewpoint, and obtains a final target viewpoint depth map through an unsupervised stereo matching network, where the feature extraction network is a 2D multi-scale feature extraction network, that is, it can be understood that the free viewpoint generating system performs feature extraction on the 2D multi-viewpoint through the 2D multi-scale feature extraction network in combination with the K parameter, the T parameter, and the R parameter of the multi-viewpoint, to obtain a final target viewpoint depth map, as specifically described in step S101 to step S104.

Further, the description of steps S101 to S104 is as follows:

step S101, determining a multi-view geometric relationship between the multi-view points according to internal and external parameters of the multi-view points;

step S102, combining the multi-view geometrical relationship through the feature extraction network to extract features of the multi-view points to obtain scale feature maps of preset groups of the multi-view points with different resolutions;

step S103, carrying out uniform sampling in the depth direction according to the depth distribution range of the multi-view point, and generating a plurality of depth planes;

and step S104, converting the scale feature maps of the preset groups of the multiple viewpoints with different resolutions to each depth plane to obtain each target depth plane, and obtaining the final target viewpoint depth map according to each target depth plane.

Specifically, the free viewpoint generating system first determines a multi-view geometric relationship between multi-view points according to the inside and outside parameters of the multi-view points. Then, the free viewpoint generation system performs feature extraction on the input multi-viewpoint images through a 2D multi-scale feature extraction network in combination with multi-view geometric relations between the multi-viewpoint images, and outputs scale feature maps of preset groups of the multi-viewpoint images with different resolutions, wherein the minimum resolution can be 1/(2)^n-1) And n is a preset group determined by the number of images in the input multi-view map. The resolution of the scale feature map is relative to the resolution of the original image from the plurality of viewpoints, and for example, 1/4 times the resolution of the scale feature map is understood to be the resolution of the original image from the plurality of viewpoints1/4 of (1). Further, in this embodiment, if the number of images of the multi-view point map is 5, the preset group is 5, and the different resolutions of the 5 groups of the multiple view points are 1/32 times, 1/16 times, 1/4 times, 1/2 times and 1 time, where the resolution of 1 time, i.e. the resolution of the scale feature map, is the original image resolution of the multiple view points; also, if the number of images of the multi-view is 3, the preset group is 3, and the different resolutions of the 3 groups of the multiple views are 1/4 times, 1/2 times and 1 time,

further, the free viewpoint generating system determines a minimum depth value D of the multi-viewpoint from the captured scene of the multi-viewpoint_minAnd a maximum depth value D_maxAccording to the minimum depth value D_minAnd a maximum depth value D_maxDetermining a depth distribution Range of a Multi-View map [ D_min，D_max]. Then, the free viewpoint generating system generates a free viewpoint in accordance with the depth distribution range [ D ]_min，D_max]Uniform sampling in the depth direction generates a series of depth planes. Then, the free viewpoint generating system transforms preset groups of scale feature maps with different resolutions of multiple viewpoints into each depth plane to obtain each transformed target depth plane, and splices each target depth plane to obtain a final target viewpoint depth map, which is specifically analyzed in combination with fig. 2, where fig. 2 is a schematic diagram of an unsupervised learning network structure for depth estimation of the free viewpoint map generating method provided by the present application.

In this embodiment, the number of images in a multi-view is taken as 3 for example, and specifically: the free viewpoint generating system reads an input picture, i.e. a multi-viewpoint I₁I.e. multi-view map I₂And i.e. multi-view map I₃Multi-view map I₁I.e. multi-view map I₂And i.e. multi-view map I₃W and H, and performing feature extraction through a 2D multi-scale feature extraction network to obtain a multi-view point diagram I₁I.e. multi-view map I₂And i.e. multi-view map I₃1/4 times, i.e., width H/4 and height W/4, and multi-view map I₁I.e. multi-view map I₂And i.e. multi-view map I₃1/2 times, i.e., H/2 in width and W/2 in height, andview point diagram I₁I.e. multi-view map I₂And i.e. multi-view map I₃The scale feature map is 1 time of, namely the scale feature map with the width H and the height W.

Further, the free viewpoint generation system transforms a preset set of different resolution scale feature maps of the plurality of viewpoints into respective depth planes in order of minimum resolution to maximum resolution. Therefore, the free viewpoint generating system transforms the feature map with the lowest resolution to each depth plane, and this embodiment can be understood as that the free viewpoint generating system transforms the scale feature map with the width of H/4 and the height of W/4 to each depth plane, then constructs a 3D matching cost body of the first stage according to the transformed target depth plane, and then matches the 3D matching cost body of the first stage through a 3D CNN (three Dimensional Convolutional Neural network, 3D Convolutional Neural network), so as to generate the depth map of the first stage. Finally, the free viewpoint generating system determines whether the first-stage depth map meets the requirement, and if it is determined that the first-stage depth map meets the requirement, the free viewpoint generating system determines the first-stage depth map as the final target viewpoint depth map, which is specifically described in steps S1041 to S1045.

Further, if it is determined that the depth map of the first stage does not meet the requirement, the free viewpoint generating system transforms the feature map of the intermediate resolution to each depth plane, which can be understood in this embodiment as transforming the scale feature map with the width of H/2 and the height of W/2 to each depth plane by the free viewpoint generating system, then constructs a 3D matching cost body of the second stage according to the transformed target depth plane, and then matches the 3D matching cost body of the second stage through 3D CNN to generate the depth map of the second stage. Finally, the free viewpoint generating system determines whether the second-stage depth map meets the requirement, and if it is determined that the second-stage depth map meets the requirement, the free viewpoint generating system determines the second-stage depth map as the final target viewpoint depth map, specifically, as described in steps S1046 to S10412

Further, if it is determined that the second-stage depth map does not meet the requirements, the free viewpoint generating system transforms the feature map with the highest resolution to each depth plane, which can be understood in this embodiment as transforming the scale feature map with the width H and the height W to each depth plane by the free viewpoint generating system, then constructs a 3D matching cost body of the third stage according to the transformed target depth plane, and then matches the 3D matching cost body of the third stage through 3D CNN to generate the third-stage depth map. Finally, the free viewpoint generating system determines whether the third-stage depth map meets the requirements, and if it is determined that the third-stage depth map meets the requirements, the free viewpoint generating system determines the third-stage depth map as the final target viewpoint depth map, which is specifically described in steps S10413 to S10419.

Further, the embodiment of the application minimizes an error function and optimizes network parameters in an unsupervised learning mode. Therefore, the method and the device reduce dependence on deep tag data and enhance model generalization.

According to the depth estimation method and device, the depth estimation and the neural network are combined, so that the accuracy of the depth estimation is guaranteed.

And step S20, performing feature extraction on the multi-view map through a convolutional neural network to obtain a plurality of depth coding maps to be processed of the multi-view map.

It should be noted that the convolutional neural network in this embodiment may be a VGG-16 convolutional neural network, or may be a VGG-19 convolutional neural network.

Therefore, the free viewpoint generating system performs feature extraction on the input multi-viewpoint of the sparse viewpoint through the VGG-16 or VGG-19 convolutional neural network, and outputs a plurality of to-be-processed depth coding patterns E of the multi-viewpoint_iSpecifically, the steps S201 to S203 are described.

And step S30, projecting each depth coding image to be processed by combining the final target viewpoint depth image through a DIBR method to obtain a plurality of target depth coding images.

And (b) the free viewpoint generating system performs DIBR (Depth-image-based rendering) on each Depth coding image to be processed by combining the final target viewpoint Depth image through a DIBR method, and projects the sparse viewpoints to the target viewpoints to obtain each target Depth coding image, specifically as in step a, wherein the DIBR method is a geometric calculation method.

Further, step a, combining the final target viewpoint depth map and the corresponding virtual camera parameters thereof, projecting each depth coding map to be processed to a target viewpoint by the DIBR method, and obtaining each target depth coding map.

In particular, the free viewpoint generation system determines a final target viewpoint depth map and its corresponding virtual camera parameters, including but not limited to K_tgtParameter, R_tgtParameter and T_tgtAnd (4) parameters. Then, the free viewpoint generating system generates a depth map according to the final target viewpoint and K thereof_tgtParameter, R_tgtParameter and T_tgtCarrying out DIBR transformation on each depth coding image to be processed by combining a DIBR method, projecting the sparse view point to a target view point to obtain each target depth coding image E_i→gtgThe specific transformation process is described in equation 1 and equation 2.

Equation 1:

equation 2: e_i→tgt(p)＝E_i(p’_i)

Where P is any pixel coordinate in the target viewpoint image coordinate system, D is the depth value of the current pixel, KRT is the position parameter of the multi-viewpoint, P'_iProjecting the pixel p in the target viewpoint coordinate system to a coordinate value in the multi-viewpoint coordinate system, E_iThe depth coding map to be processed.

According to the method and the device, the generation of the target depth coding graph is combined with the neural network, so that the accuracy of the target depth coding graph is guaranteed.

And step S40, fusing the target depth coding images through a preset aggregation module to obtain a target viewpoint coding image, and decoding the target viewpoint coding image through a full convolution network to obtain a target free viewpoint image.

It should be noted that, in this embodiment, the preset aggregation module is a self-attention-ray aggregation module, and therefore, the free viewpoint generating system fuses the target depth coding patterns located at the same pixel coordinate position into one eigenvector through the self-attention-ray aggregation module, so as to obtain the target viewpoint coding pattern. And then, the free viewpoint generating system decodes the target viewpoint coding image through a plurality of cascaded U-Net full convolution networks to generate a target free viewpoint image carrying dense free viewpoints, wherein the target free viewpoint image is a 2D color image. Finally, the free viewpoint generating system also needs to calculate the generated target free viewpoint diagram and the real target free viewpoint diagram, train the neural network with the calculated Loss, and update the network parameters.

In the process of generating the free viewpoint image, firstly, depth estimation is performed on a multi-viewpoint image through a feature extraction network and internal and external parameters of the multi-viewpoint image, then, the depth image obtained through the depth estimation is combined with a convolutional neural network, a DIBR method, a preset aggregation module and a full convolutional network, and the multi-viewpoint image of a sparse viewpoint is combined with the multiple neural networks through the depth estimation to generate a virtual target free viewpoint image of a dense viewpoint.

Referring to fig. 3, fig. 3 is a second schematic flow chart of the free viewpoint image generating method provided in the present application, and step S104 includes:

step S1041, constructing a first homography transformation matrix in each depth plane, and transforming the scale feature maps with quarter-times resolution of the multiple viewpoints to each depth plane through each first homography transformation matrix to obtain each first target depth plane;

step S1042, combining the first target depth planes according to the depth sequence to construct a first matching cost body;

step S1043, matching the first matching cost body through a stereo convolution neural network to obtain a first probability value of each area belonging to the three-dimensional object in the cost body space;

step S1044, normalizing the first probability value through a preset function, and performing weighted superposition on the normalized first probability value and the depth value thereof to obtain the first target viewpoint depth map;

step S1045, if the resolution of the first target viewpoint depth map is equal to the resolution of the multi-viewpoint, determining the first target viewpoint depth map as the final target viewpoint depth map.

Specifically, the scale feature maps of the preset group of different resolutions for multiple viewpoints of the present embodiment include a quarter-resolution scale feature map, i.e., a scale feature map having a width of H/4 and a height of W/4. Therefore, the free viewpoint generating system constructs a first homography transformation matrix of each depth plane in each depth plane, and transforms the scale feature maps with the width of H/4 and the height of W/4 into each depth plane through the first homography transformation matrix of each depth plane, so as to obtain each transformed first target depth plane. Then, the free viewpoint generating system combines the first target depth planes according to a depth order to construct a first 3D matching cost body, wherein the depth order may be from high to low or from low to high. And then, the free viewpoint generating system matches the first 3D matching cost body through a 3D stereo convolution neural network to obtain a first probability value of each area in the cost body space belonging to the three-dimensional object. Then, the free viewpoint generating system normalizes the first probability value through a preset Softmax function to obtain a normalized first probability value, and then performs weighted superposition on the normalized first probability value and a corresponding depth value to obtain a first target viewpoint depth map. And finally, the free viewpoint generating system determines whether the resolution of the first target viewpoint depth map is equal to the resolution of the multi-viewpoint, and if the resolution of the first target viewpoint depth map is determined to be equal to the resolution of the multi-viewpoint, the free viewpoint generating system determines the first target viewpoint depth map as the final target viewpoint depth map.

According to the method and the device, the depth estimation and the neural network are combined, so that the accuracy of the depth estimation is guaranteed, and the accuracy of the target free viewpoint diagram is further guaranteed.

Referring to fig. 4, fig. 4 is a third schematic flow chart of the free viewpoint map generating method provided in the present application, and after step S1044, the method further includes:

step S1046, if the resolution of the first target viewpoint depth map is smaller than the resolution of the multi-viewpoint, determining a first depth search range by using the first target viewpoint depth map as a first initial value and according to the first initial value and a first preset initial difference value;

step S1047, constructing a second homography transformation matrix in each depth plane according to the first depth search range and the internal and external parameters of the plurality of viewpoints;

step S1048, transforming the scale feature maps of half times the resolution of the multiple viewpoints to each depth plane through each second homography transformation matrix, to obtain each second target depth plane;

step S1049, combining each second target depth plane according to the depth sequence to construct a second matching cost body;

step S10410, matching the second matching cost body through the stereo convolution neural network to obtain second probability values of all areas belonging to the three-dimensional object in the cost body space;

step S10411, normalizing the second probability value through the preset function, and weighting and superposing the normalized second probability value and the depth value thereof to obtain a second target viewpoint depth map;

step S10412, if the resolution of the second target viewpoint depth map is equal to the resolution of the multi-viewpoint, determining the second target viewpoint depth map as the final target viewpoint depth map.

Specifically, the scale feature maps of the preset group of different resolutions for the multiple viewpoints of the present embodiment include a scale feature map of half the resolution, i.e., a scale feature map having a width of H/2 and a height of W/2. Therefore, if it is determined that the resolution of the first target viewpoint depth map is less than the resolution of the multi-viewpoint, the free viewpointThe generating system takes the first target viewpoint depth map as a first initial value D₁And determining a first initial value D₁First preset initial difference value delta₁According to a first initial value D₁And a first preset initial difference value delta₁The difference value of (A) is obtained as a minimum search value, i.e. the minimum search value is D₁-Δ₁According to a first initial value D₁And a first preset initial difference value delta₁The sum of (a) and (b) is the maximum search value, i.e. the maximum search value is D₁+Δ₁Then, according to the minimum search value and the maximum search value, determining a first depth search range, namely the first depth search range is [ D ]₁-Δ₁，D₁+Δ₁]. Next, the free viewpoint generating system searches for a range [ D ] according to the first depth₁-Δ₁，D₁+Δ₁]And K parameters, R parameters, and T parameters of a plurality of viewpoints, a second homography transformation matrix is constructed in each depth plane. And then, the free viewpoint generating system transforms the dimension characteristic graphs with the width of H/2 and the height of W/2 into respective depth planes through second homography transformation matrixes of the depth planes to obtain transformed second target depth planes. Then, the free viewpoint generating system combines the second target depth planes according to a depth order to construct a second 3D matching cost body, where the depth order may be from high to low or from low to high. And then, the free viewpoint generating system matches the second 3D matching cost body through a 3D stereo convolution neural network to obtain second probability values of the three-dimensional objects belonging to each region in the cost body space. Then, the free viewpoint generating system normalizes the second probability value through a preset Softmax function to obtain a normalized second probability value, and performs weighted superposition on the normalized second probability value and a depth value corresponding to the normalized second probability value to obtain a second target viewpoint depth map. Finally, the free viewpoint generating system determines whether the resolution of the second target viewpoint depth map is equal to the resolution of the multi-viewpoint, and if it is determined that the resolution of the second target viewpoint depth map is equal to the resolution of the multi-viewpoint, the free viewpoint generating system determines the second target viewpoint depth map as the final targetAnd marking the viewpoint depth map.

Referring to fig. 5, fig. 5 is a fourth flowchart of the free viewpoint image generating method provided in the present application, and after step S10411, the method further includes:

step S10413, if the resolution of the second target viewpoint depth map is smaller than the resolution of the multi-viewpoint, determining a second depth search range by using the second target viewpoint depth map as a second initial value and according to the second initial value and a second preset initial difference value;

step S10414, constructing a third homography transformation matrix in each depth plane according to the second depth search range and the internal and external parameters of the plurality of viewpoints;

step S10415, transforming the scale feature maps with one time resolution of the multiple viewpoints to each depth plane through each third homography transformation matrix, to obtain each third target depth plane;

step S10416, combining each of the third target depth planes according to a depth order, and constructing a third matching cost body;

step S10417, matching the third matching cost volume through the stereo convolutional neural network to obtain third probability values of each region belonging to the three-dimensional object in the cost volume space;

step S10418, normalizing the third probability value by the preset function, and performing weighted stacking on the normalized third probability value and a depth value thereof to obtain a third target viewpoint depth map;

in step S10419, if the resolution of the third target viewpoint depth map is equal to the resolution of the multi-viewpoint, determining the third target viewpoint depth map as the final target viewpoint depth map.

Specifically, the scale feature maps of different resolutions in the preset group of the multiple viewpoints of the present embodiment include a scale feature map of one-time resolution, i.e., wideA dimension profile with a degree H and a height W. Therefore, if it is determined that the resolution of the second target viewpoint depth map is less than the resolution of the multi-viewpoint, the free viewpoint generating system takes the second target viewpoint depth map as the second initial value D₂And determining a second initial value D₂Second preset initial difference value delta₂According to a second initial value D₂And a second preset initial difference value delta₂The difference value of (A) is obtained as a minimum search value, i.e. the minimum search value is D₂-Δ₂According to a second initial value D₂And a second preset initial difference value delta₂The sum of (a) and (b) is the maximum search value, i.e. the maximum search value is D₂+Δ₂Then, according to the minimum search value and the maximum search value, determining a second depth search range, namely the second depth search range is [ D ]₂-Δ₂，D₂+Δ₂]. Next, the free viewpoint generating system searches for a range [ D ] according to the second depth₂-Δ₂，D₂+Δ₂]And K parameters, R parameters, and T parameters of a plurality of viewpoints, a third homography transformation matrix is constructed in each depth plane. And then, the free viewpoint generating system transforms the scale feature maps with the width H and the height W into respective depth planes through the third homography transformation matrix of each depth plane to obtain each transformed third target depth plane. And then, the free viewpoint generating system combines the third target depth planes according to a depth order to construct a third 3D matching cost body, wherein the depth order can be from high to low or from low to high. And then, the free viewpoint generating system matches a third 3D matching cost body through a 3D stereo convolution neural network to obtain a third probability value that each region in the cost body space belongs to the three-dimensional object. Then, the free viewpoint generating system normalizes the third probability value through a preset Softmax function to obtain a normalized third probability value, and performs weighted superposition on the normalized third probability value and a depth value corresponding to the normalized third probability value to obtain a third target viewpoint depth map. Finally, the free viewpoint generating system determines whether the resolution of the third target viewpoint depth map is equal to the resolution of the multi-viewpoint depth map, and if so, determines the third target viewpoint depth mapThe resolution of the target viewpoint depth map is equal to that of the multi-viewpoint, and the free viewpoint generation system determines the third target viewpoint depth map as the final target viewpoint depth map.

Referring to fig. 6, fig. 6 is a fifth flowchart of the free viewpoint image generating method provided in the present application, and step S20 includes:

step S201, extracting features of the multi-view map through a VGG convolutional neural network, and outputting feature maps of the first three stages, wherein the sizes of the feature maps of the first three stages are the original size, the half size and the quarter size of the multi-view map respectively;

step S202, carrying out nearest neighbor interpolation on feature maps with one-half size and one-fourth size in the feature maps of the first three stages, and upsampling until the resolution is consistent with that of the multi-view map;

and step S203, splicing the processed feature map with the size of one half and the processed feature map with the size of one quarter with the feature map with the original size in the feature dimension to generate each depth coding map to be processed.

Specifically, the free viewpoint generation system performs feature extraction on the multi-viewpoint map through a VGG-16 or VGG-19 convolutional neural network, and outputs feature maps of the first three stages, wherein the sizes of the feature maps of the first three stages are respectively equal to the original size of the multi-viewpoint map, the half size of the multi-viewpoint map, and the quarter size of the multi-viewpoint map compared with the size of the multi-viewpoint map. Then, the free viewpoint generating system performs nearest neighbor interpolation on the feature map of one-half size of the multi-view map out of the feature maps of the first three stages, up-samples the resolution of the feature map of one-half size of the multi-view map to be identical to the resolution of the multi-view map, and performs nearest neighbor interpolation on the feature map of one-quarter size of the multi-view map to be identical to the resolution of the multi-view map. And finally, the free viewpoint generating system splices the processed feature map with the size of one half and the processed feature map with the size of one quarter and the feature map with the original size in feature dimension to generate each depth coding map to be processed.

According to the method and the device, the generation process of the depth coding graph to be processed is combined with the neural network, so that the accuracy of the depth coding graph to be processed is guaranteed, and the accuracy of the target free viewpoint graph is further guaranteed.

Further, the following describes the free viewpoint image generating apparatus provided in the present application, and the free viewpoint image generating apparatus described below and the free viewpoint image generating method described above may be referred to in correspondence with each other.

As shown in fig. 7, fig. 7 is a schematic configuration diagram of a free viewpoint image generating apparatus according to the present application, the free viewpoint image generating apparatus including:

a first extraction module 701, configured to extract features of a multi-view through a feature extraction network in combination with internal and external parameters of the multi-view, and obtain a final target view depth map in combination with an unsupervised stereo matching network;

a second extraction module 702, configured to perform feature extraction on the multi-view through a convolutional neural network to obtain multiple to-be-processed depth coding maps of the multi-view;

a projection module 703, configured to project each to-be-processed depth code map by combining the final target viewpoint depth map through a DIBR method, so as to obtain multiple target depth code maps;

and the fusion decoding module 704 is configured to fuse the target depth coding patterns through a preset aggregation module to obtain a target viewpoint coding pattern, and decode the target viewpoint coding pattern through a full convolution network to obtain a target free viewpoint.

Further, the first extraction module 701 is further configured to:

Further, the second extraction module 702 is further configured to:

Further, the projection module 703 is further configured to:

The specific embodiment of the free viewpoint image generation apparatus provided in the present application is substantially the same as the embodiments of the free viewpoint image generation method described above, and details thereof are not described herein.

Fig. 8 illustrates a physical structure diagram of an electronic device, and as shown in fig. 8, the electronic device may include: a processor (processor)810, a communication Interface 820, a memory 830 and a communication bus 840, wherein the processor 810, the communication Interface 820 and the memory 830 communicate with each other via the communication bus 840. Processor 810 may invoke logic instructions in memory 830 to perform a free viewpoint map generation method comprising:

In addition, the logic instructions in the memory 830 can be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present application also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which when executed by a computer, enable the computer to perform the free viewpoint map generation method provided by the above methods, the method comprising:

In yet another aspect, the present application further provides a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor is implemented to perform the free viewpoint map generation methods provided above, the method including:

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (personal computer, server, network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A free viewpoint image generation method, comprising:

2. The free viewpoint image generation method according to claim 1, wherein the extracting the feature of the multi-viewpoint image by combining the inside and outside parameters of the multi-viewpoint image through a feature extraction network and obtaining the final target viewpoint depth map by combining an unsupervised stereo matching network comprises:

and transforming the scale feature maps of the preset groups of the multiple viewpoints with different resolutions to each depth plane to obtain each target depth plane, and obtaining the final target viewpoint depth map according to each target depth plane.

3. The free viewpoint image generation method according to claim 2, wherein the scale feature maps of the preset group of the plurality of viewpoints at different resolutions include a scale feature map of quarter resolution,

4. The free viewpoint image generation method according to claim 3, wherein the scale feature maps of the preset group of different resolutions for the plurality of viewpoints include a scale feature map of half the resolution,

5. The free viewpoint image generation method according to claim 4, wherein the preset group of different resolution scale feature maps of the plurality of viewpoints include a one-time resolution scale feature map,

6. The method according to claim 1, wherein the obtaining the plurality of depth-coded maps to be processed of the multi-view map by performing feature extraction on the multi-view map by a convolutional neural network comprises:

performing feature extraction on the multi-view diagram through a VGG convolutional neural network, and outputting feature diagrams of the first three stages, wherein the sizes of the feature diagrams of the first three stages are the original size, the half size and the quarter size of the multi-view diagram respectively;

7. The method according to claim 1, wherein the projecting each of the to-be-processed depth code maps by combining the final target viewpoint depth map through a DIBR method to obtain a plurality of target depth code maps comprises:

8. A free viewpoint image generation apparatus, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the free viewpoint map generation method according to any one of claims 1 to 7 when executing the computer program.

10. A non-transitory computer-readable storage medium comprising a computer program, wherein the computer program is configured to implement the steps of the free viewpoint map generation method according to any one of claims 1 to 7 when executed by a processor.