CN111325693A

CN111325693A - Large-scale panoramic viewpoint synthesis method based on single-viewpoint RGB-D image

Info

Publication number: CN111325693A
Application number: CN202010113813.7A
Authority: CN
Inventors: 杨勐; 马勇; 郑南宁
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2020-02-24
Filing date: 2020-02-24
Publication date: 2020-06-23
Anticipated expiration: 2040-02-24
Also published as: CN111325693B

Abstract

The invention discloses a large-scale panoramic viewpoint synthesis method based on a single viewpoint RGB-D image. And then taking the virtual viewpoint image with the hole and the mask image as the input of the CNN neural network, wherein the output result of the network model is the image after the virtual viewpoint hole is filled, taking the virtual viewpoint newly generated in the horizontal direction as a new input, and obtaining the virtual viewpoint image at any position in the space through 3D-Warping and CNN neural network hole filling. The invention realizes the overall improvement of the quality of the virtual viewpoints and has important guiding significance for improving the synthesis of the large-parallax virtual viewpoints.

Description

Large-scale panoramic viewpoint synthesis method based on single-viewpoint RGB-D image

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a large-scale panoramic viewpoint synthesis method based on a single-viewpoint RGB-D image.

Background

With the development of science and technology and the improvement of living standard of people, the visual experience brought by the traditional two-dimensional video screen can not meet the film watching requirement of people more and more. Therefore, many research institutions and industries have paid more and more attention to the improvement of the vividness of the video screen, and all-round video screens, free viewpoint video screens and interactive stereoscopic video screens have come into force. The free viewpoint video screen can meet the requirement of a user for watching the video screen from any viewpoint, and is widely applied. In recent years, the development of free viewpoint video screens is further promoted by the rapid development of image acquisition technology, network communication technology, computer technology and stereo display technology. However, the freedom of viewing and the immersion of the free viewpoint screen are improved at the cost of huge screen data, which brings serious challenges to screen acquisition, storage and transmission. Moreover, due to the limitation of bandwidth and equipment, it is impossible to place a camera at any viewpoint, which requires a known reference viewpoint screen to obtain a virtual viewpoint screen required by a user, so virtual viewpoint rendering becomes a key technology for dealing with a free viewpoint screen.

The existing virtual viewpoint Rendering methods can be basically divided into two categories, one is a model-Based Rendering (MBR) method, and the other is an Image-Based Rendering (IBR) method. In the Image-based rendering method, Depth information in a reference Image is fully utilized by a Depth Image Based Rendering (DIBR) technology, a color Image and three-dimensional Depth information are effectively combined, the rendering speed of a natural scene is increased, and the method has a development prospect compared with other IBR technologies, and therefore the method is widely concerned. Currently, depth image-based virtual viewpoint rendering (DIBR) methods can be classified into two types, i.e., virtual viewpoint rendering based on single viewpoint mapping and virtual viewpoint rendering based on dual viewpoint mapping, according to the number of reference viewpoints. Wherein the virtual viewpoint rendering based on the dual viewpoint mapping renders a virtual viewpoint image using color images of two reference viewpoints and corresponding depth image interpolation new viewpoint positions. The method can acquire information of the shielded area from different reference viewpoint images to fill the hollow area; virtual viewpoint rendering based on single viewpoint mapping obtains a virtual viewpoint image through 3D Warping using a color image of one reference viewpoint and a corresponding depth image. The limitation is that the blocked area in the virtual viewpoint generates a lot of large hollow areas, and the hollow areas are difficult to fill due to lack of information, which is a great challenge for drawing a high-quality virtual viewpoint.

At present, there are many problems to be solved in mapping a large parallax virtual viewpoint based on a single viewpoint:

firstly, an inaccurate depth image can cause distortion phenomena such as fine cracks, artifacts, deformation and the like of a drawn virtual viewpoint image;

secondly, the occluded area in the reference viewpoint image may appear in the new virtual viewpoint image, and the information of the occluded area is unknown, so that a large hollow area is formed, and the edge of a foreground object of the synthesized virtual viewpoint image has a black edge, which affects the quality of the virtual viewpoint. The double viewpoints can obtain information of the shielded areas from different reference viewpoint images to fill in the hollow area, so that a high-quality virtual viewpoint image is obtained, and compared with the double viewpoints, the single viewpoint cannot achieve the point;

and finally, the problems of large-area hole filling, overlapped synthesized viewpoint structures, deformation distortion and the like in the large-scale viewpoint synthesis process of the RGB-D image based on the single viewpoint are solved. Aiming at the problems in the synthesis process of the single-viewpoint RGB-D image large-scale viewpoint, the existing method still has many defects.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a method for synthesizing a large-scale panoramic viewpoint based on a single-viewpoint RGB-D image, which can improve the image quality of a synthesized virtual viewpoint, thereby obtaining a large-parallax three-dimensional stereoscopic display effect.

The invention adopts the following technical scheme:

a large-scale panoramic viewpoint synthesis method based on single viewpoint RGB-D images maps images of known reference viewpoints onto an imaging plane of a virtual camera by using depth information and camera parameters, and performs large-scale panoramic viewpoint synthesis on input single-frame color images RGB₀And corresponding Depth map Depth₀Performing 3D-Warping to obtain a virtual viewpoint image with a cavity and a mask image; then using the virtual viewpoint image with the hole and the mask image as the input of the CNN neural network, and using the network modelThe output result of the model is used as an image after the virtual viewpoint hole is filled; single frame color image RGB based on initial input₀And corresponding Depth map Depth₀Synthesizing a horizontal large-parallax virtual viewpoint P_i(RGB_i,Depth_i) (ii) a Then the virtual viewpoint P generated in the horizontal direction is used_i(RGB_i,Depth_i) As a new input, obtaining a virtual viewpoint image P at any position in space through 3D-Warping and CNN neural network hole filling_i,j(RGB_i,j,Depth_i,j) And completing the synthesis of the large-scale panoramic viewpoint.

Specifically, filling of virtual viewpoint holes based on the CNN convolutional neural network specifically includes:

s201, establishing a generation confrontation neural network, which comprises a generation network for generating an image, and a global discriminator and a local discriminator for discriminating whether the generated image is consistent with an original image;

s202, training a countermeasure network model;

s203, inputting the virtual viewpoint image with the hole and the hole mask image into the trained network model, wherein the output result of the network model is the image filled with the virtual viewpoint hole.

Further, in step S202, the network training includes three parts, the first part trains the generated network alone without updating the authentication network; the second part trains the identification network independently without updating the generation network; and in the third part, joint loss is used, the generating network and the identifying network are trained together, and the generating network and the identifying network are updated in turn.

Specifically, a horizontal large parallax virtual viewpoint P is synthesized_i(RGB_i,Depth_i) The method specifically comprises the following steps:

s301, with P₀(RGB₀,Depth₀) Modifying translation vectors in camera parameters for a reference viewpoint, and obtaining P in the horizontal direction through 3D-Warping and CNN neural network hole filling₀(RGB₀,Depth₀) K virtual views P as reference views₁(RGB₁,Depth₁)，P₂(RGB₂,Depth₂)，…，P_k(RGB_k,Depth_k) K virtual views as a group, k>0，1＜2＜...＜k；

S302, with P_k(RGB_k,Depth_k) As the initial input of the next stage in the horizontal direction, the translation vector in the camera parameters is modified, and P is obtained by filling the 3D-Warping and CNN neural network cavity_k(RGB_k,Depth_k) K virtual views P as reference views_k+1(RGB_k+1,Depth_k+1)，P_k+2(RGB_k+2,Depth_k+2)，…，P_2k(RGB_2k,Depth_2k) K virtual views as a new set, k>0，k+1＜k+2＜...＜2k；

S303, with P_2k(RGB_2k,Depth_2k) And repeating the initial input in the horizontal direction as the next stage for multiple times in the same way, and obtaining new groups of virtual viewpoints in the horizontal direction.

Further, P is before rendering the virtual viewpoint₀(RGB₀,Depth₀)，P_k(RGB_k,Depth_k)，P_2k(RGB_2k,Depth_2k)，…，P_i(RGB_i,Depth_i)，…，P_N(RGB_N,Depth_N) As the first element among each set of virtual viewpoints, a restoration is first performed based on the color map guide depth map, 0 < k < 2k > < i > < N.

Further, firstly, detecting the inconsistent area, detecting the edge of the input depth map, expanding the edge, and marking the expanded area as a potential structural distortion area; and then judging each pixel point in the potential structure distortion area and generating a structure distortion measurement index, constructing a recovery weight by adopting the product of the color image Gaussian weight and the structure distortion measurement index for the distortion pixel points, performing guided recovery through weighted median filtering, performing guided filtering on the distortion area, continuously iterating the finished result graph according to the steps until a set iteration termination condition is met, outputting a depth graph to finish calculation, and otherwise, continuously iterating until the maximum iteration frequency is reached.

Specifically, the virtual viewpoint P at any position in the synthetic space_i,j(RGB_i,j,Depth_i,j) The method specifically comprises the following steps:

s401, with P_i(RGB_i,Depth_i) As the initial input in the vertical direction, the translation vector in the camera parameters is modified, and the P is obtained in the vertical direction through the 3D-Warping and CNN neural network cavity filling_i(RGB_i,Depth_i) S virtual viewpoints P as reference viewpoints_i,1(RGB_i,1,Depth_i,1)，P_i,2(RGB_i,2,Depth_i,2)，…，P_i,s(RGB_i,s,Depth_i,s) S virtual viewpoints as a new set in the vertical direction, s>0，1＜2＜...＜s；

S402, adding P_i,s(RGB_i,s,Depth_i,s) As a new initial input in the vertical direction, the translation vector in the camera parameters is modified, and P is obtained by filling the hole of the 3D-Warping and CNN neural network_i,s(RGB_i,s,Depth_i,s) S virtual viewpoints P as reference viewpoints_i,s+1(RGB_i,s+1,Depth_i,s+1)，P_i,s+2(RGB_i,s+2,Depth_i,s+2)，…，P_i,2s(RGB_i,2s,Depth_i,2s) S virtual viewpoints as a new set in the vertical direction, s>0，s+1＜s+2＜...＜2s；

S403, with P_i,2s(RGB_i,2s,Depth_i,2s) And repeating the steps in the same way for a plurality of times as new initial input in the vertical direction of the next stage to obtain a plurality of new groups of virtual viewpoints.

Further, in step S403, P is performed before the virtual viewpoint is rendered_i(RGB_i,Depth_i)，P_i,s(RGB_i,s,Depth_i,s)，P_i,2s(RGB_i,2s,Depth_i,2s)，…，P_i,j(RGB_i,j,Depth_i,j)，…，P_i,M(RGB_i,M,Depth_i,M) 0 < s < 2s as the first element in each set of virtual viewpoints in the vertical directionJ < M, and performing restoration based on the color image guide depth image by adopting a depth image restoration method.

Compared with the prior art, the invention has at least the following beneficial effects:

the invention relates to a large-scale panoramic viewpoint synthesis method based on single viewpoint RGB-D images, which can obtain a large parallax three-dimensional stereo effect at any position in space through a single frame RGB-D image based on DIBR technology without three-dimensional modeling; the CNN neural network is used for filling the virtual viewpoint large-area cavity, and the generated virtual viewpoint and the depth map are used as the next-stage input to synthesize a larger parallax virtual viewpoint, so that the virtual viewpoint has good adaptability to the problems of overlapping, cavities, artifacts, fine cracks and the like in the process of drawing a serious single-viewpoint virtual view, the reconstructed large parallax virtual viewpoint has a complete structure and a good visual perception effect, and a satisfactory visual perception effect is achieved.

Furthermore, in the process of drawing the virtual viewpoint by using the DIBR technology, when the viewpoint is converted from a reference viewpoint to a virtual viewpoint position, a part of the region which is visible in the virtual viewpoint and invisible in the original reference viewpoint is exposed, so that a large amount of information is lost, the hole regions are difficult to fill by using the traditional method, and the holes in the large regions are filled by adopting the generated countermeasure neural network, so that a better hole filling effect is obtained.

Further, the generation countermeasure neural network used for large area hole filling is composed of three networks: a network, a global discriminator and a local discriminator are generated. The generation network is used for generating a large-area cavity area image, and the global discriminator and the local discriminator are both specially used for discriminating the consistency of the newly generated image of the generation network and a real image. During each training iteration, the discriminator is first updated so that it correctly distinguishes between the true image and the newly generated image. And then, the generated network is updated so as to fully fill a large-area missing hole area and cheat the identification network as much as possible. Finally, the generating network and the discriminating network are used simultaneously to train together. The training mode can fill the hollow area with any position and any shape.

Further, for the input single frame color image and corresponding depth image, P₀(RGB₀,Depth₀) Representing an initial input, wherein RGB₀Representing a single-frame colour picture, Depth₀Representing a corresponding depth map, and respectively carrying out 3D-Warping and CNN neural network hole filling to obtain a virtual viewpoint image P at any horizontal position in space_i(RGB_i,Depth_i) And completing the synthesis of the horizontal large-parallax virtual viewpoint.

Further, for virtual view synthesis, the quality of the depth map of the reference view strongly affects the quality of the synthesized view. The texture map is used for guiding the depth map to recover, the repaired high-precision depth map is used for synthesizing the virtual viewpoint image, the consistency of the depth map and the texture map is improved, the synthesized image obtained by the repaired depth map can well protect the boundary structure of the object, and meanwhile, the method can well filter noise and reflects that the background texture structure of the synthesized view in the synthesized image is protected. The depth map processed by the method can synthesize a synthesized view with higher quality, and the quality of the synthesized view is effectively improved.

Further, an arbitrary position P in the rendering space_i,j(RGB_i,j,Depth_i,j) When a large-parallax virtual viewpoint image is processed, the VSRS directly synthesizes large-scale virtual viewpoints to generate serious viewpoint overlapping and large-area cavity problems, and because the CNN neural network repairs the limitation of large-area cavity areas, large-parallax virtual viewpoints are generated step by step, namely, a virtual viewpoint P is generated firstly_i(RGB_i,Depth_i) Next at P_i(RGB_i,Depth_i) Further synthesis of P on the basis_i,j(RGB_i,j,Depth_i,j). The quality of the virtual viewpoint image synthesized by the method is obviously improved, the hollow area is obviously reduced, the structure information of the virtual viewpoint is complete, and the quality of the virtual viewpoint is integrally improved.

In conclusion, the depth map with serious structural distortion can be effectively repaired, and the quality of the depth map is improved. And the holes are filled by adopting the generation of the antagonistic neural network, so that a better hole filling effect is obtained. When the large-parallax virtual viewpoint image is drawn, the large-parallax virtual viewpoint is generated step by step, the quality of the virtual viewpoint image is obviously improved, the cavity area is obviously reduced, the structural information of the virtual viewpoint is complete, and the quality of the virtual viewpoint is integrally improved.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

FIG. 1 is a problem definition diagram and an overall framework diagram of the present invention, wherein (a) is the problem definition diagram and (b) is the overall implementation framework diagram;

FIG. 2 is a schematic diagram of the horizontal virtual viewpoint synthesis process of the present invention, in which (a) is a horizontal viewpoint synthesis diagram and (b) is a horizontal viewpoint synthesis process diagram;

FIG. 3 is a schematic diagram of a vertical virtual viewpoint synthesis process according to the present invention, wherein (a) is a vertical viewpoint synthesis diagram, and (b) is a vertical viewpoint synthesis process diagram;

FIG. 4 is a flow chart of synthesizing virtual viewpoints in the horizontal direction and the vertical direction according to the present invention, wherein (a) is a flow chart of synthesizing virtual viewpoints in the horizontal direction, and (b) is a flow chart of synthesizing virtual viewpoints in the vertical direction;

FIG. 5 is a hole filling architecture diagram of the CNN neural network of the present invention.

FIG. 6 is an input diagram of the present invention, wherein (a) is a single frame color image and (b) is a corresponding depth image;

FIG. 7 is a graph of the color map based guided depth map repair result of the present invention;

FIG. 8 is a virtual viewpoint color image with holes and a hole mask image generated by DIBR technology according to the present invention, wherein (a) is the virtual viewpoint color image with holes, and (b) is the hole mask image;

FIG. 9 is a graph of the results of different multiples of the horizontal disparity k according to the present invention;

FIG. 10 is a diagram showing the result of the enlarged synthetic viewpoint taken by different times in the vertical direction s according to the present invention;

fig. 11 is a comparison graph of the result of the large-parallax virtual viewpoint synthesized in the horizontal direction according to the present invention and the results of other schemes, where from left to right (a) is a virtual left viewpoint with a hole, (b) is a VSRS hole filling graph, (c) is a graph of directly repairing a hole for generating a countermeasure network, and (d) is a graph of the final result of the overall scheme according to the present invention;

fig. 12 is a comparison graph of different magnification scale results of a large parallax virtual viewpoint synthesized by the present invention in a vertical direction with results of other schemes, and is (a) a virtual viewpoint graph with holes, (b) a VSRS hole filling graph, (c) a graph for directly repairing holes for generating a countermeasure network, and (d) a final result graph of the overall scheme of the present invention from left to right.

Detailed Description

The invention provides a large-scale panoramic viewpoint synthesis method based on single viewpoint RGB-D images, which comprises the steps of firstly inputting a single-frame color image and a corresponding depth image P₀(RGB₀,Depth₀) Representing an initial input, wherein RGB₀Representing a single-frame colour picture, Depth₀Representing a corresponding depth map, and respectively performing 3D-Warping to obtain a virtual viewpoint image with a cavity and a mask image;

then, the virtual viewpoint image with the hole and the mask image are used as the input of the CNN neural network, and the output result of the network model is the image after the virtual viewpoint hole is filled, namely P_i(RGB_i,Depth_i)；

Then the virtual viewpoint P newly generated in the horizontal direction is used_i(RGB_i,Depth_i) As a new input, obtaining a virtual viewpoint image P at any position in space through 3D-Warping and CNN neural network hole filling_i,j(RGB_i,j,Depth_i,j) And completing the synthesis of the large-scale panoramic viewpoint.

Referring to fig. 1, the present invention provides a method for synthesizing a large-scale panorama viewpoint based on a single viewpoint RGB-D image, comprising the following steps:

s1, respectively carrying out 3D-Warping on the single-frame color image and the corresponding depth image;

firstly, for a single frame RGB-D image P₀(RGB₀,Depth₀) And 3D-Warping is carried out, the principle is that the image of the known reference viewpoint is mapped onto the imaging plane of the virtual camera by utilizing the depth information and the camera parameters, and the virtual viewpoint with the cavity and the mask image are obtained through the 3D-Warping.

S2, filling virtual viewpoint holes based on the CNN convolutional neural network;

when the viewpoint is converted from a reference viewpoint to a virtual viewpoint position, a part of the region which is visible in the virtual viewpoint and invisible in the original reference viewpoint is exposed, so that a large amount of information is lost, and the hole regions are difficult to fill.

S201, constructing and generating an antagonistic neural network, wherein the antagonistic neural network is divided into two parts: one part is used for generating images, namely generating a network; the other part is used to discriminate whether the generated image is identical to the original image, i.e. a global discriminator and a local discriminator. Wherein the global evaluator looks at the whole image to evaluate whether it is globally consistent, while the local evaluator looks only at a small area centered on the patch area to ensure that the generated patch remains locally consistent.

S202, training a countermeasure network model, wherein the network training is divided into three parts: the first part trains the generated network independently without updating the authentication network. The second part trains the discrimination network separately without updating the generation network. The third part, using joint loss, trains the combined generation network with the discrimination network, but both networks are updated in turn according to the way the antagonistic network is produced.

S3 based on P₀(RGB₀,Depth₀) Synthesizing a horizontal large-parallax virtual viewpoint P_i(RGB_i,Depth_i)；

Synthesizing horizontal large parallax virtualViewpoint P_i(RGB_i,Depth_i) Referring to fig. 2, a schematic diagram of a synthesis process and an algorithm flowchart referring to fig. 4(a), includes the following steps:

s301, with P₀(RGB₀,Depth₀) Modifying the translation vector in the camera parameters for referring to the viewpoint, and obtaining the translation vector in P in the horizontal direction through a series of steps such as 3D-Warping in step S1 and CNN neural network hole filling in step S2₀(RGB₀,Depth₀) K (k) as a reference viewpoint>0) Virtual viewpoint { P₁(RGB₁,Depth₁)，P₂(RGB₂,Depth₂)，…，P_k(RGB_k,Depth_k) K virtual viewpoints here as a group, where 1 < 2. < k.

S302, with P_k(RGB_k,Depth_k) As the initial input of the next stage in the horizontal direction, the translation vector in the camera parameters is modified, and P is obtained through a series of operations such as 3D-Warping, CNN neural network hole filling and the like_k(RGB_k,Depth_k) K (k) as a reference viewpoint>0) Virtual viewpoint { P_k+1(RGB_k+1,Depth_k+1)，P_k+2(RGB_k+2,Depth_k+2)，…，P_2k(RGB_2k,Depth_2k) K virtual viewpoints here as a new set, where k +1 < k +2 < 2 k.

It should be noted that in the process of rendering the virtual viewpoint by using the DIBR technique, the completeness and accuracy of the depth map information directly affect the rendering quality of the virtual viewpoint image. Thus, P is before rendering the virtual viewpoint₀(RGB₀,Depth₀)，P_k(RGB_k,Depth_k)，P_2k(RGB_2k,Depth_2k)，…，P_i(RGB_i,Depth_i)，…，P_N(RGB_N,Depth_N)，0＜k < 2k > < i > < N, as the first element in each set of virtual viewpoints, is first repaired based on the color map guide depth map.

The method for restoring the corresponding depth map based on color image guidance comprises the following steps: firstly, detecting an inconsistent area, detecting the edge of an input depth map, expanding the edge, marking the expanded area as a potential structure distortion area, then judging each pixel point in the potential structure distortion area and generating a structure distortion measurement index, constructing a restoration weight by adopting the product of a color Gaussian weight and the structure distortion measurement index for the distortion pixel point, guiding and restoring through weighted median filtering, then performing guided filtering on the distortion area, continuing iteration according to the steps after the completion until a set termination iteration condition is met, outputting the depth map and ending calculation, otherwise, continuing the iteration until the maximum iteration number is reached.

S4 virtual viewpoint P based on horizontal large parallax_i(RGB_i,Depth_i) Synthesizing an arbitrary position virtual viewpoint P in space_i,j(RGB_i,j,Depth_i,j)；

Please refer to fig. 3, which is a horizontal large parallax-based virtual viewpoint P_i(RGB_i,Depth_i) Synthesizing an arbitrary position virtual viewpoint P in space_i,j(RGB_i,j,Depth_i,j) The synthesis process schematic diagram and the algorithm flow chart are shown in fig. 4(b), and the specific steps are as follows:

s401, with P_i(RGB_i,Depth_i) As initial input in the vertical direction, the translation vector in the camera parameters is modified, and the P is obtained in the vertical direction through a series of operations such as 3D-Warping, CNN neural network hole filling and the like_i(RGB_i,Depth_i) S(s) as a reference viewpoint>0) Virtual viewpoint { P_i,1(RGB_i,1,Depth_i,1)，P_i,2(RGB_i,2,Depth_i,2)，…，P_i,s(RGB_i,s,Depth_i,s) S virtual viewpoints here as a new set of vertical directions, where 1 < 2.

S402, adding P_i,s(RGB_i,s,Depth_i,s) As a new initial input in the vertical direction, the translation vector in the camera parameters is modified, and P is obtained through a series of operations such as 3D-Warping, CNN neural network hole filling and the like_i,s(RGB_i,s,Depth_i,s) S(s) as a reference viewpoint>0) Virtual viewpoint { P_i,s+1(RGB_i,s+1,Depth_i,s+1)，P_i,s+2(RGB_i,s+2,Depth_i,s+2)，…，P_i,2s(RGB_i,2s,Depth_i,2s) S virtual viewpoints here as a new set of vertical directions, where s +1 < s +2 < 2 s.

S403, with P_i,2s(RGB_i,2s,Depth_i,2s) And repeating the steps in the same way for a plurality of times as new initial input in the vertical direction of the next stage to obtain a plurality of new groups of virtual viewpoints. Also before rendering the virtual viewpoint P_i(RGB_i,Depth_i)，P_i,s(RGB_i,s,Depth_i,s)，P_i,2s(RGB_i,2s,Depth_i,2s)，…，P_i,j(RGB_i,j,Depth_i,j)，…，P_i,M(RGB_i,M,Depth_i,M) As the first element in each set of virtual viewpoints in the vertical direction, 0 < S < 2S > < j > < M, the depth map is restored based on the color map guided depth map according to the depth map restoration method in step S3.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The advantages of the invention are illustrated by comparing the partial result graphs, and the main functions of the invention are represented in the following three aspects:

1. the structure of the depth map is recovered accurately, and the accuracy is high.

Comparing the original depth map 6(b) with the color map 6(a), the structural distortion between the two can be found clearly. It can be seen from fig. 7 that the structural distortion of the resulting map compared to the original depth map is greatly improved.

2. Based on the virtual viewpoint hole filling of the generated countermeasure network, the virtual viewpoint image 8(a) with holes and the hole mask image 8(b) are input into the neural network, one part of the network is used for generating images, and the other part of the network is used for identifying whether the generated images are consistent with the original images.

Wherein the global evaluator looks at the whole image to evaluate whether it is globally consistent, while the local evaluator looks only at a small area centered on the patch area to ensure that the generated patch remains locally consistent. The third column of fig. 11 and 12 is the result of hole filling after using the generation countermeasure network of the present invention, and it can be seen that good hole filling effect is obtained.

3. The whole scheme of the invention is used for the condition of the synthetic effectiveness of the large parallax virtual viewpoint.

Referring to fig. 9, 10, 11 and 12, fig. 9 is a case where horizontal parallax k is synthesized by different multiples, and compared with an original input image when k is 0, it can be seen that-k is-5, k is 10(-k represents left parallax, and + k represents right parallax), a significant left-right large parallax effect is obtained.

FIG. 10 is a view situation where the synthetic viewpoint is magnified by different times in the vertical direction k, all results are based on the virtual No. 2 camera P₂(RGB₂,Depth₂) Compared with the original input image when s is 0, the different-scale magnification result in the vertical direction shows that 2s is 1000,4s is 2000,6s is 3000(+ s represents magnification), and the virtual viewpoint structure information is complete, so that a satisfactory visual perception effect is achieved.

Fig. 11 is a comparison of the horizontal direction synthesized large parallax virtual viewpoint result of the present invention and other methods. Fig. 11(a) is a virtual viewpoint diagram with holes drawn based on the DIBR technique, fig. 11(b) is a direct synthesis effect diagram of the MPEG viewpoint synthesis reference software VSRS, fig. 11(c) is a virtual viewpoint repair result diagram of holes with large areas 11(a) directly using the generated countermeasure network, and fig. 11(d) is a virtual viewpoint diagram with large horizontal parallax obtained by the step-by-step synthesis method according to the present invention. It can be seen that the processing results of the method for synthesizing the large parallax viewpoint, such as overlapping, holes, artifacts, cracks and the like, are superior to those of other schemes.

Fig. 12 is a comparison of different magnification scale results of synthesizing a large parallax virtual viewpoint in the vertical direction according to the present invention and other methods. Fig. 11(a) is a virtual viewpoint diagram with holes drawn based on DIBR technology, fig. 11(b) is a direct synthesis effect diagram of MPEG viewpoint synthesis reference software VSRS, fig. 11(c) is a virtual viewpoint repair result diagram of holes with large areas 11(a) directly using the generated countermeasure network, and fig. 11(d) is a virtual viewpoint diagram with large horizontal parallax obtained by the step-by-step synthesis method according to the present invention. Compared with other methods, the image quality of the invention is obviously improved, the virtual viewpoint structure information is complete, and the satisfactory visual perception effect is achieved.

In summary, the large-scale panoramic viewpoint synthesis method based on the single-viewpoint RGB-D image of the present invention renders a reference viewpoint image into a virtual viewpoint image according to the camera parameters of the position where the reference viewpoint image is located and the corresponding depth information, thereby greatly reducing the complexity of rendering the virtual viewpoint image. In the process of rendering the virtual viewpoint by using the DIBR technology, the completeness and accuracy of the depth information directly influence the rendering quality of the virtual viewpoint image. Therefore, before the virtual viewpoint is drawn, the depth map is firstly restored based on the color map guide, and the quality of the drawn virtual viewpoint image is improved to a certain extent. When the viewpoint is converted from a reference viewpoint to a virtual viewpoint position, a part of the region which is visible in the virtual viewpoint and invisible in the original reference viewpoint is exposed, so that a large amount of information is lost, the hollow regions are difficult to fill, and a generation countermeasure neural network is adoptedThe holes are filled, and a good hole filling effect is obtained. At an arbitrary position P in the rendering space_i,j(RGB_i,j,Depth_i,j) When a large-parallax virtual viewpoint image is processed, the VSRS directly synthesizes large-scale virtual viewpoints to generate serious viewpoint overlapping and large-area cavity problems, and because the CNN neural network repairs the limitation of large-area cavity areas, large-parallax virtual viewpoints are generated step by step, namely, a virtual viewpoint P is generated firstly_i(RGB_i,Depth_i) Next at P_i(RGB_i,Depth_i) Further synthesis of P on the basis_i,j(RGB_i,j,Depth_i,j). The quality of the virtual viewpoint image synthesized by the method is obviously improved, the hollow area is obviously reduced, the structure information of the virtual viewpoint is complete, and the quality of the virtual viewpoint is integrally improved.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims

1. A large-scale panoramic viewpoint synthesis method based on single viewpoint RGB-D images is characterized in that images of known reference viewpoints are mapped onto an imaging plane of a virtual camera by using depth information and camera parameters, and input single-frame color images RGB are subjected to₀And corresponding Depth map Depth₀Performing 3D-Warping to obtain a virtual viewpoint image with a cavity and a mask image; then, taking the virtual viewpoint image with the hole and the mask image as the input of a CNN neural network, and taking the output result of the network model as the image after the virtual viewpoint hole is filled; single frame color image RGB based on initial input₀And corresponding Depth map Depth₀Synthesizing a horizontal large-parallax virtual viewpoint P_i(RGB_i,Depth_i) (ii) a Then the virtual viewpoint P generated in the horizontal direction is used_i(RGB_i,Depth_i) As a new input, obtaining any position in the space by filling a 3D-Warping neural network cavity and a CNN neural network cavityVirtual viewpoint image P of (ii)_i,j(RGB_i,j,Depth_i,j) And completing the synthesis of the large-scale panoramic viewpoint.

2. The method for synthesizing a large-scale panoramic viewpoint based on a single-viewpoint RGB-D image as claimed in claim 1, wherein the filling of virtual viewpoint holes based on a CNN convolutional neural network specifically comprises:

s202, training a countermeasure network model;

3. The method for synthesizing large-scale panoramic viewpoint based on single viewpoint RGB-D image as claimed in claim 2, wherein in step S202, the network training comprises three parts, the first part is trained separately to generate network without updating the discrimination network; the second part trains the identification network independently without updating the generation network; and in the third part, joint loss is used, the generating network and the identifying network are trained together, and the generating network and the identifying network are updated in turn.

4. The method for synthesizing a large-scale panorama viewpoint based on single viewpoint RGB-D image as set forth in claim 1, wherein a horizontal large parallax virtual viewpoint P is synthesized_i(RGB_i,Depth_i) The method specifically comprises the following steps:

5. The method for synthesizing a large-scale panorama viewpoint based on single viewpoint RGB-D image as set forth in claim 4, wherein P is added before rendering the virtual viewpoint₀(RGB₀,Depth₀)，P_k(RGB_k,Depth_k)，P_2k(RGB_2k,Depth_2k)，…，P_i(RGB_i,Depth_i)，…，P_N(RGB_N,Depth_N) As the first element among each set of virtual viewpoints, a restoration is first performed based on the color map guide depth map, 0 < k < 2k > < i > < N.

6. The method according to claim 5, wherein the non-uniform region is first detected, the edge of the input depth map is detected, the edge is dilated, and the dilated region is marked as a potential structural distortion region; and then judging each pixel point in the potential structure distortion area and generating a structure distortion measurement index, constructing a recovery weight by adopting the product of the color image Gaussian weight and the structure distortion measurement index for the distortion pixel points, performing guided recovery through weighted median filtering, performing guided filtering on the distortion area, continuously iterating the finished result graph according to the steps until a set iteration termination condition is met, outputting a depth graph to finish calculation, and otherwise, continuously iterating until the maximum iteration frequency is reached.

7. The method as claimed in claim 1, wherein the virtual viewpoint P at any position in the synthesis space is a virtual viewpoint P_i,j(RGB_i,j,Depth_i,j) The method specifically comprises the following steps:

8. The method for synthesizing a large-scale panorama viewpoint based on single viewpoint RGB-D image as claimed in claim 7, wherein P is added before rendering the virtual viewpoint in step S403_i(RGB_i,Depth_i)，P_i,s(RGB_i,s,Depth_i,s)，P_i,2s(RGB_i,2s,Depth_i,2s)，…，P_i,j(RGB_i,j,Depth_i,j)，…，P_i,M(RGB_i,M,Depth_i,M) And as the first element in each group of virtual viewpoints in the vertical direction, 0 < s < 2s < j < > M, and adopting a depth map repairing method to repair based on the color map guide depth map.