WO2022116397A1

WO2022116397A1 - Virtual viewpoint depth map processing method, device, and apparatus, and storage medium

Info

Publication number: WO2022116397A1
Application number: PCT/CN2021/076924
Authority: WO
Inventors: 王荣刚; 刘香凝; 王振宇; 蔡砚刚
Original assignee: 北京大学深圳研究生院
Priority date: 2020-12-04
Filing date: 2021-02-19
Publication date: 2022-06-09
Also published as: CN112581389A

Abstract

Disclosed in the present application are a virtual viewpoint depth map processing method, device, and apparatus, and a storage medium. In the present application, a virtual viewpoint depth map is obtained by forwardly mapping a reference viewpoint depth map corresponding to a preset reference viewpoint to the position of a virtual viewpoint; the virtual viewpoint depth map is subjected to smoothing processing by using a preset filter method, such that edge information of the depth map is retained; the smoothed virtual viewpoint depth map is inversely mapped to the position of the preset reference viewpoint to synthesize virtual viewpoint texture maps based on different preset reference viewpoints; the virtual viewpoint texture maps are subjected to weighted fusion, and the fused virtual viewpoint texture map is subjected to hole filling processing and foreground edge filtering processing, thereby improving the quality of the virtual viewpoint depth map.

Description

Virtual viewpoint depth map processing method, device, device and storage medium

This application claims the priority of the Chinese patent application filed on December 04, 2020 with the application number 202011419908.8 and the invention titled "Virtual Viewpoint Depth Map Processing Method, Apparatus, Device and Storage Medium", the entire content of which is approved by Reference is incorporated in the application.

technical field

The present application relates to the field of image processing, and in particular, to a method, device, device and storage medium for processing a virtual viewpoint depth map.

Background technique

Depth map-based image rendering (DIBR) is an important technology in the field of virtual viewpoint synthesis. This technology uses the texture map and depth map of the reference viewpoint to obtain the view of any virtual viewpoint through 3D coordinate transformation. In the process of synthesizing the virtual viewpoint texture map using DIBR technology, part of the background texture map is invisible in the reference viewpoint due to being occluded by the foreground object, but is visible in the virtual viewpoint. In this case, the virtual viewpoint texture map There may be holes in the depth map, and the depth discontinuity in the depth map is the cause of the hole, and the depth map is filled with the smoothing method based on Gaussian filtering, mean filtering, median filtering, etc. The depth map is not preserved. The edge information of the synthetic texture map has ghosting phenomenon, so the quality of the virtual viewpoint depth map is poor.

The above content is only used to assist the understanding of the technical solutions of the present application, and does not mean that the above content is the prior art.

SUMMARY OF THE INVENTION

The main purpose of this application is to provide a virtual viewpoint depth map processing method, device, device and storage medium, which aims to solve the problem that the existing depth map smoothing method cannot retain the edge information of the depth map, resulting in the edge area of the synthesized texture map. There are technical problems of ghosting phenomenon and poor quality of virtual viewpoint depth map.

In order to achieve the above purpose, the present application provides a virtual viewpoint depth map processing method, the virtual viewpoint depth map processing method includes:

forwardly map the reference viewpoint depth map corresponding to the preset reference viewpoint to the position of the virtual viewpoint to obtain the virtual viewpoint depth map;

performing bilateral filtering and smoothing processing on the virtual viewpoint depth map to obtain a smooth virtual viewpoint depth map;

establishing a reverse mapping relationship between the virtual viewpoint and the reference viewpoint according to the smooth virtual viewpoint depth map, to synthesize virtual viewpoint texture maps based on different preset reference viewpoints;

Perform weighted fusion on the virtual viewpoint texture map, and perform hole filling processing and foreground edge filtering processing on the fused virtual viewpoint texture map to obtain a target virtual viewpoint depth map.

In addition, in order to achieve the above object, the present application also provides a virtual viewpoint depth map processing device, the virtual viewpoint depth map processing device, comprising:

a forward mapping module, configured to forward map the reference viewpoint depth map corresponding to the preset reference viewpoint to the position of the virtual viewpoint to obtain the virtual viewpoint depth map;

a post-processing module for performing bilateral filtering and smoothing on the virtual viewpoint depth map to obtain a smooth virtual viewpoint depth map;

a reverse mapping module, configured to establish a reverse mapping relationship between the virtual viewpoint and the reference viewpoint according to the smooth virtual viewpoint depth map, so as to synthesize virtual viewpoint texture maps based on different preset reference viewpoints;

The fusion processing module is configured to perform weighted fusion on the virtual viewpoint texture map, and perform hole filling processing and foreground edge filtering processing on the fused virtual viewpoint texture map to obtain the target virtual viewpoint depth map.

A virtual viewpoint depth map processing method, device, device, and storage medium proposed by the embodiments of the present application are different from those in the prior art. When the existing algorithm is used to smooth the depth map and fill in the holes, the edge information of the depth map cannot be preserved. , resulting in the phenomenon of ghosting in the edge area of the synthesized texture map, which in turn leads to poor quality of the virtual viewpoint depth map. Obtaining a virtual viewpoint depth map, performing bilateral filtering and smoothing processing on the virtual viewpoint depth map, which can effectively smooth the virtual viewpoint depth map and retain edge information, and establish the virtual viewpoint and the described virtual viewpoint depth map according to the smoothed virtual viewpoint depth map. Refer to the reverse mapping relationship between viewpoints to synthesize virtual viewpoint texture maps based on different preset reference viewpoints, perform weighted fusion on the virtual viewpoint texture maps, and perform hole filling processing and foreground on the fused virtual viewpoint texture maps. Edge filtering processing not only effectively smoothes the virtual viewpoint depth map, but also retains the edge information of the depth map. After filling the holes in the virtual viewpoint texture map, the foreground edge of the texture map is filtered to improve the virtual viewpoint texture map. quality, which in turn improves the quality of the virtual viewpoint depth map.

Description of drawings

Fig. 1 is the terminal\device structure schematic diagram of the hardware operating environment involved in the solution of the embodiment of the present application;

2 is a schematic flowchart of a first embodiment of a method for processing a virtual viewpoint depth map of the present application;

FIG. 3 is a schematic diagram of the refinement process of step S40 in FIG. 2;

FIG. 4 is a schematic diagram of an apparatus for processing a virtual viewpoint depth map of the present application.

The realization, functional characteristics and advantages of the purpose of the present application will be further described with reference to the accompanying drawings in conjunction with the embodiments.

Detailed ways

It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

In the following description, suffixes such as 'module', 'component' or 'unit' used to represent elements are used only to facilitate the description of the present application, and have no specific meaning per se. Thus, "module", "component" or "unit" may be used interchangeably.

The axis action configuration terminal (also called terminal, device, or terminal device) in this embodiment of the present application may be a PC, or may be a mobile terminal device with a display function, such as a smart phone, a tablet computer, and a portable computer.

As shown in FIG. 1 , the terminal may include: a processor 1001 , such as a CPU, a network interface 1004 , a user interface 1003 , a memory 1005 , and a communication bus 1002 . Among them, the communication bus 1002 is used to realize the connection and communication between these components. The user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. Optionally, the network interface 1004 may include a standard wired interface and a wireless interface (eg, a WI-FI interface). The memory 1005 may be high-speed RAM memory, or may be non-volatile memory, such as disk memory. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001 .

Optionally, the terminal may further include a camera, an RF (Radio Frequency, radio frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like. Among them, sensors such as light sensors, motion sensors and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display screen according to the brightness of the ambient light, and the proximity sensor may turn off the display screen and/or turn off the display screen when the mobile terminal is moved to the ear. Backlight. As a kind of motion sensor, the gravitational acceleration sensor can detect the magnitude of acceleration in all directions (generally three axes), and can detect the magnitude and direction of gravity when stationary, and can be used for applications that recognize the posture of mobile terminals (such as horizontal and vertical screen switching, related games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; of course, the mobile terminal can also be equipped with other sensors such as gyroscope, barometer, hygrometer, thermometer, infrared sensor, etc. No longer.

Those skilled in the art can understand that the terminal structure shown in FIG. 1 does not constitute a limitation on the terminal, and may include more or less components than the one shown, or combine some components, or arrange different components.

As shown in FIG. 1 , the memory 1005, which is a computer storage medium, may include an operating system, a network communication module, a user interface module, and computer-readable instructions of a virtual viewpoint depth map processing method.

In the terminal shown in FIG. 1 , the network interface 1004 is mainly used to connect to the background server and perform data communication with the background server; the user interface 1003 is mainly used to connect to the client (client) and perform data communication with the client; and the processor 1001 can be used to call the computer-readable instructions of the virtual viewpoint depth map processing method stored in the memory 1005, and when the computer-readable instructions of the virtual viewpoint depth map processing method are executed by the processor, realize the virtual viewpoint depth provided by the following embodiments. Operations in graph processing methods.

Based on the above-mentioned device hardware structure, an embodiment of the virtual viewpoint depth map processing method of the present application is proposed.

Referring to FIG. 2 , a first embodiment of the present application provides a virtual viewpoint depth map processing method. The virtual viewpoint depth map processing method includes:

Step S10, forward mapping the reference viewpoint depth map corresponding to the preset reference viewpoint to the position of the virtual viewpoint to obtain the virtual viewpoint depth map;

Step S20, performing bilateral filtering and smoothing processing on the virtual viewpoint depth map to obtain a smooth virtual viewpoint depth map;

Step S30, establishing a reverse mapping relationship between the virtual viewpoint and the reference viewpoint according to the smooth virtual viewpoint depth map, to synthesize virtual viewpoint texture maps based on different preset reference viewpoints;

Step S40: Perform weighted fusion on the virtual viewpoint texture map, and perform hole filling processing and foreground edge filtering processing on the fused virtual viewpoint texture map to obtain a target virtual viewpoint depth map.

Specifically, in step S10, the depth map of the reference viewpoint corresponding to the preset reference viewpoint is forwardly mapped to the position of the virtual viewpoint to obtain the depth map of the virtual viewpoint;

The preset reference viewpoint may be the actual shooting viewpoint of the real camera, the virtual viewpoint may be the shooting viewpoint of the virtual camera, and the virtual camera may be converted by the shooting viewpoint of the real camera, where the real camera is located. The image actually captured by the preset reference viewpoint is the reference viewpoint image, the depth map of the reference viewpoint image is the depth map of the reference viewpoint, and the reference viewpoint image may be the depth map of the original image captured by the real camera . In this embodiment, the mapping between the image and the pixels in the image is based on the mapping of the coordinate system, and the reference viewpoint depth map and the virtual viewpoint depth map are used as references to establish a plane coordinate system, and the virtual camera and all the The reference camera is used as the reference system, and the three-dimensional space coordinate system is established. The mapping between the image and the pixel point can be regarded as the coordinate transformation based on the coordinate system. The depth map of the reference viewpoint is forwardly mapped to the position of the virtual viewpoint to obtain the depth map of the virtual viewpoint. The forward mapping refers to mapping the pixels on the reference viewpoint depth map from the two-dimensional coordinate system to the three-dimensional coordinate system of the real camera, and then through translation and rotation, from the coordinate system of the real camera. It is mapped to the coordinate system of the virtual camera, and finally inverse transformation is performed to map from the three-dimensional space coordinate system to the two-dimensional coordinate system, that is, from the coordinate system of the virtual camera to the position of the virtual viewpoint.

The reference viewpoint depth map corresponding to the preset reference viewpoint is forwardly mapped to the position of the virtual viewpoint to obtain the virtual viewpoint depth map, including steps A1-A4:

Step A1, establishing a first spatial coordinate system according to the internal parameters of the reference camera corresponding to the preset reference viewpoint, and mapping the depth map of the reference viewpoint to the first spatial coordinate system to obtain the reference viewpoint the first spatial coordinate of the depth map;

The internal parameters of the reference camera include focal length, optical center coordinates, etc. According to the internal parameters of the reference camera, a first space coordinate system is established, and the first space coordinate system is a three-dimensional space coordinate system, including the reference camera's internal parameters. position, the first spatial coordinate system may be established with the position of the reference camera as the origin, and the reference viewpoint depth map may establish a plane coordinate system, thereby obtaining the two-dimensional coordinates of the reference viewpoint depth map, The image coordinates of the depth map of the reference viewpoint are mapped into the first spatial coordinate system to obtain the first spatial coordinates of the depth map of the reference viewpoint. For example, the image coordinates of the depth map of the reference viewpoint (u, v ) to the coordinates [X, Y, Z] ^T in the reference camera coordinate system, the mapping transformation formula can be as follows (Formula 1):

Among them, [u, v, 1] ^T is the homogeneous coordinate of the pixel of the reference viewpoint depth map, Z is the depth value of the pixel, and the depth value can be determined by the internal parameters and position of the real camera, [X , Y, Z] ^T is the coordinates of the real object corresponding to (u, v) in the first space coordinate system, f _x , f _y , c _x , _cy are the internal parameters of the real camera, f _x , f _y are the focal lengths in the x and _y directions, respectively, and c _x and cy are the optical center coordinates in the x and y directions, respectively.

Step A2, establishing a second space coordinate system according to the internal parameters of the virtual camera corresponding to the virtual viewpoint, and mapping the first space coordinate to the second space coordinate system to obtain the reference viewpoint depth map The second space coordinate of ;

The internal parameters of the virtual camera may be consistent with the internal parameters of the real camera. With reference to the first space coordinate system, a second space coordinate system is established according to the internal parameters of the virtual camera. The second space coordinate system The position of the virtual camera is included, and the first spatial coordinate is mapped to the second spatial coordinate system to obtain the second spatial coordinate of the reference viewpoint depth map. The mapping transformation formula is as follows (Formula 2-4) :

T=R _v ×(T _c -T _v )(4)

Among them, R is a 3x3 rotation matrix, T is a translation vector, [X, Y, Z] ^T is the coordinates of the pixel in the first space coordinate system, that is, the first space coordinate, [X ₁ , Y ₁ , Z ₁ ] ^T is the second space coordinate after the first space coordinate is mapped to the second space coordinate system, R _c and T _c are the rotation matrix and translation matrix of the preset reference viewpoint, R _v and T _v are the rotation and translation matrices of the virtual viewpoint.

Step A3, performing inverse transformation on the second space coordinate to obtain a first mapping relationship between the second space coordinate system and the image coordinates of the virtual viewpoint;

Transform the second space coordinates, that is, map the second space coordinates from a three-dimensional space coordinate system to a two-dimensional coordinate system, map the second space coordinates to the virtual viewpoint depth map, and obtain the first space coordinate system. The first mapping relationship between the two-space coordinate system and the image coordinates of the virtual viewpoint, that is, the mapping relationship between the image of the virtual viewpoint and the virtual camera, the mapping transformation formula of the inverse transformation is as follows (Formula 5):

Among them, Z ₁ is the depth value of the pixel point, and the depth value can be determined by the internal parameters and position of the virtual camera, f _x , f _y , c _x , and _cy are the internal parameters of the virtual camera, which can be combined with the internal parameters of the virtual camera. The internal parameters of the real camera are consistent, and [u ₁ , v ₁ , 1] ^T is the homogeneous coordinate of the pixel point (u ₁ , v ₁ ) in the image coordinate system of the virtual viewpoint.

Step A4: Map the reference viewpoint depth map to the position of the virtual viewpoint according to the first mapping relationship to obtain the virtual viewpoint depth map.

According to the first mapping relationship, the forward mapping relationship between the depth map of the reference viewpoint and the virtual viewpoint, that is, the mapping relationship between the point (u, v) and the point (u ₁ , v ₁ ), can be obtained. Mapping the point on the reference viewpoint depth map to the position of the virtual viewpoint, that is, mapping the reference viewpoint depth map to the position of the virtual viewpoint according to the forward mapping relationship, to obtain the virtual viewpoint Viewpoint depth map.

Use a preset algorithm to perform bilateral filtering and smoothing on the depth map of the virtual viewpoint. If part of the background texture map in the reference viewpoint is occluded by a foreground object, it will not be visible in the texture map of the reference viewpoint. In this case, there may be hole areas in the virtual view texture map, and the depth discontinuity area in the depth map is the cause of the hole, so the depth map should be smoothed to reduce the hole area. . A preferred method in the preset algorithm is a bilateral filtering algorithm, as shown in the formula of the bilateral filtering algorithm (Formula 6):

Among them, d(i,j) is the depth value of the pixel point in the neighborhood of the pixel point (x,y), and the size of (i,j) is determined by the filter radius, and the filter radius can be determined through experiments, and automatically Defined settings, such as setting the filter radius to 7 pixels, then (i, j) is the pixel within the range determined by (x, y) as the center and 7 pixels as the radius,

For the standard deviation of bilateral filtering, you can customize the settings. For example, when setting the filtering radius to 7 pixels, set

The bilateral filtering method adopts the weighted average method, and uses the weighted average of the depth values d(i,j) in the neighborhood to determine the depth value d(x,y) of the center pixel (x,y) of the range, and the weighted average The method is based on Gaussian distribution. The weight of the bilateral filtering method not only considers the Euclidean distance of the pixel, that is, the influence of the position on the central pixel, but also considers the distance of the depth value. In the area where the depth value changes little, the weight of the spatial domain It plays a major role, which is equivalent to Gaussian smoothing. In the edge area of the image, the depth value changes greatly, and the range weight of the pixel point becomes larger, so that the edge information can be preserved and the generation of holes can be reduced.

The reverse mapping refers to, according to the depth of the virtual viewpoint, mapping the pixels on the depth map of the virtual viewpoint to the coordinate system of the virtual camera, and then through translation and rotation, from the coordinates of the virtual camera. It is mapped from the coordinate system of the real camera to the coordinate system of the real camera, and is mapped from the three-dimensional space coordinate system to the two-dimensional coordinate system through inverse transformation, that is, it is mapped from the coordinate system of the real camera to the coordinate system of the reference viewpoint. The process of reverse mapping is described in the following formula (Equation 7-9):

The pixels on the depth map of the virtual viewpoint are mapped to the coordinate system of the virtual camera according to the depth of the virtual viewpoint:

Among them, formula 7 is the same as formula 5, [u ₁ , v ₁ , 1] ^T is the homogeneous coordinate of the pixel point (u ₁ , v ₁ ), Z ₁ is the depth value of the pixel point, [X ₁ , Y ₁ , Z ₁ ] ^T is the coordinates of the real object corresponding to the pixel point (u ₁ , v ₁ ) in the virtual camera coordinate system, f _x , f _y , c _x , and _cy are the internal parameters of the virtual camera, respectively are the focal length and optical center coordinates in the x and y directions.

From the coordinate system of the virtual camera to the coordinate system of the real camera:

Among them, R' is a 3x3 rotation matrix, T' is a translation vector, [X ₁ , Y ₁ , Z ₁ ] ^T is the coordinates of the pixel in the virtual camera coordinate system, [X', Y', Z' ] ^T is the coordinate of the pixel in the real camera coordinate system.

From the coordinate system of the real camera to the coordinate system of the reference viewpoint:

Wherein, Z' is the depth value of the real camera, and (u', v') is the coordinates of the pixel in the reference viewpoint coordinate system.

The described will establish a reverse mapping relationship between the virtual viewpoint and the reference viewpoint according to the smooth virtual viewpoint depth map to synthesize virtual viewpoint texture maps based on different preset reference viewpoints, including steps B1-B2 :

Step B1, performing reverse mapping on the smooth virtual viewpoint depth map to obtain a second mapping relationship between the spatial coordinate system of the virtual viewpoint and the spatial coordinate system of the preset reference viewpoint;

Perform reverse mapping on the smooth virtual viewpoint depth map to obtain a second mapping relationship between the spatial coordinate system of the virtual viewpoint and the spatial coordinate system of the preset reference viewpoint, where the reference camera is the real camera , obtain the second mapping relationship between the spatial coordinate system of the reference viewpoint and the virtual viewpoint through reverse mapping, that is, establish a spatial coordinate system according to the virtual viewpoint and the preset reference viewpoint respectively, and through reverse mapping, Find the mapping relationship between the space coordinate system of the preset reference viewpoint and the space coordinate system of the virtual viewpoint.

Step B2: Map the reference viewpoint texture map corresponding to the preset reference viewpoint to the position of the virtual viewpoint according to the second mapping relationship to obtain the virtual viewpoint texture map.

According to the second mapping relationship, the inverse mapping relationship from the point (u ₁ , v ₁ ) on the virtual viewpoint image to the point (u', v') on the reference viewpoint can be obtained, and according to the inverse mapping relationship According to the mapping relationship, the reference viewpoint texture map is mapped to the position of the virtual viewpoint, and then the virtual viewpoint texture map is obtained. There can be multiple reference viewpoints and reference cameras. Since each reference viewpoint corresponds to a reference viewpoint texture map, the reference viewpoint texture map corresponding to each reference viewpoint can be reverse mapped to obtain a virtual image. Viewpoint texture map, so as to obtain virtual viewpoint texture maps based on different preset reference viewpoints.

The virtual viewpoint texture map is weighted and fused, the quality of the texture map obtained from different reference viewpoints and the orientation of the object shooting are different, and the part occluded by the foreground object is also different, so it is necessary to obtain based on different preset reference viewpoints. After the image is given different weights, it is fused, and the obtained depth map of the target virtual viewpoint retains the edge information of the depth map. After the virtual viewpoint texture map is weighted and fused, the fused texture map is subjected to hole filling processing and foreground edge filtering processing to fill the holes existing in the depth map, and filter the foreground edge to smooth the foreground edge. , to ensure the integrity of edge information.

In this embodiment, the reference viewpoint depth map corresponding to the preset reference viewpoint is forward-mapped to the position of the virtual viewpoint to obtain a virtual viewpoint depth map, and a preset filtering method is used to smooth the virtual viewpoint depth map to obtain a smooth The virtual viewpoint depth map and the edge information of the depth map are preserved, and according to the smooth virtual viewpoint depth map, a reverse mapping relationship between the virtual viewpoint and the reference viewpoint is established, and virtual viewpoints based on different preset reference viewpoints are synthesized. Viewpoint texture map, weighted fusion of the virtual viewpoint texture map, hole filling processing and foreground edge filtering processing are performed on the fused virtual viewpoint texture map, so as to reduce the ghosting phenomenon in the edge area of the synthesized texture map, thereby improving the The quality of the virtual viewpoint texture map.

Further, referring to FIG. 3 , a second embodiment of the present application provides a virtual viewpoint depth map processing method. Based on the embodiment shown in FIG. 2 , this embodiment is a refinement of step S40 in the first embodiment.

Specifically, in step S40, weighted fusion is performed on the virtual viewpoint texture map, including steps S41-S44:

Step S41, determining the first position of the reference camera corresponding to the preset reference viewpoint, and the second position of the virtual camera corresponding to the virtual viewpoint;

Similarly, in this embodiment, the mapping between the image and the pixels in the image is also based on the mapping of the coordinate system, and the reference viewpoint depth map and the virtual viewpoint depth map are used as references to establish a plane coordinate system, so that The virtual camera and the reference camera are used as reference systems, and a three-dimensional space coordinate system is established. The mapping between images and pixel points can be regarded as coordinate transformation based on the coordinate system. The weight of the virtual viewpoint texture map includes two parts, wherein the weight of the first part is determined by the positions of the virtual camera and the reference camera. Therefore, the position of the reference camera, that is, the first position, and the position of the reference camera need to be determined. The position of the virtual camera, that is, the second position.

Step S42, determining a first weight according to the positional relationship between the first position and the second position;

According to the positional relationship between the first position and the second position, the distance between the reference camera and the virtual camera is determined, and the first weight is determined by the distance between the reference camera and the virtual camera, For example, when the distance between the reference camera and the virtual camera is _di , the first weight may be _inversely proportional to di, so as to determine the first weight.

Step S43, determining the depth value confidence of the virtual camera position, and determining a second weight according to the depth value confidence;

The weight of the virtual viewpoint texture map further includes a second weight, the second weight is determined by the depth value confidence of the virtual camera position, first determine the depth value confidence of the virtual camera position, according to the depth value The confidence level determines the second weight.

The determining the depth value confidence of the virtual camera position includes steps C1-C3:

Step C1, determining a first reference point from the virtual viewpoint depth map, and mapping the first reference point to the space coordinate system where the first position is located to obtain a second reference point;

A pixel is determined from the virtual viewpoint depth map as the first reference point, the virtual viewpoint depth map can be used as a plane coordinate system, the first reference point can be (x, y), the first reference point The reference point is mapped to the space coordinate system where the first position is located, and the space coordinate system where the first position is located is the first space coordinate system. According to the depth value of the virtual camera, the first space coordinate system is The reference point (x, y) is mapped to the position of the reference camera to obtain the second reference point, and the second reference point may be (u, v).

Step C2, according to the position depth value of the reference camera, map the second reference point to the space coordinate system where the second position is located to obtain a third reference point;

According to the position depth value of the reference camera, the second reference point is mapped to the space coordinate system where the second position is located, the second position is the position of the reference camera, and the second position is located The space coordinate system of is the second space coordinate system. According to the position depth value of the reference camera, the second reference point (u, v) is mapped to the second space coordinate system, and the third space coordinate system is obtained. Reference point, the third reference point may be (x_1, y_1).

Step C3: Determine the confidence level of the depth value by using a first preset algorithm according to the coordinates of the first reference point and the coordinates of the third reference point.

Using a first preset algorithm, the depth value confidence is obtained from the first reference point and the third reference point, and the first preset algorithm may be the following formula (Formula 10):

Wherein, dist is the confidence level of the depth value. After the confidence level of the depth value is determined, the second weight can be determined, and the second weight can be determined by the following formula (Formula 11):

conf_depth(x,y)=e ^-dist/5 (11)

Step S44, according to the first weight and the second weight, fuse the virtual viewpoint texture maps based on different preset reference viewpoints.

After the weights of the two parts are determined, the virtual viewpoint texture maps based on different preset reference viewpoints are weighted and fused, and the specific weighted fusion process can be the following formula (Formula 12):

Wherein, f _i (x, y) is the value of the pixel of the virtual viewpoint texture map, and f(x, y) is the pixel corresponding to f _i (x, y) in the virtual viewpoint texture map after weighted fusion The value of the point, conf _i (x, y) is the corresponding weight, including two parts, so the value of i is i=1, 2, and the calculation method of the weight can be the following formula (Formula 13):

conf _i (x,y)=conf_cam _i (x,y)*conf_depth _i (x,y) (13)

Among them, conf_cam _i (x, y) is the first weight, which is determined by the distance between the reference camera and the virtual camera, and conf_depth _i (x, y) is the second weight, which is determined by the confidence of the depth value, here The i of can be the number of the virtual cameras, and the calculation method of the first weight can be the following formula (Formula 14):

Wherein, d _i is the distance between the ith virtual camera and the reference camera.

In step S40, performing hole filling processing and foreground edge filtering processing on the fused virtual viewpoint texture map, including steps S45-S49:

Step S45, select the first center pixel point from the virtual viewpoint texture map, and create the first window of preset size;

Perform hole filling processing on the fused virtual viewpoint texture map. A preferred hole filling processing method may be a joint bilateral filtering method, and a pixel is selected as the first center pixel, which may also be called the first center pixel. The center pixel is the current pixel, and a window of a preset size is established with the current pixel as the center. For example, a first window of N*N size is created, and the first window contains N*N pixels. N may be 30 or 40. Except for the first central pixel, other pixels in the first window are neighboring pixels of the first central pixel.

Step S46, using the second preset algorithm to perform filling calculation on the values of the pixels in the first window, so as to perform hole filling processing;

Using the second preset algorithm, calculate the pixels in the first window to fill in the holes, through the calculation, you can find the discontinuous points in the depth map, and then through the calculation, fill the pixels, The filling method can be to use the average value of the pixel points in the neighborhood of the hole, or the weighted average value of the pixel points in the neighborhood to fill in the depth value of the hole pixel point, and the second preset algorithm can be the following formula (Formula 15 -18):

disp(i,j)=fB/depth(i,j) (16)

drange=maxdisp-mindisp+1 (17)

maxdisp=fB/depth, mindisp=fB/depth (18)

Wherein, img(x, y) is the pixel in the first window, disp(i, j) is the disparity value of the pixel (i, j), and depth(i, j) is the pixel (i, j) j) depth value, the value of fB is determined according to the internal parameters and position of the reference camera, drange is the range of maximum and minimum parallax values, determined by the maximum and minimum parallax values, and the maximum and minimum parallax values are determined by the corresponding reference camera The fB value of , and the maximum and minimum depth values are determined according to the shooting scene, where dtr is a parameter used to reduce the parallax range, and the value can be customized. For example, set the value of dtr to 0.01666667.

Step S47, determining the foreground edge area of the virtual viewpoint texture map, and marking the foreground edge area;

After the virtual viewpoint texture map is filled with holes, the edge area of the virtual viewpoint texture map is determined, and the edge area is marked, and the edge area is marked for filtering and smoothing the edge area. , to prevent ghosting or defects in the edge area.

Described determining the foreground edge region of the virtual viewpoint texture map, and marking the foreground edge region, including steps D1-D3:

Step D1, select a reference pixel, and determine the absolute value of the gradient of the depth value of the reference pixel;

Select a reference pixel at the edge position of the virtual viewpoint depth map, and determine the gradient value of the reference pixel. You can start from the horizontal and vertical directions to determine the horizontal gradient value and vertical gradient of the reference pixel. value, and calculate the absolute value of the horizontal gradient value and the vertical gradient value. For example, if the virtual viewpoint depth map is continuous, the absolute value of the gradient can be obtained by derivation of the edge area. If the virtual viewpoint texture map If there is discontinuity, the absolute value of the gradient can be obtained by forward difference quotient, backward difference quotient, or middle difference quotient.

Step D2, if the absolute value of the gradient is greater than a preset threshold, then determine that the reference pixel is an edge pixel;

If the absolute value of the gradient is greater than a preset threshold, the reference pixel is determined to be an edge pixel, and the preset threshold is determined by the maximum and minimum disparity value ranges drange and dtr parameters, which can be determined by the following formula (Formula 19 ) is calculated to get:

dthresh=dtr*drange (19)

Wherein, dthresh is the preset threshold.

Step D3: Extend a preset number of pixels around the edge pixels to determine the foreground edge area, and mark the foreground edge area.

After determining that the reference pixel is an edge pixel, expand a preset number of pixels around the reference pixel, for example, expand 4 pixels in the horizontal direction and the vertical direction, and the expanded area is the foreground edge area, marking the foreground edge area.

Step S48, select the second center pixel point from the foreground edge area, and create a second window of preset size;

Select a second center pixel point from the foreground edge area, and take the second center pixel point as the center to create a second window of a preset size, for example, take the second center pixel point as the center, and expand to the surrounding by 4 pixel points, a 5*5 window is created as the second window, and the second window is created to perform filtering processing on the foreground edge area.

Step S49 , using a third preset algorithm to perform filtering calculation on the values of the pixels in the second window, so as to perform filtering processing on the foreground edge region.

The third preset algorithm is used to calculate the pixels in the second window to filter the foreground edge region, which may be the average or weighted average of the pixels in the neighborhood window of a certain pixel. The value replaces the value of the pixel to achieve the purpose of filtering. Taking the use of the weighted average value of the pixels in the neighborhood window to replace the value of the pixel as an example, the formula of the third preset algorithm can be (Formula 20):

Among them, (x, y) is the selected pixel that should be replaced, f(x, y) is the pixel value of the pixel to be replaced, and f(i, j) is the pixel in the neighborhood of point (x, y) The pixel value of the point.

In this embodiment, the first weight is determined by determining the positional relationship between the virtual camera and the reference camera, and the second weight is determined by the depth value confidence of the virtual camera, according to the first weight and the The second weight performs weighted fusion on the virtual viewpoint texture map, and selects a center pixel from the fused virtual viewpoint texture map to create a first window and a second window of preset size, thereby determining the center Neighborhood pixels of the pixel points, using a preset algorithm to calculate the value of the neighborhood pixel points to perform hole filling and foreground edge filtering processing on the fused virtual viewpoint texture map, reducing the virtual viewpoint texture. The phenomenon of holes and ghosting in the image improves the subjective and objective quality of the virtual viewpoint texture image.

Referring to FIG. 4 , the first embodiment of the present application provides a virtual viewpoint depth map processing apparatus, and the virtual viewpoint depth map processing apparatus includes:

The forward mapping module 10 is configured to forward map the reference viewpoint depth map corresponding to the preset reference viewpoint to the position of the virtual viewpoint to obtain the virtual viewpoint depth map;

A post-processing module 20, configured to perform bilateral filtering and smoothing processing on the virtual viewpoint depth map to obtain a smooth virtual viewpoint depth map;

a reverse mapping module 30, configured to establish a reverse mapping relationship between the virtual viewpoint and the reference viewpoint according to the smooth virtual viewpoint depth map, so as to synthesize virtual viewpoint texture maps based on different preset reference viewpoints;

The fusion processing module 40 is configured to perform weighted fusion on the virtual viewpoint texture map, and perform hole filling processing and foreground edge filtering processing on the fused virtual viewpoint texture map to obtain a target virtual viewpoint depth map.

It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or system comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or system. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article or system that includes the element.

The above-mentioned serial numbers of the embodiments of the present application are only for description, and do not represent the advantages or disadvantages of the embodiments.

From the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course hardware can also be used, but in many cases the former is better implementation. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence or the parts that make contributions to the prior art. The computer software products are stored in a storage medium (such as ROM/RAM) as described above. , magnetic disk, optical disk), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.

The above are only the preferred embodiments of the present application, and are not intended to limit the patent scope of the present application. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present application, or directly or indirectly applied in other related technical fields , are similarly included within the scope of patent protection of this application.

Claims

A virtual viewpoint depth map processing method, wherein the virtual viewpoint depth map processing method comprises the following steps:

forwardly map the reference viewpoint depth map corresponding to the preset reference viewpoint to the position of the virtual viewpoint to obtain the virtual viewpoint depth map;

performing bilateral filtering and smoothing processing on the virtual viewpoint depth map to obtain a smooth virtual viewpoint depth map;

establishing a reverse mapping relationship between the virtual viewpoint and the reference viewpoint according to the smooth virtual viewpoint depth map, to synthesize virtual viewpoint texture maps based on different preset reference viewpoints;

Perform weighted fusion on the virtual viewpoint texture map, and perform hole filling processing and foreground edge filtering processing on the fused virtual viewpoint texture map to obtain a target virtual viewpoint depth map.
The virtual viewpoint depth map processing method according to claim 1, wherein the step of forwardly mapping the reference viewpoint depth map corresponding to the preset reference viewpoint to the position of the virtual viewpoint to obtain the virtual viewpoint depth map comprises:

According to the internal parameters of the reference camera corresponding to the preset reference viewpoint, a first spatial coordinate system is established, and the depth map of the reference viewpoint is mapped to the first spatial coordinate system to obtain the depth map of the reference viewpoint. the first space coordinate;

According to the internal parameters of the virtual camera corresponding to the virtual viewpoint, a second spatial coordinate system is established, and the first spatial coordinate is mapped to the second spatial coordinate system to obtain the second spatial coordinate system of the reference viewpoint depth map. space coordinates;

Perform inverse transformation on the second space coordinate to obtain a first mapping relationship between the second space coordinate system and the image coordinates of the virtual viewpoint;

According to the first mapping relationship, the depth map of the reference viewpoint is mapped to the position of the virtual viewpoint to obtain the depth map of the virtual viewpoint.
The method for processing a virtual viewpoint depth map according to claim 1, wherein the inverse mapping relationship between the virtual viewpoint and the reference viewpoint is established according to the smooth virtual viewpoint depth map, so as to synthesize the different The step of presetting the virtual viewpoint texture map of the reference viewpoint includes:

performing reverse mapping on the smooth virtual viewpoint depth map to obtain a second mapping relationship between the spatial coordinate system of the virtual viewpoint and the spatial coordinate system of the preset reference viewpoint;

According to the second mapping relationship, the reference viewpoint texture map corresponding to the preset reference viewpoint is mapped to the position of the virtual viewpoint to obtain the virtual viewpoint texture map.
The virtual viewpoint depth map processing method according to claim 1, wherein the step of performing weighted fusion on the virtual viewpoint texture map comprises:

determining the first position of the reference camera corresponding to the preset reference viewpoint, and the second position of the virtual camera corresponding to the virtual viewpoint;

determining a first weight according to the positional relationship between the first position and the second position;

determining a depth value confidence of the virtual camera position, and determining a second weight according to the depth value confidence;

According to the first weight and the second weight, the virtual viewpoint texture maps based on different preset reference viewpoints are fused.
The virtual viewpoint depth map processing method according to claim 4, wherein the step of determining the depth value confidence of the virtual camera position comprises:

Determine a first reference point from the virtual viewpoint depth map, and map the first reference point to the space coordinate system where the first position is located to obtain a second reference point;

According to the position depth value of the reference camera, the second reference point is mapped to the space coordinate system where the second position is located to obtain a third reference point;

According to the coordinates of the first reference point and the coordinates of the third reference point, a first preset algorithm is used to determine the confidence level of the depth value.
The virtual viewpoint depth map processing method according to claim 1, wherein the step of performing hole filling processing and foreground edge filtering processing on the fused virtual viewpoint texture map comprises:

Select the first center pixel from the virtual viewpoint texture map, and create a first window of a preset size;

Using a second preset algorithm to perform filling calculation on the values of the pixel points in the first window to perform hole filling processing;

determining the foreground edge region of the virtual viewpoint texture map, and marking the foreground edge region;

Select the second center pixel point from the foreground edge area, and create a second window of preset size;

By using the third preset algorithm, filtering calculation is performed on the values of the pixel points in the second window, so as to perform filtering processing on the foreground edge region.
The virtual viewpoint depth map processing method according to claim 6, wherein the step of determining the foreground edge area of the virtual viewpoint texture map and marking the foreground edge area comprises:

Selecting a reference pixel, and determining the absolute value of the gradient of the depth value of the reference pixel;

If the absolute value of the gradient is greater than a preset threshold, determining that the reference pixel is an edge pixel;

Extending a preset number of pixels around the edge pixels to determine the foreground edge area, and marking the foreground edge area.
A virtual viewpoint depth map processing device, wherein the depth map post-processing device includes:

a forward mapping module, configured to forward map the reference viewpoint depth map corresponding to the preset reference viewpoint to the position of the virtual viewpoint to obtain the virtual viewpoint depth map;

a post-processing module for performing bilateral filtering and smoothing on the virtual viewpoint depth map to obtain a smooth virtual viewpoint depth map;

a reverse mapping module, configured to establish a reverse mapping relationship between the virtual viewpoint and the reference viewpoint according to the smooth virtual viewpoint depth map, so as to synthesize virtual viewpoint texture maps based on different preset reference viewpoints;

The fusion processing module is configured to perform weighted fusion on the virtual viewpoint texture map, and perform hole filling processing and foreground edge filtering processing on the fused virtual viewpoint texture map to obtain the target virtual viewpoint depth map.
A virtual view depth map processing device, wherein the virtual view depth map processing device comprises: a memory, a processor, and computer-readable instructions stored on the memory for implementing the virtual view depth map processing method, the The processor is configured to execute computer-readable instructions for implementing the virtual viewpoint depth map processing method, so as to implement the following steps:

forwardly map the reference viewpoint depth map corresponding to the preset reference viewpoint to the position of the virtual viewpoint to obtain the virtual viewpoint depth map;

performing bilateral filtering and smoothing processing on the virtual viewpoint depth map to obtain a smooth virtual viewpoint depth map;

establishing a reverse mapping relationship between the virtual viewpoint and the reference viewpoint according to the smooth virtual viewpoint depth map, to synthesize virtual viewpoint texture maps based on different preset reference viewpoints;

Perform weighted fusion on the virtual viewpoint texture map, and perform hole filling processing and foreground edge filtering processing on the fused virtual viewpoint texture map to obtain a target virtual viewpoint depth map.
The virtual viewpoint depth map processing device according to claim 9, wherein the step of forwardly mapping the reference viewpoint depth map corresponding to the preset reference viewpoint to the position of the virtual viewpoint to obtain the virtual viewpoint depth map comprises:

According to the internal parameters of the reference camera corresponding to the preset reference viewpoint, a first spatial coordinate system is established, and the depth map of the reference viewpoint is mapped to the first spatial coordinate system to obtain the depth map of the reference viewpoint. the first space coordinate;

According to the internal parameters of the virtual camera corresponding to the virtual viewpoint, a second spatial coordinate system is established, and the first spatial coordinate is mapped to the second spatial coordinate system to obtain the second spatial coordinate system of the reference viewpoint depth map. space coordinates;

Inversely transform the second space coordinate to obtain a first mapping relationship between the second space coordinate system and the image coordinates of the virtual viewpoint;

According to the first mapping relationship, the depth map of the reference viewpoint is mapped to the position of the virtual viewpoint to obtain the depth map of the virtual viewpoint.
The virtual viewpoint depth map processing device according to claim 9, wherein the inverse mapping relationship between the virtual viewpoint and the reference viewpoint is established according to the smooth virtual viewpoint depth map, so as to synthesize the different The step of presetting the virtual viewpoint texture map of the reference viewpoint includes:

performing reverse mapping on the smooth virtual viewpoint depth map to obtain a second mapping relationship between the spatial coordinate system of the virtual viewpoint and the spatial coordinate system of the preset reference viewpoint;

According to the second mapping relationship, the reference viewpoint texture map corresponding to the preset reference viewpoint is mapped to the position of the virtual viewpoint to obtain the virtual viewpoint texture map.
The virtual viewpoint depth map processing device according to claim 9, wherein the step of performing weighted fusion on the virtual viewpoint texture map comprises:

determining the first position of the reference camera corresponding to the preset reference viewpoint, and the second position of the virtual camera corresponding to the virtual viewpoint;

determining a first weight according to the positional relationship between the first position and the second position;

determining a depth value confidence of the virtual camera position, and determining a second weight according to the depth value confidence;

According to the first weight and the second weight, the virtual viewpoint texture maps based on different preset reference viewpoints are fused.
The virtual viewpoint depth map processing method according to claim 12, wherein the step of determining the depth value confidence of the virtual camera position comprises:

Determine a first reference point from the virtual viewpoint depth map, and map the first reference point to the space coordinate system where the first position is located to obtain a second reference point;

According to the position depth value of the reference camera, the second reference point is mapped to the space coordinate system where the second position is located to obtain a third reference point;

According to the coordinates of the first reference point and the coordinates of the third reference point, a first preset algorithm is used to determine the confidence level of the depth value.
A storage medium, wherein the storage medium stores computer-readable instructions for implementing the virtual viewpoint depth map processing method, and the computer-readable instructions for implementing the virtual viewpoint depth map processing method are executed by a processor to achieve the following The steps of the virtual viewpoint depth map processing method:

forwardly map the reference viewpoint depth map corresponding to the preset reference viewpoint to the position of the virtual viewpoint to obtain the virtual viewpoint depth map;

performing bilateral filtering and smoothing processing on the virtual viewpoint depth map to obtain a smooth virtual viewpoint depth map;

establishing a reverse mapping relationship between the virtual viewpoint and the reference viewpoint according to the smooth virtual viewpoint depth map, to synthesize virtual viewpoint texture maps based on different preset reference viewpoints;

Perform weighted fusion on the virtual viewpoint texture map, and perform hole filling processing and foreground edge filtering processing on the fused virtual viewpoint texture map to obtain a target virtual viewpoint depth map.
The storage medium according to claim 14, wherein the step of forwardly mapping the reference viewpoint depth map corresponding to the preset reference viewpoint to the position of the virtual viewpoint to obtain the virtual viewpoint depth map comprises:

According to the internal parameters of the reference camera corresponding to the preset reference viewpoint, a first spatial coordinate system is established, and the depth map of the reference viewpoint is mapped to the first spatial coordinate system to obtain the depth map of the reference viewpoint. the first space coordinate;

According to the internal parameters of the virtual camera corresponding to the virtual viewpoint, a second spatial coordinate system is established, and the first spatial coordinate is mapped to the second spatial coordinate system to obtain the second spatial coordinate system of the reference viewpoint depth map. space coordinates;

Perform inverse transformation on the second space coordinate to obtain a first mapping relationship between the second space coordinate system and the image coordinates of the virtual viewpoint;

According to the first mapping relationship, the depth map of the reference viewpoint is mapped to the position of the virtual viewpoint to obtain the depth map of the virtual viewpoint.
The storage medium according to claim 14, wherein, according to the smoothed virtual viewpoint depth map, a reverse mapping relationship between the virtual viewpoint and the reference viewpoint is established to synthesize the reference viewpoints based on different presets The steps of the virtual viewpoint texture map include:

performing reverse mapping on the smooth virtual viewpoint depth map to obtain a second mapping relationship between the spatial coordinate system of the virtual viewpoint and the spatial coordinate system of the preset reference viewpoint;

According to the second mapping relationship, the reference viewpoint texture map corresponding to the preset reference viewpoint is mapped to the position of the virtual viewpoint to obtain the virtual viewpoint texture map.
The storage medium of claim 14, wherein the step of performing weighted fusion on the virtual viewpoint texture map comprises:

determining the first position of the reference camera corresponding to the preset reference viewpoint, and the second position of the virtual camera corresponding to the virtual viewpoint;

determining a first weight according to the positional relationship between the first position and the second position;

determining a depth value confidence of the virtual camera position, and determining a second weight according to the depth value confidence;

According to the first weight and the second weight, the virtual viewpoint texture maps based on different preset reference viewpoints are fused.
The storage medium of claim 17, wherein the step of determining the depth value confidence of the virtual camera position comprises:

Determine the first reference point from the depth map of the virtual viewpoint, and map the first reference point to the space coordinate system where the first position is located to obtain the second reference point;

According to the position depth value of the reference camera, the second reference point is mapped to the space coordinate system where the second position is located to obtain a third reference point;

According to the coordinates of the first reference point and the coordinates of the third reference point, a first preset algorithm is used to determine the confidence level of the depth value.
The storage medium according to claim 14, wherein the step of performing hole filling processing and foreground edge filtering processing on the fused virtual viewpoint texture map comprises:

Select the first center pixel from the virtual viewpoint texture map, and create a first window of a preset size;

Using the second preset algorithm, perform filling calculation on the value of the pixel points in the first window, so as to perform hole filling processing;

determining the foreground edge region of the virtual viewpoint texture map, and marking the foreground edge region;

Select the second center pixel point from the foreground edge area, and create a second window of preset size;

By using the third preset algorithm, filtering calculation is performed on the values of the pixel points in the second window, so as to perform filtering processing on the foreground edge region.
The storage medium according to claim 19, wherein the step of determining the foreground edge region of the virtual viewpoint texture map and marking the foreground edge region comprises:

Selecting a reference pixel, and determining the absolute value of the gradient of the depth value of the reference pixel;

If the absolute value of the gradient is greater than a preset threshold, determining that the reference pixel is an edge pixel;

Extending a preset number of pixels around the edge pixels to determine the foreground edge area, and marking the foreground edge area.