CN112581389A

CN112581389A - Virtual viewpoint depth map processing method, equipment, device and storage medium

Info

Publication number: CN112581389A
Application number: CN202011419908.8A
Authority: CN
Inventors: 王荣刚; 刘香凝; 王振宇; 蔡砚刚
Original assignee: Peking University Shenzhen Graduate School
Current assignee: Peking University Shenzhen Graduate School
Priority date: 2020-12-04
Filing date: 2020-12-04
Publication date: 2021-03-30
Also published as: WO2022116397A1

Abstract

The invention discloses a method, equipment, a device and a storage medium for processing a depth map of a virtual viewpoint, wherein the method comprises the steps of mapping a reference viewpoint depth map corresponding to a preset reference viewpoint to the position of the virtual viewpoint in a forward direction to obtain the depth map of the virtual viewpoint, smoothing the depth map of the virtual viewpoint by using a preset filtering method, reserving edge information of the depth map, mapping the smoothed depth map of the virtual viewpoint to the position of the preset reference viewpoint in a reverse direction to synthesize virtual viewpoint texture maps based on different preset reference viewpoints, performing weighted fusion on the virtual viewpoint texture maps, and performing cavity filling processing and foreground edge filtering processing on the fused virtual viewpoint texture maps, thereby improving the quality of the depth map of the virtual viewpoint.

Description

Virtual viewpoint depth map processing method, equipment, device and storage medium

Technical Field

The present invention relates to the field of image processing, and in particular, to a method, an apparatus, a device, and a storage medium for processing a depth map of a virtual viewpoint.

Background

Depth map-based image rendering (DIBR) is an important technology in the field of virtual viewpoint synthesis, and the technology can obtain a view of any virtual viewpoint by using a texture map and a depth map of a reference viewpoint through three-dimensional coordinate transformation. In the process of synthesizing the texture map of the virtual viewpoint by using the DIBR technology, part of the background texture map is invisible in the reference viewpoint due to being shielded by foreground objects and is visible in the virtual viewpoint, in this case, a hole region may exist in the texture map of the virtual viewpoint, a depth discontinuous region existing in the depth map is a cause of the generation of the hole, and when the hole is filled by using a depth map smoothing method based on Gaussian filtering, mean filtering, median filtering and the like, the edge information of the depth map is not retained, so that the phenomenon that ghost images exist in the edge region of the synthesized texture map is caused, and therefore the quality of the depth map of the virtual viewpoint is poor.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide a method, equipment, a device and a storage medium for processing a virtual viewpoint depth map, and aims to solve the technical problems that the existing depth map smoothing method cannot keep edge information of the depth map, so that ghost phenomena exist in the edge area of a synthesized texture map, and the quality of the virtual viewpoint depth map is poor.

In order to achieve the above object, the present invention provides a virtual viewpoint depth map processing method, including:

mapping a reference viewpoint depth map corresponding to a preset reference viewpoint to the position of a virtual viewpoint in a forward direction to obtain a virtual viewpoint depth map;

carrying out bilateral filtering smoothing treatment on the virtual viewpoint depth map to obtain a smooth virtual viewpoint depth map;

establishing a reverse mapping relation between the virtual viewpoint and the reference viewpoint according to the smooth virtual viewpoint depth map so as to synthesize virtual viewpoint texture maps based on different preset reference viewpoints;

and performing weighted fusion on the virtual viewpoint texture map, and performing hole filling processing and foreground edge filtering processing on the fused virtual viewpoint texture map to obtain a target virtual viewpoint depth map.

Optionally, the step of mapping the reference viewpoint depth map corresponding to the preset reference viewpoint to the position of the virtual viewpoint in the forward direction to obtain the virtual viewpoint depth map includes:

establishing a first space coordinate system according to the internal parameters of a reference camera corresponding to the preset reference viewpoint, and mapping the reference viewpoint depth map to the first space coordinate system to obtain a first space coordinate of the reference viewpoint depth map;

establishing a second space coordinate system according to the internal parameters of the virtual camera corresponding to the virtual viewpoint, and mapping the first space coordinate to the second space coordinate system to obtain a second space coordinate of the reference viewpoint depth map;

performing inverse transformation on the second space coordinate to obtain a first mapping relation between the second space coordinate system and the image coordinate of the virtual viewpoint;

and mapping the reference viewpoint depth map to the position of the virtual viewpoint according to the first mapping relation to obtain the virtual viewpoint depth map.

Optionally, the step of establishing a reverse mapping relationship between the virtual viewpoint and the reference viewpoint according to the smoothed virtual viewpoint depth map to synthesize a virtual viewpoint texture map based on different preset reference viewpoints includes:

performing reverse mapping on the smoothed virtual viewpoint depth map to obtain a second mapping relation between the space coordinate system of the virtual viewpoint and the space coordinate system of the preset reference viewpoint;

and mapping the reference viewpoint texture map corresponding to the preset reference viewpoint to the position of the virtual viewpoint according to the second mapping relation to obtain the virtual viewpoint texture map.

Optionally, the step of performing weighted fusion on the virtual viewpoint texture map includes:

determining a first position of a reference camera corresponding to the preset reference viewpoint and a second position of a virtual camera corresponding to the virtual viewpoint;

determining a first weight according to the position relation between the first position and the second position;

determining a depth value confidence of the virtual camera position, and determining a second weight according to the depth value confidence;

and fusing the virtual viewpoint texture maps based on different preset reference viewpoints according to the first weight and the second weight.

Optionally, the step of determining a depth value confidence of the virtual camera position includes:

determining a first reference point from the virtual viewpoint depth map, and mapping the first reference point to a space coordinate system where the first position is located to obtain a second reference point;

mapping the second reference point to a space coordinate system where the second position is located according to the position depth value of the reference camera to obtain a third reference point;

and determining the confidence coefficient of the depth value by utilizing a first preset algorithm according to the coordinates of the first reference point and the coordinates of the third reference point.

Optionally, the step of performing hole filling processing and foreground edge filtering processing on the fused virtual viewpoint texture map includes:

selecting a first central pixel point from the virtual viewpoint texture map, and creating a first window with a preset size;

filling calculation is carried out on the values of the pixel points in the first window by utilizing a second preset algorithm so as to carry out hole filling processing;

determining a foreground edge area of the virtual viewpoint texture map, and marking the foreground edge area;

selecting a second center pixel point from the foreground edge area, and creating a second window with a preset size;

and performing filtering calculation on the values of the pixel points in the second window by using a third preset algorithm so as to perform filtering processing on the foreground edge area.

Optionally, the step of determining a foreground edge region of the virtual viewpoint texture map and marking the foreground edge region includes:

selecting a reference pixel point, and determining the gradient absolute value of the depth value of the reference pixel point;

if the absolute value of the gradient is larger than a preset threshold value, determining the reference pixel point as an edge pixel point;

and expanding a preset number of pixel points to the periphery of the edge pixel points to determine the foreground edge area, and marking the foreground edge area.

In addition, to achieve the above object, the present invention further provides a virtual viewpoint depth map processing apparatus, including:

the forward mapping module is used for mapping a reference viewpoint depth map corresponding to a preset reference viewpoint to the position of a virtual viewpoint in a forward direction to obtain a virtual viewpoint depth map;

the post-processing module is used for carrying out bilateral filtering smoothing processing on the virtual viewpoint depth map to obtain a smooth virtual viewpoint depth map;

the inverse mapping module is used for establishing an inverse mapping relation between the virtual viewpoint and the reference viewpoint according to the smooth virtual viewpoint depth map so as to synthesize a virtual viewpoint texture map based on different preset reference viewpoints;

and the fusion processing module is used for performing weighted fusion on the virtual viewpoint texture map, and performing cavity filling processing and foreground edge filtering processing on the fused virtual viewpoint texture map to obtain a target virtual viewpoint depth map.

Compared with the prior art that the edge information of the depth map cannot be reserved when the depth map is subjected to smoothing processing and cavity filling by using the existing algorithm, so that ghost images exist in the edge area of a synthesized texture map, and the quality of the virtual viewpoint depth map is poor, the method obtains the virtual viewpoint depth map by mapping the reference viewpoint depth map corresponding to the preset reference viewpoint in the forward direction to the position of the virtual viewpoint, performs bilateral filtering smoothing processing on the virtual viewpoint depth map, can effectively smooth the virtual viewpoint depth map and reserve the edge information, establishes a reverse mapping relation between the virtual viewpoint and the reference viewpoint according to the smoothed virtual viewpoint depth map so as to synthesize the virtual viewpoint texture map based on different preset reference viewpoints, and performing weighted fusion on the virtual viewpoint texture map, and performing hole filling processing and foreground edge filtering processing on the fused virtual viewpoint texture map, so that the virtual viewpoint depth map is effectively smoothed, the edge information of the depth map is also reserved, and after holes in the virtual viewpoint texture map are filled, the foreground edge of the texture map is subjected to filtering processing, so that the quality of the virtual viewpoint texture map is improved, and further the quality of the virtual viewpoint depth map is improved.

Drawings

FIG. 1 is a schematic diagram of a terminal \ device structure of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a virtual viewpoint depth map processing method according to a first embodiment of the present invention;

FIG. 3 is a detailed flowchart of step S40 in FIG. 2;

fig. 4 is a schematic diagram of a device for processing a virtual viewpoint depth map according to the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in itself. Thus, "module", "component" or "unit" may be used mixedly.

The axis motion configuration terminal (also called terminal, equipment or terminal equipment) in the embodiment of the invention can be a PC (personal computer), and can also be a mobile terminal equipment with a display function, such as a smart phone, a tablet computer, a portable computer and the like.

As shown in fig. 1, the terminal may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Optionally, the terminal may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like. Such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display screen according to the brightness of ambient light, and a proximity sensor that may turn off the display screen and/or the backlight when the mobile terminal is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), detect the magnitude and direction of gravity when the mobile terminal is stationary, and can be used for applications (such as horizontal and vertical screen switching, related games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer and tapping) and the like for recognizing the attitude of the mobile terminal; of course, the mobile terminal may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which are not described herein again.

Those skilled in the art will appreciate that the terminal structure shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, a program of an operating system, a network communication module, a user interface module, and a virtual viewpoint depth map processing method may be included in a memory 1005, which is a kind of computer storage medium.

In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to call a program of a virtual viewpoint depth map processing method stored in the memory 1005, which when executed by the processor, implements an operation in the virtual viewpoint depth map processing method provided by the following embodiments.

Based on the hardware structure of the equipment, the embodiment of the virtual viewpoint depth map processing method is provided.

Referring to fig. 2, a first embodiment of the present invention provides a virtual viewpoint depth map processing method, including:

step S10, forward mapping a reference viewpoint depth map corresponding to a preset reference viewpoint to the position of a virtual viewpoint to obtain a virtual viewpoint depth map;

step S20, carrying out bilateral filtering smoothing treatment on the virtual viewpoint depth map to obtain a smooth virtual viewpoint depth map;

step S30, according to the smoothed virtual viewpoint depth map, establishing a reverse mapping relation between the virtual viewpoint and the reference viewpoint to synthesize a virtual viewpoint texture map based on different preset reference viewpoints;

and step S40, performing weighted fusion on the virtual viewpoint texture map, and performing hole filling processing and foreground edge filtering processing on the fused virtual viewpoint texture map to obtain a target virtual viewpoint depth map.

Specifically, step S10, forward map a reference viewpoint depth map corresponding to a preset reference viewpoint to a position of a virtual viewpoint, to obtain a virtual viewpoint depth map;

the preset reference viewpoint may be an actual shooting viewpoint of a real camera, the virtual viewpoint may be a shooting viewpoint of a virtual camera, the virtual camera may be converted through a shooting angle of the real camera, an image actually shot by the real camera at the preset reference viewpoint is a reference viewpoint image, a depth map of the reference viewpoint image is the reference viewpoint depth map, and the reference viewpoint image may be a depth map of an original image shot by the real camera. In this embodiment, the mapping between the image and the pixel in the image is based on a coordinate system, a planar coordinate system is established with the reference viewpoint depth map and the virtual viewpoint depth map as references, a three-dimensional space coordinate system is established with the virtual camera and the reference camera as reference systems, and the mapping between the image and the pixel can be regarded as coordinate transformation based on the coordinate system. And mapping the reference viewpoint depth map to the position of the virtual viewpoint in the forward direction to obtain the virtual viewpoint depth map. The forward mapping refers to mapping pixel points on the reference viewpoint depth map from a two-dimensional coordinate system to a three-dimensional coordinate system of the real camera, then performing translation and rotation, mapping from the coordinate system of the real camera to a coordinate system of the virtual camera, and finally performing inverse transformation, and mapping from a three-dimensional space coordinate system to a two-dimensional coordinate system, namely mapping from the coordinate system of the virtual camera to the position of the virtual viewpoint.

The step of mapping the reference viewpoint depth map corresponding to the preset reference viewpoint to the position of the virtual viewpoint in the forward direction to obtain the virtual viewpoint depth map includes steps a1-a 4:

step A1, establishing a first spatial coordinate system according to the internal parameters of the reference camera corresponding to the preset reference viewpoint, and mapping the reference viewpoint depth map to the first spatial coordinate system to obtain a first spatial coordinate of the reference viewpoint depth map;

the internal parameters of the reference camera comprise focal length, optical center coordinates and the like, a first space coordinate system is established according to the internal parameters of the reference camera, the first space coordinate system is a three-dimensional space coordinate system and comprises the position of the reference camera,the first spatial coordinate system may be established with the position of the reference camera as an origin, the reference viewpoint depth map may establish a planar coordinate system, thereby obtaining two-dimensional coordinates of the reference viewpoint depth map, and image coordinates of the reference viewpoint depth map are mapped into the first spatial coordinate system to obtain first spatial coordinates of the reference viewpoint depth map, for example, transforming the image coordinates (u, v) of the reference viewpoint depth map to coordinates [ X, Y, Z ] under a reference camera coordinate system]^TThe mapping transformation formula can be as follows (formula 1):

wherein [ u, v, 1 ]]^TIs the homogeneous coordinates of the pixel points of the reference viewpoint depth map, Z is the depth value of the pixel points, which can be determined by the internal parameters and positions of the real camera, [ X, Y, Z]_TIs the coordinate of the corresponding real object in the first space coordinate system, f_x、f_y、c_x、c_yIs an internal parameter of the real camera, f_x、f_yFocal lengths in x, y directions, c, respectively_x、c_yThe optical center coordinates in the x and y directions, respectively.

Step A2, establishing a second spatial coordinate system according to the internal parameters of the virtual camera corresponding to the virtual viewpoint, and mapping the first spatial coordinate to the second spatial coordinate system to obtain a second spatial coordinate of the reference viewpoint depth map;

the internal parameters of the virtual camera may be consistent with the internal parameters of the real camera, a second spatial coordinate system is established according to the internal parameters of the virtual camera by referring to the first spatial coordinate system, the second spatial coordinate system includes the position of the virtual camera, the first spatial coordinate is mapped to the second spatial coordinate system to obtain a second spatial coordinate of the reference viewpoint depth map, and a mapping transformation formula is as follows (formulas 2 to 4):

T＝R_v×(T_c-T_v) (4)

wherein R is a rotation matrix of 3X3, T is a translation vector, [ X, Y, Z]^TIs the coordinate of a pixel point in the first space coordinate system, i.e. the first space coordinate, [ X ]₁,Y₁,Z₁]^TIs a second spatial coordinate, R, after said first spatial coordinate is mapped to said second spatial coordinate system_cAnd T_cIs a rotation matrix and a translation matrix, R, of the preset reference viewpoint_vAnd T_vAre the rotation matrix and the translation matrix of the virtual viewpoint.

Step A3, inverse transforming the second space coordinate to obtain a first mapping relation between the second space coordinate system and the image coordinate of the virtual viewpoint;

transforming the second space coordinate, namely mapping the second space coordinate from a three-dimensional space coordinate system to a two-dimensional coordinate system, mapping the second space coordinate to the virtual viewpoint depth map, to obtain a first mapping relation between the second space coordinate system and the image coordinate of the virtual viewpoint, namely a mapping relation between the image of the virtual viewpoint and the virtual camera, wherein an inverse-transformed mapping transformation formula is as follows (formula 5):

wherein Z is₁Is the depth value of a pixel point, which can be determined by the internal parameters and position of the virtual camera, f_x、f_y、c_x、c_yAs internal parameters of the virtual camera, can be compared with the realThe internal parameters of the cameras are consistent, [ u ]₁,v₁,1]^TIs a pixel point (u) in the image coordinate system of the virtual viewpoint₁,v₁) Homogeneous coordinates of (a).

Step A4, according to the first mapping relation, mapping the reference viewpoint depth map to the position of the virtual viewpoint to obtain the virtual viewpoint depth map.

According to the first mapping relation, a forward mapping relation of the reference viewpoint depth map to the virtual viewpoint, namely point (u, v) to point (u)₁,v₁) The mapping relationship of (a) to (b) may map a point on the reference viewpoint depth map to the position of the virtual viewpoint, that is, according to the forward mapping relationship, the reference viewpoint depth map is mapped to the position of the virtual viewpoint to obtain the virtual viewpoint depth map.

and performing bilateral filtering smoothing processing on the virtual viewpoint depth map by using a preset algorithm, wherein if a part of background texture map in the reference viewpoint is shielded by a foreground object, the background texture map is invisible in the reference viewpoint texture map and is visible in the virtual viewpoint texture map. One preferred method in the preset algorithm is a bilateral filtering algorithm, and a formula of the bilateral filtering algorithm is shown as (formula 6):

wherein d (i, j) is the depth value of the pixel point in the neighborhood of the pixel point (x, y), and the size of (i, j) is determined by the filtering radius, which can be determined by experiment and set by self-definition, for example, the filtering radius is set to 7 pixel points, then (i, j) is set to (x, y)The circle center, the pixel points in the range determined by taking 7 pixel points as the radius,

for the standard deviation of bilateral filtering, it can be set by user, for example, when the filtering radius is set to 7 pixel points

The bilateral filtering method adopts a weighted average method, the depth value d (x, y) of a central pixel point (x, y) in a range is determined by the weighted average of the depth values d (i, j) in a neighborhood range, the weighted average method is based on Gaussian distribution, the weight of the bilateral filtering method not only considers the Euclidean distance of the pixel point, namely the influence of the position on the central pixel, but also considers the distance of the depth value, in an area with small depth value change, the weight of a space domain plays a main role, namely Gaussian smoothing is performed, in an edge area of an image, the depth value is changed greatly, the weight of the range area of the pixel point is increased, so that the information of the edge can be reserved, and the generation of holes is reduced.

the inverse mapping refers to mapping pixel points on the virtual viewpoint depth map to a coordinate system of the virtual camera according to the depth of the virtual viewpoint, then performing translation and rotation, mapping from the coordinate system of the virtual camera to a coordinate system of the real camera, and mapping from a three-dimensional coordinate system to a two-dimensional coordinate system through inverse transformation, that is, mapping from the coordinate system of the real camera to a coordinate system of the reference viewpoint, wherein the inverse mapping process is as follows (formula 7-9):

mapping pixel points on the virtual viewpoint depth map to the coordinate system of the virtual camera according to the depth of the virtual viewpoint:

wherein, equation 7 is the same as equation 5, [ u ]₁,v₁,1]^TIs a pixel point (u)₁,v₁) Z is a homogeneous coordinate of₁Is the depth value of the pixel, [ X ]₁,Y₁,Z₁]^TIs a pixel point (u)₁,v₁) Coordinates of the corresponding real object in the virtual camera coordinate system, f_x、f_y、c_x、c_yThe internal parameters of the virtual camera are the focal length and the optical center coordinate in the x direction and the y direction respectively.

Mapping from under the coordinate system of the virtual camera to under the coordinate system of the real camera:

wherein R 'is a rotation matrix of 3X3, T' is a translation vector, [ X [ ]₁,Y₁,Z₁]^TIs the coordinates of the pixel in the virtual camera coordinate system, [ X ', Y ', Z ']^TIs the coordinate of the pixel point under the real camera coordinate system.

Mapping from the coordinate system of the real camera to the coordinate system of the reference viewpoint:

wherein Z ' is the depth value of the real camera, and (u ', v ') is the coordinate of the pixel point in the reference viewpoint coordinate system.

The establishing of the inverse mapping relationship between the virtual viewpoint and the reference viewpoint according to the smoothed virtual viewpoint depth map to synthesize a virtual viewpoint texture map based on different preset reference viewpoints includes steps B1-B2:

step B1, inverse mapping the smoothed virtual viewpoint depth map to obtain a second mapping relation between the space coordinate system of the virtual viewpoint and the space coordinate system of the preset reference viewpoint;

and performing reverse mapping on the smoothed virtual viewpoint depth map to obtain a second mapping relation between the space coordinate system of the virtual viewpoint and the space coordinate system of the preset reference viewpoint, wherein the reference camera is the real camera, and the second mapping relation between the reference viewpoint and the space coordinate system of the virtual viewpoint is obtained through reverse mapping, that is, space coordinate systems are respectively established according to the virtual viewpoint and the preset reference viewpoint, and the mapping relation between the space coordinate system of the preset reference viewpoint and the space coordinate system of the virtual viewpoint is found through reverse mapping.

Step B2, according to the second mapping relation, mapping the reference viewpoint texture map corresponding to the preset reference viewpoint to the position of the virtual viewpoint to obtain the virtual viewpoint texture map.

According to the second mapping relationship, a point (u) on the virtual viewpoint image can be obtained₁,v₁) And mapping the reference viewpoint texture map to the position of the virtual viewpoint according to the reverse mapping relation of the points (u ', v') on the reference viewpoint, thereby obtaining the virtual viewpoint texture map. The reference viewpoints and the reference cameras can be multiple, and each reference viewpoint corresponds to one reference viewpoint texture map, so that the reference viewpoint texture map corresponding to each reference viewpoint can be mapped reversely to obtain a virtual viewpoint texture map, and virtual viewpoint texture maps based on different preset reference viewpoints are obtained.

And performing weighted fusion on the texture maps of the virtual viewpoints, wherein the texture maps obtained from different reference viewpoints have different qualities, different side-weight orientations of object shooting and different parts shielded by foreground objects, so that different weights are required to be given to the texture maps obtained based on different preset reference viewpoints, and then performing fusion, so that the obtained target virtual viewpoint depth map retains the edge information of the depth map. And after the virtual viewpoint texture images are weighted and fused, carrying out hole filling processing and foreground edge filtering processing on the fused texture images, filling holes existing in the depth images, and filtering the foreground edges to smooth the foreground edges and ensure the integrity of edge information.

In this embodiment, a reference viewpoint depth map corresponding to a preset reference viewpoint is mapped to a position of a virtual viewpoint in a forward direction to obtain a virtual viewpoint depth map, the virtual viewpoint depth map is smoothed by using a preset filtering method to obtain a smooth virtual viewpoint depth map and retain edge information of the depth map, a reverse mapping relationship between the virtual viewpoint and the reference viewpoint is established according to the smooth virtual viewpoint depth map, virtual viewpoint texture maps based on different preset reference viewpoints are synthesized, the virtual viewpoint texture maps are weighted and fused, and a hole filling process and a foreground edge filtering process are performed on the fused virtual viewpoint texture map, so that a ghost phenomenon of a synthesized texture map edge region is reduced, and the quality of the virtual viewpoint texture map is improved.

Further, referring to fig. 3, a second embodiment of the present invention provides a virtual viewpoint depth map processing method, which is based on the above-mentioned embodiment shown in fig. 2, and this embodiment is a refinement of step S40 in the first embodiment.

Specifically, in step S40, the weighted fusion of the virtual viewpoint texture maps includes steps S41-S44:

step S41, determining a first position of the reference camera corresponding to the preset reference viewpoint and a second position of the virtual camera corresponding to the virtual viewpoint;

similarly, in this embodiment, the mapping between the image and the pixel in the image is also based on a coordinate system, a planar coordinate system is established with the reference viewpoint depth map and the virtual viewpoint depth map as references, a three-dimensional space coordinate system is established with the virtual camera and the reference camera as reference systems, and the mapping between the image and the pixel can be regarded as coordinate transformation based on the coordinate system. The weight of the virtual viewpoint texture map comprises two parts, wherein the first part of the weight is determined by the positions of the virtual camera and the reference camera, and therefore, the position of the reference camera, i.e. the first position, and the position of the virtual camera, i.e. the second position, need to be determined.

Step S42, determining a first weight according to the position relation between the first position and the second position;

determining a distance between the reference camera and the virtual camera according to a position relationship between the first position and the second position, determining a first weight from the distance between the reference camera and the virtual camera, e.g. when the distance between the reference camera and the virtual camera is d_iThen, the first weight may be equal to d_iInversely proportional to the first weight.

Step S43, determining a depth value confidence coefficient of the virtual camera position, and determining a second weight according to the depth value confidence coefficient;

the weight of the virtual viewpoint texture map further includes a second weight, the second weight is determined by the depth value confidence of the virtual camera position, the depth value confidence of the virtual camera position is determined first, and the second weight is determined according to the depth value confidence.

The determining a depth value confidence for the virtual camera position, comprising steps C1-C3:

step C1, determining a first reference point from the virtual viewpoint depth map, and mapping the first reference point to a space coordinate system where the first position is located to obtain a second reference point;

determining a pixel point from the virtual viewpoint depth map as the first reference point, where the virtual viewpoint depth map may be a planar coordinate system, where the first reference point may be (x, y), mapping the first reference point to a spatial coordinate system in which the first location is located, where the spatial coordinate system in which the first location is the first spatial coordinate system, and mapping the first reference point (x, y) to the location of the reference camera according to the depth value of the virtual camera to obtain the second reference point, where the second reference point may be (u, v).

Step C2, according to the position depth value of the reference camera, mapping the second reference point to a space coordinate system where the second position is located to obtain a third reference point;

and mapping the second reference point to a space coordinate system where the second position is located according to the position depth value of the reference camera, where the second position is the position of the reference camera, the space coordinate system where the second position is located is the second space coordinate system, and mapping the second reference point (u, v) to the second space coordinate system according to the position depth value of the reference camera to obtain a third reference point, where the third reference point may be (x _1, y _ 1).

And step C3, determining the confidence of the depth value by using a first preset algorithm according to the coordinates of the first reference point and the coordinates of the third reference point.

Obtaining the depth value confidence from the first reference point and the third reference point by using a first preset algorithm, where the first preset algorithm may be the following formula (formula 10):

wherein dist is the depth value confidence, and after determining the depth value confidence, the second weight may be determined, and the second weight may be determined by the following formula (formula 11):

conf_depth(x,y)＝e^-dist/5 (11)

step S44, fusing the virtual viewpoint texture maps based on different preset reference viewpoints according to the first weight and the second weight.

After determining the weights of the two parts, weighting and fusing the virtual viewpoint texture maps based on different preset reference viewpoints, wherein the specific weighting and fusing process can be the following formula (formula 12):

wherein f is_i(x, y) is the value of the pixel point of the virtual viewpoint texture map, and f (x, y) is weighted and fused with f in the virtual viewpoint texture map_i(x, y) value of the corresponding pixel point, conf_iThe weight (x, y) is a corresponding weight, which includes two parts, so that i is equal to 1,2, and the weight may be calculated by the following formula (formula 13):

conf_i(x,y)＝conf_cam_i(x,y)*conf_depth_i(x,y) (13)

wherein conf _ cam_i(x, y) is a first weight, determined by the distance of the reference camera from the virtual camera, conf _ depth_i(x, y) is a second weight determined by the depth value confidence, where i may be the number of virtual cameras, and the first weight may be calculated by the following formula (formula 14):

wherein d is_iIs the distance between the ith said virtual camera and said reference camera.

In step S40, the performing of the hole filling process and the foreground edge filtering process on the fused virtual viewpoint texture map includes steps S45-S49:

step S45, selecting a first central pixel point from the virtual viewpoint texture map, and creating a first window with a preset size;

and performing hole filling processing on the fused virtual viewpoint texture map, wherein a preferred hole filling processing method may be a joint bilateral filtering method, and one pixel point is selected as the first central pixel point, which may also be called as the first central pixel point, and a window with a preset size is established by taking the current pixel point as a center, for example, a first window with the size of N × N is established, the first window contains N × N pixel points, N may be 30 or 40, and except the first central pixel point, other pixel points in the first window are neighbor pixel points of the first central pixel point.

Step S46, utilizing a second preset algorithm to perform filling calculation on the values of the pixel points in the first window so as to perform hole filling processing;

calculating pixel points in the first window by using a second preset algorithm to fill the holes, finding out discontinuous points in the depth map by calculation, and then filling the pixel points by calculation, wherein the filling mode can be that the average value of pixel points in the neighborhood of the hole or the weighted average value of the pixel points in the neighborhood is used as the depth value of the pixel point of the hole to fill, and the second preset algorithm can be the following formula (formula 15-18):

disp(i,j)＝fB/depth(i,j) (16)

drange＝max disp-min disp+1 (17)

max disp＝fB/depth，min disp＝fB/depth (18)

here, img (x, y) is a pixel point in the first window, disp (i, j) is a parallax value of the pixel point (i, j), depth (i, j) is a depth value of the pixel point (i, j), an fB value is determined according to the internal parameter and the position of the reference camera, drange is a maximum and minimum parallax value range determined by the maximum and minimum parallax value, the maximum and minimum parallax value is determined by the fB value of the corresponding reference camera, and the maximum and minimum depth value is determined according to the shooting scene, where dtr is a parameter for reducing the parallax range, and a value may be set by a user, for example, a value of dtr is set to 0.01666667.

Step S47, determining the foreground edge area of the virtual viewpoint texture map and marking the foreground edge area;

after filling the holes in the virtual viewpoint texture map, determining an edge area of the virtual viewpoint texture map, and marking the edge area, wherein the marking of the edge area is to perform filtering smoothing processing on the edge area so as to prevent the edge area from having ghosts or defects.

The determining the foreground edge area of the virtual viewpoint texture map and marking the foreground edge area comprises steps D1-D3:

step D1, selecting a reference pixel point, and determining the gradient absolute value of the depth value of the reference pixel point;

selecting reference pixels at edge positions of the virtual viewpoint depth map, determining gradient values of the reference pixels, determining horizontal gradient values and vertical gradient values of the reference pixels, and calculating absolute values of the horizontal gradient values and the vertical gradient values, for example, if the virtual viewpoint depth map is continuous, the absolute values of the gradients can be obtained by deriving edge areas, and if the virtual viewpoint texture map is discontinuous, the absolute values of the gradients can be obtained by forward difference quotient, backward difference quotient, middle difference quotient or the like.

Step D2, if the absolute value of the gradient is greater than a preset threshold, determining the reference pixel point as an edge pixel point;

if the absolute value of the gradient is greater than a preset threshold, determining that the reference pixel is an edge pixel, where the preset threshold is determined by the maximum and minimum parallax range drange and the dtr parameter and can be calculated by the following formula (formula 19):

dthresh＝dtr*drange (19)

wherein, dthresh is the preset threshold.

And D3, expanding a preset number of pixel points to the periphery of the edge pixel points to determine the foreground edge area, and marking the foreground edge area.

After the reference pixel points are determined to be edge pixel points, extending a preset number of pixel points to the periphery of the reference pixel points, for example, extending 4 pixel points to the horizontal direction and the vertical direction respectively, wherein the extended areas are foreground edge areas, and marking the foreground edge areas.

Step S48, selecting a second center pixel point from the foreground edge area, and creating a second window with a preset size;

selecting a second center pixel point from the foreground edge region, and creating a second window with a preset size by taking the second center pixel point as a center, for example, extending 4 pixel points to the periphery by taking the second center pixel point as a center, and creating a 5 × 5 window as the second window, where the second window is created to perform filtering processing on the foreground edge region.

And step 49, performing filtering calculation on the values of the pixel points in the second window by using a third preset algorithm so as to perform filtering processing on the foreground edge area.

And calculating the pixel points in the second window by using a third preset algorithm so as to filter the foreground edge region, wherein the average value or weighted average value of the pixel points in a certain pixel point neighborhood window can be used for replacing the value of the pixel point so as to achieve the purpose of filtering. Taking the weighted average of the pixel points in the neighborhood window as an example to replace the value of the pixel point, the formula of the third preset algorithm may be (formula 20):

wherein, (x, y) is the selected pixel point to be replaced, f (x, y) is the pixel value of the replaced pixel point, and f (i, j) is the pixel value of the pixel point in the neighborhood of the point (x, y).

In this embodiment, a first weight is determined by determining a position relationship between the virtual camera and the reference camera, a second weight is determined by a depth value confidence of the virtual camera, the virtual viewpoint texture map is subjected to weighted fusion according to the first weight and the second weight, a central pixel point is selected from the fused virtual viewpoint texture map to create a first window and a second window with preset sizes, so as to determine a pixel point of a neighborhood point of the central pixel, and a preset algorithm is used to calculate a value of the neighborhood pixel point to perform void filling and foreground edge filtering on the fused virtual viewpoint texture map, thereby reducing a void and a ghost phenomenon of the virtual viewpoint texture map and improving the subjective and objective quality of the virtual viewpoint texture map.

Referring to fig. 4, a first embodiment of the present invention provides a virtual viewpoint depth map processing apparatus, including:

a forward mapping module 10, configured to forward map a reference viewpoint depth map corresponding to a preset reference viewpoint to a position of a virtual viewpoint to obtain a virtual viewpoint depth map;

the post-processing module 20 is configured to perform bilateral filtering smoothing on the virtual viewpoint depth map to obtain a smooth virtual viewpoint depth map;

a reverse mapping module 30, configured to establish a reverse mapping relationship between the virtual viewpoint and the reference viewpoint according to the smoothed virtual viewpoint depth map, so as to synthesize virtual viewpoint texture maps based on different preset reference viewpoints;

and the fusion processing module 40 is configured to perform weighted fusion on the virtual viewpoint texture map, and perform hole filling processing and foreground edge filtering processing on the fused virtual viewpoint texture map to obtain a target virtual viewpoint depth map.

Optionally, the forward mapping module 10 includes:

the first mapping unit is used for establishing a first space coordinate system according to the internal parameters of the reference camera corresponding to the preset reference viewpoint and mapping the reference viewpoint depth map to the first space coordinate system to obtain a first space coordinate of the reference viewpoint depth map;

the second mapping unit is used for establishing a second space coordinate system according to the internal parameters of the virtual camera corresponding to the virtual viewpoint, and mapping the first space coordinate to the second space coordinate system so as to obtain a second space coordinate of the reference viewpoint depth map;

the inverse transformation unit is used for performing inverse transformation on the second space coordinate to obtain a first mapping relation between the second space coordinate system and the image coordinate of the virtual viewpoint;

and the third mapping unit is used for mapping the reference viewpoint depth map to the position of the virtual viewpoint according to the first mapping relation to obtain the virtual viewpoint depth map.

Optionally, the reverse mapping module 30 includes:

a reverse mapping unit, configured to perform reverse mapping on the smoothed virtual viewpoint depth map to obtain a second mapping relationship between the spatial coordinate system of the virtual viewpoint and the spatial coordinate system of the preset reference viewpoint;

and the fourth mapping unit is used for mapping the reference viewpoint texture map corresponding to the preset reference viewpoint to the position of the virtual viewpoint according to the second mapping relation to obtain the virtual viewpoint texture map.

Optionally, the fusion processing module 40 includes:

a first determining unit, configured to determine a first weight according to a position relationship between the first position and the second position;

a second determining unit, configured to determine a depth value confidence of the virtual camera position, and determine a second weight according to the depth value confidence;

and the fusion unit is used for fusing the virtual viewpoint texture maps based on different preset reference viewpoints according to the first weight and the second weight.

Optionally, the second determining unit includes:

the first mapping subunit is configured to determine a first reference point from the virtual viewpoint depth map, and map the first reference point to a spatial coordinate system where the first position is located, so as to obtain a second reference point;

the second mapping subunit is configured to map the second reference point to a spatial coordinate system where the second position is located according to the position depth value of the reference camera, so as to obtain a third reference point;

and the calculating subunit is used for determining the depth value confidence coefficient by using a first preset algorithm according to the coordinates of the first reference point and the coordinates of the third reference point.

Optionally, the fusion processing module 40 further includes:

the creating unit is used for selecting a first central pixel point from the virtual viewpoint texture map and creating a first window with a preset size;

the calculation unit is used for performing filling calculation on the values of the pixel points in the first window by using a second preset algorithm so as to perform hole filling processing;

the marking unit is used for determining a foreground edge area of the virtual viewpoint texture map and marking the foreground edge area;

the second creating unit is used for selecting a second central pixel point from the foreground edge area and creating a second window with a preset size;

and the filtering unit is used for performing filtering calculation on the values of the pixel points in the second window by using a third preset algorithm so as to perform filtering processing on the foreground edge area.

Optionally, the marking unit includes:

the first determining subunit is used for selecting a reference pixel point and determining the gradient absolute value of the depth value of the reference pixel point;

the second determining subunit is configured to determine, if the absolute value of the gradient is greater than a preset threshold, that the reference pixel is an edge pixel;

and the marking subunit is used for expanding a preset number of pixel points to the periphery of the edge pixel points so as to determine the foreground edge area and marking the foreground edge area.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A virtual viewpoint depth map processing method is characterized by comprising the following steps:

2. The method of claim 1, wherein the step of mapping the reference viewpoint depth map corresponding to the preset reference viewpoint to the position of the virtual viewpoint in the forward direction to obtain the virtual viewpoint depth map comprises:

3. The method of claim 1, wherein the step of establishing a reverse mapping relationship between the virtual viewpoint and the reference viewpoint according to the smoothed virtual viewpoint depth map to synthesize a virtual viewpoint texture map based on different preset reference viewpoints comprises:

4. The method of claim 1, wherein the step of performing weighted fusion on the virtual viewpoint texture map comprises:

5. The virtual viewpoint depth map processing method as set forth in claim 4, wherein the step of determining a depth value confidence of the virtual camera position includes:

6. The method for processing the depth map of the virtual viewpoint according to claim 1, wherein the step of performing the hole filling processing and the foreground edge filtering processing on the texture map of the virtual viewpoint after the fusion includes:

7. The method of claim 6, wherein the step of determining a foreground edge region of the virtual viewpoint texture map and marking the foreground edge region comprises:

8. A virtual viewpoint depth map processing apparatus, characterized in that the depth map post-processing apparatus comprises:

9. A virtual viewpoint depth map processing device characterized by comprising: a memory, a processor, and a program stored on the memory for implementing the virtual viewpoint depth map processing method, the processor being configured to execute the program for implementing the virtual viewpoint depth map processing method to implement the steps of the virtual viewpoint depth map processing method according to any one of claims 1 to 7.

10. A storage medium having stored thereon a program for implementing a virtual viewpoint depth map processing method, the program being executed by a processor to implement the steps of the virtual viewpoint depth map processing method according to any one of claims 1 to 7.