WO2022001865A1

WO2022001865A1 - Depth map and video processing and reconstruction methods and apparatuses, device, and storage medium

Info

Publication number: WO2022001865A1
Application number: PCT/CN2021/102335
Authority: WO
Inventors: 盛骁杰
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2020-07-03
Filing date: 2021-06-25
Publication date: 2022-01-06
Also published as: CN113963094A

Abstract

Depth map and video processing and reconstruction methods and apparatuses, a device, and a storage medium, wherein the depth map processing method comprises: acquiring an estimated depth map generated on the basis of a plurality of frame-synchronized texture maps, the plurality of texture maps having different viewing angles (S61); acquiring depth values of pixels in the estimated depth map (S62); and acquiring quantization parameter data corresponding to the viewing angle of the estimated depth map, and quantizing the depth values of the pixels in the estimated depth map on the basis of the quantization parameter data, to obtain quantized depth values of corresponding pixels in a quantized depth map (S63). By using this method, the image quality of a free viewpoint video obtained by reconstruction can be improved.

Description

Depth map and video processing, reconstruction method, device, device and storage medium

This application claims the priority of the Chinese patent application filed on July 3, 2020 with the application number 202010630749.X and the invention titled "Depth Map and Video Processing, Reconstruction Method, Apparatus, Equipment and Storage Medium", the entire content of which is approved by Reference is incorporated in this application.

technical field

The embodiments of the present specification relate to the technical field of video processing, and in particular, to a depth map and video processing, reconstruction method, apparatus, device, and storage medium.

Background technique

Free viewpoint video is a technology that can provide a high degree of freedom viewing experience. Users can adjust the viewing angle through interactive operations during the viewing process, and watch from the free viewpoint they want to watch, which can greatly improve the viewing experience.

In a wide range of scenes, such as sports games, it is a potential and feasible solution to achieve high-degree-of-freedom viewing through the Depth Image Based Rendering (DIBR) technology. The expression of free-view video is generally the splicing of texture maps collected by multiple cameras and corresponding depth maps to form a spliced image.

At present, after the server estimates the depth map of the scene and the object based on the texture map, the depth value is expressed as a depth map by quantizing the 8-bit binary data. Splicing the synchronized texture maps of multiple viewing angles and the obtained depth maps of the corresponding viewing angles to obtain a spliced image, and then compressing the spliced image and corresponding parameter data according to the frame timing, a free viewpoint video can be obtained and transmitted, so that the terminal can The device reconstructs the free-view image based on the obtained free-view video stream.

The inventors have found through research that, with the current depth map quantization processing method, the quality of the reconstructed free-viewpoint image is limited by the current depth map quantization processing method.

SUMMARY OF THE INVENTION

In view of this, the embodiments of this specification provide a depth map and video processing and reconstruction method, apparatus, device, and storage medium, which can improve the image quality of the reconstructed free-viewpoint video.

The embodiments of this specification provide a depth map processing method, including:

obtaining an estimated depth map generated based on multiple frame-synchronized texture maps, the multiple texture maps having different viewing angles;

obtaining the depth value of the pixel in the estimated depth map;

Obtaining and performing quantization processing on the depth values of the pixels in the estimated depth map based on the quantization parameter data corresponding to the angle of view of the estimated depth map, to obtain the quantized depth values of the corresponding pixels in the quantized depth map.

Optionally, performing quantization processing on the depth value of the corresponding pixel in the estimated depth map based on the acquired and based on the quantization parameter data corresponding to the estimated depth map view angle, to obtain the quantized depth value of the corresponding pixel in the quantized depth map. ,include:

Obtain the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map;

Based on the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map, a corresponding quantization formula is used to quantize the depth values of the corresponding pixels in the estimated depth map, and obtain The quantized depth value of the corresponding pixel in the quantized depth map.

Optionally, based on the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map, a corresponding quantization formula is used to determine the depth of the corresponding pixel in the estimated depth map. The value is quantized to obtain the quantized depth value of the corresponding pixel in the quantized depth map, including:

The depth value of the corresponding pixel in the estimated depth map is quantized using the following quantization formula:

Among them, M is the quantization bit of the corresponding pixel in the estimated depth map, range is the depth value of the corresponding pixel in the estimated depth map, Depth is the quantized depth value of the corresponding pixel in the estimated depth map, and N is the estimated depth value. The depth map corresponds to the viewing angle, depth_range_near_N is the minimum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N, and depth_range_far_N is the maximum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N.

Optionally, the method further includes:

performing down-sampling processing on the quantized depth map to obtain a first depth map;

The texture maps of the frame-synchronized multiple viewing angles and the first depth map of the corresponding viewing angles are spliced according to a preset splicing method to obtain a spliced image.

The embodiments of this specification also provide a free-view video reconstruction method, the method includes:

Obtain a free-view video, where the free-view video includes spliced images at multiple frame moments and parameter data corresponding to the spliced images, and the spliced images include synchronized texture maps of multiple viewing angles and quantized depth maps of corresponding viewing angles, so The parameter data corresponding to the spliced image includes: quantization parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle;

obtaining the quantized depth value of the pixel in the quantized depth map;

Obtaining and performing inverse quantization processing on the quantized depth values of the pixels in the quantized depth map based on the quantization parameter data of the estimated depth map of the corresponding viewing angle of the quantized depth map to obtain a corresponding estimated depth map;

Based on the synchronized texture maps of multiple viewpoints and the estimated depth map of the corresponding viewpoints, the image of the virtual viewpoint is reconstructed according to the position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image.

Optionally, the obtaining and based on the quantization parameter data of the estimated depth map of the view corresponding to the quantized depth map, perform inverse quantization processing on the quantized depth value in the quantized depth map to obtain the estimated depth map of the corresponding view, including: :

Based on the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map, the corresponding inverse quantization formula is used to perform inverse quantization processing on the quantized depth values in the quantized depth map, to obtain The estimated depth map for the corresponding view corresponds to the depth value of the corresponding pixel.

Optionally, based on the minimum depth distance from the optical center and the maximum depth distance from the optical center of the corresponding viewing angle of the estimated depth map, the corresponding inverse quantization formula is used to quantify the depth value in the quantized depth map. Perform inverse quantization processing to obtain the depth value of the corresponding pixel of the estimated depth map of the corresponding viewing angle, including:

The following inverse quantization formula is used to inverse quantize the quantized depth values in the quantized depth map to obtain corresponding pixel values in the estimated depth map:

Wherein, M is the quantization bit of the corresponding pixel in the quantized depth map, range is the depth value of the corresponding pixel in the estimated depth map, Depth is the quantized depth value of the corresponding pixel in the quantized depth map, and N is the estimated depth The depth map corresponds to the viewing angle, depth_range_near_N is the minimum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N, depth_range_far_N is the maximum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N, and maxdisp is the corresponding viewing angle N. The maximum value of the quantized depth distance, and mindisp is the minimum value of the quantized depth distance corresponding to the viewing angle N.

Optionally, the resolution of the quantized depth map is smaller than the resolution of the texture map of the corresponding viewing angle; before reconstructing to obtain the image of the virtual viewpoint, the method further includes:

Upsampling the estimated depth map of the corresponding viewpoint to obtain a second depth map for reconstructing the virtual viewpoint image.

Optionally, performing up-sampling on the estimated depth map of the corresponding viewing angle to obtain a second depth map for reconstructing the virtual viewpoint image, comprising:

acquiring the depth value of the pixel in the estimated depth map as the pixel value of the even-numbered row and the even-numbered column corresponding to the second depth map;

For the depth values of pixels in even rows and odd columns in the second depth map, determine the corresponding pixel in the corresponding texture map as an intermediate pixel, and based on the luminance channel value of the intermediate pixel in the corresponding texture map and the left pixel corresponding to the intermediate pixel The relationship between the luminance channel value of the side pixel and the luminance channel value of the right pixel is determined;

For the depth values of pixels in odd rows in the second depth map, determine the corresponding pixel in the corresponding texture map as an intermediate pixel, based on the luminance channel value of the intermediate pixel in the corresponding texture map and the upper pixel corresponding to the intermediate pixel The relationship between the luminance channel value of and the luminance channel value of the underlying pixel is determined.

Optionally, the texture map based on multiple viewing angles and the estimated depth map of the corresponding viewing angle are reconstructed to obtain the image of the virtual viewpoint according to the obtained position information of the virtual viewpoint and the camera parameter data corresponding to the stitched image, include:

According to the position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image, select multiple target texture maps and target depth maps from the synchronized texture maps of multiple viewing angles and the estimated depth maps of the corresponding viewing angles ;

The target texture map and the target depth map are combined and rendered to obtain the image of the virtual viewpoint.

The embodiment of this specification also provides a free-viewpoint video processing method, the method includes:

obtaining the quantized depth value of the pixel in the quantized depth map;

In response to the user interaction behavior, determine the location information of the virtual viewpoint;

Optionally, the determining the position information of the virtual viewpoint in response to the user interaction behavior includes: determining the corresponding virtual viewpoint path information in response to the user's gesture interaction operation;

The texture map based on the synchronization of multiple viewing angles and the estimated depth map of the corresponding viewing angle, according to the obtained position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image, reconstruct the image of the virtual viewpoint, including:

According to the virtual viewpoint path information, the texture map in the spliced image at the corresponding frame moment and the estimated depth map of the corresponding viewing angle are selected as the target texture map and the target depth map;

Optionally, the method further includes:

acquiring the virtual rendering target object in the image of the virtual viewpoint;

acquiring a virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object;

The virtual information image and the image of the virtual viewpoint are synthesized and displayed.

Optionally, the acquiring the virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object includes:

According to the position of the virtual rendering target object in the image of the virtual viewpoint obtained by the three-dimensional calibration, a virtual information image matching the position of the virtual rendering target object is obtained.

Optionally, obtaining the virtual rendering target object in the image of the virtual viewpoint includes:

In response to the special effect generating the interactive control instruction, the virtual rendering target object in the image of the virtual viewpoint is acquired.

An embodiment of this specification provides a depth map processing device, the device comprising:

An estimated depth map obtaining unit, adapted to obtain an estimated depth map generated based on a plurality of frame-synchronized texture maps, the perspectives of the multiple texture maps are different;

a depth value obtaining unit, adapted to obtain the depth value of the pixel in the depth map;

a quantization parameter data acquisition unit, adapted to acquire quantization parameter data corresponding to the estimated depth map viewing angle;

The quantization processing unit is adapted to perform quantization processing on the depth values of the pixels in the estimated depth map based on the quantization parameter data corresponding to the angle of view of the estimated depth map, so as to obtain the quantized depth values of the corresponding pixels in the quantized depth map.

Embodiments of this specification also provide a free-view video reconstruction device, the device comprising:

The first video acquisition unit is adapted to acquire a free-view video, the free-view video includes spliced images at multiple frame moments and parameter data corresponding to the spliced images, and the spliced images include synchronized texture maps of multiple viewing angles and a quantized depth map corresponding to a viewing angle, and the parameter data corresponding to the spliced image includes: quantized parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle;

a first quantized depth value acquisition unit, adapted to the quantized depth values of the pixels in the quantized depth map;

a first quantization parameter data acquisition unit, adapted to acquire quantization parameter data corresponding to the perspective of the quantized depth map;

a first depth map inverse quantization processing unit, adapted to perform inverse quantization processing on the quantized depth map of the corresponding view angle based on the quantization parameter data corresponding to the view angle of the quantized depth map to obtain a corresponding estimated depth map;

The first image reconstruction unit is adapted to reconstruct the obtained virtual viewpoint based on the texture maps of multiple viewpoints and the estimated depth map of the corresponding viewpoints, according to the obtained position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image. image.

The embodiments of this specification also provide a free-viewpoint video processing device, the device comprising:

The second video acquisition unit is adapted to acquire a free-view video, where the free-view video includes spliced images at multiple frame moments and parameter data corresponding to the spliced images, and the spliced images include synchronized texture maps of multiple viewing angles and a quantized depth map corresponding to a viewing angle, and the parameter data corresponding to the spliced image includes: quantized parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle;

a second quantized depth value obtaining unit, adapted to obtain the quantized depth value of the pixel in the quantized depth map;

The second depth map inverse quantization processing unit is adapted to obtain and perform inverse quantization processing on the quantized depth values of the pixels in the quantized depth map based on the quantization parameter data of the estimated depth map of the view angle corresponding to the quantized depth map, to obtain the corresponding estimated depth map;

a virtual viewpoint position determination unit, adapted to determine the location information of the virtual viewpoint in response to user interaction;

The second image reconstruction unit is adapted to reconstruct the image of the virtual viewpoint based on the synchronized texture maps of multiple viewpoints and the estimated depth map of the corresponding viewpoints, according to the position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image. .

Embodiments of the present specification further provide an electronic device, including a memory and a processor, wherein the memory stores computer instructions that can be executed on the processor, wherein the processor executes the aforementioned computer instructions when the processor executes the computer instructions. The steps of the method of any one of the embodiments.

The embodiments of this specification also provide a server device, including a processor and a communication component, wherein:

The processor is adapted to perform the steps of the depth map processing method described in any of the foregoing embodiments, obtain a quantized depth map, and store the texture maps of multiple viewing angles of frame synchronization and the first depth map of the corresponding viewing angle according to preset values. The splicing method is spliced to obtain a spliced image, and the spliced images of multiple frames and the corresponding parameter data are packaged to obtain a free-view video;

The communication component is adapted to transmit the free-view video.

The embodiments of this specification also provide a terminal device, including a communication component, a processor and a display component, wherein:

the communication component, adapted to obtain free-view video;

The processor is adapted to perform the steps of the free-view video reconstruction method or the free-view video processing method described in any of the foregoing embodiments;

The display component is adapted to display the reconstructed image obtained by the processor.

The embodiments of the present specification further provide a computer-readable storage medium on which computer instructions are stored, wherein, when the computer instructions are executed, the steps of the methods described in any of the foregoing embodiments are executed.

Compared with the prior art, the technical solutions of the embodiments of this specification have the following beneficial effects:

Using the depth map processing method in the embodiment of this specification, in the depth map quantization process, the depth values of the pixels in the estimated depth map are quantized by using quantization parameters that match the actual situation of the corresponding view angle, so that for each view angle The depth map can make full use of the expression space of depth quantization bits, which can improve the image quality of the reconstructed free-view video.

Further, by performing down-sampling processing on the quantized depth map, a first depth map is obtained, and the first depth map and the texture map of the corresponding viewing angle are spliced according to a preset splicing method, which can reduce the overall size of the spliced image. Therefore, the storage resources and transmission resources of the spliced image can be saved.

Further, on the one hand, in the case where the decoding resolution of the overall stitched image is limited, by setting the resolution of the quantized depth map to be smaller than the resolution of the texture map of the corresponding viewing angle, a texture map with a higher resolution can be transmitted, and then the texture map with a higher resolution can be transmitted. Upsampling the estimated depth map of the corresponding viewing angle to obtain a second depth map, and performing free-view video reconstruction based on the texture maps of multiple viewing angles synchronized in the spliced image and the second depth map of the corresponding viewing angle, so that the clarity can be obtained. Higher free-view images for improved user experience.

Description of drawings

1 is a schematic diagram of a specific application system of a free-view video display in an embodiment of this specification;

2 is a schematic diagram of an interactive interface of a terminal device in an embodiment of this specification;

FIG. 3 is a schematic diagram of a setting mode of a collection device in an embodiment of the present specification;

4 is a schematic diagram of another terminal device interaction interface in the embodiment of this specification;

5 is a schematic diagram of a field of view application scenario in the embodiment of the present specification;

6 is a flowchart of a depth map processing method in an embodiment of the present specification;

7 is a schematic diagram of a free-viewpoint video data generation process in an embodiment of the present specification;

8 is a schematic diagram of the generation and processing of a kind of 6DoF video data in the embodiment of this specification;

9 is a schematic structural diagram of a data header file in an embodiment of the present specification;

10 is a schematic diagram of a user side processing 6DoF video data in an embodiment of this specification;

11 is a schematic structural diagram of a stitched image in the embodiment of this specification;

12 is a flowchart of a free-viewpoint video reconstruction method in an embodiment of the present specification;

13 is a flowchart of a combined rendering method in an embodiment of the present specification;

14 is a flowchart of a free-viewpoint video processing method in an embodiment of the present specification;

15 is a flowchart of another free-viewpoint video processing method in an embodiment of the present specification;

16 is a schematic diagram of a display interface of an interactive terminal in an embodiment of this specification;

17 is a schematic diagram of a display interface of an interactive terminal in an embodiment of this specification;

18 is a schematic diagram of a display interface of an interactive terminal in an embodiment of this specification;

19 is a schematic diagram of a display interface of an interactive terminal in an embodiment of this specification;

20 is a schematic diagram of a display interface of an interactive terminal in an embodiment of this specification;

21 is a schematic structural diagram of a depth map processing apparatus in an embodiment of the present specification;

22 is a schematic structural diagram of a device for free-view video reconstruction in an embodiment of the present specification;

23 is a schematic structural diagram of a free-viewpoint video processing apparatus in an embodiment of the present specification;

24 is a schematic structural diagram of an electronic device in an embodiment of this specification;

25 is a schematic structural diagram of a server device in an embodiment of this specification;

FIG. 26 is a schematic structural diagram of a terminal device in an embodiment of this specification.

detailed description

In order to enable those skilled in the art to better understand and implement the embodiments in this specification, the following first exemplarily introduces the implementation of free-view video with reference to the accompanying drawings and specific application scenarios.

Referring to FIG. 1 , a specific application system for free-view video display in an embodiment of the present invention may include a collection system 11 of multiple collection devices, a server 12 , and a display device 13 , wherein the collection system 11 can collect images of the area to be viewed. The acquisition system 11 or the server 12 can process the acquired synchronized multiple texture maps to generate multi-angle free viewing angle data that can support the display device 13 to perform virtual viewpoint switching. The display device 13 can display reconstructed images generated based on multi-angle free viewing angle data, the reconstructed images correspond to virtual viewpoints, and can display reconstructed images corresponding to different virtual viewpoints according to user instructions, and switch the viewing position and viewing angle.

In a specific implementation, the process of performing image reconstruction to obtain a reconstructed image may be implemented by the display device 13, or may be implemented by a device located in a content delivery network (Content Delivery Network, CDN) by means of edge computing. It can be understood that FIG. 1 is only an example, and is not a limitation on the collection system, the server, the terminal device and the specific implementation manner.

Continuing to refer to FIG. 1 , the user can view the area to be viewed through the display device 13 , and in this embodiment, the area to be viewed is a basketball court. As mentioned earlier, the viewing position and viewing angle can be switched.

For example, users can swipe across the screen to switch virtual viewpoints. In an embodiment of the present invention, referring to FIG. 2 , when the user's finger _{slides the screen in the direction D 22} , the virtual viewpoint for viewing can be switched. Continuing to refer to FIG. 3 , the position of the virtual viewpoint before sliding may be VP ₁ , and after sliding the screen to switch the virtual viewpoint, the position of the virtual viewpoint may be VP ₂ . With reference to FIG. 4 , after sliding the screen, the reconstructed image displayed on the screen may be as shown in FIG. 4 . The reconstructed image may be obtained by performing image reconstruction based on multi-angle free viewing angle data generated from images collected by multiple collection devices in an actual collection situation.

It can be understood that the image viewed before switching may also be a reconstructed image. The reconstructed images may be frame images in the video stream. In addition, the manner of switching the virtual viewpoint according to the user's instruction may be various, which is not limited here.

In a specific implementation, the virtual viewpoint can be represented by coordinates of 6 degrees of freedom (DoF), wherein the spatial position of the virtual viewpoint can be represented as (x, y, z), and the viewing angle can be represented as three rotation directions

Virtual viewpoint is a three-dimensional concept, and three-dimensional information is required to generate reconstructed images. In a specific implementation manner, the multi-angle free viewing angle data may include depth map data, which is used to provide third-dimensional information outside the plane image. Compared with other implementations, such as providing three-dimensional information through point cloud data, the data volume of the depth map data is smaller.

In the embodiment of the present invention, the switching of the virtual viewpoints may be performed within a certain range, which is a multi-angle free viewing angle range. That is, within the multi-angle free viewing angle range, the virtual viewpoint position and the viewing angle can be switched arbitrarily.

The multi-angle free viewing angle range is related to the arrangement of the acquisition device. The wider the shooting coverage of the acquisition device, the larger the multi-angle free viewing angle range. The quality of the picture displayed by the terminal device is related to the number of collection devices. Generally, the more collection devices are set, the fewer empty areas in the displayed picture.

In addition, the range of multi-angle free viewing angles is related to the spatial distribution of the acquisition devices. The range of multi-angle free viewing angles and the interaction mode with the display device on the terminal side can be set based on the spatial distribution relationship of the collection devices.

As shown in FIG. 1 and FIG. 3 , at a height _HLK higher than the basket, several collection devices are arranged along a certain path. For example, 6 collection devices may be arranged along an arc, that is, collection devices CJ ₁ to CJ ₆ . It can be understood that the location, quantity and support manner of the collection devices can be various, which are not limited here.

It can be understood that the above specific application scenario examples are used to better understand the embodiments of this specification, however, the embodiments of this specification are not limited to the above specific application scenarios. The inventor found through research that the current depth map processing method still has some limitations, which affects the quality of the images in the reconstructed free-viewpoint video.

In view of the above problems, the embodiments of this specification provide a corresponding depth map processing method and a free-view video reconstruction method. Examples are described in detail.

In image processing, a sampled image value is represented by a number, and the process of converting a continuous value of an image function to its digital equivalent is quantization. Image quantization assigns each successive sample value an integer number.

At present, after the server device (such as the server 12) estimates the depth map of the scene and the object based on the texture map, it performs 8-bit binary data quantization on the depth value, and expresses it as a depth map. For the convenience of description, it is referred to as the estimated depth here. picture. The texture maps of the synchronized multiple viewing angles and the obtained estimated depth maps of the corresponding viewing angles are spliced to obtain a spliced image. The spliced image and corresponding parameter data are compressed according to the frame timing to obtain a free-view video, and the terminal device can reconstruct the free-view image based on the freely obtained free-view video.

However, the inventor found through research that the current depth map quantization processing method is based on the same set of quantization parameter data for each depth map in the spliced image, and the quality of the reconstructed free viewpoint image is affected by the current depth. Limitations of graph quantization processing methods.

Specifically, at present, based on the maximum and minimum values of the depth values in the pre-defined field of view, and the depth value of each pixel in the estimated depth map estimated by the server, a numerical value can be obtained by quantizing the following formula Quantized depth value in 8-bit binary in the range 0-255.

Wherein, range is the depth value of the corresponding pixel in the estimated depth map, Depth is the quantized depth value of the corresponding pixel in the estimated depth map, depth_range_near is the minimum depth distance from the optical center in the preset field of view, and depth_range_far is The maximum depth distance from the optical center in the preset field of view.

However, in specific application scenarios, using the same set of fixed quantization parameter data to quantize the depth values of pixels in all estimated depth maps in the spliced image may result in the inability to fully utilize the expression space of the depth map. For example, the depth_range_near of the estimated depth map corresponding to some perspectives, that is, the closest object, is larger than the estimated depth map corresponding to some other perspectives. Therefore, using the same quantization parameter data quantization will result in the entire 8-bit binary The expression space of is not fully utilized, the maximum depth value of pixels in some viewpoints will be far away from 255, and the minimum depth value of pixels in some viewpoints will be far away from 0.

Field of view of the application scenario shown in FIG. 5 schematic for a scene region 50, which contains the object R, and is provided with a plurality of collecting devices _{_{_{P 1, P 2 ... P n}}} ... P N, capture device P ₁ ~ P _N was arc-shaped is provided, the order of the corresponding optical center _{_{_{C 1, C 2 ... C n}}} ... C N, each acquisition device P ₁ ~ P _N corresponding to the optical axis _{_{_{L 1, L 2 ... L n}}} ... L N 5 objects from FIG. R each acquisition device P ₁ ~ P _N between the optical center C ₁ ~ C _N can visually see the spatial relationship, different from the minimum and maximum distance between the optical center of each object R and the collection apparatus, and therefore based on capture device P ₁ ~ P _N acquired texture map estimation estimates the maximum depth from varying depths from the minimum value and the distance from the optical center of the optical center of the resulting depth map.

Based on this, in the depth map quantization process in the embodiments of this specification, quantization parameters that match the actual situation of the corresponding viewing angle are used to quantize the depth values of the pixels in the estimated depth map, so that for the depth map of each viewing angle, all The expression space of depth quantization bits can be fully utilized, and the image quality of the reconstructed free-view video can be improved.

Referring to the flowchart of the depth map processing method shown in FIG. 6 , the embodiment of this specification may specifically include the following quantization processing steps:

S61: Acquire an estimated depth map generated based on multiple frame-synchronized texture maps, where the multiple texture maps have different viewing angles.

In a specific implementation, as shown in FIG. 1 , an acquisition system composed of a plurality of acquisition devices may acquire images synchronously, and obtain the texture maps synchronized with the multiple frames.

The origin of the coordinate system of the acquisition device (such as a camera) can be used as the optical center, and the depth value can be the distance from each point in the field of view to the optical center along the optical axis. In a specific implementation, an estimated depth map corresponding to each texture map may be obtained based on the multiple texture maps synchronized in the frame.

S62: Acquire depth values of pixels in the estimated depth map.

S63: Acquire and perform quantization processing on the depth values of the pixels in the estimated depth map based on the quantization parameter data corresponding to the perspective of the estimated depth map, to obtain the quantized depth values of the corresponding pixels in the quantized depth map.

In a specific implementation, the estimated value of the quantization parameter may include: estimating the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the depth map, so as to perform quantization processing on the depth values of the pixels in the estimated depth map, The minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map can be obtained first, and then, the minimum depth distance from the optical center of the viewing angle corresponding to the estimated depth map and the optical center can be obtained. For the maximum depth distance from the optical center, a corresponding quantization formula is used to quantize the depth value of the corresponding pixel in the estimated depth map to obtain the quantized depth value of the corresponding pixel in the quantized depth map.

In some embodiments of this specification, the following quantization formula is used to quantize the depth value of the corresponding pixel in the estimated depth map:

After the pixel depth map in the estimated depth map is quantized by using the above embodiment, the object closest to the camera (optical center) in the quantized depth map corresponding to each viewing angle can be quantized to obtain a depth value ^{closer to 2 M -1} . In a specific implementation, M can take 8 bits, 16 bits, and so on. If M is 8 bits, the depth value of the object closest to the optical center in the quantized depth map corresponding to each viewing angle is closer to 255 after quantization.

In a specific implementation, the synchronized texture maps of multiple viewing angles and the quantized depth maps of the corresponding viewing angles can be spliced to obtain a spliced image, and then based on the spliced images at multiple frame moments and the parameter data corresponding to the spliced images, it is possible to obtain Free View Video. Considering the limitation of transmission bandwidth, the free-view video may be compressed and then transmitted to a terminal device for image reconstruction of the free-view video.

Referring to Figure 7, in order to perform free-view video reconstruction, texture map acquisition and depth map calculation are required, including three main steps, namely Multi-camera Video Capturing, camera internal and external parameter calculation ( Camera Parameter Estimation), and Depth Map Calculation. For multi-camera capture, it is required that the video captured by each camera can be aligned at the frame level. Among them, the texture image can be obtained through the video acquisition of multiple cameras; the camera parameters can be obtained through the calculation of the internal and external parameters of the camera, which can include the internal parameter data of the camera and the external parameter data; through the calculation of the depth map, it can be obtained Depth Map, multiple synchronized texture maps, depth maps and camera parameters corresponding to the perspective, form 6DoF video data.

In the solution of the embodiment of the present specification, no special camera, such as a light field camera, is required to collect video. Likewise, there is no need for complex camera calibration prior to acquisition. Multiple camera positions can be laid out and arranged to better capture the object or scene that needs to be captured.

After the above three steps are processed, the texture map collected from multiple cameras, the camera parameters of all cameras, and the depth map of each camera are obtained. These three parts of data can be referred to as data files in the multi-angle free-view video data, and can also be referred to as 6DoF video data. Because of these data, the client can generate virtual viewpoints according to the virtual 6 degrees of freedom (DoF) position, thereby providing a 6DoF video experience.

8, 6DoF video data and indicative data can be compressed and transmitted to the user side, and the user side can obtain the user side 6DoF expression according to the received data, that is, the aforementioned 6DoF video data and metadata. Among them, indicative data can also be called metadata (Metadata),

Metadata can be used to describe the data pattern of 6DoF video data, and can specifically include: Stitching Pattern metadata, used to indicate the pixel data of multiple texture maps in the stitched image and the storage rules for quantized depth map data; edge Padding pattern metadata, which can be used to indicate the method of edge protection in the stitched image, the quantization parameter metadata of the corresponding viewing angle, and other metadata (Other metadata). The metadata can be stored in the data header file, and the specific storage sequence can be as shown in FIG. 9 , or stored in other sequences.

Referring to Figure 10, the user side obtains 6DoF video data, which includes camera parameters, texture maps, quantized depth maps, and metadata, as well as user-side interaction behavior data. Through these data, the user side can use Depth Image-Based Rendering (DIBR, Depth Image-Based Rendering) for 6DoF rendering, so as to generate a virtual viewpoint image at a specific 6DoF position generated according to user interaction behavior, that is, according to The user instructs, and the virtual viewpoint of the 6DoF position corresponding to the instruction is determined.

At present, any video frame in the free-view video data is generally expressed as a stitched image formed by a texture map collected by multiple cameras and a corresponding depth map. Figure 11 is a schematic diagram of the structure of the spliced image, wherein the upper part of the spliced image is the texture map area, which is divided into 8 texture map sub-regions, which store the pixel data of the 8 texture maps that are synchronized respectively. The pictures are taken from different angles, that is, from different perspectives. The lower half of the spliced image is the depth map area, which is divided into 8 depth map sub-regions, and the corresponding quantized depth maps of the above 8 texture maps are respectively stored. The texture map of view N and the quantized depth map of view N are in one-to-one correspondence with pixels, and the spliced image is compressed and transmitted to the terminal for decoding and DIBR, so that the image can be interpolated from the viewpoint of user interaction.

After research, the inventor found that, as shown in Figure 11, for each texture map, there is a quantized depth map of the same resolution corresponding to it, so that the resolution of the overall stitched image is twice that of the texture map set. The video decoding resolution of a mobile terminal) is generally limited, so the above-mentioned expression method of free-view video data can only be realized by reducing the resolution of the texture map, resulting in a decrease in the definition of the reconstructed image felt by the user on the terminal side.

In view of the above problems, in some embodiments of this specification, the quantized depth map may be down-sampled first to obtain a first depth map, and the synchronized texture maps of multiple viewing angles and the first depth map of corresponding viewing angles may be combined Stitching is performed according to a preset stitching method to obtain a stitched image.

To enable those skilled in the art to better understand and implement the embodiments of the present specification, two specific examples of downsampling processing methods are given below:

One is to perform snapshot processing on the pixels in the quantized depth map to obtain the first depth map. For example, one pixel can be extracted from every other pixel in the quantized depth map to obtain the first depth map, and the resolution of the obtained first depth map is 50% of the original depth map .

The other is to perform filtering based on the corresponding texture map on the pixels in the quantized depth map to obtain the first depth map.

In order to save data storage resources and data transmission resources, the stitched image can be a rectangle.

In order to enable those skilled in the art to better understand and implement the embodiments of the present specification, a method for reconstructing a free-viewpoint video on the terminal side after the above-mentioned depth map processing is correspondingly introduced through specific embodiments below.

Referring to the flowchart of the free-viewpoint video reconstruction method shown in FIG. 12 , in the embodiments of this specification, the following steps may be specifically adopted to perform free-viewpoint video reconstruction:

S121: Obtain a free-view video, where the free-view video includes spliced images at multiple frame times and parameter data corresponding to the spliced images, and the spliced images include synchronized texture maps of multiple viewing angles and quantized depth maps of corresponding viewing angles , the parameter data corresponding to the spliced image includes: quantization parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle.

In a specific implementation, the free-view video may be in the form of a video compressed file, or may be transmitted in the form of a video stream. The parameter data of the spliced image may be stored in the header file of the free-view video data, and the specific form may refer to the introduction in the foregoing embodiments.

In some embodiments of the present specification, the quantization parameter data of the estimated depth map of the corresponding viewing angle may be stored in the form of an array. For example, for a free-view video with 16 sets of texture maps and quantized depth maps in the stitched image, the quantization parameter data can be expressed as:

Array Z = [view 0 parameter value, view 2 parameter value...view 15 quantization parameter value].

S122: Acquire quantized depth values of pixels in the quantized depth map.

S123: Obtain and perform inverse quantization processing on the quantized depth values of the pixels in the quantized depth map based on the quantization parameter data of the estimated depth map of the view angle corresponding to the quantized depth map to obtain a corresponding estimated depth map.

S124 , based on the synchronized texture maps of multiple viewpoints and the estimated depth map of the corresponding viewpoints, and according to the position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image, reconstruct the image of the virtual viewpoint.

For step S123, in some embodiments of this specification, it is implemented in the following manner:

Obtain the minimum depth distance from the optical center and the maximum depth distance from the optical center of the corresponding viewing angle of the estimated depth map;

In a specific embodiment of this specification, the following inverse quantization formula is used to perform inverse quantization processing on the quantized depth values in the quantized depth map to obtain corresponding pixel values in the estimated depth map:

In some embodiments of this specification, corresponding to the foregoing embodiments, if the resolution of the quantized depth map is smaller than the resolution of the texture map of the corresponding viewing angle, for example, on the server side, the quantized depth map has been down-sampled, then the On the side of the terminal device, the estimated depth map of the corresponding viewing angle may be up-sampled to obtain a second depth map, and then the virtual viewpoint image is reconstructed by using the second depth map.

In a specific implementation, there can be a variety of upsampling methods, some example methods are given below:

The first example of the way is to perform up-sampling processing on the estimated depth map that has been down-sampled by 1/4 to obtain a second depth map with the same resolution as the texture map. Based on different rows and columns, it specifically includes the following different processes Way:

(1) Obtain the depth value of the pixel in the estimated depth map as the pixel value of the even-numbered row and the even-numbered column corresponding to the second depth map;

(2) For the depth values of the pixels in the even rows and odd columns in the second depth map, determine the corresponding pixels in the corresponding texture map as intermediate pixels, based on the brightness channel value of the intermediate pixels in the corresponding texture map and the intermediate pixels The relationship between the brightness channel value of the corresponding left pixel and the brightness channel value of the right pixel is determined;

Specifically, for the depth values of pixels in odd rows in the second depth map, determine the corresponding pixels in the corresponding texture map as intermediate pixels, and correspond to the intermediate pixels based on the luminance channel values of the intermediate pixels in the corresponding texture map The relationship between the luminance channel value of the pixel above and the luminance channel value of the pixel below is determined.

Specifically, based on the relationship between the luminance channel value of the middle pixel in the corresponding texture map and the luminance channel value of the left pixel and the luminance channel value of the right pixel corresponding to the middle pixel, there are three cases:

a1. If the absolute value of the difference between the luminance channel value of the middle pixel in the corresponding texture image and the luminance channel value of the right pixel corresponding to the middle pixel is less than the difference between the luminance channel value of the middle pixel and the luminance channel value of the left pixel The quotient of the absolute value of the difference and the preset threshold value, then the depth value corresponding to the right pixel is selected as the depth value of the corresponding pixel in the even-numbered row and odd-numbered column in the second depth map, that is:

a2. If the absolute value of the difference between the luminance channel value of the middle pixel in the corresponding texture image and the luminance channel value of the left pixel corresponding to the middle pixel is less than the difference between the luminance channel value of the middle pixel and the luminance channel value of the right pixel the quotient of the absolute value of the difference and the preset threshold, then select the depth value corresponding to the left pixel as the depth value of the corresponding pixel in the even row and odd column in the second depth map;

a3. Otherwise, select the maximum value among the depth values corresponding to the left pixel and the right pixel as the depth value of the corresponding pixels in the even rows and odd columns in the second depth map.

(3) For the depth values of pixels in odd rows in the second depth map, determine the corresponding pixels in the corresponding texture map as intermediate pixels, and correspond to the intermediate pixels based on the luminance channel values of the intermediate pixels in the corresponding texture map The relationship between the luminance channel value of the pixel above and the luminance channel value of the pixel below is determined.

b1. If the absolute value of the difference between the luminance channel value of the middle pixel in the corresponding texture image and the luminance channel value of the lower pixel corresponding to the middle pixel is smaller than the absolute value of the luminance channel value of the middle pixel and the luminance channel value of the upper pixel The quotient of the difference and the preset threshold, then select the depth value corresponding to the lower pixel as the depth value of the pixel corresponding to the odd row in the second depth map;

b2. If the absolute value of the difference between the luminance channel value of the middle pixel in the corresponding texture image and the luminance channel value of the upper pixel corresponding to the middle pixel is smaller than the difference between the luminance channel value of the middle pixel and the luminance channel value of the lower pixel the quotient of the absolute value and the preset threshold, then select the depth value corresponding to the upper pixel as the depth value of the pixel corresponding to the odd row in the second depth map;

b3. Otherwise, select the maximum value among the depth values corresponding to the upper pixel and the lower pixel as the depth value of the corresponding pixels in the even rows and odd columns in the second depth map.

The three cases of a1 to a3 in the above step (2) can be expressed as:

If abs(pix_C-pix_R)<abs(pix_C-pix_L)/THR, select Dep_R;

If abs(pix_C-pix_L)<abs(pix_C-pix_R)/THR, select Dep_L;

Otherwise, for other cases, choose Max(Dep_R, Dep_L).

The three cases from b1 to b3 in the above step (3) can be expressed as:

If abs(pix_C-pix_D)<abs(pix_C-pix_U)/THR, then use Dep_D;

If abs(pix_C-pix_U)<abs(pix_C-pix_D)/THR, select Dep_U;

Otherwise, for other cases, choose Max(Dep_D, Dep_U).

In the above formula, pix_C is the luminance channel value (Y value) of the middle pixel in the texture map corresponding to the depth value in the second depth map, pix_L is the luminance channel value of the left pixel of pix_C, and pix_R is the right pixel of pix_C The luminance channel value of , pix_U is the luminance channel value of the pixel above pix_C, the luminance channel value of the pixel below pix_D, Dep_R is the depth value corresponding to the right pixel of the middle pixel in the texture map at the corresponding position of the depth value in the second depth map, Dep_L is the depth value corresponding to the right pixel of the middle pixel in the texture map at the position corresponding to the depth value in the second depth map, and Dep_D is the depth value corresponding to the pixel below the middle pixel in the texture map at the position corresponding to the depth value in the second depth map , Dep_U is the depth value corresponding to the pixel above the middle pixel in the texture map at the position corresponding to the depth value in the second depth map. abs represents an absolute value, and THR is a settable threshold. In an embodiment of this specification, THR is set to 2.

Example of the second way:

Obtain the depth value of the pixel in the estimated depth map as the pixel value of the corresponding row and column in the second depth map; for the pixel in the second depth map that does not have a corresponding relationship with the estimated depth map The pixel is obtained by filtering based on the difference between the pixel value of the corresponding pixel in the corresponding texture map and the surrounding pixels of the corresponding pixel.

Wherein, for the pixels in the second depth map that do not have a corresponding relationship with the pixels in the estimated depth map, the calculation is performed based on the difference between the corresponding pixels in the corresponding texture map and the pixel values of the surrounding pixels of the corresponding pixel. filtered.

There are many specific filtering methods, and two specific embodiments are given below.

Specific embodiment 1, nearest neighbor filtering method

Specifically, for the pixels in the second depth map that do not have a corresponding relationship with the pixels in the estimated depth map, the corresponding pixels in the texture map and four diagonal position pixels around the corresponding pixel may be compared Compare the pixel values of the corresponding pixels, obtain the pixel point that is the closest to the pixel value of the corresponding pixel, and use the depth value in the estimated depth map corresponding to the pixel point with the most similar pixel value as the corresponding depth value in the texture map. The pixel corresponds to the depth value of the pixel in the second depth map.

Specific embodiment 2, weighted filtering method

Specifically, the corresponding pixels in the texture map may be compared with the surrounding pixels of the corresponding pixels, and according to the similarity of the pixel values, the depth values in the estimated depth map corresponding to the surrounding pixels may be weighted to obtain the A depth value of a corresponding pixel in the texture map in the second depth map.

The above shows some methods for upsampling the estimated depth map to obtain the second depth map. It can be understood that the above are only examples, and specific upsampling methods are not limited in the embodiments of this specification. Moreover, the method of up-sampling the estimated depth map in any video frame may correspond to the method of down-sampling the quantized depth map to obtain the first depth map, or there may be no corresponding relationship. In addition, the ratio of upsampling and downsampling can be the same or different.

Next, some specific examples of step S124 are given.

In specific implementation, in order to save data processing resources and improve image reconstruction efficiency on the premise of ensuring image reconstruction quality, only part of the texture map in the stitched image and the estimated depth map of the corresponding viewing angle may be selected as the target texture map and target Depth maps for reconstruction of virtual viewpoint images. in particular:

According to the position information of the virtual viewpoint and the parameter data corresponding to the spliced image, in the synchronized texture maps of multiple viewing angles and the estimated depth maps of the corresponding viewing angles, multiple target texture maps and target depth maps can be selected. . After that, the target texture map and the target depth map can be combined and rendered to obtain the image of the virtual viewpoint.

In a specific implementation, the location information of the virtual viewpoint may be determined according to user interaction behavior or preset. If it is determined based on the user interaction behavior, the virtual viewpoint position at the corresponding interaction moment can be determined by acquiring trajectory data corresponding to the user interaction operation. In some embodiments of this specification, the location information of the virtual viewpoint corresponding to the corresponding video frame may also be preset on the server (such as the server or the cloud), and the set virtual viewpoint is transmitted in the header file of the free viewpoint video. The position information of the viewpoint.

In a specific implementation, the spatial positional relationship between the estimated depth map of each texture map and the corresponding viewing angle and the virtual viewpoint position may be determined based on the virtual viewpoint position and the parameter data corresponding to the spliced image, in order to save data processing resources, According to the position information of the virtual viewpoint and the parameter data corresponding to the spliced image, the texture map of the synchronized multiple viewing angles and the estimated depth map of the corresponding viewing angle can be selected to satisfy the preset position with the virtual viewpoint position. A relational and/or quantitative relational texture map and an estimated depth map are used as the target texture map and target depth map.

For example, texture maps and estimated depth maps corresponding to 2 to N viewpoints closest to the virtual viewpoint position may be selected. Wherein, N is the number of texture maps in the spliced image, that is, the number of acquisition devices corresponding to the texture maps. In a specific implementation, the quantitative relationship value may be fixed or variable.

Referring to the flowchart of the combined rendering method shown in FIG. 13 , in some embodiments of this specification, the following steps may be specifically included:

S131, respectively perform forward mapping on the target depth map in the selected spliced image to the virtual position.

S132: Perform post-processing on the forward-mapped target depth maps respectively.

In specific implementation, there may be various post-processing methods. In some embodiments of this specification, at least one of the following methods may be used to perform post-processing on the target depth map:

1) Perform foreground edge protection processing on the target depth map after forward mapping;

2) Perform pixel-level filtering on the forward-mapped target depth map respectively.

S133, respectively perform reverse mapping on the target texture map in the selected spliced image.

S134 , fuse the virtual texture maps generated after reverse mapping to obtain a fused texture map.

Through the above steps S131 to S134, the reconstructed image can be obtained.

In a specific implementation, holes may also be filled in the fused texture map to obtain a reconstructed image corresponding to the position of the virtual viewpoint at the moment of user interaction. Through hole filling, the quality of the reconstructed image can be improved.

In a specific implementation, before performing image reconstruction, the target depth map may be preprocessed (for example, upsampling), or the estimated depth map obtained after all inverse quantization processing in the stitched image may be preprocessed ( For example, upsampling processing), and then perform image reconstruction based on virtual viewpoints.

For better understanding and implementation by those skilled in the art, the embodiments of the present specification also provide specific embodiments such as apparatuses and devices corresponding to the foregoing method embodiments, which will be described below with reference to the accompanying drawings.

The embodiments of this specification also provide a corresponding free-viewpoint video processing method, with reference to Figure 14, which may specifically include the following steps:

S141: Obtain a free-view video, where the free-view video includes spliced images at multiple frame times and parameter data corresponding to the spliced images, and the spliced images include synchronized texture maps of multiple viewing angles and quantized depth maps of corresponding viewing angles , the parameter data corresponding to the spliced image includes: quantization parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle.

In a specific implementation, by acquiring a free-viewpoint video and decoding the free-viewpoint video, the spliced images at the multiple frame moments and the parameter data corresponding to the spliced images can be obtained.

The specific form of the free-view video may be a multi-angle free-view video, such as a 6DoF video, as exemplified in the foregoing embodiment.

By downloading a free-view video stream or obtaining a stored free-view video data file, a sequence of video frames can be obtained, and each video frame can include a spliced image formed by the synchronized texture maps of multiple viewing angles and the first depth map of the corresponding viewing angle, A structure of a stitched image is shown in Figure 11. It can be understood that other structures of splicing images can be adopted. For example, different splicing methods can be adopted according to the ratio of the resolution of the texture map and the first depth map of the corresponding viewing angle. For example, one texture map can correspond to multiple images. a first depth map (eg, the first depth map is a depth map processed by 25% downsampling).

In addition to the stitched image, the free-viewpoint video data file may also include metadata describing the stitched image. In a specific implementation, the parameter data of the spliced image can be obtained from the metadata, for example, the camera parameters of the spliced image, the splicing rule of the spliced image, the resolution information of the spliced image, etc. one or more types of information.

In a specific implementation, the parameter information of the spliced image may be transmitted in combination with the spliced image, for example, may be stored in a video file header. The embodiments of this specification do not limit the specific format of the spliced image, nor do they limit the specific type and storage location of the parameter information of the spliced image, as long as the reconstructed image of the corresponding virtual viewpoint position can be obtained based on the virtual viewpoint video.

S142: Acquire quantized depth values of pixels in the quantized depth map.

S143: Obtain and perform inverse quantization processing on the quantized depth values of the pixels in the quantized depth map based on the quantization parameter data of the estimated depth map of the viewing angle corresponding to the quantized depth map to obtain a corresponding estimated depth map.

The specific quantization parameter data used in the inverse quantization processing in step S143 and the specific inverse quantization processing method can be referred to the introduction of the foregoing embodiments, and the description will not be repeated here.

S144, in response to the user interaction behavior, determine the position information of the virtual viewpoint.

In a specific implementation, if the free viewpoint video adopts the 6DoF expression, the virtual viewpoint position information based on user interaction can be expressed as coordinates

, the virtual viewpoint position information can be generated in one or more preset user interaction manners. For example, coordinates can be entered for user operations, such as manual clicks or gesture paths, or virtual locations determined by voice input, or users can be provided with customized virtual viewpoints (for example, the user can enter a location or perspective in the scene, such as a basket off the court, from the sidelines, from the referee's perspective, from the coach's perspective, etc.). Or based on a specific object (eg, a player on the court, an actor or guest in an image, a host, etc., the user can switch to the perspective of the object after the user clicks on the corresponding object). It can be understood that a specific user interaction behavior mode is not limited in the embodiment of the present invention, as long as virtual viewpoint position information based on user interaction can be obtained.

As an optional example, in response to the user's gesture interaction operation, the corresponding virtual viewpoint path information may be determined. As far as gesture interaction is concerned, the corresponding virtual viewpoint path can be planned based on different forms of gesture interaction, so that the path information of the corresponding virtual viewpoint can be determined based on the user's specific gesture operation. Left and right sliding corresponds to the left and right movement of the viewing angle; the user's finger sliding up and down relative to the touch screen corresponds to the up and down movement of the viewpoint position; the zoom operation of the finger corresponds to the zoom in and out of the viewpoint position.

It can be understood that the above virtual viewpoint paths based on gesture shape planning are only exemplary descriptions, and virtual viewpoint paths based on other gesture shapes may be predefined, or user-defined settings may be allowed to enhance user experience.

S145 , based on the synchronized texture maps of multiple viewpoints and the estimated depth map of the corresponding viewpoints, and according to the position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image, reconstruct the image of the virtual viewpoint.

In a specific implementation, according to the virtual viewpoint path information, the texture map of the corresponding frame moment and the estimated depth map of the corresponding viewing angle can be selected as the target texture map and the target depth map, and the target texture map and the target depth map can be processed. Rendering is combined to obtain an image of the virtual viewpoint.

The specific selection method may be introduced with reference to the foregoing embodiments, and will not be described in detail here.

It should be noted that, based on the virtual viewpoint path information, part of the texture map and the second depth map of the corresponding viewing angle in one frame or consecutive multi-frame spliced images can be selected according to the time sequence, as the target texture map and the target depth map, which are used to reconstruct the corresponding Image of virtual viewpoint.

In a specific implementation, further processing may be performed on the reconstructed free-viewpoint image. An exemplary extension is given below.

In order to enrich the user's visual experience, Augmented Reality (AR) special effects can be implanted in the reconstructed free-viewpoint images. In some embodiments of this description, referring to the flowchart of the free-viewpoint video processing method shown in FIG. 15 , the implantation of AR special effects is implemented in the following manner:

S151. Acquire a virtual rendering target object in the image of the virtual viewpoint.

In a specific implementation, certain objects in the image of the free-view video may be determined as virtual rendering target objects based on certain indication information, and the indication information may be generated based on user interaction, or may be based on certain preset trigger conditions or a third party. command is obtained. In an optional embodiment of the present specification, the virtual rendering target object in the image of the virtual viewpoint may be acquired in response to the interactive control instruction generated by the special effect.

S152: Acquire a virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object.

In the embodiments of this specification, the implanted AR special effects are presented in the form of virtual information images. The virtual information image may be generated based on augmented reality special effect input data of the target object. After the virtual rendering target object is determined, a virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object can be acquired.

In the embodiment of this specification, the virtual information image corresponding to the virtual rendering target object may be generated in advance, or may be generated immediately in response to the special effect generation instruction.

In a specific implementation, a virtual information image matching the position of the virtual rendering target object can be obtained based on the position of the virtual rendering target object obtained by three-dimensional calibration in the reconstructed image, so that the obtained virtual information image can be made to match the position of the virtual rendering target object. The position of the virtual rendering target object in the three-dimensional space is more matched, and the displayed virtual information image is more in line with the real state in the three-dimensional space, so the displayed composite image is more realistic and vivid, and the user's visual experience is enhanced.

In a specific implementation, a virtual information image corresponding to the target object may be generated according to a preset special effect generation method based on the augmented reality special effect input data of the virtual rendering target object.

In a specific implementation, a variety of special effect generation methods can be adopted.

For example, the augmented reality special effect input data of the target object may be input into a preset three-dimensional model, and the output matches the virtual rendering target object based on the position of the virtual rendering target object obtained by the three-dimensional calibration in the image. virtual information images;

For another example, the augmented reality special effects input data of the virtual rendering target object can be input into a preset machine learning model, and the position of the virtual rendering target object in the image obtained based on the three-dimensional calibration can be output and the same as that of the virtual rendering target object. A virtual information image that matches the virtual render target object.

S153 , synthesizing and displaying the virtual information image and the image of the virtual viewpoint.

In a specific implementation, the virtual information image and the image of the virtual viewpoint can be synthesized and displayed in various ways. Two specific implementation examples are given below:

Example 1: Perform fusion processing on the virtual information image and the corresponding image to obtain a fusion image, and display the fusion image;

Example 2: The virtual information image is superimposed on the corresponding image to obtain a superimposed composite image, and the superimposed composite image is displayed.

In a specific implementation, the obtained composite image may be displayed directly, or the obtained composite image may be inserted into a video stream to be played for playback and display. For example, the fused image can be inserted into the video stream to be played for playback display.

The free viewpoint video may include a special effect display identifier. In a specific implementation, the superimposed position of the virtual information image in the image of the virtual viewpoint may be determined based on the special effect display identifier, and then the virtual information image may be placed in the image of the virtual viewpoint. The determined superposition position is displayed in superposition.

In order to make those skilled in the art better understand and implement, a detailed description is given below through an image display process of an interactive terminal. Referring to the schematic diagrams of the video playback screens of the interactive terminal shown in FIG. 16 to FIG. 20 , the interactive terminal T1 plays the video in real time. 16 , the video frame P1 is displayed. Next, the video frame P2 displayed by the interactive terminal includes a plurality of special effect display marks such as the special effect display mark I1. The video frame P2 is represented by an inverted triangle symbol pointing to the target object, such as shown in Figure 17. It can be understood that, the special effect display logo may also be displayed in other manners. The terminal user touches and clicks on the special effect display identifier I1, then the system automatically acquires the virtual information image corresponding to the special effect display identifier I1, and superimposes the virtual information image on the video frame P3 and displays it in the video frame P3, as shown in FIG. The position of the site where Q1 stands is the center, and a three-dimensional ring R1 is rendered. Next, as shown in FIG. 19 and FIG. 20 , the end user touches and clicks the special effect display identifier I2 in the video frame P3, and the system automatically acquires the virtual information image corresponding to the special effect display identifier I2, and displays the virtual information image in a superimposed manner. On the video frame P3, a superimposed image is obtained, that is, the video frame P4, in which the hit rate information display board M0 is displayed. The hit rate information display board M0 displays the number position, name and hit rate information of the target object, namely the athlete Q2.

As shown in Figure 16 , Figure 17 , Figure 18 , Figure 19 and Figure 20 , the end user can continue to click other special effect display logos displayed in the video frame to watch the video showing the AR special effect corresponding to each special effect display logo.

It can be understood that different types of implanted special effects can be distinguished by different types of special effect display signs.

Referring to the schematic structural diagram of the depth map processing apparatus shown in FIG. 21, the depth map processing apparatus 210 may include: an estimated depth map obtaining unit 211, a depth value obtaining unit 212, a quantization parameter data obtaining unit 213, and a quantization processing unit 214, specifically land:

The estimated depth map acquiring unit 211 is adapted to acquire an estimated depth map generated based on multiple frame-synchronized texture maps, and the multiple texture maps have different viewing angles;

a depth value obtaining unit 212, adapted to obtain the depth value of the pixel in the depth map;

a quantization parameter data acquisition unit 213, adapted to acquire quantization parameter data corresponding to the estimated depth map view angle;

The quantization processing unit 214 is adapted to perform quantization processing on the depth values of the pixels in the estimated depth map based on the quantization parameter data corresponding to the perspective of the estimated depth map, to obtain the quantized depth values of the corresponding pixels in the quantized depth map.

In a specific implementation, the quantization parameter data obtaining unit 213 is adapted to obtain the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map; correspondingly, the quantization processing unit 214 , based on the minimum depth distance from the optical center and the maximum depth distance from the optical center of the corresponding viewing angle of the estimated depth map, the corresponding quantization formula can be used to quantize the depth value of the corresponding pixel in the estimated depth map. , to obtain the quantized depth value of the corresponding pixel in the quantized depth map.

For the specific quantization principle of the quantization processing unit 214 and the specific quantization formula that can be used, reference may be made to the descriptions in the foregoing embodiments.

As an optional example, continuing to refer to FIG. 21 , the depth map processing apparatus 210 may further include: a downsampling processing unit 215 and a stitching unit 216, wherein:

The downsampling processing unit 215 is adapted to perform downsampling processing on the quantized depth map to obtain a first depth map;

The splicing unit 216 is adapted to splicing the synchronized texture maps of multiple viewing angles and the first depth map of the corresponding viewing angles according to a preset splicing method to obtain a spliced image.

Referring to the schematic structural diagram of the free-viewpoint video reconstruction apparatus shown in FIG. 22 , the free-viewpoint video reconstruction apparatus 220 may include: a first video acquisition unit 221 , a first quantized depth value acquisition unit 222 , and a first quantization parameter data acquisition unit 223 , the first depth map inverse quantization processing unit 224 and the first image reconstruction unit 225, specifically:

The first video acquisition unit 221 is adapted to acquire a free-view video, the free-view video includes a spliced image at multiple frame moments and parameter data corresponding to the spliced image, and the spliced image includes synchronized multiple viewing angles. A texture map and a quantized depth map corresponding to a viewing angle, and the parameter data corresponding to the spliced image includes: quantized parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle;

The first quantized depth value obtaining unit 222 is adapted to the quantized depth values of the pixels in the quantized depth map;

The first quantization parameter data obtaining unit 223 is adapted to obtain quantization parameter data corresponding to the quantized depth map view angle;

The first depth map inverse quantization processing unit 224 is adapted to perform inverse quantization processing on the quantized depth map of the corresponding view angle based on the quantization parameter data corresponding to the view angle of the quantized depth map to obtain a corresponding estimated depth map;

The first image reconstruction unit 225 is adapted to reconstruct the obtained image based on the texture maps of multiple viewing angles and the estimated depth maps of the corresponding viewing angles, according to the obtained position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image. Image of virtual viewpoint.

In a specific implementation, in a specific implementation, the first quantization parameter data acquisition unit 223 is adapted to acquire the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map; Correspondingly, the first depth map inverse quantization processing unit 224 is adapted to adopt the corresponding inverse quantization formula based on the minimum depth distance from the optical center and the maximum depth distance from the optical center based on the corresponding viewing angle of the estimated depth map. Perform inverse quantization processing on the quantized depth values in the quantized depth map to obtain the depth values of the pixels corresponding to the estimated depth map corresponding to the viewing angle.

In some embodiments of the present specification, the specific inverse quantization formula used by the first depth map inverse quantization processing unit 224 may refer to the foregoing embodiments, and details are not repeated here.

Referring to FIG. 23 , an embodiment of the present specification further provides a free-view video processing apparatus. As shown in FIG. 23 , the free-view video processing apparatus 230 may include: a second video obtaining unit 231 , a second quantized depth value obtaining unit 232 , The second depth map inverse quantization processing unit 233, the virtual viewpoint position determination unit 234, and the second image reconstruction unit 235, wherein:

The second video acquisition unit 231 is adapted to acquire a free-view video, the free-view video includes a spliced image at multiple frame moments and parameter data corresponding to the spliced image, and the spliced image includes synchronized multiple viewing angles. A texture map and a quantized depth map corresponding to a viewing angle, and the parameter data corresponding to the spliced image includes: quantized parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle;

The second quantized depth value obtaining unit 232 is adapted to obtain the quantized depth value of the pixel in the quantized depth map;

The second depth map inverse quantization processing unit 233 is adapted to obtain and perform inverse quantization processing on the quantized depth values of the pixels in the quantized depth map based on the quantization parameter data of the estimated depth map of the viewing angle corresponding to the quantized depth map, Get the corresponding estimated depth map;

The virtual viewpoint location determining unit 234 is adapted to determine location information of the virtual viewpoint in response to user interaction;

The second image reconstruction unit 235 is adapted to reconstruct the virtual view point according to the position information of the virtual view point and the camera parameter data corresponding to the stitched image based on the synchronized texture maps of multiple viewing angles and the estimated depth map of the corresponding viewing angles. Viewpoint image.

For the specific implementation of the free-viewpoint video processing apparatus in the embodiments of the present specification, reference may be made to the aforementioned free-viewpoint video processing method, which will not be repeated here.

This specification also provides an electronic device. Referring to the schematic structural diagram of the electronic device shown in FIG. 24 , the electronic device 240 may include a memory 241 and a processor 242. running computer instructions, wherein, when the processor 242 runs the computer instructions, the steps of the methods described in any of the foregoing embodiments may be executed. For specific steps, principles, etc., refer to the foregoing corresponding method embodiments, which will not be repeated here. .

In a specific implementation, based on a specific solution, the electronic device can be set on the service side as a server or cloud device, or can be set on the user side as a terminal device.

The embodiment of this specification also provides a corresponding server device. Referring to the schematic structural diagram of the server device shown in FIG. 25 , in a specific implementation, as shown in FIG. 25 , the server device 250 may include a processor 251 and communication component 252, where:

The processor 251 is adapted to perform the steps of the depth map processing method described in any of the foregoing embodiments to obtain a quantized depth map, and to synchronize the texture maps of multiple viewing angles and the first depth map of the corresponding viewing angle according to a preset value. The splicing method is spliced to obtain a spliced image, and the spliced images of multiple frames and the corresponding parameter data are packaged to obtain a free-view video;

The communication component 252 is adapted to transmit the free-view video.

An embodiment of this specification also provides a terminal device. Referring to the schematic structural diagram of the terminal device shown in FIG. 26 , in a specific implementation, as shown in FIG. 26 , the terminal device 260 may include a communication component 261 , a processor 262 and a display component 263, of which:

the communication component 261, adapted to obtain free-view video;

The processor 262 is adapted to execute the steps of the free-viewpoint video reconstruction method or the free-viewpoint video processing method described in any of the foregoing embodiments. For specific steps, refer to the foregoing embodiments of the free-viewpoint video reconstruction method and the free-viewpoint video processing method. description, which will not be repeated here.

The display component 263 is adapted to display the reconstructed image obtained by the processor 262 .

In the embodiments of this specification, the terminal device may be a mobile terminal such as a mobile phone, a tablet computer, a personal computer, a television, or a combination of any terminal device and an external display device.

The embodiments of the present specification also provide a computer-readable storage medium on which computer instructions are stored, wherein, when the computer instructions are run, the execution of the free-viewpoint video reconstruction method or the free-viewpoint video processing method described in any of the foregoing embodiments is performed. For details of the steps, reference may be made to the foregoing specific embodiments, which will not be repeated here.

In a specific implementation, the computer-readable storage medium may be various suitable readable storage mediums such as an optical disc, a mechanical hard disk, and a solid-state hard disk.

Although the embodiments of the present specification are disclosed as above, the present invention is not limited thereto. Any person skilled in the art can make various changes and modifications without departing from the spirit and scope of the embodiments of this specification. Therefore, the protection scope of the present invention should be based on the scope defined by the claims.

Claims

A depth map processing method, comprising:

obtaining an estimated depth map generated based on multiple frame-synchronized texture maps, the multiple texture maps having different viewing angles;

obtaining the depth value of the pixel in the estimated depth map;

Obtaining and performing quantization processing on the depth values of the pixels in the estimated depth map based on the quantization parameter data corresponding to the angle of view of the estimated depth map, to obtain the quantized depth values of the corresponding pixels in the quantized depth map.
The method according to claim 1, wherein the acquiring and based on the quantization parameter data corresponding to the perspective of the estimated depth map, perform quantization processing on the depth values of the corresponding pixels in the estimated depth map to obtain a quantized depth map The quantized depth value of the corresponding pixel in , including:

Obtain the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map;

Based on the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map, a corresponding quantization formula is used to quantize the depth values of the corresponding pixels in the estimated depth map, and obtain The quantized depth value of the corresponding pixel in the quantized depth map.
The method according to claim 2, wherein, based on the minimum depth distance from the optical center and the maximum depth distance from the optical center of the corresponding viewing angle based on the estimated depth map, a corresponding quantization formula is used for the estimated depth The depth value of the corresponding pixel in the figure is quantized to obtain the quantized depth value of the corresponding pixel in the quantized depth map, including:

The depth value of the corresponding pixel in the estimated depth map is quantized using the following quantization formula:

Among them, M is the quantization bit of the corresponding pixel in the estimated depth map, range is the depth value of the corresponding pixel in the estimated depth map, Depth is the quantized depth value of the corresponding pixel in the estimated depth map, and N is the estimated depth value. The depth map corresponds to the viewing angle, depth_range_near_N is the minimum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N, and depth_range_far_N is the maximum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N.
The method of claim 1, further comprising:

performing down-sampling processing on the quantized depth map to obtain a first depth map;

The texture maps of the frame-synchronized multiple viewing angles and the first depth map of the corresponding viewing angles are spliced according to a preset splicing method to obtain a spliced image.
A free-viewpoint video reconstruction method, comprising:

Obtain a free-view video, where the free-view video includes spliced images at multiple frame moments and parameter data corresponding to the spliced images, and the spliced images include synchronized texture maps of multiple viewing angles and quantized depth maps of corresponding viewing angles, so The parameter data corresponding to the spliced image includes: quantization parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle;

obtaining the quantized depth value of the pixel in the quantized depth map;

Obtaining and performing inverse quantization processing on the quantized depth values of the pixels in the quantized depth map based on the quantization parameter data of the estimated depth map of the corresponding viewing angle of the quantized depth map to obtain a corresponding estimated depth map;

Based on the synchronized texture maps of multiple viewpoints and the estimated depth map of the corresponding viewpoints, the image of the virtual viewpoint is reconstructed according to the position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image.
The method according to claim 5, wherein the obtaining and based on the quantization parameter data of the estimated depth map of the view corresponding to the quantized depth map, perform inverse quantization processing on the quantized depth value in the quantized depth map to obtain the corresponding The estimated depth map of the viewpoint, including:

Obtain the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map;

Based on the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map, the corresponding inverse quantization formula is used to perform inverse quantization processing on the quantized depth values in the quantized depth map, to obtain The estimated depth map for the corresponding view corresponds to the depth value of the corresponding pixel.
The method according to claim 6, wherein, based on the minimum depth distance from the optical center and the maximum depth distance from the optical center based on the viewing angle corresponding to the estimated depth map, a corresponding inverse quantization formula is used to quantize the quantization The quantized depth value in the depth map is subjected to inverse quantization processing to obtain the depth value of the pixel corresponding to the estimated depth map of the corresponding viewing angle, including:

The following inverse quantization formula is used to inverse quantize the quantized depth values in the quantized depth map to obtain corresponding pixel values in the estimated depth map:

Wherein, M is the quantization bit of the corresponding pixel in the quantized depth map, range is the depth value of the corresponding pixel in the estimated depth map, Depth is the quantized depth value of the corresponding pixel in the quantized depth map, and N is the estimated depth The depth map corresponds to the viewing angle, depth_range_near_N is the minimum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N, depth_range_far_N is the maximum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N, and maxdisp is the corresponding viewing angle N. The maximum value of the quantized depth distance, and mindisp is the minimum value of the quantized depth distance corresponding to the viewing angle N.
The method according to any one of claims 5 to 7, wherein the resolution of the quantized depth map is smaller than the resolution of the texture map of the corresponding viewing angle; before reconstructing to obtain the image of the virtual viewpoint, further comprising:

Upsampling the estimated depth map of the corresponding viewpoint to obtain a second depth map for reconstructing the image of the virtual viewpoint.
The method according to claim 8, wherein the up-sampling of the estimated depth map of the corresponding view to obtain a second depth map for reconstructing the virtual view image, comprising:

acquiring the depth value of the pixel in the estimated depth map as the pixel value of the even-numbered row and the even-numbered column corresponding to the second depth map;

For the depth values of pixels in even rows and odd columns in the second depth map, determine the corresponding pixel in the corresponding texture map as an intermediate pixel, and based on the luminance channel value of the intermediate pixel in the corresponding texture map and the left pixel corresponding to the intermediate pixel The relationship between the luminance channel value of the side pixel and the luminance channel value of the right pixel is determined;

For the depth values of pixels in odd rows in the second depth map, determine the corresponding pixel in the corresponding texture map as an intermediate pixel, based on the luminance channel value of the intermediate pixel in the corresponding texture map and the upper pixel corresponding to the intermediate pixel The relationship between the luminance channel value of and the luminance channel value of the underlying pixel is determined.
The method according to any one of claims 5 to 7, wherein the texture map based on the synchronized multiple viewing angles and the estimated depth map of the corresponding viewing angle are obtained according to the obtained position information of the virtual viewpoint and the corresponding stitched image. The camera parameter data is reconstructed to obtain the image of the virtual viewpoint, including:

According to the position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image, select multiple target texture maps and target depth maps from the synchronized texture maps of multiple viewing angles and the estimated depth maps of the corresponding viewing angles ;

The target texture map and the target depth map are combined and rendered to obtain the image of the virtual viewpoint.
A free-viewpoint video processing method, comprising:

Obtain a free-view video, where the free-view video includes spliced images at multiple frame moments and parameter data corresponding to the spliced images, and the spliced images include synchronized texture maps of multiple viewing angles and quantized depth maps of corresponding viewing angles, so The parameter data corresponding to the spliced image includes: quantization parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle;

obtaining the quantized depth value of the pixel in the quantized depth map;

Obtaining and performing inverse quantization processing on the quantized depth values of the pixels in the quantized depth map based on the quantization parameter data of the estimated depth map of the corresponding viewing angle of the quantized depth map to obtain a corresponding estimated depth map;

In response to the user interaction behavior, determine the location information of the virtual viewpoint;

Based on the synchronized texture maps of multiple viewpoints and the estimated depth map of the corresponding viewpoints, the image of the virtual viewpoint is reconstructed according to the position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image.
The method according to claim 11, wherein the determining the position information of the virtual viewpoint in response to the user interaction behavior comprises: determining the corresponding virtual viewpoint path information in response to the user's gesture interaction operation;

The texture map based on the synchronization of multiple viewing angles and the estimated depth map of the corresponding viewing angle, according to the obtained position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image, reconstruct the image of the virtual viewpoint, including:

According to the virtual viewpoint path information, select the texture map in the spliced image of the corresponding frame moment and the estimated depth map of the corresponding viewing angle, as the target texture map and the target depth map;

The target texture map and the target depth map are combined and rendered to obtain the image of the virtual viewpoint.
The method of claim 11 or 12, further comprising:

acquiring the virtual rendering target object in the image of the virtual viewpoint;

acquiring a virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object;

The virtual information image and the image of the virtual viewpoint are synthesized and displayed.
The method according to claim 13, wherein the acquiring the virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object comprises:

According to the position of the virtual rendering target object in the image of the virtual viewpoint obtained by the three-dimensional calibration, a virtual information image matching the position of the virtual rendering target object is obtained.
The method according to claim 13, wherein the acquiring the virtual rendering target object in the image of the virtual viewpoint comprises:

In response to the special effect generating the interactive control instruction, the virtual rendering target object in the image of the virtual viewpoint is acquired.
A depth map processing device, comprising:

an estimated depth map acquiring unit, adapted to acquire an estimated depth map generated based on multiple frame-synchronized texture maps, the multiple texture maps having different viewing angles;

a depth value obtaining unit, adapted to obtain the depth value of the pixel in the depth map;

a quantization parameter data acquisition unit, adapted to acquire quantization parameter data corresponding to the estimated depth map viewing angle;

The quantization processing unit is adapted to perform quantization processing on the depth values of the pixels in the estimated depth map based on the quantization parameter data corresponding to the angle of view of the estimated depth map, so as to obtain the quantized depth values of the corresponding pixels in the quantized depth map.
A free-viewpoint video reconstruction device, comprising:

The first video acquisition unit is adapted to acquire a free-view video, the free-view video includes spliced images at multiple frame moments and parameter data corresponding to the spliced images, and the spliced images include synchronized texture maps of multiple viewing angles and a quantized depth map corresponding to a viewing angle, and the parameter data corresponding to the spliced image includes: quantized parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle;

a first quantized depth value acquisition unit, adapted to the quantized depth values of the pixels in the quantized depth map;

a first quantization parameter data acquisition unit, adapted to acquire quantization parameter data corresponding to the perspective of the quantized depth map;

a first depth map inverse quantization processing unit, adapted to perform inverse quantization processing on the quantized depth map of the corresponding view angle based on the quantization parameter data corresponding to the view angle of the quantized depth map to obtain a corresponding estimated depth map;

The first image reconstruction unit is adapted to reconstruct the obtained virtual viewpoint based on the texture maps of multiple viewpoints and the estimated depth map of the corresponding viewpoints, according to the obtained position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image. image.
A free-viewpoint video processing device, comprising:

The second video acquisition unit is adapted to acquire a free-view video, where the free-view video includes spliced images at multiple frame moments and parameter data corresponding to the spliced images, and the spliced images include synchronized texture maps of multiple viewing angles and a quantized depth map corresponding to a viewing angle, and the parameter data corresponding to the spliced image includes: quantized parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle;

a second quantized depth value obtaining unit, adapted to obtain the quantized depth value of the pixel in the quantized depth map;

The second depth map inverse quantization processing unit is adapted to obtain and perform inverse quantization processing on the quantized depth values of the pixels in the quantized depth map based on the quantization parameter data of the estimated depth map of the view angle corresponding to the quantized depth map, to obtain the corresponding estimated depth map;

a virtual viewpoint position determination unit, adapted to determine the location information of the virtual viewpoint in response to user interaction;

The second image reconstruction unit is adapted to reconstruct the image of the virtual viewpoint based on the synchronized texture maps of multiple viewpoints and the estimated depth map of the corresponding viewpoints, according to the position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image. .
An electronic device comprising a memory and a processor, wherein the memory stores computer instructions that can be executed on the processor, wherein the processor executes claims 1 to 4 and claims when the processor executes the computer instructions 5 to 10 or the steps of any one of claims 11 to 15.
A server device including a processor and a communication component, wherein:

The processor is adapted to perform the steps of the method according to any one of claims 1 to 4, obtain a quantized depth map, and splices the synchronized texture maps of multiple viewing angles and the first depth map of the corresponding viewing angle according to a preset The spliced image is obtained by splicing in a way, and the spliced image of multiple frames and the corresponding parameter data are encapsulated to obtain a free-view video;

The communication component is adapted to transmit the free-view video.
A terminal device including a communication component, a processor and a display component, wherein:

the communication component, adapted to obtain free-view video;

the processor adapted to perform the steps of the method of any one of claims 5 to 10 or claims 11 to 15;

The display component is adapted to display the reconstructed image obtained by the processor.
A computer-readable storage medium having computer instructions stored thereon, wherein the computer instructions, when executed, perform the steps of the method of any one of claims 1 to 4, 5 to 10, or 11 to 15.