WO2022001865A1 - Depth map and video processing and reconstruction methods and apparatuses, device, and storage medium - Google Patents

Depth map and video processing and reconstruction methods and apparatuses, device, and storage medium Download PDF

Info

Publication number
WO2022001865A1
WO2022001865A1 PCT/CN2021/102335 CN2021102335W WO2022001865A1 WO 2022001865 A1 WO2022001865 A1 WO 2022001865A1 CN 2021102335 W CN2021102335 W CN 2021102335W WO 2022001865 A1 WO2022001865 A1 WO 2022001865A1
Authority
WO
WIPO (PCT)
Prior art keywords
depth map
depth
quantized
map
estimated
Prior art date
Application number
PCT/CN2021/102335
Other languages
French (fr)
Chinese (zh)
Inventor
盛骁杰
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2022001865A1 publication Critical patent/WO2022001865A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/04Texture mapping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/122Improving the 3D impression of stereoscopic images by modifying image signal contents, e.g. by filtering or adding monoscopic depth cues
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/243Image signal generators using stereoscopic image cameras using three or more 2D image sensors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Definitions

  • the embodiments of the present specification relate to the technical field of video processing, and in particular, to a depth map and video processing, reconstruction method, apparatus, device, and storage medium.
  • Free viewpoint video is a technology that can provide a high degree of freedom viewing experience. Users can adjust the viewing angle through interactive operations during the viewing process, and watch from the free viewpoint they want to watch, which can greatly improve the viewing experience.
  • DIBR Depth Image Based Rendering
  • the depth value is expressed as a depth map by quantizing the 8-bit binary data. Splicing the synchronized texture maps of multiple viewing angles and the obtained depth maps of the corresponding viewing angles to obtain a spliced image, and then compressing the spliced image and corresponding parameter data according to the frame timing, a free viewpoint video can be obtained and transmitted, so that the terminal can The device reconstructs the free-view image based on the obtained free-view video stream.
  • the inventors have found through research that, with the current depth map quantization processing method, the quality of the reconstructed free-viewpoint image is limited by the current depth map quantization processing method.
  • the embodiments of this specification provide a depth map and video processing and reconstruction method, apparatus, device, and storage medium, which can improve the image quality of the reconstructed free-viewpoint video.
  • the embodiments of this specification provide a depth map processing method, including:
  • a corresponding quantization formula is used to quantize the depth values of the corresponding pixels in the estimated depth map, and obtain The quantized depth value of the corresponding pixel in the quantized depth map.
  • a corresponding quantization formula is used to determine the depth of the corresponding pixel in the estimated depth map.
  • the value is quantized to obtain the quantized depth value of the corresponding pixel in the quantized depth map, including:
  • the depth value of the corresponding pixel in the estimated depth map is quantized using the following quantization formula:
  • M is the quantization bit of the corresponding pixel in the estimated depth map
  • range is the depth value of the corresponding pixel in the estimated depth map
  • Depth is the quantized depth value of the corresponding pixel in the estimated depth map
  • N is the estimated depth value.
  • the depth map corresponds to the viewing angle
  • depth_range_near_N is the minimum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N
  • depth_range_far_N is the maximum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N.
  • the method further includes:
  • the texture maps of the frame-synchronized multiple viewing angles and the first depth map of the corresponding viewing angles are spliced according to a preset splicing method to obtain a spliced image.
  • the embodiments of this specification also provide a free-view video reconstruction method, the method includes:
  • the free-view video includes spliced images at multiple frame moments and parameter data corresponding to the spliced images, and the spliced images include synchronized texture maps of multiple viewing angles and quantized depth maps of corresponding viewing angles, so
  • the parameter data corresponding to the spliced image includes: quantization parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle;
  • the image of the virtual viewpoint is reconstructed according to the position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image.
  • the obtaining and based on the quantization parameter data of the estimated depth map of the view corresponding to the quantized depth map perform inverse quantization processing on the quantized depth value in the quantized depth map to obtain the estimated depth map of the corresponding view, including: :
  • the corresponding inverse quantization formula is used to perform inverse quantization processing on the quantized depth values in the quantized depth map, to obtain
  • the estimated depth map for the corresponding view corresponds to the depth value of the corresponding pixel.
  • the corresponding inverse quantization formula is used to quantify the depth value in the quantized depth map.
  • Perform inverse quantization processing to obtain the depth value of the corresponding pixel of the estimated depth map of the corresponding viewing angle including:
  • the following inverse quantization formula is used to inverse quantize the quantized depth values in the quantized depth map to obtain corresponding pixel values in the estimated depth map:
  • M is the quantization bit of the corresponding pixel in the quantized depth map
  • range is the depth value of the corresponding pixel in the estimated depth map
  • Depth is the quantized depth value of the corresponding pixel in the quantized depth map
  • N is the estimated depth
  • depth_range_near_N is the minimum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N
  • depth_range_far_N is the maximum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N
  • maxdisp is the corresponding viewing angle N.
  • the maximum value of the quantized depth distance, and mindisp is the minimum value of the quantized depth distance corresponding to the viewing angle N.
  • the resolution of the quantized depth map is smaller than the resolution of the texture map of the corresponding viewing angle; before reconstructing to obtain the image of the virtual viewpoint, the method further includes:
  • performing up-sampling on the estimated depth map of the corresponding viewing angle to obtain a second depth map for reconstructing the virtual viewpoint image comprising:
  • the corresponding pixel in the corresponding texture map For the depth values of pixels in even rows and odd columns in the second depth map, determine the corresponding pixel in the corresponding texture map as an intermediate pixel, and based on the luminance channel value of the intermediate pixel in the corresponding texture map and the left pixel corresponding to the intermediate pixel The relationship between the luminance channel value of the side pixel and the luminance channel value of the right pixel is determined;
  • the corresponding pixel in the corresponding texture map For the depth values of pixels in odd rows in the second depth map, determine the corresponding pixel in the corresponding texture map as an intermediate pixel, based on the luminance channel value of the intermediate pixel in the corresponding texture map and the upper pixel corresponding to the intermediate pixel The relationship between the luminance channel value of and the luminance channel value of the underlying pixel is determined.
  • the texture map based on multiple viewing angles and the estimated depth map of the corresponding viewing angle are reconstructed to obtain the image of the virtual viewpoint according to the obtained position information of the virtual viewpoint and the camera parameter data corresponding to the stitched image, include:
  • the target texture map and the target depth map are combined and rendered to obtain the image of the virtual viewpoint.
  • the embodiment of this specification also provides a free-viewpoint video processing method, the method includes:
  • the free-view video includes spliced images at multiple frame moments and parameter data corresponding to the spliced images, and the spliced images include synchronized texture maps of multiple viewing angles and quantized depth maps of corresponding viewing angles, so
  • the parameter data corresponding to the spliced image includes: quantization parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle;
  • the image of the virtual viewpoint is reconstructed according to the position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image.
  • the determining the position information of the virtual viewpoint in response to the user interaction behavior includes: determining the corresponding virtual viewpoint path information in response to the user's gesture interaction operation;
  • the texture map in the spliced image at the corresponding frame moment and the estimated depth map of the corresponding viewing angle are selected as the target texture map and the target depth map;
  • the target texture map and the target depth map are combined and rendered to obtain the image of the virtual viewpoint.
  • the method further includes:
  • the virtual information image and the image of the virtual viewpoint are synthesized and displayed.
  • the acquiring the virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object includes:
  • obtaining the virtual rendering target object in the image of the virtual viewpoint includes:
  • the virtual rendering target object in the image of the virtual viewpoint is acquired.
  • An embodiment of this specification provides a depth map processing device, the device comprising:
  • An estimated depth map obtaining unit adapted to obtain an estimated depth map generated based on a plurality of frame-synchronized texture maps, the perspectives of the multiple texture maps are different;
  • a depth value obtaining unit adapted to obtain the depth value of the pixel in the depth map
  • a quantization parameter data acquisition unit adapted to acquire quantization parameter data corresponding to the estimated depth map viewing angle
  • the quantization processing unit is adapted to perform quantization processing on the depth values of the pixels in the estimated depth map based on the quantization parameter data corresponding to the angle of view of the estimated depth map, so as to obtain the quantized depth values of the corresponding pixels in the quantized depth map.
  • Embodiments of this specification also provide a free-view video reconstruction device, the device comprising:
  • the first video acquisition unit is adapted to acquire a free-view video
  • the free-view video includes spliced images at multiple frame moments and parameter data corresponding to the spliced images
  • the spliced images include synchronized texture maps of multiple viewing angles and a quantized depth map corresponding to a viewing angle
  • the parameter data corresponding to the spliced image includes: quantized parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle;
  • a first quantized depth value acquisition unit adapted to the quantized depth values of the pixels in the quantized depth map
  • a first quantization parameter data acquisition unit adapted to acquire quantization parameter data corresponding to the perspective of the quantized depth map
  • a first depth map inverse quantization processing unit adapted to perform inverse quantization processing on the quantized depth map of the corresponding view angle based on the quantization parameter data corresponding to the view angle of the quantized depth map to obtain a corresponding estimated depth map
  • the first image reconstruction unit is adapted to reconstruct the obtained virtual viewpoint based on the texture maps of multiple viewpoints and the estimated depth map of the corresponding viewpoints, according to the obtained position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image. image.
  • the embodiments of this specification also provide a free-viewpoint video processing device, the device comprising:
  • the second video acquisition unit is adapted to acquire a free-view video, where the free-view video includes spliced images at multiple frame moments and parameter data corresponding to the spliced images, and the spliced images include synchronized texture maps of multiple viewing angles and a quantized depth map corresponding to a viewing angle, and the parameter data corresponding to the spliced image includes: quantized parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle;
  • a second quantized depth value obtaining unit adapted to obtain the quantized depth value of the pixel in the quantized depth map
  • the second depth map inverse quantization processing unit is adapted to obtain and perform inverse quantization processing on the quantized depth values of the pixels in the quantized depth map based on the quantization parameter data of the estimated depth map of the view angle corresponding to the quantized depth map, to obtain the corresponding estimated depth map;
  • a virtual viewpoint position determination unit adapted to determine the location information of the virtual viewpoint in response to user interaction
  • the second image reconstruction unit is adapted to reconstruct the image of the virtual viewpoint based on the synchronized texture maps of multiple viewpoints and the estimated depth map of the corresponding viewpoints, according to the position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image. .
  • Embodiments of the present specification further provide an electronic device, including a memory and a processor, wherein the memory stores computer instructions that can be executed on the processor, wherein the processor executes the aforementioned computer instructions when the processor executes the computer instructions.
  • the embodiments of this specification also provide a server device, including a processor and a communication component, wherein:
  • the processor is adapted to perform the steps of the depth map processing method described in any of the foregoing embodiments, obtain a quantized depth map, and store the texture maps of multiple viewing angles of frame synchronization and the first depth map of the corresponding viewing angle according to preset values.
  • the splicing method is spliced to obtain a spliced image, and the spliced images of multiple frames and the corresponding parameter data are packaged to obtain a free-view video;
  • the communication component is adapted to transmit the free-view video.
  • the embodiments of this specification also provide a terminal device, including a communication component, a processor and a display component, wherein:
  • the communication component adapted to obtain free-view video
  • the processor is adapted to perform the steps of the free-view video reconstruction method or the free-view video processing method described in any of the foregoing embodiments;
  • the display component is adapted to display the reconstructed image obtained by the processor.
  • inventions of the present specification further provide a computer-readable storage medium on which computer instructions are stored, wherein, when the computer instructions are executed, the steps of the methods described in any of the foregoing embodiments are executed.
  • the depth values of the pixels in the estimated depth map are quantized by using quantization parameters that match the actual situation of the corresponding view angle, so that for each view angle
  • the depth map can make full use of the expression space of depth quantization bits, which can improve the image quality of the reconstructed free-view video.
  • a first depth map is obtained, and the first depth map and the texture map of the corresponding viewing angle are spliced according to a preset splicing method, which can reduce the overall size of the spliced image. Therefore, the storage resources and transmission resources of the spliced image can be saved.
  • the decoding resolution of the overall stitched image is limited, by setting the resolution of the quantized depth map to be smaller than the resolution of the texture map of the corresponding viewing angle, a texture map with a higher resolution can be transmitted, and then the texture map with a higher resolution can be transmitted. Upsampling the estimated depth map of the corresponding viewing angle to obtain a second depth map, and performing free-view video reconstruction based on the texture maps of multiple viewing angles synchronized in the spliced image and the second depth map of the corresponding viewing angle, so that the clarity can be obtained. Higher free-view images for improved user experience.
  • FIG. 1 is a schematic diagram of a specific application system of a free-view video display in an embodiment of this specification
  • FIG. 2 is a schematic diagram of an interactive interface of a terminal device in an embodiment of this specification
  • FIG. 3 is a schematic diagram of a setting mode of a collection device in an embodiment of the present specification
  • FIG. 4 is a schematic diagram of another terminal device interaction interface in the embodiment of this specification.
  • FIG. 5 is a schematic diagram of a field of view application scenario in the embodiment of the present specification.
  • FIG. 6 is a flowchart of a depth map processing method in an embodiment of the present specification.
  • FIG. 7 is a schematic diagram of a free-viewpoint video data generation process in an embodiment of the present specification.
  • FIG. 8 is a schematic diagram of the generation and processing of a kind of 6DoF video data in the embodiment of this specification.
  • FIG. 9 is a schematic structural diagram of a data header file in an embodiment of the present specification.
  • FIG. 10 is a schematic diagram of a user side processing 6DoF video data in an embodiment of this specification.
  • FIG. 11 is a schematic structural diagram of a stitched image in the embodiment of this specification.
  • FIG. 13 is a flowchart of a combined rendering method in an embodiment of the present specification.
  • 16 is a schematic diagram of a display interface of an interactive terminal in an embodiment of this specification.
  • 17 is a schematic diagram of a display interface of an interactive terminal in an embodiment of this specification.
  • FIG. 18 is a schematic diagram of a display interface of an interactive terminal in an embodiment of this specification.
  • 19 is a schematic diagram of a display interface of an interactive terminal in an embodiment of this specification.
  • 20 is a schematic diagram of a display interface of an interactive terminal in an embodiment of this specification.
  • FIG. 21 is a schematic structural diagram of a depth map processing apparatus in an embodiment of the present specification.
  • 22 is a schematic structural diagram of a device for free-view video reconstruction in an embodiment of the present specification.
  • FIG. 23 is a schematic structural diagram of a free-viewpoint video processing apparatus in an embodiment of the present specification.
  • 24 is a schematic structural diagram of an electronic device in an embodiment of this specification.
  • 25 is a schematic structural diagram of a server device in an embodiment of this specification.
  • FIG. 26 is a schematic structural diagram of a terminal device in an embodiment of this specification.
  • a specific application system for free-view video display in an embodiment of the present invention may include a collection system 11 of multiple collection devices, a server 12 , and a display device 13 , wherein the collection system 11 can collect images of the area to be viewed.
  • the acquisition system 11 or the server 12 can process the acquired synchronized multiple texture maps to generate multi-angle free viewing angle data that can support the display device 13 to perform virtual viewpoint switching.
  • the display device 13 can display reconstructed images generated based on multi-angle free viewing angle data, the reconstructed images correspond to virtual viewpoints, and can display reconstructed images corresponding to different virtual viewpoints according to user instructions, and switch the viewing position and viewing angle.
  • the process of performing image reconstruction to obtain a reconstructed image may be implemented by the display device 13, or may be implemented by a device located in a content delivery network (Content Delivery Network, CDN) by means of edge computing.
  • CDN Content Delivery Network
  • the user can view the area to be viewed through the display device 13 , and in this embodiment, the area to be viewed is a basketball court. As mentioned earlier, the viewing position and viewing angle can be switched.
  • users can swipe across the screen to switch virtual viewpoints.
  • the virtual viewpoint for viewing can be switched.
  • the position of the virtual viewpoint before sliding may be VP 1
  • the position of the virtual viewpoint may be VP 2 .
  • the reconstructed image displayed on the screen may be as shown in FIG. 4 .
  • the reconstructed image may be obtained by performing image reconstruction based on multi-angle free viewing angle data generated from images collected by multiple collection devices in an actual collection situation.
  • the image viewed before switching may also be a reconstructed image.
  • the reconstructed images may be frame images in the video stream.
  • the manner of switching the virtual viewpoint according to the user's instruction may be various, which is not limited here.
  • the virtual viewpoint can be represented by coordinates of 6 degrees of freedom (DoF), wherein the spatial position of the virtual viewpoint can be represented as (x, y, z), and the viewing angle can be represented as three rotation directions
  • DoF degrees of freedom
  • the spatial position of the virtual viewpoint can be represented as (x, y, z)
  • the viewing angle can be represented as three rotation directions
  • the multi-angle free viewing angle data may include depth map data, which is used to provide third-dimensional information outside the plane image. Compared with other implementations, such as providing three-dimensional information through point cloud data, the data volume of the depth map data is smaller.
  • the switching of the virtual viewpoints may be performed within a certain range, which is a multi-angle free viewing angle range. That is, within the multi-angle free viewing angle range, the virtual viewpoint position and the viewing angle can be switched arbitrarily.
  • the multi-angle free viewing angle range is related to the arrangement of the acquisition device.
  • the wider the shooting coverage of the acquisition device the larger the multi-angle free viewing angle range.
  • the quality of the picture displayed by the terminal device is related to the number of collection devices. Generally, the more collection devices are set, the fewer empty areas in the displayed picture.
  • the range of multi-angle free viewing angles is related to the spatial distribution of the acquisition devices.
  • the range of multi-angle free viewing angles and the interaction mode with the display device on the terminal side can be set based on the spatial distribution relationship of the collection devices.
  • collection devices are arranged along a certain path.
  • 6 collection devices may be arranged along an arc, that is, collection devices CJ 1 to CJ 6 . It can be understood that the location, quantity and support manner of the collection devices can be various, which are not limited here.
  • the embodiments of this specification provide a corresponding depth map processing method and a free-view video reconstruction method. Examples are described in detail.
  • a sampled image value is represented by a number, and the process of converting a continuous value of an image function to its digital equivalent is quantization.
  • Image quantization assigns each successive sample value an integer number.
  • the server device (such as the server 12) estimates the depth map of the scene and the object based on the texture map, it performs 8-bit binary data quantization on the depth value, and expresses it as a depth map.
  • the estimated depth For the convenience of description, it is referred to as the estimated depth here. picture.
  • the texture maps of the synchronized multiple viewing angles and the obtained estimated depth maps of the corresponding viewing angles are spliced to obtain a spliced image.
  • the spliced image and corresponding parameter data are compressed according to the frame timing to obtain a free-view video, and the terminal device can reconstruct the free-view image based on the freely obtained free-view video.
  • the inventor found through research that the current depth map quantization processing method is based on the same set of quantization parameter data for each depth map in the spliced image, and the quality of the reconstructed free viewpoint image is affected by the current depth. Limitations of graph quantization processing methods.
  • a numerical value can be obtained by quantizing the following formula Quantized depth value in 8-bit binary in the range 0-255.
  • range is the depth value of the corresponding pixel in the estimated depth map
  • Depth is the quantized depth value of the corresponding pixel in the estimated depth map
  • depth_range_near is the minimum depth distance from the optical center in the preset field of view
  • depth_range_far is The maximum depth distance from the optical center in the preset field of view.
  • FIG. 5 Field of view of the application scenario shown in FIG. 5 schematic for a scene region 50, which contains the object R, and is provided with a plurality of collecting devices P 1, P 2 ... P n ... P N, capture device P 1 ⁇ P N was arc-shaped is provided, the order of the corresponding optical center C 1, C 2 ... C n ... C N, each acquisition device P 1 ⁇ P N corresponding to the optical axis L 1, L 2 ... L n ... L N 5 objects from FIG.
  • each acquisition device P 1 ⁇ P N between the optical center C 1 ⁇ C N can visually see the spatial relationship, different from the minimum and maximum distance between the optical center of each object R and the collection apparatus, and therefore based on capture device P 1 ⁇ P N acquired texture map estimation estimates the maximum depth from varying depths from the minimum value and the distance from the optical center of the optical center of the resulting depth map.
  • quantization parameters that match the actual situation of the corresponding viewing angle are used to quantize the depth values of the pixels in the estimated depth map, so that for the depth map of each viewing angle, all The expression space of depth quantization bits can be fully utilized, and the image quality of the reconstructed free-view video can be improved.
  • the embodiment of this specification may specifically include the following quantization processing steps:
  • S61 Acquire an estimated depth map generated based on multiple frame-synchronized texture maps, where the multiple texture maps have different viewing angles.
  • an acquisition system composed of a plurality of acquisition devices may acquire images synchronously, and obtain the texture maps synchronized with the multiple frames.
  • the origin of the coordinate system of the acquisition device (such as a camera) can be used as the optical center, and the depth value can be the distance from each point in the field of view to the optical center along the optical axis.
  • an estimated depth map corresponding to each texture map may be obtained based on the multiple texture maps synchronized in the frame.
  • S63 Acquire and perform quantization processing on the depth values of the pixels in the estimated depth map based on the quantization parameter data corresponding to the perspective of the estimated depth map, to obtain the quantized depth values of the corresponding pixels in the quantized depth map.
  • the estimated value of the quantization parameter may include: estimating the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the depth map, so as to perform quantization processing on the depth values of the pixels in the estimated depth map,
  • the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map can be obtained first, and then, the minimum depth distance from the optical center of the viewing angle corresponding to the estimated depth map and the optical center can be obtained.
  • a corresponding quantization formula is used to quantize the depth value of the corresponding pixel in the estimated depth map to obtain the quantized depth value of the corresponding pixel in the quantized depth map.
  • the following quantization formula is used to quantize the depth value of the corresponding pixel in the estimated depth map:
  • M is the quantization bit of the corresponding pixel in the estimated depth map
  • range is the depth value of the corresponding pixel in the estimated depth map
  • Depth is the quantized depth value of the corresponding pixel in the estimated depth map
  • N is the estimated depth value.
  • the depth map corresponds to the viewing angle
  • depth_range_near_N is the minimum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N
  • depth_range_far_N is the maximum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N.
  • the object closest to the camera (optical center) in the quantized depth map corresponding to each viewing angle can be quantized to obtain a depth value closer to 2 M -1 .
  • M can take 8 bits, 16 bits, and so on. If M is 8 bits, the depth value of the object closest to the optical center in the quantized depth map corresponding to each viewing angle is closer to 255 after quantization.
  • the synchronized texture maps of multiple viewing angles and the quantized depth maps of the corresponding viewing angles can be spliced to obtain a spliced image, and then based on the spliced images at multiple frame moments and the parameter data corresponding to the spliced images, it is possible to obtain Free View Video.
  • the free-view video may be compressed and then transmitted to a terminal device for image reconstruction of the free-view video.
  • texture map acquisition and depth map calculation are required, including three main steps, namely Multi-camera Video Capturing, camera internal and external parameter calculation ( Camera Parameter Estimation), and Depth Map Calculation.
  • Multi-camera Video Capturing it is required that the video captured by each camera can be aligned at the frame level.
  • the texture image can be obtained through the video acquisition of multiple cameras;
  • the camera parameters can be obtained through the calculation of the internal and external parameters of the camera, which can include the internal parameter data of the camera and the external parameter data;
  • Depth Map multiple synchronized texture maps, depth maps and camera parameters corresponding to the perspective, form 6DoF video data.
  • the texture map collected from multiple cameras, the camera parameters of all cameras, and the depth map of each camera are obtained.
  • These three parts of data can be referred to as data files in the multi-angle free-view video data, and can also be referred to as 6DoF video data. Because of these data, the client can generate virtual viewpoints according to the virtual 6 degrees of freedom (DoF) position, thereby providing a 6DoF video experience.
  • DoF degrees of freedom
  • 6DoF video data and indicative data can be compressed and transmitted to the user side, and the user side can obtain the user side 6DoF expression according to the received data, that is, the aforementioned 6DoF video data and metadata.
  • indicative data can also be called metadata (Metadata)
  • Metadata can be used to describe the data pattern of 6DoF video data, and can specifically include: Stitching Pattern metadata, used to indicate the pixel data of multiple texture maps in the stitched image and the storage rules for quantized depth map data; edge Padding pattern metadata, which can be used to indicate the method of edge protection in the stitched image, the quantization parameter metadata of the corresponding viewing angle, and other metadata (Other metadata).
  • Stitching Pattern metadata used to indicate the pixel data of multiple texture maps in the stitched image and the storage rules for quantized depth map data
  • edge Padding pattern metadata which can be used to indicate the method of edge protection in the stitched image, the quantization parameter metadata of the corresponding viewing angle, and other metadata (Other metadata).
  • the metadata can be stored in the data header file, and the specific storage sequence can be as shown in FIG. 9 , or stored in other sequences.
  • the user side obtains 6DoF video data, which includes camera parameters, texture maps, quantized depth maps, and metadata, as well as user-side interaction behavior data.
  • the user side can use Depth Image-Based Rendering (DIBR, Depth Image-Based Rendering) for 6DoF rendering, so as to generate a virtual viewpoint image at a specific 6DoF position generated according to user interaction behavior, that is, according to The user instructs, and the virtual viewpoint of the 6DoF position corresponding to the instruction is determined.
  • DIBR Depth Image-Based Rendering
  • any video frame in the free-view video data is generally expressed as a stitched image formed by a texture map collected by multiple cameras and a corresponding depth map.
  • Figure 11 is a schematic diagram of the structure of the spliced image, wherein the upper part of the spliced image is the texture map area, which is divided into 8 texture map sub-regions, which store the pixel data of the 8 texture maps that are synchronized respectively. The pictures are taken from different angles, that is, from different perspectives.
  • the lower half of the spliced image is the depth map area, which is divided into 8 depth map sub-regions, and the corresponding quantized depth maps of the above 8 texture maps are respectively stored.
  • the texture map of view N and the quantized depth map of view N are in one-to-one correspondence with pixels, and the spliced image is compressed and transmitted to the terminal for decoding and DIBR, so that the image can be interpolated from the viewpoint of user interaction.
  • the quantized depth map may be down-sampled first to obtain a first depth map, and the synchronized texture maps of multiple viewing angles and the first depth map of corresponding viewing angles may be combined Stitching is performed according to a preset stitching method to obtain a stitched image.
  • One is to perform snapshot processing on the pixels in the quantized depth map to obtain the first depth map. For example, one pixel can be extracted from every other pixel in the quantized depth map to obtain the first depth map, and the resolution of the obtained first depth map is 50% of the original depth map .
  • the other is to perform filtering based on the corresponding texture map on the pixels in the quantized depth map to obtain the first depth map.
  • the stitched image can be a rectangle.
  • S121 Obtain a free-view video, where the free-view video includes spliced images at multiple frame times and parameter data corresponding to the spliced images, and the spliced images include synchronized texture maps of multiple viewing angles and quantized depth maps of corresponding viewing angles , the parameter data corresponding to the spliced image includes: quantization parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle.
  • the free-view video may be in the form of a video compressed file, or may be transmitted in the form of a video stream.
  • the parameter data of the spliced image may be stored in the header file of the free-view video data, and the specific form may refer to the introduction in the foregoing embodiments.
  • the quantization parameter data of the estimated depth map of the corresponding viewing angle may be stored in the form of an array.
  • the quantization parameter data can be expressed as:
  • Array Z [view 0 parameter value, view 2 parameter value...view 15 quantization parameter value].
  • S122 Acquire quantized depth values of pixels in the quantized depth map.
  • S123 Obtain and perform inverse quantization processing on the quantized depth values of the pixels in the quantized depth map based on the quantization parameter data of the estimated depth map of the view angle corresponding to the quantized depth map to obtain a corresponding estimated depth map.
  • step S123 in some embodiments of this specification, it is implemented in the following manner:
  • the corresponding inverse quantization formula is used to perform inverse quantization processing on the quantized depth values in the quantized depth map, to obtain
  • the estimated depth map for the corresponding view corresponds to the depth value of the corresponding pixel.
  • the following inverse quantization formula is used to perform inverse quantization processing on the quantized depth values in the quantized depth map to obtain corresponding pixel values in the estimated depth map:
  • M is the quantization bit of the corresponding pixel in the quantized depth map
  • range is the depth value of the corresponding pixel in the estimated depth map
  • Depth is the quantized depth value of the corresponding pixel in the quantized depth map
  • N is the estimated depth
  • depth_range_near_N is the minimum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N
  • depth_range_far_N is the maximum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N
  • maxdisp is the corresponding viewing angle N.
  • the maximum value of the quantized depth distance, and mindisp is the minimum value of the quantized depth distance corresponding to the viewing angle N.
  • the quantized depth map if the resolution of the quantized depth map is smaller than the resolution of the texture map of the corresponding viewing angle, for example, on the server side, the quantized depth map has been down-sampled, then the On the side of the terminal device, the estimated depth map of the corresponding viewing angle may be up-sampled to obtain a second depth map, and then the virtual viewpoint image is reconstructed by using the second depth map.
  • the first example of the way is to perform up-sampling processing on the estimated depth map that has been down-sampled by 1/4 to obtain a second depth map with the same resolution as the texture map. Based on different rows and columns, it specifically includes the following different processes Way:
  • the corresponding pixels in the corresponding texture map as intermediate pixels, and correspond to the intermediate pixels based on the luminance channel values of the intermediate pixels in the corresponding texture map
  • the relationship between the luminance channel value of the pixel above and the luminance channel value of the pixel below is determined.
  • the depth value corresponding to the right pixel is selected as the depth value of the corresponding pixel in the even-numbered row and odd-numbered column in the second depth map, that is:
  • the absolute value of the difference between the luminance channel value of the middle pixel in the corresponding texture image and the luminance channel value of the left pixel corresponding to the middle pixel is less than the difference between the luminance channel value of the middle pixel and the luminance channel value of the right pixel the quotient of the absolute value of the difference and the preset threshold, then select the depth value corresponding to the left pixel as the depth value of the corresponding pixel in the even row and odd column in the second depth map;
  • the absolute value of the difference between the luminance channel value of the middle pixel in the corresponding texture image and the luminance channel value of the upper pixel corresponding to the middle pixel is smaller than the difference between the luminance channel value of the middle pixel and the luminance channel value of the lower pixel the quotient of the absolute value and the preset threshold, then select the depth value corresponding to the upper pixel as the depth value of the pixel corresponding to the odd row in the second depth map;
  • pix_C is the luminance channel value (Y value) of the middle pixel in the texture map corresponding to the depth value in the second depth map
  • pix_L is the luminance channel value of the left pixel of pix_C
  • pix_R is the right pixel of pix_C
  • the luminance channel value of , pix_U is the luminance channel value of the pixel above pix_C
  • Dep_R is the depth value corresponding to the right pixel of the middle pixel in the texture map at the corresponding position of the depth value in the second depth map
  • Dep_L is the depth value corresponding to the right pixel of the middle pixel in the texture map at the position corresponding to the depth value in the second depth map
  • Dep_D is the depth value corresponding to the pixel below the middle pixel in the texture map at the position corresponding to the depth value in the second depth map
  • Dep_U is the depth value corresponding to the pixel above the middle
  • the depth value of the pixel in the estimated depth map as the pixel value of the corresponding row and column in the second depth map; for the pixel in the second depth map that does not have a corresponding relationship with the estimated depth map
  • the pixel is obtained by filtering based on the difference between the pixel value of the corresponding pixel in the corresponding texture map and the surrounding pixels of the corresponding pixel.
  • the calculation is performed based on the difference between the corresponding pixels in the corresponding texture map and the pixel values of the surrounding pixels of the corresponding pixel. filtered.
  • the corresponding pixels in the texture map and four diagonal position pixels around the corresponding pixel may be compared Compare the pixel values of the corresponding pixels, obtain the pixel point that is the closest to the pixel value of the corresponding pixel, and use the depth value in the estimated depth map corresponding to the pixel point with the most similar pixel value as the corresponding depth value in the texture map.
  • the pixel corresponds to the depth value of the pixel in the second depth map.
  • the corresponding pixels in the texture map may be compared with the surrounding pixels of the corresponding pixels, and according to the similarity of the pixel values, the depth values in the estimated depth map corresponding to the surrounding pixels may be weighted to obtain the A depth value of a corresponding pixel in the texture map in the second depth map.
  • the above shows some methods for upsampling the estimated depth map to obtain the second depth map. It can be understood that the above are only examples, and specific upsampling methods are not limited in the embodiments of this specification. Moreover, the method of up-sampling the estimated depth map in any video frame may correspond to the method of down-sampling the quantized depth map to obtain the first depth map, or there may be no corresponding relationship. In addition, the ratio of upsampling and downsampling can be the same or different.
  • step S124 is given.
  • only part of the texture map in the stitched image and the estimated depth map of the corresponding viewing angle may be selected as the target texture map and target Depth maps for reconstruction of virtual viewpoint images.
  • multiple target texture maps and target depth maps can be selected. . After that, the target texture map and the target depth map can be combined and rendered to obtain the image of the virtual viewpoint.
  • the location information of the virtual viewpoint may be determined according to user interaction behavior or preset. If it is determined based on the user interaction behavior, the virtual viewpoint position at the corresponding interaction moment can be determined by acquiring trajectory data corresponding to the user interaction operation.
  • the location information of the virtual viewpoint corresponding to the corresponding video frame may also be preset on the server (such as the server or the cloud), and the set virtual viewpoint is transmitted in the header file of the free viewpoint video. The position information of the viewpoint.
  • the spatial positional relationship between the estimated depth map of each texture map and the corresponding viewing angle and the virtual viewpoint position may be determined based on the virtual viewpoint position and the parameter data corresponding to the spliced image, in order to save data processing resources, According to the position information of the virtual viewpoint and the parameter data corresponding to the spliced image, the texture map of the synchronized multiple viewing angles and the estimated depth map of the corresponding viewing angle can be selected to satisfy the preset position with the virtual viewpoint position.
  • a relational and/or quantitative relational texture map and an estimated depth map are used as the target texture map and target depth map.
  • texture maps and estimated depth maps corresponding to 2 to N viewpoints closest to the virtual viewpoint position may be selected.
  • N is the number of texture maps in the spliced image, that is, the number of acquisition devices corresponding to the texture maps.
  • the quantitative relationship value may be fixed or variable.
  • post-processing methods there may be various post-processing methods.
  • at least one of the following methods may be used to perform post-processing on the target depth map:
  • holes may also be filled in the fused texture map to obtain a reconstructed image corresponding to the position of the virtual viewpoint at the moment of user interaction. Through hole filling, the quality of the reconstructed image can be improved.
  • the target depth map before performing image reconstruction, may be preprocessed (for example, upsampling), or the estimated depth map obtained after all inverse quantization processing in the stitched image may be preprocessed (for example, upsampling processing), and then perform image reconstruction based on virtual viewpoints.
  • S141 Obtain a free-view video, where the free-view video includes spliced images at multiple frame times and parameter data corresponding to the spliced images, and the spliced images include synchronized texture maps of multiple viewing angles and quantized depth maps of corresponding viewing angles , the parameter data corresponding to the spliced image includes: quantization parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle.
  • the spliced images at the multiple frame moments and the parameter data corresponding to the spliced images can be obtained.
  • the specific form of the free-view video may be a multi-angle free-view video, such as a 6DoF video, as exemplified in the foregoing embodiment.
  • each video frame can include a spliced image formed by the synchronized texture maps of multiple viewing angles and the first depth map of the corresponding viewing angle
  • a structure of a stitched image is shown in Figure 11. It can be understood that other structures of splicing images can be adopted. For example, different splicing methods can be adopted according to the ratio of the resolution of the texture map and the first depth map of the corresponding viewing angle. For example, one texture map can correspond to multiple images. a first depth map (eg, the first depth map is a depth map processed by 25% downsampling).
  • the free-viewpoint video data file may also include metadata describing the stitched image.
  • the parameter data of the spliced image can be obtained from the metadata, for example, the camera parameters of the spliced image, the splicing rule of the spliced image, the resolution information of the spliced image, etc. one or more types of information.
  • the parameter information of the spliced image may be transmitted in combination with the spliced image, for example, may be stored in a video file header.
  • the embodiments of this specification do not limit the specific format of the spliced image, nor do they limit the specific type and storage location of the parameter information of the spliced image, as long as the reconstructed image of the corresponding virtual viewpoint position can be obtained based on the virtual viewpoint video.
  • the free-view video may be in the form of a video compressed file, or may be transmitted in the form of a video stream.
  • the parameter data of the spliced image may be stored in the header file of the free-view video data, and the specific form may refer to the introduction in the foregoing embodiments.
  • the quantization parameter data of the estimated depth map of the corresponding viewing angle may be stored in the form of an array.
  • the quantization parameter data can be expressed as:
  • Array Z [view 0 parameter value, view 2 parameter value...view 15 quantization parameter value].
  • S142 Acquire quantized depth values of pixels in the quantized depth map.
  • S143 Obtain and perform inverse quantization processing on the quantized depth values of the pixels in the quantized depth map based on the quantization parameter data of the estimated depth map of the viewing angle corresponding to the quantized depth map to obtain a corresponding estimated depth map.
  • the virtual viewpoint position information based on user interaction can be expressed as coordinates
  • the virtual viewpoint position information can be generated in one or more preset user interaction manners. For example, coordinates can be entered for user operations, such as manual clicks or gesture paths, or virtual locations determined by voice input, or users can be provided with customized virtual viewpoints (for example, the user can enter a location or perspective in the scene, such as a basket off the court, from the sidelines, from the referee's perspective, from the coach's perspective, etc.).
  • a specific user interaction behavior mode is not limited in the embodiment of the present invention, as long as virtual viewpoint position information based on user interaction can be obtained.
  • the corresponding virtual viewpoint path information may be determined.
  • the corresponding virtual viewpoint path can be planned based on different forms of gesture interaction, so that the path information of the corresponding virtual viewpoint can be determined based on the user's specific gesture operation.
  • Left and right sliding corresponds to the left and right movement of the viewing angle;
  • the user's finger sliding up and down relative to the touch screen corresponds to the up and down movement of the viewpoint position;
  • the zoom operation of the finger corresponds to the zoom in and out of the viewpoint position.
  • virtual viewpoint paths based on gesture shape planning are only exemplary descriptions, and virtual viewpoint paths based on other gesture shapes may be predefined, or user-defined settings may be allowed to enhance user experience.
  • the texture map of the corresponding frame moment and the estimated depth map of the corresponding viewing angle can be selected as the target texture map and the target depth map, and the target texture map and the target depth map can be processed. Rendering is combined to obtain an image of the virtual viewpoint.
  • part of the texture map and the second depth map of the corresponding viewing angle in one frame or consecutive multi-frame spliced images can be selected according to the time sequence, as the target texture map and the target depth map, which are used to reconstruct the corresponding Image of virtual viewpoint.
  • AR Augmented Reality
  • certain objects in the image of the free-view video may be determined as virtual rendering target objects based on certain indication information, and the indication information may be generated based on user interaction, or may be based on certain preset trigger conditions or a third party. command is obtained.
  • the virtual rendering target object in the image of the virtual viewpoint may be acquired in response to the interactive control instruction generated by the special effect.
  • S152 Acquire a virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object.
  • the implanted AR special effects are presented in the form of virtual information images.
  • the virtual information image may be generated based on augmented reality special effect input data of the target object.
  • a virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object can be acquired.
  • the virtual information image corresponding to the virtual rendering target object may be generated in advance, or may be generated immediately in response to the special effect generation instruction.
  • a virtual information image matching the position of the virtual rendering target object can be obtained based on the position of the virtual rendering target object obtained by three-dimensional calibration in the reconstructed image, so that the obtained virtual information image can be made to match the position of the virtual rendering target object.
  • the position of the virtual rendering target object in the three-dimensional space is more matched, and the displayed virtual information image is more in line with the real state in the three-dimensional space, so the displayed composite image is more realistic and vivid, and the user's visual experience is enhanced.
  • a virtual information image corresponding to the target object may be generated according to a preset special effect generation method based on the augmented reality special effect input data of the virtual rendering target object.
  • the augmented reality special effect input data of the target object may be input into a preset three-dimensional model, and the output matches the virtual rendering target object based on the position of the virtual rendering target object obtained by the three-dimensional calibration in the image.
  • the augmented reality special effects input data of the virtual rendering target object can be input into a preset machine learning model, and the position of the virtual rendering target object in the image obtained based on the three-dimensional calibration can be output and the same as that of the virtual rendering target object.
  • the virtual information image and the image of the virtual viewpoint can be synthesized and displayed in various ways. Two specific implementation examples are given below:
  • Example 1 Perform fusion processing on the virtual information image and the corresponding image to obtain a fusion image, and display the fusion image;
  • Example 2 The virtual information image is superimposed on the corresponding image to obtain a superimposed composite image, and the superimposed composite image is displayed.
  • the obtained composite image may be displayed directly, or the obtained composite image may be inserted into a video stream to be played for playback and display.
  • the fused image can be inserted into the video stream to be played for playback display.
  • the free viewpoint video may include a special effect display identifier.
  • the superimposed position of the virtual information image in the image of the virtual viewpoint may be determined based on the special effect display identifier, and then the virtual information image may be placed in the image of the virtual viewpoint. The determined superposition position is displayed in superposition.
  • the interactive terminal T1 plays the video in real time.
  • the video frame P1 is displayed.
  • the video frame P2 displayed by the interactive terminal includes a plurality of special effect display marks such as the special effect display mark I1.
  • the video frame P2 is represented by an inverted triangle symbol pointing to the target object, such as shown in Figure 17. It can be understood that, the special effect display logo may also be displayed in other manners.
  • the terminal user touches and clicks on the special effect display identifier I1, then the system automatically acquires the virtual information image corresponding to the special effect display identifier I1, and superimposes the virtual information image on the video frame P3 and displays it in the video frame P3, as shown in FIG.
  • the position of the site where Q1 stands is the center, and a three-dimensional ring R1 is rendered.
  • the end user touches and clicks the special effect display identifier I2 in the video frame P3, and the system automatically acquires the virtual information image corresponding to the special effect display identifier I2, and displays the virtual information image in a superimposed manner.
  • the hit rate information display board M0 displays the number position, name and hit rate information of the target object, namely the athlete Q2.
  • the end user can continue to click other special effect display logos displayed in the video frame to watch the video showing the AR special effect corresponding to each special effect display logo.
  • the depth map processing apparatus 210 may include: an estimated depth map obtaining unit 211, a depth value obtaining unit 212, a quantization parameter data obtaining unit 213, and a quantization processing unit 214, specifically land:
  • the estimated depth map acquiring unit 211 is adapted to acquire an estimated depth map generated based on multiple frame-synchronized texture maps, and the multiple texture maps have different viewing angles;
  • a depth value obtaining unit 212 adapted to obtain the depth value of the pixel in the depth map
  • the quantization processing unit 214 is adapted to perform quantization processing on the depth values of the pixels in the estimated depth map based on the quantization parameter data corresponding to the perspective of the estimated depth map, to obtain the quantized depth values of the corresponding pixels in the quantized depth map.
  • the quantization parameter data obtaining unit 213 is adapted to obtain the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map; correspondingly, the quantization processing unit 214 , based on the minimum depth distance from the optical center and the maximum depth distance from the optical center of the corresponding viewing angle of the estimated depth map, the corresponding quantization formula can be used to quantize the depth value of the corresponding pixel in the estimated depth map. , to obtain the quantized depth value of the corresponding pixel in the quantized depth map.
  • the depth map processing apparatus 210 may further include: a downsampling processing unit 215 and a stitching unit 216, wherein:
  • the downsampling processing unit 215 is adapted to perform downsampling processing on the quantized depth map to obtain a first depth map
  • the splicing unit 216 is adapted to splicing the synchronized texture maps of multiple viewing angles and the first depth map of the corresponding viewing angles according to a preset splicing method to obtain a spliced image.
  • the free-viewpoint video reconstruction apparatus 220 may include: a first video acquisition unit 221 , a first quantized depth value acquisition unit 222 , and a first quantization parameter data acquisition unit 223 , the first depth map inverse quantization processing unit 224 and the first image reconstruction unit 225, specifically:
  • the first video acquisition unit 221 is adapted to acquire a free-view video, the free-view video includes a spliced image at multiple frame moments and parameter data corresponding to the spliced image, and the spliced image includes synchronized multiple viewing angles.
  • the first quantized depth value obtaining unit 222 is adapted to the quantized depth values of the pixels in the quantized depth map;
  • the first quantization parameter data obtaining unit 223 is adapted to obtain quantization parameter data corresponding to the quantized depth map view angle
  • the first depth map inverse quantization processing unit 224 is adapted to perform inverse quantization processing on the quantized depth map of the corresponding view angle based on the quantization parameter data corresponding to the view angle of the quantized depth map to obtain a corresponding estimated depth map;
  • the first image reconstruction unit 225 is adapted to reconstruct the obtained image based on the texture maps of multiple viewing angles and the estimated depth maps of the corresponding viewing angles, according to the obtained position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image. Image of virtual viewpoint.
  • the first quantization parameter data acquisition unit 223 is adapted to acquire the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map;
  • the first depth map inverse quantization processing unit 224 is adapted to adopt the corresponding inverse quantization formula based on the minimum depth distance from the optical center and the maximum depth distance from the optical center based on the corresponding viewing angle of the estimated depth map. Perform inverse quantization processing on the quantized depth values in the quantized depth map to obtain the depth values of the pixels corresponding to the estimated depth map corresponding to the viewing angle.
  • the specific inverse quantization formula used by the first depth map inverse quantization processing unit 224 may refer to the foregoing embodiments, and details are not repeated here.
  • the free-view video processing apparatus 230 may include: a second video obtaining unit 231 , a second quantized depth value obtaining unit 232 , The second depth map inverse quantization processing unit 233, the virtual viewpoint position determination unit 234, and the second image reconstruction unit 235, wherein:
  • the second video acquisition unit 231 is adapted to acquire a free-view video, the free-view video includes a spliced image at multiple frame moments and parameter data corresponding to the spliced image, and the spliced image includes synchronized multiple viewing angles.
  • the second quantized depth value obtaining unit 232 is adapted to obtain the quantized depth value of the pixel in the quantized depth map
  • the second depth map inverse quantization processing unit 233 is adapted to obtain and perform inverse quantization processing on the quantized depth values of the pixels in the quantized depth map based on the quantization parameter data of the estimated depth map of the viewing angle corresponding to the quantized depth map, Get the corresponding estimated depth map;
  • the virtual viewpoint location determining unit 234 is adapted to determine location information of the virtual viewpoint in response to user interaction;
  • the second image reconstruction unit 235 is adapted to reconstruct the virtual view point according to the position information of the virtual view point and the camera parameter data corresponding to the stitched image based on the synchronized texture maps of multiple viewing angles and the estimated depth map of the corresponding viewing angles. Viewpoint image.
  • the electronic device 240 may include a memory 241 and a processor 242. running computer instructions, wherein, when the processor 242 runs the computer instructions, the steps of the methods described in any of the foregoing embodiments may be executed. For specific steps, principles, etc., refer to the foregoing corresponding method embodiments, which will not be repeated here. .
  • the electronic device can be set on the service side as a server or cloud device, or can be set on the user side as a terminal device.
  • the server device 250 may include a processor 251 and communication component 252, where:
  • the processor 251 is adapted to perform the steps of the depth map processing method described in any of the foregoing embodiments to obtain a quantized depth map, and to synchronize the texture maps of multiple viewing angles and the first depth map of the corresponding viewing angle according to a preset value.
  • the splicing method is spliced to obtain a spliced image, and the spliced images of multiple frames and the corresponding parameter data are packaged to obtain a free-view video;
  • the communication component 252 is adapted to transmit the free-view video.
  • the terminal device 260 may include a communication component 261 , a processor 262 and a display component 263, of which:
  • the communication component 261 adapted to obtain free-view video
  • the processor 262 is adapted to execute the steps of the free-viewpoint video reconstruction method or the free-viewpoint video processing method described in any of the foregoing embodiments. For specific steps, refer to the foregoing embodiments of the free-viewpoint video reconstruction method and the free-viewpoint video processing method. description, which will not be repeated here.
  • the display component 263 is adapted to display the reconstructed image obtained by the processor 262 .
  • the terminal device may be a mobile terminal such as a mobile phone, a tablet computer, a personal computer, a television, or a combination of any terminal device and an external display device.
  • the embodiments of the present specification also provide a computer-readable storage medium on which computer instructions are stored, wherein, when the computer instructions are run, the execution of the free-viewpoint video reconstruction method or the free-viewpoint video processing method described in any of the foregoing embodiments is performed.
  • the steps reference may be made to the foregoing specific embodiments, which will not be repeated here.
  • the computer-readable storage medium may be various suitable readable storage mediums such as an optical disc, a mechanical hard disk, and a solid-state hard disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Graphics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Processing Or Creating Images (AREA)
  • Image Generation (AREA)

Abstract

Depth map and video processing and reconstruction methods and apparatuses, a device, and a storage medium, wherein the depth map processing method comprises: acquiring an estimated depth map generated on the basis of a plurality of frame-synchronized texture maps, the plurality of texture maps having different viewing angles (S61); acquiring depth values of pixels in the estimated depth map (S62); and acquiring quantization parameter data corresponding to the viewing angle of the estimated depth map, and quantizing the depth values of the pixels in the estimated depth map on the basis of the quantization parameter data, to obtain quantized depth values of corresponding pixels in a quantized depth map (S63). By using this method, the image quality of a free viewpoint video obtained by reconstruction can be improved.

Description

深度图及视频处理、重建方法、装置、设备及存储介质Depth map and video processing, reconstruction method, device, device and storage medium
本申请要求2020年07月03日递交的申请号为202010630749.X、发明名称为“深度图及视频处理、重建方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on July 3, 2020 with the application number 202010630749.X and the invention titled "Depth Map and Video Processing, Reconstruction Method, Apparatus, Equipment and Storage Medium", the entire content of which is approved by Reference is incorporated in this application.
技术领域technical field
本说明书实施例涉及视频处理技术领域,尤其涉及深度图及视频处理、重建方法、装置、设备及存储介质。The embodiments of the present specification relate to the technical field of video processing, and in particular, to a depth map and video processing, reconstruction method, apparatus, device, and storage medium.
背景技术Background technique
自由视点视频是一种能够提供高自由度观看体验的技术,用户可以在观看过程中通过交互操作,调整观看视角,从想观看的自由视点角度进行观看,从而可以大幅提升观看体验。Free viewpoint video is a technology that can provide a high degree of freedom viewing experience. Users can adjust the viewing angle through interactive operations during the viewing process, and watch from the free viewpoint they want to watch, which can greatly improve the viewing experience.
在大范围的场景中,比如体育比赛,通过基于深度图的图像绘制(Depth Image Based Rendering,DIBR)技术来实现高自由度的观看是一种具有很大潜力和可行性的方案。自由视点视频的表达一般为多相机采集到的纹理图与相应深度图进行拼接,形成拼接图像。In a wide range of scenes, such as sports games, it is a potential and feasible solution to achieve high-degree-of-freedom viewing through the Depth Image Based Rendering (DIBR) technology. The expression of free-view video is generally the splicing of texture maps collected by multiple cameras and corresponding depth maps to form a spliced image.
目前,在服务端基于纹理图估计出场景和物体的深度图之后,通过对深度值进行8比特二进制数据的量化,表达为一个深度图。将同步的多个视角的纹理图和所得到的对应视角的深度图进行拼接,得到拼接图像,进而按照帧时序将拼接图像及对应的参数数据进行压缩,可以得到自由视点视频,传输,使得终端设备基于获取得到的自由视点视频流进行自由视点图像的重建。At present, after the server estimates the depth map of the scene and the object based on the texture map, the depth value is expressed as a depth map by quantizing the 8-bit binary data. Splicing the synchronized texture maps of multiple viewing angles and the obtained depth maps of the corresponding viewing angles to obtain a spliced image, and then compressing the spliced image and corresponding parameter data according to the frame timing, a free viewpoint video can be obtained and transmitted, so that the terminal can The device reconstructs the free-view image based on the obtained free-view video stream.
发明人经研究发现,目前的深度图量化处理方法,重建得到的自由视点图像的质量受到目前的深度图量化处理方法的限制。The inventors have found through research that, with the current depth map quantization processing method, the quality of the reconstructed free-viewpoint image is limited by the current depth map quantization processing method.
发明内容SUMMARY OF THE INVENTION
有鉴于此,本说明书实施例提供一种深度图及视频处理、重建方法、装置、设备及存储介质,能够提高重建得到的自由视点视频的图像质量。In view of this, the embodiments of this specification provide a depth map and video processing and reconstruction method, apparatus, device, and storage medium, which can improve the image quality of the reconstructed free-viewpoint video.
本说明书实施例提供了一种深度图处理方法,包括:The embodiments of this specification provide a depth map processing method, including:
获取基于多个帧同步的纹理图生成的估计深度图,所述多个纹理图的视角不同;obtaining an estimated depth map generated based on multiple frame-synchronized texture maps, the multiple texture maps having different viewing angles;
获取所述估计深度图中像素的深度值;obtaining the depth value of the pixel in the estimated depth map;
获取并基于与所述估计深度图视角对应的量化参数数据,对所述估计深度图中像素的深度值进行量化处理,得到量化深度图中对应像素的量化深度值。Obtaining and performing quantization processing on the depth values of the pixels in the estimated depth map based on the quantization parameter data corresponding to the angle of view of the estimated depth map, to obtain the quantized depth values of the corresponding pixels in the quantized depth map.
可选地,所述获取并基于与所述估计深度图视角对应的量化参数数据,对所述估计深度图中的对应像素的深度值进行量化处理,得到量化深度图中对应像素的量化深度值,包括:Optionally, performing quantization processing on the depth value of the corresponding pixel in the estimated depth map based on the acquired and based on the quantization parameter data corresponding to the estimated depth map view angle, to obtain the quantized depth value of the corresponding pixel in the quantized depth map. ,include:
获取所述估计深度图对应视角的距离光心的深度距离最小值和距离光心的深度距离最大值;Obtain the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map;
基于所述估计深度图对应视角的距离光心的深度距离最小值和距离光心的深度距离最大值,采用对应的量化公式对所述估计深度图中的对应像素的深度值进行量化处理,得到量化深度图中对应像素的量化深度值。Based on the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map, a corresponding quantization formula is used to quantize the depth values of the corresponding pixels in the estimated depth map, and obtain The quantized depth value of the corresponding pixel in the quantized depth map.
可选地,所述基于所述估计深度图对应视角的距离光心的深度距离最小值和距离光心的深度距离最大值,采用对应的量化公式对所述估计深度图中的对应像素的深度值进行量化处理,得到量化深度图中对应像素的量化深度值,包括:Optionally, based on the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map, a corresponding quantization formula is used to determine the depth of the corresponding pixel in the estimated depth map. The value is quantized to obtain the quantized depth value of the corresponding pixel in the quantized depth map, including:
采用如下量化公式对估计深度图中的对应像素的深度值进行量化处理:The depth value of the corresponding pixel in the estimated depth map is quantized using the following quantization formula:
Figure PCTCN2021102335-appb-000001
Figure PCTCN2021102335-appb-000001
其中,M为所述估计深度图对应像素的量化比特位,range为所述估计深度图中对应像素的深度值,Depth为所述估计深度图中对应像素的量化深度值,N为所述估计深度图对应视角,depth_range_near_N为视角N对应的估计深度图中的距离光心的深度距离最小值,depth_range_far_N为视角N对应的估计深度图中距离光心的深度距离最大值。Among them, M is the quantization bit of the corresponding pixel in the estimated depth map, range is the depth value of the corresponding pixel in the estimated depth map, Depth is the quantized depth value of the corresponding pixel in the estimated depth map, and N is the estimated depth value. The depth map corresponds to the viewing angle, depth_range_near_N is the minimum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N, and depth_range_far_N is the maximum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N.
可选地,所述方法还包括:Optionally, the method further includes:
对所述量化深度图进行降采样处理,得到第一深度图;performing down-sampling processing on the quantized depth map to obtain a first depth map;
将所述帧同步的多个视角的纹理图和对应视角的第一深度图按照预设的拼接方式进行拼接,得到拼接图像。The texture maps of the frame-synchronized multiple viewing angles and the first depth map of the corresponding viewing angles are spliced according to a preset splicing method to obtain a spliced image.
本说明书实施例还提供了一种自由视点视频重建方法,所述方法包括:The embodiments of this specification also provide a free-view video reconstruction method, the method includes:
获取自由视点视频,所述自由视点视频包括多个帧时刻的拼接图像和所述拼接图像对应的参数数据,所述拼接图像包括同步的多个视角的纹理图和对应视角的量化深度图,所述拼接图像对应的参数数据包括:对应视角的估计深度图的量化参数数据和摄像机参数数据;Obtain a free-view video, where the free-view video includes spliced images at multiple frame moments and parameter data corresponding to the spliced images, and the spliced images include synchronized texture maps of multiple viewing angles and quantized depth maps of corresponding viewing angles, so The parameter data corresponding to the spliced image includes: quantization parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle;
获取所述量化深度图中像素的量化深度值;obtaining the quantized depth value of the pixel in the quantized depth map;
获取并基于所述量化深度图对应视角的估计深度图的量化参数数据,对所述量化深度图中像素的量化深度值进行反量化处理,得到对应的估计深度图;Obtaining and performing inverse quantization processing on the quantized depth values of the pixels in the quantized depth map based on the quantization parameter data of the estimated depth map of the corresponding viewing angle of the quantized depth map to obtain a corresponding estimated depth map;
基于同步的多个视角的纹理图以及对应视角的估计深度图,根据虚拟视点的位置信息以及所述拼接图像对应的摄像机参数数据,重建得到所述虚拟视点的图像。Based on the synchronized texture maps of multiple viewpoints and the estimated depth map of the corresponding viewpoints, the image of the virtual viewpoint is reconstructed according to the position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image.
可选地,所述获取并基于所述量化深度图对应视角的估计深度图的量化参数数据,对所述量化深度图中的量化深度值进行反量化处理,得到对应视角的估计深度图,包括:Optionally, the obtaining and based on the quantization parameter data of the estimated depth map of the view corresponding to the quantized depth map, perform inverse quantization processing on the quantized depth value in the quantized depth map to obtain the estimated depth map of the corresponding view, including: :
获取所述估计深度图对应视角的距离光心的深度距离最小值和距离光心的深度距离最大值;Obtain the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map;
基于所述估计深度图对应视角的距离光心的深度距离最小值和距离光心的深度距离最大值,采用对应的反量化公式对所述量化深度图中的量化深度值进行反量化处理,得 到对应视角的估计深度图对应像素的深度值。Based on the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map, the corresponding inverse quantization formula is used to perform inverse quantization processing on the quantized depth values in the quantized depth map, to obtain The estimated depth map for the corresponding view corresponds to the depth value of the corresponding pixel.
可选地,所述基于所述估计深度图对应视角的距离光心的深度距离最小值和距离光心的深度距离最大值,采用对应的反量化公式对所述量化深度图中的量化深度值进行反量化处理,得到对应视角的估计深度图对应像素的深度值,包括:Optionally, based on the minimum depth distance from the optical center and the maximum depth distance from the optical center of the corresponding viewing angle of the estimated depth map, the corresponding inverse quantization formula is used to quantify the depth value in the quantized depth map. Perform inverse quantization processing to obtain the depth value of the corresponding pixel of the estimated depth map of the corresponding viewing angle, including:
采用如下反量化公式对所述量化深度图中的量化深度值进行反量化处理,得到估计深度图中的对应像素值:The following inverse quantization formula is used to inverse quantize the quantized depth values in the quantized depth map to obtain corresponding pixel values in the estimated depth map:
Figure PCTCN2021102335-appb-000002
Figure PCTCN2021102335-appb-000002
Figure PCTCN2021102335-appb-000003
Figure PCTCN2021102335-appb-000003
Figure PCTCN2021102335-appb-000004
Figure PCTCN2021102335-appb-000004
其中,M为所述量化深度图对应像素的量化比特位,range为所述估计深度图中对应像素的深度值,Depth为所述量化深度图中对应像素的量化深度值,N为所述估计深度图对应视角,depth_range_near_N为视角N对应的估计深度图中的距离光心的深度距离最小值,depth_range_far_N为视角N对应的估计深度图中距离光心的深度距离最大值,maxdisp为视角N对应的量化深度距离最大值,mindisp为视角N对应的量化深度距离最小值。Wherein, M is the quantization bit of the corresponding pixel in the quantized depth map, range is the depth value of the corresponding pixel in the estimated depth map, Depth is the quantized depth value of the corresponding pixel in the quantized depth map, and N is the estimated depth The depth map corresponds to the viewing angle, depth_range_near_N is the minimum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N, depth_range_far_N is the maximum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N, and maxdisp is the corresponding viewing angle N. The maximum value of the quantized depth distance, and mindisp is the minimum value of the quantized depth distance corresponding to the viewing angle N.
可选地,所述量化深度图的分辨率小于对应视角的纹理图的分辨率;在重建得到所述虚拟视点的图像之前,还包括:Optionally, the resolution of the quantized depth map is smaller than the resolution of the texture map of the corresponding viewing angle; before reconstructing to obtain the image of the virtual viewpoint, the method further includes:
对所述对应视角的估计深度图进行升采样,得到用于重建所述虚拟视点图像的第二深度图。Upsampling the estimated depth map of the corresponding viewpoint to obtain a second depth map for reconstructing the virtual viewpoint image.
可选地,所述对所述对应视角的估计深度图进行升采样,得到用于重建所述虚拟视点图像的第二深度图,包括:Optionally, performing up-sampling on the estimated depth map of the corresponding viewing angle to obtain a second depth map for reconstructing the virtual viewpoint image, comprising:
获取所述估计深度图中像素的深度值,作为所述第二深度图中对应的偶数行及偶数列的像素值;acquiring the depth value of the pixel in the estimated depth map as the pixel value of the even-numbered row and the even-numbered column corresponding to the second depth map;
对于所述第二深度图中偶数行奇数列的像素的深度值,确定对应纹理图中对应像素作为中间像素,基于对应纹理图中所述中间像素的亮度通道值与所述中间像素对应的左侧像素的亮度通道值和右侧像素的亮度通道值之间的关系确定;For the depth values of pixels in even rows and odd columns in the second depth map, determine the corresponding pixel in the corresponding texture map as an intermediate pixel, and based on the luminance channel value of the intermediate pixel in the corresponding texture map and the left pixel corresponding to the intermediate pixel The relationship between the luminance channel value of the side pixel and the luminance channel value of the right pixel is determined;
对于所述第二深度图中奇数行像素的深度值,确定对应纹理图中的对应像素作为中间像素,基于对应的纹理图中所述中间像素的亮度通道值与所述中间像素对应的上方像素的亮度通道值和下方像素的亮度通道值之间的关系确定。For the depth values of pixels in odd rows in the second depth map, determine the corresponding pixel in the corresponding texture map as an intermediate pixel, based on the luminance channel value of the intermediate pixel in the corresponding texture map and the upper pixel corresponding to the intermediate pixel The relationship between the luminance channel value of and the luminance channel value of the underlying pixel is determined.
可选地,所述基于多个视角的纹理图以及对应视角的估计深度图,根据获取得到的虚拟视点的位置信息以及所述拼接图像对应的摄像机参数数据,重建得到所述虚拟视点的图像,包括:Optionally, the texture map based on multiple viewing angles and the estimated depth map of the corresponding viewing angle are reconstructed to obtain the image of the virtual viewpoint according to the obtained position information of the virtual viewpoint and the camera parameter data corresponding to the stitched image, include:
根据所述虚拟视点的位置信息,以及所述拼接图像对应的摄像机参数数据,在所述 同步的多个视角的纹理图和对应视角的估计深度图中,选择多个目标纹理图和目标深度图;According to the position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image, select multiple target texture maps and target depth maps from the synchronized texture maps of multiple viewing angles and the estimated depth maps of the corresponding viewing angles ;
对所述目标纹理图和目标深度图进行组合渲染,得到所述虚拟视点的图像。The target texture map and the target depth map are combined and rendered to obtain the image of the virtual viewpoint.
本说明书实施例还提供了一种自由视点视频处理方法,所述方法包括:The embodiment of this specification also provides a free-viewpoint video processing method, the method includes:
获取自由视点视频,所述自由视点视频包括多个帧时刻的拼接图像和所述拼接图像对应的参数数据,所述拼接图像包括同步的多个视角的纹理图和对应视角的量化深度图,所述拼接图像对应的参数数据包括:对应视角的估计深度图的量化参数数据和摄像机参数数据;Obtain a free-view video, where the free-view video includes spliced images at multiple frame moments and parameter data corresponding to the spliced images, and the spliced images include synchronized texture maps of multiple viewing angles and quantized depth maps of corresponding viewing angles, so The parameter data corresponding to the spliced image includes: quantization parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle;
获取所述量化深度图中像素的量化深度值;obtaining the quantized depth value of the pixel in the quantized depth map;
获取并基于所述量化深度图对应视角的估计深度图的量化参数数据,对所述量化深度图中像素的量化深度值进行反量化处理,得到对应的估计深度图;Obtaining and performing inverse quantization processing on the quantized depth values of the pixels in the quantized depth map based on the quantization parameter data of the estimated depth map of the corresponding viewing angle of the quantized depth map to obtain a corresponding estimated depth map;
响应于用户交互行为,确定虚拟视点的位置信息;In response to the user interaction behavior, determine the location information of the virtual viewpoint;
基于同步的多个视角的纹理图以及对应视角的估计深度图,根据虚拟视点的位置信息以及所述拼接图像对应的摄像机参数数据,重建得到所述虚拟视点的图像。Based on the synchronized texture maps of multiple viewpoints and the estimated depth map of the corresponding viewpoints, the image of the virtual viewpoint is reconstructed according to the position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image.
可选地,所述响应于用户交互行为,确定虚拟视点的位置信息,包括:响应于用户的手势交互操作,确定对应的虚拟视点路径信息;Optionally, the determining the position information of the virtual viewpoint in response to the user interaction behavior includes: determining the corresponding virtual viewpoint path information in response to the user's gesture interaction operation;
所述基于同步的多个视角的纹理图以及对应视角的估计深度图,根据获取得到的虚拟视点的位置信息以及所述拼接图像对应的摄像机参数数据,重建得到所述虚拟视点的图像,包括:The texture map based on the synchronization of multiple viewing angles and the estimated depth map of the corresponding viewing angle, according to the obtained position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image, reconstruct the image of the virtual viewpoint, including:
根据所述虚拟视点路径信息,选取相应帧时刻的拼接图像中的纹理图和对应视角的估计深度图,作为目标纹理图和目标深度图;According to the virtual viewpoint path information, the texture map in the spliced image at the corresponding frame moment and the estimated depth map of the corresponding viewing angle are selected as the target texture map and the target depth map;
对所述目标纹理图和目标深度图进行组合渲染,得到所述虚拟视点的图像。The target texture map and the target depth map are combined and rendered to obtain the image of the virtual viewpoint.
可选地,所述方法还包括:Optionally, the method further includes:
获取所述虚拟视点的图像中的虚拟渲染目标对象;acquiring the virtual rendering target object in the image of the virtual viewpoint;
获取基于所述虚拟渲染目标对象的增强现实特效输入数据所生成的虚拟信息图像;acquiring a virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object;
将所述虚拟信息图像与所述虚拟视点的图像进行合成处理并展示。The virtual information image and the image of the virtual viewpoint are synthesized and displayed.
可选地,所述获取基于所述虚拟渲染目标对象的增强现实特效输入数据所生成的虚拟信息图像,包括:Optionally, the acquiring the virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object includes:
根据三维标定得到的所述虚拟渲染目标对象在所述虚拟视点的图像中的位置,得到与所述虚拟渲染目标对象位置匹配的虚拟信息图像。According to the position of the virtual rendering target object in the image of the virtual viewpoint obtained by the three-dimensional calibration, a virtual information image matching the position of the virtual rendering target object is obtained.
可选地,所述获取所述虚拟视点的图像中的虚拟渲染目标对象,包括:Optionally, obtaining the virtual rendering target object in the image of the virtual viewpoint includes:
响应于特效生成交互控制指令,获取所述虚拟视点的图像中的虚拟渲染目标对象。In response to the special effect generating the interactive control instruction, the virtual rendering target object in the image of the virtual viewpoint is acquired.
本说明书实施例提供了一种深度图处理装置,所述装置包括:An embodiment of this specification provides a depth map processing device, the device comprising:
估计深度图获取单元,适于获取基于多个帧同步的纹理图生成的估计深度图,所述 多个纹理图的视角不同;An estimated depth map obtaining unit, adapted to obtain an estimated depth map generated based on a plurality of frame-synchronized texture maps, the perspectives of the multiple texture maps are different;
深度值获取单元,适于获取所述深度图中像素的深度值;a depth value obtaining unit, adapted to obtain the depth value of the pixel in the depth map;
量化参数数据获取单元,适于获取与所述估计深度图视角对应的量化参数数据;a quantization parameter data acquisition unit, adapted to acquire quantization parameter data corresponding to the estimated depth map viewing angle;
量化处理单元,适于基于与所述估计深度图视角对应的量化参数数据,对所述估计深度图中像素的深度值进行量化处理,得到量化深度图中对应像素的量化深度值。The quantization processing unit is adapted to perform quantization processing on the depth values of the pixels in the estimated depth map based on the quantization parameter data corresponding to the angle of view of the estimated depth map, so as to obtain the quantized depth values of the corresponding pixels in the quantized depth map.
本说明书实施例还提供了一种自由视点视频重建装置,所述装置包括:Embodiments of this specification also provide a free-view video reconstruction device, the device comprising:
第一视频获取单元,适于获取自由视点视频,所述自由视点视频包括多个帧时刻的拼接图像和所述拼接图像对应的参数数据,所述拼接图像包括同步的多个视角的纹理图和对应视角的量化深度图,所述拼接图像对应的参数数据包括:对应视角的估计深度图的量化参数数据和摄像机参数数据;The first video acquisition unit is adapted to acquire a free-view video, the free-view video includes spliced images at multiple frame moments and parameter data corresponding to the spliced images, and the spliced images include synchronized texture maps of multiple viewing angles and a quantized depth map corresponding to a viewing angle, and the parameter data corresponding to the spliced image includes: quantized parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle;
第一量化深度值获取单元,适于所述量化深度图中像素的量化深度值;a first quantized depth value acquisition unit, adapted to the quantized depth values of the pixels in the quantized depth map;
第一量化参数数据获取单元,适于获取与所述量化深度图视角对应的量化参数数据;a first quantization parameter data acquisition unit, adapted to acquire quantization parameter data corresponding to the perspective of the quantized depth map;
第一深度图反量化处理单元,适于基于与所述量化深度图视角对应的量化参数数据,对对应视角的量化深度图进行反量化处理,得到对应的估计深度图;a first depth map inverse quantization processing unit, adapted to perform inverse quantization processing on the quantized depth map of the corresponding view angle based on the quantization parameter data corresponding to the view angle of the quantized depth map to obtain a corresponding estimated depth map;
第一图像重建单元,适于基于多个视角的纹理图以及对应视角的估计深度图,根据获取得到的虚拟视点的位置信息以及所述拼接图像对应的摄像机参数数据,重建得到所述虚拟视点的图像。The first image reconstruction unit is adapted to reconstruct the obtained virtual viewpoint based on the texture maps of multiple viewpoints and the estimated depth map of the corresponding viewpoints, according to the obtained position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image. image.
本说明书实施例还提供了一种自由视点视频处理装置,所述装置包括:The embodiments of this specification also provide a free-viewpoint video processing device, the device comprising:
第二视频获取单元,适于获取自由视点视频,所述自由视点视频包括多个帧时刻的拼接图像和所述拼接图像对应的参数数据,所述拼接图像包括同步的多个视角的纹理图和对应视角的量化深度图,所述拼接图像对应的参数数据包括:对应视角的估计深度图的量化参数数据和摄像机参数数据;The second video acquisition unit is adapted to acquire a free-view video, where the free-view video includes spliced images at multiple frame moments and parameter data corresponding to the spliced images, and the spliced images include synchronized texture maps of multiple viewing angles and a quantized depth map corresponding to a viewing angle, and the parameter data corresponding to the spliced image includes: quantized parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle;
第二量化深度值获取单元,适于获取所述量化深度图中像素的量化深度值;a second quantized depth value obtaining unit, adapted to obtain the quantized depth value of the pixel in the quantized depth map;
第二深度图反量化处理单元,适于获取并基于所述量化深度图对应视角的估计深度图的量化参数数据,对所述量化深度图中像素的量化深度值进行反量化处理,得到对应的估计深度图;The second depth map inverse quantization processing unit is adapted to obtain and perform inverse quantization processing on the quantized depth values of the pixels in the quantized depth map based on the quantization parameter data of the estimated depth map of the view angle corresponding to the quantized depth map, to obtain the corresponding estimated depth map;
虚拟视点位置确定单元,适于响应于用户交互行为,确定虚拟视点的位置信息;a virtual viewpoint position determination unit, adapted to determine the location information of the virtual viewpoint in response to user interaction;
第二图像重建单元,适于基于同步的多个视角的纹理图以及对应视角的估计深度图,根据虚拟视点的位置信息以及所述拼接图像对应的摄像机参数数据,重建得到所述虚拟视点的图像。The second image reconstruction unit is adapted to reconstruct the image of the virtual viewpoint based on the synchronized texture maps of multiple viewpoints and the estimated depth map of the corresponding viewpoints, according to the position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image. .
本说明书实施例还提供了一种电子设备,包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的计算机指令,其中,所述处理器运行所述计算机指令时执行前述任一实施例所述方法的步骤。Embodiments of the present specification further provide an electronic device, including a memory and a processor, wherein the memory stores computer instructions that can be executed on the processor, wherein the processor executes the aforementioned computer instructions when the processor executes the computer instructions. The steps of the method of any one of the embodiments.
本说明书实施例还提供了一种服务端设备,包括处理器和通信组件,其中:The embodiments of this specification also provide a server device, including a processor and a communication component, wherein:
所述处理器,适于执行前述任一实施例所述的深度图处理方法的步骤,得到量化深度图,将帧同步的多个视角的纹理图和对应视角的第一深度图按照预设的拼接方式进行拼接,得到拼接图像,以及将多个帧的拼接图像及对应的参数数据进行封装处理,得到自由视点视频;The processor is adapted to perform the steps of the depth map processing method described in any of the foregoing embodiments, obtain a quantized depth map, and store the texture maps of multiple viewing angles of frame synchronization and the first depth map of the corresponding viewing angle according to preset values. The splicing method is spliced to obtain a spliced image, and the spliced images of multiple frames and the corresponding parameter data are packaged to obtain a free-view video;
所述通信组件,适于传输所述自由视点视频。The communication component is adapted to transmit the free-view video.
本说明书实施例还提供了一种终端设备,包括通信组件、处理器和显示组件,其中:The embodiments of this specification also provide a terminal device, including a communication component, a processor and a display component, wherein:
所述通信组件,适于获取自由视点视频;the communication component, adapted to obtain free-view video;
所述处理器,适于执行前述任一实施例所述的自由视点视频重建方法或自由视点视频处理方法的步骤;The processor is adapted to perform the steps of the free-view video reconstruction method or the free-view video processing method described in any of the foregoing embodiments;
所述显示组件,适于显示所述处理器得到的重建图像。The display component is adapted to display the reconstructed image obtained by the processor.
本说明书实施例还提供了一种计算机可读存储介质,其上存储有计算机指令,其中,所述计算机指令运行时执行前述任一实施例所述方法的步骤。The embodiments of the present specification further provide a computer-readable storage medium on which computer instructions are stored, wherein, when the computer instructions are executed, the steps of the methods described in any of the foregoing embodiments are executed.
与现有技术相比,本说明书实施例的技术方案具有以下有益效果:Compared with the prior art, the technical solutions of the embodiments of this specification have the following beneficial effects:
采用本说明书实施例中的深度图处理方法,在深度图量化过程中,采用与对应视角的实际情况匹配的量化参数,对所述估计深度图中像素的深度值进行量化,从而对于每个视角的深度图,都可以充分利用深度量化比特位的表达空间,进而可以提高重建得到的自由视点视频的图像质量。Using the depth map processing method in the embodiment of this specification, in the depth map quantization process, the depth values of the pixels in the estimated depth map are quantized by using quantization parameters that match the actual situation of the corresponding view angle, so that for each view angle The depth map can make full use of the expression space of depth quantization bits, which can improve the image quality of the reconstructed free-view video.
进一步地,通过对量化深度图进行降采样处理,得到第一深度图,并将所述第一深度图和对应视角的纹理图按照预设的拼接方式进行拼接,可以减小所述拼接图像整体的数据量,从而可以节约所述拼接图像的存储资源和传输资源。Further, by performing down-sampling processing on the quantized depth map, a first depth map is obtained, and the first depth map and the texture map of the corresponding viewing angle are spliced according to a preset splicing method, which can reduce the overall size of the spliced image. Therefore, the storage resources and transmission resources of the spliced image can be saved.
进一步地,一方面,在整体拼接图像解码分辨率有限的情况下,通过设置量化深度图的分辨率小于对应视角的纹理图的分辨率,可以传输分辨率更高的纹理图,进而通过对所述对应视角的估计深度图进行升采样,得到第二深度图,并基于所述拼接图像中同步的多个视角的纹理图以及对应视角的第二深度图进行自由视点视频重建,可以得到清晰度更高的自由视点图像,提高用户体验。Further, on the one hand, in the case where the decoding resolution of the overall stitched image is limited, by setting the resolution of the quantized depth map to be smaller than the resolution of the texture map of the corresponding viewing angle, a texture map with a higher resolution can be transmitted, and then the texture map with a higher resolution can be transmitted. Upsampling the estimated depth map of the corresponding viewing angle to obtain a second depth map, and performing free-view video reconstruction based on the texture maps of multiple viewing angles synchronized in the spliced image and the second depth map of the corresponding viewing angle, so that the clarity can be obtained. Higher free-view images for improved user experience.
附图说明Description of drawings
图1是本说明书实施例中一种自由视点视频展示的具体应用系统示意图;1 is a schematic diagram of a specific application system of a free-view video display in an embodiment of this specification;
图2是本说明书实施例中一种终端设备交互界面示意图;2 is a schematic diagram of an interactive interface of a terminal device in an embodiment of this specification;
图3是本说明书实施例中一种采集设备设置方式的示意图;FIG. 3 is a schematic diagram of a setting mode of a collection device in an embodiment of the present specification;
图4是本说明书实施例中另一种终端设备交互界面示意图;4 is a schematic diagram of another terminal device interaction interface in the embodiment of this specification;
图5是本说明书实施例中一种视场应用场景示意图;5 is a schematic diagram of a field of view application scenario in the embodiment of the present specification;
图6是本说明书实施例中一种深度图处理方法的流程图;6 is a flowchart of a depth map processing method in an embodiment of the present specification;
图7是本说明书实施例中一种自由视点视频数据生成过程的示意图;7 is a schematic diagram of a free-viewpoint video data generation process in an embodiment of the present specification;
图8是本说明书实施例中一种6DoF视频数据的生成及处理的示意图;8 is a schematic diagram of the generation and processing of a kind of 6DoF video data in the embodiment of this specification;
图9是本说明书实施例中一种数据头文件的结构示意图;9 is a schematic structural diagram of a data header file in an embodiment of the present specification;
图10是本说明书实施例中一种用户侧对6DoF视频数据处理的示意图;10 is a schematic diagram of a user side processing 6DoF video data in an embodiment of this specification;
图11是本说明书实施例中一种拼接图像的结构示意图;11 is a schematic structural diagram of a stitched image in the embodiment of this specification;
图12是本说明书实施例中一种自由视点视频重建方法的流程图;12 is a flowchart of a free-viewpoint video reconstruction method in an embodiment of the present specification;
图13是本说明书实施例中一种组合渲染方法的流程图;13 is a flowchart of a combined rendering method in an embodiment of the present specification;
图14是本说明书实施例中一种自由视点视频处理方法的流程图;14 is a flowchart of a free-viewpoint video processing method in an embodiment of the present specification;
图15是本说明书实施例中另一种自由视点视频处理方法的流程图;15 is a flowchart of another free-viewpoint video processing method in an embodiment of the present specification;
图16是本说明书实施例中一种交互终端的显示界面示意图;16 is a schematic diagram of a display interface of an interactive terminal in an embodiment of this specification;
图17是本说明书实施例中一种交互终端的显示界面示意图;17 is a schematic diagram of a display interface of an interactive terminal in an embodiment of this specification;
图18是本说明书实施例中一种交互终端的显示界面示意图;18 is a schematic diagram of a display interface of an interactive terminal in an embodiment of this specification;
图19是本说明书实施例中一种交互终端的显示界面示意图;19 is a schematic diagram of a display interface of an interactive terminal in an embodiment of this specification;
图20是本说明书实施例中一种交互终端的显示界面示意图;20 is a schematic diagram of a display interface of an interactive terminal in an embodiment of this specification;
图21是本说明书实施例中一种深度图处理装置的结构示意图;21 is a schematic structural diagram of a depth map processing apparatus in an embodiment of the present specification;
图22是本说明书实施例中一种自由视点视频重建装置的结构示意图;22 is a schematic structural diagram of a device for free-view video reconstruction in an embodiment of the present specification;
图23是本说明书实施例中一种自由视点视频处理装置的结构示意图;23 is a schematic structural diagram of a free-viewpoint video processing apparatus in an embodiment of the present specification;
图24是本说明书实施例中一种电子设备的结构示意图;24 is a schematic structural diagram of an electronic device in an embodiment of this specification;
图25是本说明书实施例中一种服务端设备的结构示意图;25 is a schematic structural diagram of a server device in an embodiment of this specification;
图26是本说明书实施例中一种终端设备的结构示意图。FIG. 26 is a schematic structural diagram of a terminal device in an embodiment of this specification.
具体实施方式detailed description
为使本领域技术人员更好地理解和实施本说明书中的实施例,以下首先结合附图及具体应用场景对自由视点视频的实现方式进行示例性介绍。In order to enable those skilled in the art to better understand and implement the embodiments in this specification, the following first exemplarily introduces the implementation of free-view video with reference to the accompanying drawings and specific application scenarios.
参考图1,本发明实施例中一种自由视点视频展示的具体应用系统,可以包括多个采集设备的采集系统11、服务器12和显示设备13,其中采集系统11,可以对待观看区域进行图像采集;采集系统11或者服务器12,可以对获取到的同步的多个纹理图进行处理,生成能够支持显示设备13进行虚拟视点切换的多角度自由视角数据。显示设备13可以展示基于多角度自由视角数据生成的重建图像,重建图像对应于虚拟视点,根据用户指示可以展示对应于不同虚拟视点的重建图像,切换观看的位置和观看角度。Referring to FIG. 1 , a specific application system for free-view video display in an embodiment of the present invention may include a collection system 11 of multiple collection devices, a server 12 , and a display device 13 , wherein the collection system 11 can collect images of the area to be viewed. The acquisition system 11 or the server 12 can process the acquired synchronized multiple texture maps to generate multi-angle free viewing angle data that can support the display device 13 to perform virtual viewpoint switching. The display device 13 can display reconstructed images generated based on multi-angle free viewing angle data, the reconstructed images correspond to virtual viewpoints, and can display reconstructed images corresponding to different virtual viewpoints according to user instructions, and switch the viewing position and viewing angle.
在具体实现中,进行图像重建,得到重建图像的过程可以由显示设备13实施,也可以由位于内容分发网络(Content Delivery Network,CDN)的设备以边缘计算的方式实施。可以理解的是,图1仅为示例,并非对采集系统、服务器、终端设备以及具体实现方式的限制。In a specific implementation, the process of performing image reconstruction to obtain a reconstructed image may be implemented by the display device 13, or may be implemented by a device located in a content delivery network (Content Delivery Network, CDN) by means of edge computing. It can be understood that FIG. 1 is only an example, and is not a limitation on the collection system, the server, the terminal device and the specific implementation manner.
继续参考图1,用户可以通过显示设备13对待观看区域进行观看,在本实施例中,待观看区域为篮球场。如前所述,观看的位置和观看角度是可以切换的。Continuing to refer to FIG. 1 , the user can view the area to be viewed through the display device 13 , and in this embodiment, the area to be viewed is a basketball court. As mentioned earlier, the viewing position and viewing angle can be switched.
举例而言,用户可以在屏幕滑动,以切换虚拟视点。在本发明一实施例中,结合参 考图2,用户手指沿D 22方向滑动屏幕时,可以切换进行观看的虚拟视点。继续参考图3,滑动前的虚拟视点的位置可以是VP 1,滑动屏幕切换虚拟视点后,虚拟视点的位置可以是VP 2。结合参考图4,在滑动屏幕后,屏幕展示的重建图像可以如图4所示。重建图像,可以是基于由实际采集情境中的多个采集设备采集到的图像生成的多角度自由视角数据进行图像重建得到的。 For example, users can swipe across the screen to switch virtual viewpoints. In an embodiment of the present invention, referring to FIG. 2 , when the user's finger slides the screen in the direction D 22 , the virtual viewpoint for viewing can be switched. Continuing to refer to FIG. 3 , the position of the virtual viewpoint before sliding may be VP 1 , and after sliding the screen to switch the virtual viewpoint, the position of the virtual viewpoint may be VP 2 . With reference to FIG. 4 , after sliding the screen, the reconstructed image displayed on the screen may be as shown in FIG. 4 . The reconstructed image may be obtained by performing image reconstruction based on multi-angle free viewing angle data generated from images collected by multiple collection devices in an actual collection situation.
可以理解的是,切换前进行观看的图像,也可以是重建图像。重建图像可以是视频流中的帧图像。另根据用户指示切换虚拟视点的方式可以是多样的,在此不做限制。It can be understood that the image viewed before switching may also be a reconstructed image. The reconstructed images may be frame images in the video stream. In addition, the manner of switching the virtual viewpoint according to the user's instruction may be various, which is not limited here.
在具体实施中,虚拟视点可以用6自由度(Degree of Freedom,DoF)的坐标表示,其中,虚拟视点的空间位置可以表示为(x,y,z),视角可以表示为三个旋转方向
Figure PCTCN2021102335-appb-000005
In a specific implementation, the virtual viewpoint can be represented by coordinates of 6 degrees of freedom (DoF), wherein the spatial position of the virtual viewpoint can be represented as (x, y, z), and the viewing angle can be represented as three rotation directions
Figure PCTCN2021102335-appb-000005
Figure PCTCN2021102335-appb-000006
Figure PCTCN2021102335-appb-000006
虚拟视点是一个三维概念,生成重建图像需要三维信息。在一种具体实现方式中,多角度自由视角数据中可以包括深度图数据,用于提供平面图像外的第三维信息。相比于其它实现方式,例如通过点云数据提供三维信息,深度图数据的数据量较小。Virtual viewpoint is a three-dimensional concept, and three-dimensional information is required to generate reconstructed images. In a specific implementation manner, the multi-angle free viewing angle data may include depth map data, which is used to provide third-dimensional information outside the plane image. Compared with other implementations, such as providing three-dimensional information through point cloud data, the data volume of the depth map data is smaller.
在本发明实施例中,虚拟视点的切换可以在一定范围内进行,该范围即为多角度自由视角范围。也即,在多角度自由视角范围内,可以任意切换虚拟视点位置以及视角。In the embodiment of the present invention, the switching of the virtual viewpoints may be performed within a certain range, which is a multi-angle free viewing angle range. That is, within the multi-angle free viewing angle range, the virtual viewpoint position and the viewing angle can be switched arbitrarily.
多角度自由视角范围与采集设备的布置相关,采集设备的拍摄覆盖范围越广,则多角度自由视角范围越大。终端设备展示的画面质量,与采集设备的数量相关,通常,设置的采集设备的数量越多,展示的画面中空洞区域越少。The multi-angle free viewing angle range is related to the arrangement of the acquisition device. The wider the shooting coverage of the acquisition device, the larger the multi-angle free viewing angle range. The quality of the picture displayed by the terminal device is related to the number of collection devices. Generally, the more collection devices are set, the fewer empty areas in the displayed picture.
此外,多角度自由视角的范围与采集设备的空间分布相关。可以基于采集设备的空间分布关系设置多角度自由视角的范围以及在终端侧与显示设备的交互方式。In addition, the range of multi-angle free viewing angles is related to the spatial distribution of the acquisition devices. The range of multi-angle free viewing angles and the interaction mode with the display device on the terminal side can be set based on the spatial distribution relationship of the collection devices.
如图1及图3所示,在高于篮筐的高度H LK,沿一定路径设置若干采集设备,例如,可以沿弧线设置6个采集设备,也即采集设备CJ 1至CJ 6。可以理解的是,采集设备的设置位置、数量和支撑方式可以是多样的,在此不做限制。 As shown in FIG. 1 and FIG. 3 , at a height HLK higher than the basket, several collection devices are arranged along a certain path. For example, 6 collection devices may be arranged along an arc, that is, collection devices CJ 1 to CJ 6 . It can be understood that the location, quantity and support manner of the collection devices can be various, which are not limited here.
可以理解的是,以上具体应用场景示例用以更好地理解本说明书实施例,然而,本说明书实施例并不限于以上具体应用场景。发明人经研究发现,目前的深度图处理方式仍有一些局限性,导致重建的自由视点视频中的图像的质量受到影响。It can be understood that the above specific application scenario examples are used to better understand the embodiments of this specification, however, the embodiments of this specification are not limited to the above specific application scenarios. The inventor found through research that the current depth map processing method still has some limitations, which affects the quality of the images in the reconstructed free-viewpoint video.
针对上述问题,本说明书实施例提供了相应的深度图处理方法及自由视点视频重建方法,为使本说明书实施例的目的、方案、原理及效果等更加清楚明了,以下参照附图,并通过具体实施例进行详细描述。In view of the above problems, the embodiments of this specification provide a corresponding depth map processing method and a free-view video reconstruction method. Examples are described in detail.
在图像处理中,采样的图像数值用一个数字表示,将图像函数的连续数值转变为其数字等价量的过程是量化。图像量化给每个连续的样本数值一个整数数字。In image processing, a sampled image value is represented by a number, and the process of converting a continuous value of an image function to its digital equivalent is quantization. Image quantization assigns each successive sample value an integer number.
目前,服务端设备(如服务器12)基于纹理图估计出场景和物体的深度图之后,通过对深度值进行8比特二进制数据的量化,表达为一个深度图,为描述方便,这里称为估计深度图。将同步的多个视角的纹理图和所得到的对应视角的估计深度图进行拼接, 得到拼接图像。按照帧时序将拼接图像及对应的参数数据进行压缩,可以得到自由视点视频,终端设备基于自由获取的自由视点视频可以进行自由视点图像的重建。At present, after the server device (such as the server 12) estimates the depth map of the scene and the object based on the texture map, it performs 8-bit binary data quantization on the depth value, and expresses it as a depth map. For the convenience of description, it is referred to as the estimated depth here. picture. The texture maps of the synchronized multiple viewing angles and the obtained estimated depth maps of the corresponding viewing angles are spliced to obtain a spliced image. The spliced image and corresponding parameter data are compressed according to the frame timing to obtain a free-view video, and the terminal device can reconstruct the free-view image based on the freely obtained free-view video.
然而,发明人经研究发现,目前的深度图量化处理方法,对于拼接图像中的每一个深度图,均基于一套同样的量化参数数据进行量化,重建得到的自由视点图像的质量受到目前的深度图量化处理方法的限制。However, the inventor found through research that the current depth map quantization processing method is based on the same set of quantization parameter data for each depth map in the spliced image, and the quality of the reconstructed free viewpoint image is affected by the current depth. Limitations of graph quantization processing methods.
具体而言,目前基于预先定义的视场中的深度值的最大值和最小值,以及服务端估计得到的估计深度图中的每个像素的深度值,通过如下公式进行量化,可以得到一个数值范围在0-255之间的8比特二进制的量化深度值。Specifically, at present, based on the maximum and minimum values of the depth values in the pre-defined field of view, and the depth value of each pixel in the estimated depth map estimated by the server, a numerical value can be obtained by quantizing the following formula Quantized depth value in 8-bit binary in the range 0-255.
Figure PCTCN2021102335-appb-000007
Figure PCTCN2021102335-appb-000007
其中,range为所述估计深度图中对应像素的深度值,Depth为所述估计深度图中对应像素的量化深度值,depth_range_near为预设的视场中距离光心的深度距离最小值,depth_range_far为预设的视场中距离光心的深度距离最大值。Wherein, range is the depth value of the corresponding pixel in the estimated depth map, Depth is the quantized depth value of the corresponding pixel in the estimated depth map, depth_range_near is the minimum depth distance from the optical center in the preset field of view, and depth_range_far is The maximum depth distance from the optical center in the preset field of view.
然而,在具体应用场景中,采用同一套固定的量化参数数据对拼接图像中的所有估计深度图中像素的深度值进行量化,可能导致无法充分利用深度图的表达空间。例如,有些视角对应的估计深度图的depth_range_near,也就是距离最近的物体,要比其他一些视角对应的估计深度图中要大,因此,使用同样的量化参数数据量化后,会导致整个8比特二进制的表达空间没有得到充分利用,某些视角中的像素的最大深度值会远离255,而某些视角中像素的最小深度值会远离0。However, in specific application scenarios, using the same set of fixed quantization parameter data to quantize the depth values of pixels in all estimated depth maps in the spliced image may result in the inability to fully utilize the expression space of the depth map. For example, the depth_range_near of the estimated depth map corresponding to some perspectives, that is, the closest object, is larger than the estimated depth map corresponding to some other perspectives. Therefore, using the same quantization parameter data quantization will result in the entire 8-bit binary The expression space of is not fully utilized, the maximum depth value of pixels in some viewpoints will be far away from 255, and the minimum depth value of pixels in some viewpoints will be far away from 0.
如图5所示的视场应用场景示意图,对于一场景区域50,其中包含物体R,并设置有多个采集设备P 1、P 2…P n…P N,采集设备P 1~P N呈弧线形设置,对应的光心依次为C 1、C 2…C n…C N,各采集设备P 1~P N对应的光轴L 1、L 2…L n…L N从图5物体R与各采集设备P 1~P N的光心C 1~C N之间空间关系可以直观看出,物体R与各采集设备的光心之间的距离最小值和距离最大值不同,因此基于采集设备P 1~P N采集到的纹理图估计得到的估计深度图的距离光心的深度距离最小值和距离光心的深度距离最大值均有不同。 Field of view of the application scenario shown in FIG. 5 schematic for a scene region 50, which contains the object R, and is provided with a plurality of collecting devices P 1, P 2 ... P n ... P N, capture device P 1 ~ P N was arc-shaped is provided, the order of the corresponding optical center C 1, C 2 ... C n ... C N, each acquisition device P 1 ~ P N corresponding to the optical axis L 1, L 2 ... L n ... L N 5 objects from FIG. R each acquisition device P 1 ~ P N between the optical center C 1 ~ C N can visually see the spatial relationship, different from the minimum and maximum distance between the optical center of each object R and the collection apparatus, and therefore based on capture device P 1 ~ P N acquired texture map estimation estimates the maximum depth from varying depths from the minimum value and the distance from the optical center of the optical center of the resulting depth map.
基于此,本说明书实施例在深度图量化过程中,采用与对应视角的实际情况匹配的量化参数,对所述估计深度图中像素的深度值进行量化,从而对于每个视角的深度图,都可以充分利用深度量化比特位的表达空间,进而可以提高重建得到的自由视点视频的图像质量。Based on this, in the depth map quantization process in the embodiments of this specification, quantization parameters that match the actual situation of the corresponding viewing angle are used to quantize the depth values of the pixels in the estimated depth map, so that for the depth map of each viewing angle, all The expression space of depth quantization bits can be fully utilized, and the image quality of the reconstructed free-view video can be improved.
参照图6所示的深度图处理方法的流程图,本说明书实施例具体可以包括如下量化处理步骤:Referring to the flowchart of the depth map processing method shown in FIG. 6 , the embodiment of this specification may specifically include the following quantization processing steps:
S61,获取基于多个帧同步的纹理图生成的估计深度图,所述多个纹理图的视角不同。S61: Acquire an estimated depth map generated based on multiple frame-synchronized texture maps, where the multiple texture maps have different viewing angles.
在具体实施中,如图1所示,可以由多个采集设备组成的采集系统同步采集图像,得到所述多个帧同步的纹理图。In a specific implementation, as shown in FIG. 1 , an acquisition system composed of a plurality of acquisition devices may acquire images synchronously, and obtain the texture maps synchronized with the multiple frames.
采集设备(如相机)坐标系的原点可以作为光心,深度值可以是视场中各个点沿光 轴到光心的距离。在具体实施中,可以基于所述帧同步的多个纹理图,得到每个纹理图对应的估计深度图。The origin of the coordinate system of the acquisition device (such as a camera) can be used as the optical center, and the depth value can be the distance from each point in the field of view to the optical center along the optical axis. In a specific implementation, an estimated depth map corresponding to each texture map may be obtained based on the multiple texture maps synchronized in the frame.
S62,获取所述估计深度图中像素的深度值。S62: Acquire depth values of pixels in the estimated depth map.
S63,获取并基于与所述估计深度图视角对应的量化参数数据,对所述估计深度图中像素的深度值进行量化处理,得到量化深度图中对应像素的量化深度值。S63: Acquire and perform quantization processing on the depth values of the pixels in the estimated depth map based on the quantization parameter data corresponding to the perspective of the estimated depth map, to obtain the quantized depth values of the corresponding pixels in the quantized depth map.
在具体实施中,量化参数估计值可以包括:估计深度图对应视角的距离光心的深度距离最小值和距离光心的深度距离最大值,为对估计深度图中像素的深度值进行量化处理,可以先获取所述估计深度图对应视角的距离光心的深度距离最小值和距离光心的深度距离最大值,之后,可以基于所述估计深度图对应视角的距离光心的深度距离最小值和距离光心的深度距离最大值,采用对应的量化公式对所述估计深度图中的对应像素的深度值进行量化处理,得到量化深度图中对应像素的量化深度值。In a specific implementation, the estimated value of the quantization parameter may include: estimating the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the depth map, so as to perform quantization processing on the depth values of the pixels in the estimated depth map, The minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map can be obtained first, and then, the minimum depth distance from the optical center of the viewing angle corresponding to the estimated depth map and the optical center can be obtained. For the maximum depth distance from the optical center, a corresponding quantization formula is used to quantize the depth value of the corresponding pixel in the estimated depth map to obtain the quantized depth value of the corresponding pixel in the quantized depth map.
在本说明书一些实施例中,采用如下量化公式对估计深度图中的对应像素的深度值进行量化处理:In some embodiments of this specification, the following quantization formula is used to quantize the depth value of the corresponding pixel in the estimated depth map:
Figure PCTCN2021102335-appb-000008
Figure PCTCN2021102335-appb-000008
其中,M为所述估计深度图对应像素的量化比特位,range为所述估计深度图中对应像素的深度值,Depth为所述估计深度图中对应像素的量化深度值,N为所述估计深度图对应视角,depth_range_near_N为视角N对应的估计深度图中的距离光心的深度距离最小值,depth_range_far_N为视角N对应的估计深度图中距离光心的深度距离最大值。Among them, M is the quantization bit of the corresponding pixel in the estimated depth map, range is the depth value of the corresponding pixel in the estimated depth map, Depth is the quantized depth value of the corresponding pixel in the estimated depth map, and N is the estimated depth value. The depth map corresponds to the viewing angle, depth_range_near_N is the minimum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N, and depth_range_far_N is the maximum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N.
采用上述实施例对估计深度图中的像素深度图进行量化处理后,能够使得各视角对应的量化深度图中离相机(光心)最近的物体被量化后得到更加接近2 M-1的深度值。在具体实施中,M可以取8比特,16比特等。若M取8比特,则各视角对应的量化深度图中离光心最近的物体被量化后深度值更加接近255。 After the pixel depth map in the estimated depth map is quantized by using the above embodiment, the object closest to the camera (optical center) in the quantized depth map corresponding to each viewing angle can be quantized to obtain a depth value closer to 2 M -1 . In a specific implementation, M can take 8 bits, 16 bits, and so on. If M is 8 bits, the depth value of the object closest to the optical center in the quantized depth map corresponding to each viewing angle is closer to 255 after quantization.
在具体实施中,可以将同步的多个视角的纹理图与对应视角的量化深度图进行拼接,得到拼接图像,进而基于多个帧时刻的拼接图像以及所述拼接图像对应的参数数据,可以得到自由视点视频。考虑到传输带宽的限制,可以将所述自由视点视频进行压缩后,传输至终端设备进行自由视点视频的图像重建。In a specific implementation, the synchronized texture maps of multiple viewing angles and the quantized depth maps of the corresponding viewing angles can be spliced to obtain a spliced image, and then based on the spliced images at multiple frame moments and the parameter data corresponding to the spliced images, it is possible to obtain Free View Video. Considering the limitation of transmission bandwidth, the free-view video may be compressed and then transmitted to a terminal device for image reconstruction of the free-view video.
结合参照图7,为进行自由视点视频重建,需要进行纹理图的采集和深度图计算,包括了三个主要步骤,分别为多摄像机的视频采集(Multi-camera Video Capturing),摄像机内外参计算(Camera Parameter Estimation),以及深度图计算(Depth Map Calculation)。对于多摄像机采集来说,要求各个摄像机采集的视频可以帧级对齐。其中,通过多摄像机的视频采集可以得到纹理图(Texture Image);通过摄像机内外参计算,可以得到摄像机参数(Camera Parameter),可以包括摄像机内部参数数据和外部参数数 据;通过深度图计算,可以得到深度图(Depth Map),多个同步的纹理图及对应视角的深度图和摄像机参数,形成6DoF视频数据。Referring to Figure 7, in order to perform free-view video reconstruction, texture map acquisition and depth map calculation are required, including three main steps, namely Multi-camera Video Capturing, camera internal and external parameter calculation ( Camera Parameter Estimation), and Depth Map Calculation. For multi-camera capture, it is required that the video captured by each camera can be aligned at the frame level. Among them, the texture image can be obtained through the video acquisition of multiple cameras; the camera parameters can be obtained through the calculation of the internal and external parameters of the camera, which can include the internal parameter data of the camera and the external parameter data; through the calculation of the depth map, it can be obtained Depth Map, multiple synchronized texture maps, depth maps and camera parameters corresponding to the perspective, form 6DoF video data.
在本说明书实施例方案中,并不需要特殊的摄像机,比如光场摄像机,来做视频的采集。同样的,也不需要在采集前先进行复杂的摄像机校准的工作。可以布局和安排多摄像机的位置,以更好的拍摄需要拍摄的物体或者场景。In the solution of the embodiment of the present specification, no special camera, such as a light field camera, is required to collect video. Likewise, there is no need for complex camera calibration prior to acquisition. Multiple camera positions can be laid out and arranged to better capture the object or scene that needs to be captured.
在以上的三个步骤处理完后,就得到了从多摄像机采集来的纹理图,所有摄像机的摄像机参数,以及每个摄像机的深度图。可以把这三部分数据称作为多角度自由视角视频数据中的数据文件,也可以称作6自由度视频数据(6DoF video data)。因为有了这些数据,用户端就可以根据虚拟的6自由度(Degree of Freedom,DoF)位置,来生成虚拟视点,从而提供6DoF的视频体验。After the above three steps are processed, the texture map collected from multiple cameras, the camera parameters of all cameras, and the depth map of each camera are obtained. These three parts of data can be referred to as data files in the multi-angle free-view video data, and can also be referred to as 6DoF video data. Because of these data, the client can generate virtual viewpoints according to the virtual 6 degrees of freedom (DoF) position, thereby providing a 6DoF video experience.
结合参考图8,6DoF视频数据以及指示性数据可以经过压缩和传输到达用户侧,用户侧可以根据接收到的数据,获取用户侧6DoF表达,也即前述的6DoF视频数据和元数据。其中,指示性数据也可以称作元数据(Metadata),8, 6DoF video data and indicative data can be compressed and transmitted to the user side, and the user side can obtain the user side 6DoF expression according to the received data, that is, the aforementioned 6DoF video data and metadata. Among them, indicative data can also be called metadata (Metadata),
元数据可以用来描述6DoF视频数据的数据模式,具体可以包括:拼接模式元数据(Stitching Pattern metadata),用来指示拼接图像中多个纹理图的像素数据以及量化深度图数据的存储规则;边缘保护元数据(Padding pattern metadata),可以用于指示对拼接图像中进行边缘保护的方式,对应视角的量化参数元数据,以及其它元数据(Other metadata)。元数据可以存储于数据头文件,具体的存储顺序可以如图9所示,或者以其它顺序存储。Metadata can be used to describe the data pattern of 6DoF video data, and can specifically include: Stitching Pattern metadata, used to indicate the pixel data of multiple texture maps in the stitched image and the storage rules for quantized depth map data; edge Padding pattern metadata, which can be used to indicate the method of edge protection in the stitched image, the quantization parameter metadata of the corresponding viewing angle, and other metadata (Other metadata). The metadata can be stored in the data header file, and the specific storage sequence can be as shown in FIG. 9 , or stored in other sequences.
结合参考图10,用户侧得到了6DoF视频数据,其中包括了摄像机参数,纹理图以及量化深度图,以及元数据,除此之外,还有用户端的交互行为数据。通过这些数据,用户侧可以采用基于深度图的渲染(DIBR,Depth Image-Based Rendering)方式进行的6DoF渲染,从而在一个特定的根据用户交互行为产生的6DoF位置产生虚拟视点的图像,也即根据用户指示,确定与该指示对应的6DoF位置的虚拟视点。Referring to Figure 10, the user side obtains 6DoF video data, which includes camera parameters, texture maps, quantized depth maps, and metadata, as well as user-side interaction behavior data. Through these data, the user side can use Depth Image-Based Rendering (DIBR, Depth Image-Based Rendering) for 6DoF rendering, so as to generate a virtual viewpoint image at a specific 6DoF position generated according to user interaction behavior, that is, according to The user instructs, and the virtual viewpoint of the 6DoF position corresponding to the instruction is determined.
目前,自由视点视频数据中对于任一视频帧,一般表达为多摄像机采集的纹理图与相应深度图所形成的拼接图像。如图11所示的拼接图像的结构示意图,其中,拼接图像的上半部分为纹理图区域,划分称为8个纹理图子区域,分别存储同步的8个纹理图的像素数据,每个纹理图的拍摄角度不同,也即视角不同。拼接图像的下半部分为深度图区域,划分为8个深度图子区域,分别存储上述8个纹理图的对应的量化深度图。其中,视角N的纹理图与视角N的量化深度图是像素点一一对应的,将所述拼接图像压缩后传输到终端进行解码和DIBR,从而可以在用户互动的视点上插值出图像。At present, any video frame in the free-view video data is generally expressed as a stitched image formed by a texture map collected by multiple cameras and a corresponding depth map. Figure 11 is a schematic diagram of the structure of the spliced image, wherein the upper part of the spliced image is the texture map area, which is divided into 8 texture map sub-regions, which store the pixel data of the 8 texture maps that are synchronized respectively. The pictures are taken from different angles, that is, from different perspectives. The lower half of the spliced image is the depth map area, which is divided into 8 depth map sub-regions, and the corresponding quantized depth maps of the above 8 texture maps are respectively stored. The texture map of view N and the quantized depth map of view N are in one-to-one correspondence with pixels, and the spliced image is compressed and transmitted to the terminal for decoding and DIBR, so that the image can be interpolated from the viewpoint of user interaction.
发明人经研究发现,如图11所示,对于每一个纹理图,都有一个相同分辨率的量化深度图与其对应,从而整体拼接图像的分辨率是纹理图集合的两倍,由于终端(如移动终端)的视频解码分辨率一般是有限的,因此上述自由视点视频数据的表达方法只能通 过降低纹理图的分辨率来实现,从而导致用户在终端侧感受到的重建图像的清晰度下降。After research, the inventor found that, as shown in Figure 11, for each texture map, there is a quantized depth map of the same resolution corresponding to it, so that the resolution of the overall stitched image is twice that of the texture map set. The video decoding resolution of a mobile terminal) is generally limited, so the above-mentioned expression method of free-view video data can only be realized by reducing the resolution of the texture map, resulting in a decrease in the definition of the reconstructed image felt by the user on the terminal side.
针对上述问题,在本说明书一些实施例中,可以先对所述量化深度图进行降采样处理,得到第一深度图,将所述同步的多个视角的纹理图和对应视角的第一深度图按照预设的拼接方式进行拼接,得到拼接图像。In view of the above problems, in some embodiments of this specification, the quantized depth map may be down-sampled first to obtain a first depth map, and the synchronized texture maps of multiple viewing angles and the first depth map of corresponding viewing angles may be combined Stitching is performed according to a preset stitching method to obtain a stitched image.
为使本领域技术人员更好地理解和实施本说明书实施例,以下给出两种具体的降采样处理方式示例:To enable those skilled in the art to better understand and implement the embodiments of the present specification, two specific examples of downsampling processing methods are given below:
一种是,对所述量化深度图中的像素进行抽点处理,得到所述第一深度图。例如可以对所述量化深度图中的像素点每隔一个像素点抽取一个像素点,得到所述第一深度图,得到的所述第一深度图的分辨率为所述原始深度图的50%。One is to perform snapshot processing on the pixels in the quantized depth map to obtain the first depth map. For example, one pixel can be extracted from every other pixel in the quantized depth map to obtain the first depth map, and the resolution of the obtained first depth map is 50% of the original depth map .
另一种是,对所述量化深度图中的像素做基于对应纹理图的滤波,得到所述第一深度图。The other is to perform filtering based on the corresponding texture map on the pixels in the quantized depth map to obtain the first depth map.
为节约数据存储资源和数据传输资源,拼接图像可以为矩形。In order to save data storage resources and data transmission resources, the stitched image can be a rectangle.
为使本领域技术人员更好地理解和实施本说明书实施例,以下通过具体实施例,对经过上述深度图处理后终端侧的自由视点视频重建方法进行相应介绍。In order to enable those skilled in the art to better understand and implement the embodiments of the present specification, a method for reconstructing a free-viewpoint video on the terminal side after the above-mentioned depth map processing is correspondingly introduced through specific embodiments below.
参照图12所示的自由视点视频重建方法的流程图,在本说明书实施例中,具体可以采用如下步骤进行自由视点视频重建:Referring to the flowchart of the free-viewpoint video reconstruction method shown in FIG. 12 , in the embodiments of this specification, the following steps may be specifically adopted to perform free-viewpoint video reconstruction:
S121,获取自由视点视频,所述自由视点视频包括多个帧时刻的拼接图像和所述拼接图像对应的参数数据,所述拼接图像包括同步的多个视角的纹理图和对应视角的量化深度图,所述拼接图像对应的参数数据包括:对应视角的估计深度图的量化参数数据和摄像机参数数据。S121: Obtain a free-view video, where the free-view video includes spliced images at multiple frame times and parameter data corresponding to the spliced images, and the spliced images include synchronized texture maps of multiple viewing angles and quantized depth maps of corresponding viewing angles , the parameter data corresponding to the spliced image includes: quantization parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle.
在具体实施中,所述自由视点视频可以为视频压缩文件形式,也可以以视频流的方式传输。所述拼接图像的参数数据可以存储在自由视点视频数据的头文件中,具体形式可以参见前述实施例介绍。In a specific implementation, the free-view video may be in the form of a video compressed file, or may be transmitted in the form of a video stream. The parameter data of the spliced image may be stored in the header file of the free-view video data, and the specific form may refer to the introduction in the foregoing embodiments.
在本说明书一些实施例中,所述对应视角的估计深度图的量化参数数据可以以数组的形式存储。例如,对于拼接图像中有16组纹理图和量化深度图的自由视点视频,量化参数数据可以依次为表示为:In some embodiments of the present specification, the quantization parameter data of the estimated depth map of the corresponding viewing angle may be stored in the form of an array. For example, for a free-view video with 16 sets of texture maps and quantized depth maps in the stitched image, the quantization parameter data can be expressed as:
数组Z=[视角0参数值,视角2参数值…视角15量化参数值]。Array Z = [view 0 parameter value, view 2 parameter value...view 15 quantization parameter value].
S122,获取所述量化深度图中像素的量化深度值。S122: Acquire quantized depth values of pixels in the quantized depth map.
S123,获取并基于所述量化深度图对应视角的估计深度图的量化参数数据,对所述量化深度图中像素的量化深度值进行反量化处理,得到对应的估计深度图。S123: Obtain and perform inverse quantization processing on the quantized depth values of the pixels in the quantized depth map based on the quantization parameter data of the estimated depth map of the view angle corresponding to the quantized depth map to obtain a corresponding estimated depth map.
S124,基于同步的多个视角的纹理图以及对应视角的估计深度图,根据虚拟视点的位置信息以及所述拼接图像对应的摄像机参数数据,重建得到所述虚拟视点的图像。S124 , based on the synchronized texture maps of multiple viewpoints and the estimated depth map of the corresponding viewpoints, and according to the position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image, reconstruct the image of the virtual viewpoint.
对于步骤S123,在本说明书一些实施例中,采用如下方式实现:For step S123, in some embodiments of this specification, it is implemented in the following manner:
获取所述估计深度图对应视角的距离光心的深度距离最小值和距离光心的深度距离 最大值;Obtain the minimum depth distance from the optical center and the maximum depth distance from the optical center of the corresponding viewing angle of the estimated depth map;
基于所述估计深度图对应视角的距离光心的深度距离最小值和距离光心的深度距离最大值,采用对应的反量化公式对所述量化深度图中的量化深度值进行反量化处理,得到对应视角的估计深度图对应像素的深度值。Based on the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map, the corresponding inverse quantization formula is used to perform inverse quantization processing on the quantized depth values in the quantized depth map, to obtain The estimated depth map for the corresponding view corresponds to the depth value of the corresponding pixel.
在本说明书一具体实施例中,采用如下反量化公式对所述量化深度图中的量化深度值进行反量化处理,得到估计深度图中的对应像素值:In a specific embodiment of this specification, the following inverse quantization formula is used to perform inverse quantization processing on the quantized depth values in the quantized depth map to obtain corresponding pixel values in the estimated depth map:
Figure PCTCN2021102335-appb-000009
Figure PCTCN2021102335-appb-000009
Figure PCTCN2021102335-appb-000010
Figure PCTCN2021102335-appb-000010
Figure PCTCN2021102335-appb-000011
Figure PCTCN2021102335-appb-000011
其中,M为所述量化深度图对应像素的量化比特位,range为所述估计深度图中对应像素的深度值,Depth为所述量化深度图中对应像素的量化深度值,N为所述估计深度图对应视角,depth_range_near_N为视角N对应的估计深度图中的距离光心的深度距离最小值,depth_range_far_N为视角N对应的估计深度图中距离光心的深度距离最大值,maxdisp为视角N对应的量化深度距离最大值,mindisp为视角N对应的量化深度距离最小值。Wherein, M is the quantization bit of the corresponding pixel in the quantized depth map, range is the depth value of the corresponding pixel in the estimated depth map, Depth is the quantized depth value of the corresponding pixel in the quantized depth map, and N is the estimated depth The depth map corresponds to the viewing angle, depth_range_near_N is the minimum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N, depth_range_far_N is the maximum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N, and maxdisp is the corresponding viewing angle N. The maximum value of the quantized depth distance, and mindisp is the minimum value of the quantized depth distance corresponding to the viewing angle N.
在本说明书一些实施例中,与前述实施例对应,若所述量化深度图的分辨率小于对应视角的纹理图的分辨率,例如在服务端,所述量化深度图做过降采样,则在终端设备侧,可以对所述对应视角的估计深度图进行升采样,得到第二深度图,之后利用所述第二深度图进行虚拟视点图像的重建。In some embodiments of this specification, corresponding to the foregoing embodiments, if the resolution of the quantized depth map is smaller than the resolution of the texture map of the corresponding viewing angle, for example, on the server side, the quantized depth map has been down-sampled, then the On the side of the terminal device, the estimated depth map of the corresponding viewing angle may be up-sampled to obtain a second depth map, and then the virtual viewpoint image is reconstructed by using the second depth map.
在具体实施中,可以有多种升采样方式,以下给出一些示例方式:In a specific implementation, there can be a variety of upsampling methods, some example methods are given below:
方式一示例,对经过1/4降采样的估计深度图进行升采样处理,得到与所述纹理图分辨率相同的第二深度图,基于不同的行和列,具体包括如下几种不同的处理方式:The first example of the way is to perform up-sampling processing on the estimated depth map that has been down-sampled by 1/4 to obtain a second depth map with the same resolution as the texture map. Based on different rows and columns, it specifically includes the following different processes Way:
(1)获取所述估计深度图中像素的深度值,作为所述第二深度图中对应的偶数行及偶数列的像素值;(1) Obtain the depth value of the pixel in the estimated depth map as the pixel value of the even-numbered row and the even-numbered column corresponding to the second depth map;
(2)对于所述第二深度图中偶数行奇数列的像素的深度值,确定对应纹理图中对应像素作为中间像素,基于对应纹理图中所述中间像素的亮度通道值与所述中间像素对应的左侧像素的亮度通道值和右侧像素的亮度通道值之间的关系确定;(2) For the depth values of the pixels in the even rows and odd columns in the second depth map, determine the corresponding pixels in the corresponding texture map as intermediate pixels, based on the brightness channel value of the intermediate pixels in the corresponding texture map and the intermediate pixels The relationship between the brightness channel value of the corresponding left pixel and the brightness channel value of the right pixel is determined;
具体地,对于所述第二深度图中奇数行像素的深度值,确定对应纹理图中的对应像素作为中间像素,基于对应的纹理图中所述中间像素的亮度通道值与所述中间像素对应的上方像素的亮度通道值和下方像素的亮度通道值之间的关系确定。Specifically, for the depth values of pixels in odd rows in the second depth map, determine the corresponding pixels in the corresponding texture map as intermediate pixels, and correspond to the intermediate pixels based on the luminance channel values of the intermediate pixels in the corresponding texture map The relationship between the luminance channel value of the pixel above and the luminance channel value of the pixel below is determined.
具体地,基于对应纹理图中所述中间像素的亮度通道值与所述中间像素对应的左侧像素的亮度通道值和右侧像素的亮度通道值之间的关系,共有三种情况:Specifically, based on the relationship between the luminance channel value of the middle pixel in the corresponding texture map and the luminance channel value of the left pixel and the luminance channel value of the right pixel corresponding to the middle pixel, there are three cases:
a1.若对应纹理图中中间像素的亮度通道值与所述中间像素对应的右侧像素的亮度通道值之差的绝对值小于所述中间像素的亮度通道值与左侧像素的亮度通道值之差的绝对值与预设阈值之商,则选取所述右侧像素对应的深度值作为所述第二深度图中偶数行奇数列相应像素的深度值,即:a1. If the absolute value of the difference between the luminance channel value of the middle pixel in the corresponding texture image and the luminance channel value of the right pixel corresponding to the middle pixel is less than the difference between the luminance channel value of the middle pixel and the luminance channel value of the left pixel The quotient of the absolute value of the difference and the preset threshold value, then the depth value corresponding to the right pixel is selected as the depth value of the corresponding pixel in the even-numbered row and odd-numbered column in the second depth map, that is:
a2.若对应纹理图中中间像素的亮度通道值与所述中间像素对应的左侧像素的亮度通道值之差的绝对值小于所述中间像素的亮度通道值与右侧像素的亮度通道值之差的绝对值与所述预设阈值之商,则选取所述左侧像素对应的深度值作为所述第二深度图中偶数行奇数列相应像素的深度值;a2. If the absolute value of the difference between the luminance channel value of the middle pixel in the corresponding texture image and the luminance channel value of the left pixel corresponding to the middle pixel is less than the difference between the luminance channel value of the middle pixel and the luminance channel value of the right pixel the quotient of the absolute value of the difference and the preset threshold, then select the depth value corresponding to the left pixel as the depth value of the corresponding pixel in the even row and odd column in the second depth map;
a3.否则,选取所述左侧像素和右侧像素对应的深度值中的最大值作为所述第二深度图中偶数行奇数列相应像素的深度值。a3. Otherwise, select the maximum value among the depth values corresponding to the left pixel and the right pixel as the depth value of the corresponding pixels in the even rows and odd columns in the second depth map.
(3)对于所述第二深度图中奇数行像素的深度值,确定对应纹理图中的对应像素作为中间像素,基于对应的纹理图中所述中间像素的亮度通道值与所述中间像素对应的上方像素的亮度通道值和下方像素的亮度通道值之间的关系确定。(3) For the depth values of pixels in odd rows in the second depth map, determine the corresponding pixels in the corresponding texture map as intermediate pixels, and correspond to the intermediate pixels based on the luminance channel values of the intermediate pixels in the corresponding texture map The relationship between the luminance channel value of the pixel above and the luminance channel value of the pixel below is determined.
b1.若对应纹理图中中间像素的亮度通道值与所述中间像素对应的下方像素的亮度通道值之差的绝对值小于所述中间像素的亮度通道值与上方像素的亮度通道值的绝对值之差与预设阈值之商,则选取所述下方像素对应的深度值作为所述第二深度图中奇数行相应像素的深度值;b1. If the absolute value of the difference between the luminance channel value of the middle pixel in the corresponding texture image and the luminance channel value of the lower pixel corresponding to the middle pixel is smaller than the absolute value of the luminance channel value of the middle pixel and the luminance channel value of the upper pixel The quotient of the difference and the preset threshold, then select the depth value corresponding to the lower pixel as the depth value of the pixel corresponding to the odd row in the second depth map;
b2.若对应纹理图中中间像素的亮度通道值与所述中间像素对应的上方像素的亮度通道值之差的绝对值小于所述中间像素的亮度通道值与下方像素的亮度通道值之差的绝对值与所述预设阈值之商,则选取所述上方像素对应的深度值作为所述第二深度图中奇数行相应像素的深度值;b2. If the absolute value of the difference between the luminance channel value of the middle pixel in the corresponding texture image and the luminance channel value of the upper pixel corresponding to the middle pixel is smaller than the difference between the luminance channel value of the middle pixel and the luminance channel value of the lower pixel the quotient of the absolute value and the preset threshold, then select the depth value corresponding to the upper pixel as the depth value of the pixel corresponding to the odd row in the second depth map;
b3.否则,选取所述上方像素和下方像素对应的深度值中的最大值作为所述第二深度图中偶数行奇数列相应像素的深度值。b3. Otherwise, select the maximum value among the depth values corresponding to the upper pixel and the lower pixel as the depth value of the corresponding pixels in the even rows and odd columns in the second depth map.
上述步骤(2)中的a1至a3三种情况用公式可以表示为:The three cases of a1 to a3 in the above step (2) can be expressed as:
若abs(pix_C-pix_R)<abs(pix_C-pix_L)/THR,则选取Dep_R;If abs(pix_C-pix_R)<abs(pix_C-pix_L)/THR, select Dep_R;
若abs(pix_C-pix_L)<abs(pix_C-pix_R)/THR,则选取Dep_L;If abs(pix_C-pix_L)<abs(pix_C-pix_R)/THR, select Dep_L;
否则,对于其他情况,选取Max(Dep_R,Dep_L)。Otherwise, for other cases, choose Max(Dep_R, Dep_L).
上述步骤(3)中b1至b3三种情况用公式可以表示为:The three cases from b1 to b3 in the above step (3) can be expressed as:
若abs(pix_C-pix_D)<abs(pix_C-pix_U)/THR,则采用Dep_D;If abs(pix_C-pix_D)<abs(pix_C-pix_U)/THR, then use Dep_D;
若abs(pix_C-pix_U)<abs(pix_C-pix_D)/THR,则选取Dep_U;If abs(pix_C-pix_U)<abs(pix_C-pix_D)/THR, select Dep_U;
否则,对于其他情况,选取Max(Dep_D,Dep_U)。Otherwise, for other cases, choose Max(Dep_D, Dep_U).
在上述公式中,pix_C为第二深度图中深度值对应位置的纹理图中中间像素的亮度通道值(Y值),pix_L为pix_C的左侧像素的亮度通道值,pix_R为pix_C的右侧像素的亮度通道值,pix_U为pix_C的上方像素的亮度通道值,pix_D下方像素的亮度通道值, Dep_R为第二深度图中深度值对应位置的纹理图中中间像素的右侧像素对应的深度值,Dep_L为第二深度图中深度值对应位置的纹理图中中间像素的右侧像素对应的深度值,Dep_D为第二深度图中深度值对应位置的纹理图中中间像素的下方像素对应的深度值,Dep_U为第二深度图中深度值对应位置的纹理图中中间像素的上方像素对应的深度值。abs表示绝对值,THR为可设定的阈值,在本说明书一实施例中,THR设置为2。In the above formula, pix_C is the luminance channel value (Y value) of the middle pixel in the texture map corresponding to the depth value in the second depth map, pix_L is the luminance channel value of the left pixel of pix_C, and pix_R is the right pixel of pix_C The luminance channel value of , pix_U is the luminance channel value of the pixel above pix_C, the luminance channel value of the pixel below pix_D, Dep_R is the depth value corresponding to the right pixel of the middle pixel in the texture map at the corresponding position of the depth value in the second depth map, Dep_L is the depth value corresponding to the right pixel of the middle pixel in the texture map at the position corresponding to the depth value in the second depth map, and Dep_D is the depth value corresponding to the pixel below the middle pixel in the texture map at the position corresponding to the depth value in the second depth map , Dep_U is the depth value corresponding to the pixel above the middle pixel in the texture map at the position corresponding to the depth value in the second depth map. abs represents an absolute value, and THR is a settable threshold. In an embodiment of this specification, THR is set to 2.
方式二示例:Example of the second way:
获取所述估计深度图中像素的深度值,作为所述第二深度图中对应行和列的像素值;对于所述第二深度图中与所述估计深度图中的像素不存在对应关系的像素,基于对应纹理图中的相应像素以及所述相应像素的周围像素的像素值之间的差异进行滤波得到。Obtain the depth value of the pixel in the estimated depth map as the pixel value of the corresponding row and column in the second depth map; for the pixel in the second depth map that does not have a corresponding relationship with the estimated depth map The pixel is obtained by filtering based on the difference between the pixel value of the corresponding pixel in the corresponding texture map and the surrounding pixels of the corresponding pixel.
其中,对于所述第二深度图中与所述估计深度图中的像素不存在对应关系的像素,基于对应纹理图中的相应像素以及所述相应像素的周围像素的像素值之间的差异进行滤波得到。Wherein, for the pixels in the second depth map that do not have a corresponding relationship with the pixels in the estimated depth map, the calculation is performed based on the difference between the corresponding pixels in the corresponding texture map and the pixel values of the surrounding pixels of the corresponding pixel. filtered.
具体的滤波方法可以有多种,以下给出两种具体实施例。There are many specific filtering methods, and two specific embodiments are given below.
具体实施例一,最近邻滤波法Specific embodiment 1, nearest neighbor filtering method
具体而言,对于所述第二深度图中与所述估计深度图中的像素不存在对应关系的像素,可以将所述纹理图中的相应像素与所述相应像素周围四个对角位置像素的像素值进行比较,求取与所述相应像素的像素值最相近的像素点,并将所述像素值最相近的像素点对应的估计深度图中的深度值作为与所述纹理图中相应像素在所述第二深度图中对应像素的深度值。Specifically, for the pixels in the second depth map that do not have a corresponding relationship with the pixels in the estimated depth map, the corresponding pixels in the texture map and four diagonal position pixels around the corresponding pixel may be compared Compare the pixel values of the corresponding pixels, obtain the pixel point that is the closest to the pixel value of the corresponding pixel, and use the depth value in the estimated depth map corresponding to the pixel point with the most similar pixel value as the corresponding depth value in the texture map. The pixel corresponds to the depth value of the pixel in the second depth map.
具体实施例二,加权滤波法Specific embodiment 2, weighted filtering method
具体而言,可以将所述纹理图中的相应像素与所述相应像素周围像素进行比较,根据像素值的相似程度,对周围像素对应的估计深度图中的深度值进行加权处理,得到所述纹理图中的相应像素在所述第二深度图中对应像素的深度值。Specifically, the corresponding pixels in the texture map may be compared with the surrounding pixels of the corresponding pixels, and according to the similarity of the pixel values, the depth values in the estimated depth map corresponding to the surrounding pixels may be weighted to obtain the A depth value of a corresponding pixel in the texture map in the second depth map.
以上示出了一些可以对所述估计深度图进行升采样,得到第二深度图的方法,可以理解的是,以上仅为示例说明,本说明书实施例中并不限定具体的升采样方法。并且,对于任一视频帧中的估计深度图进行升采样的方法,可以与将量化深度图进行降采样得到第一深度图的方法相应,也可以没有对应关系。此外,升采样的比例与降采样的比例可以相同,也可以不同。The above shows some methods for upsampling the estimated depth map to obtain the second depth map. It can be understood that the above are only examples, and specific upsampling methods are not limited in the embodiments of this specification. Moreover, the method of up-sampling the estimated depth map in any video frame may correspond to the method of down-sampling the quantized depth map to obtain the first depth map, or there may be no corresponding relationship. In addition, the ratio of upsampling and downsampling can be the same or different.
下面接着就步骤S124给出一些具体示例。Next, some specific examples of step S124 are given.
在具体实施中,为了在保证图像重建质量的前提下,节约数据处理资源,提高图像重建效率,可以仅选取所述拼接图像中的部分纹理图和对应视角的估计深度图作为目标纹理图和目标深度图,用于虚拟视点图像的重建。具体而言:In specific implementation, in order to save data processing resources and improve image reconstruction efficiency on the premise of ensuring image reconstruction quality, only part of the texture map in the stitched image and the estimated depth map of the corresponding viewing angle may be selected as the target texture map and target Depth maps for reconstruction of virtual viewpoint images. in particular:
可以根据所述虚拟视点的位置信息,以及所述拼接图像对应的参数数据,在所述同步的多个视角的纹理图和对应视角的估计深度图中,选择多个目标纹理图和目标深度图。 之后,可以对所述目标纹理图和目标深度图进行组合渲染,得到所述虚拟视点的图像。According to the position information of the virtual viewpoint and the parameter data corresponding to the spliced image, in the synchronized texture maps of multiple viewing angles and the estimated depth maps of the corresponding viewing angles, multiple target texture maps and target depth maps can be selected. . After that, the target texture map and the target depth map can be combined and rendered to obtain the image of the virtual viewpoint.
在具体实施中,可以根据用户交互行为,或者根据预先设置确定所述虚拟视点的位置信息。若是基于用户交互行为确定,则可以通过获取用户交互操作对应的轨迹数据确定相应交互时刻的虚拟视点位置。在本说明书一些实施例中,也可以在服务端(如服务器或云端)预先设定相应视频帧对应的虚拟视点的位置信息,并在所述自由视点视频的头文件中传输所设定的虚拟视点的位置信息。In a specific implementation, the location information of the virtual viewpoint may be determined according to user interaction behavior or preset. If it is determined based on the user interaction behavior, the virtual viewpoint position at the corresponding interaction moment can be determined by acquiring trajectory data corresponding to the user interaction operation. In some embodiments of this specification, the location information of the virtual viewpoint corresponding to the corresponding video frame may also be preset on the server (such as the server or the cloud), and the set virtual viewpoint is transmitted in the header file of the free viewpoint video. The position information of the viewpoint.
在具体实施中,可以基于虚拟视点位置以及所述拼接图像对应的参数数据,确定其中每个纹理图及对应视角的估计深度图与所述虚拟视点位置的空间位置关系,为节约数据处理资源,可以根据所述虚拟视点的位置信息,以及所述拼接图像对应的参数数据,在所述同步的多个视角的纹理图和对应视角的估计深度图中选择与所述虚拟视点位置满足预设位置关系和/或数量关系的纹理图和估计深度图作为所述目标纹理图和目标深度图。In a specific implementation, the spatial positional relationship between the estimated depth map of each texture map and the corresponding viewing angle and the virtual viewpoint position may be determined based on the virtual viewpoint position and the parameter data corresponding to the spliced image, in order to save data processing resources, According to the position information of the virtual viewpoint and the parameter data corresponding to the spliced image, the texture map of the synchronized multiple viewing angles and the estimated depth map of the corresponding viewing angle can be selected to satisfy the preset position with the virtual viewpoint position. A relational and/or quantitative relational texture map and an estimated depth map are used as the target texture map and target depth map.
例如,可以选取离所述虚拟视点位置最近的2至N个视点对应的纹理图和估计深度图。其中,N为所述拼接图像中纹理图的数量,也即纹理图对应的采集设备的数量。在具体实施中,数量关系值可以为固定的,也可以为变化的。For example, texture maps and estimated depth maps corresponding to 2 to N viewpoints closest to the virtual viewpoint position may be selected. Wherein, N is the number of texture maps in the spliced image, that is, the number of acquisition devices corresponding to the texture maps. In a specific implementation, the quantitative relationship value may be fixed or variable.
参照图13所示的组合渲染方法的流程图,在本说明书一些实施例中,具体可以包括如下步骤:Referring to the flowchart of the combined rendering method shown in FIG. 13 , in some embodiments of this specification, the following steps may be specifically included:
S131,将选择的拼接图像中目标深度图分别进行前向映射,映射至所述虚拟位置上。S131, respectively perform forward mapping on the target depth map in the selected spliced image to the virtual position.
S132,对前向映射后的目标深度图分别进行后处理。S132: Perform post-processing on the forward-mapped target depth maps respectively.
在具体实施中,后处理方法可以有多种,在本说明书一些实施例中,可以采用如下其中至少一种方法对目标深度图进行后处理:In specific implementation, there may be various post-processing methods. In some embodiments of this specification, at least one of the following methods may be used to perform post-processing on the target depth map:
1)对前向映射后的目标深度图分别进行前景边缘保护处理;1) Perform foreground edge protection processing on the target depth map after forward mapping;
2)对前向映射后的目标深度图分别进行像素级滤波处理。2) Perform pixel-level filtering on the forward-mapped target depth map respectively.
S133,将选择的所述拼接图像中目标纹理图分别进行反向映射。S133, respectively perform reverse mapping on the target texture map in the selected spliced image.
S134,将反向映射后所生成的各虚拟纹理图进行融合,得到融合纹理图。S134 , fuse the virtual texture maps generated after reverse mapping to obtain a fused texture map.
通过上述步骤S131至步骤S134,可以得到重建图像。Through the above steps S131 to S134, the reconstructed image can be obtained.
在具体实施中,还可以对融合纹理图进行空洞填补,得到所述用户交互时刻虚拟视点位置对应的重建图像。通过空洞填补,可以提高重建图像的质量。In a specific implementation, holes may also be filled in the fused texture map to obtain a reconstructed image corresponding to the position of the virtual viewpoint at the moment of user interaction. Through hole filling, the quality of the reconstructed image can be improved.
在具体实施中,在进行图像重建前,可以先对目标深度图进行预处理(例如,升采样处理),也可以先对拼接图像中的所有反量化处理后得到的估计深度图进行预处理(例如,升采样处理),再进行基于虚拟视点的图像重建。In a specific implementation, before performing image reconstruction, the target depth map may be preprocessed (for example, upsampling), or the estimated depth map obtained after all inverse quantization processing in the stitched image may be preprocessed ( For example, upsampling processing), and then perform image reconstruction based on virtual viewpoints.
为使本领域技术人员更好地理解和实施,本说明书实施例还提供了如前述方法实施例相应的装置及设备等具体实施例,以下参照附图进行对应描述。For better understanding and implementation by those skilled in the art, the embodiments of the present specification also provide specific embodiments such as apparatuses and devices corresponding to the foregoing method embodiments, which will be described below with reference to the accompanying drawings.
本说明书实施例还提供了相应的自由视点视频处理方法,参照图14,具体可以包括 如下步骤:The embodiments of this specification also provide a corresponding free-viewpoint video processing method, with reference to Figure 14, which may specifically include the following steps:
S141,获取自由视点视频,所述自由视点视频包括多个帧时刻的拼接图像和所述拼接图像对应的参数数据,所述拼接图像包括同步的多个视角的纹理图和对应视角的量化深度图,所述拼接图像对应的参数数据包括:对应视角的估计深度图的量化参数数据和摄像机参数数据。S141: Obtain a free-view video, where the free-view video includes spliced images at multiple frame times and parameter data corresponding to the spliced images, and the spliced images include synchronized texture maps of multiple viewing angles and quantized depth maps of corresponding viewing angles , the parameter data corresponding to the spliced image includes: quantization parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle.
在具体实施中,通过获取自由视点视频,并对所述自由视点视频进行解码处理,可以得到所述多个帧时刻的拼接图像和所述拼接图像对应的参数数据。In a specific implementation, by acquiring a free-viewpoint video and decoding the free-viewpoint video, the spliced images at the multiple frame moments and the parameter data corresponding to the spliced images can be obtained.
其中,所述自由视点视频,具体形式可以是前述实施例示例的多角度自由视角视频,如6DoF视频。The specific form of the free-view video may be a multi-angle free-view video, such as a 6DoF video, as exemplified in the foregoing embodiment.
通过下载自由视点视频流或者获取存储的自由视点视频数据文件,可以获取视频帧序列,每个视频帧可以包括同步的多个视角的纹理图和对应视角的第一深度图所形成的拼接图像,一种拼接图像的结构如图11所示。可以理解的是,可以采用其他的拼接图像的结构,例如,可以根据所述纹理图和对应视角的第一深度图的分辨率的比例不同采用不同的拼接方式,例如,一个纹理图可以对应多个第一深度图(如,所述第一深度图为采用25%降采样处理后的深度图)。By downloading a free-view video stream or obtaining a stored free-view video data file, a sequence of video frames can be obtained, and each video frame can include a spliced image formed by the synchronized texture maps of multiple viewing angles and the first depth map of the corresponding viewing angle, A structure of a stitched image is shown in Figure 11. It can be understood that other structures of splicing images can be adopted. For example, different splicing methods can be adopted according to the ratio of the resolution of the texture map and the first depth map of the corresponding viewing angle. For example, one texture map can correspond to multiple images. a first depth map (eg, the first depth map is a depth map processed by 25% downsampling).
自由视点视频数据文件中除了包含拼接图像外,还可以包括描述所述拼接图像的元数据。在具体实施中,可以从所述元数据中获取所述拼接图像的参数数据,例如可以获取所述拼接图像的摄像机参数、所述拼接图像的拼接规则、所述拼接图像的分辨率信息等其中一种或多种信息。In addition to the stitched image, the free-viewpoint video data file may also include metadata describing the stitched image. In a specific implementation, the parameter data of the spliced image can be obtained from the metadata, for example, the camera parameters of the spliced image, the splicing rule of the spliced image, the resolution information of the spliced image, etc. one or more types of information.
在具体实施中,所述拼接图像的参数信息可以和所述拼接图像组合传输,例如,可以存储于视频文件头中。本说明书实施例并不限定所述拼接图像的具体格式,也不限定所述拼接图像的参数信息的具体类型和存储位置,能够基于所述虚拟视点视频得到相应虚拟视点位置的重建图像即可。In a specific implementation, the parameter information of the spliced image may be transmitted in combination with the spliced image, for example, may be stored in a video file header. The embodiments of this specification do not limit the specific format of the spliced image, nor do they limit the specific type and storage location of the parameter information of the spliced image, as long as the reconstructed image of the corresponding virtual viewpoint position can be obtained based on the virtual viewpoint video.
在具体实施中,所述自由视点视频可以为视频压缩文件形式,也可以以视频流的方式传输。所述拼接图像的参数数据可以存储在自由视点视频数据的头文件中,具体形式可以参见前述实施例介绍。In a specific implementation, the free-view video may be in the form of a video compressed file, or may be transmitted in the form of a video stream. The parameter data of the spliced image may be stored in the header file of the free-view video data, and the specific form may refer to the introduction in the foregoing embodiments.
在本说明书一些实施例中,所述对应视角的估计深度图的量化参数数据可以以数组的形式存储。例如,对于拼接图像中有16组纹理图和量化深度图的自由视点视频,量化参数数据可以依次为表示为:In some embodiments of the present specification, the quantization parameter data of the estimated depth map of the corresponding viewing angle may be stored in the form of an array. For example, for a free-view video with 16 sets of texture maps and quantized depth maps in the stitched image, the quantization parameter data can be expressed as:
数组Z=[视角0参数值,视角2参数值…视角15量化参数值]。Array Z = [view 0 parameter value, view 2 parameter value...view 15 quantization parameter value].
S142,获取所述量化深度图中像素的量化深度值。S142: Acquire quantized depth values of pixels in the quantized depth map.
S143,获取并基于所述量化深度图对应视角的估计深度图的量化参数数据,对所述量化深度图中像素的量化深度值进行反量化处理,得到对应的估计深度图。S143: Obtain and perform inverse quantization processing on the quantized depth values of the pixels in the quantized depth map based on the quantization parameter data of the estimated depth map of the viewing angle corresponding to the quantized depth map to obtain a corresponding estimated depth map.
步骤S143反量化处理所采用的具体的量化参数数据,以及具体的反量化处理方法可 以参见前述实施例介绍,此处不再重复描述。The specific quantization parameter data used in the inverse quantization processing in step S143 and the specific inverse quantization processing method can be referred to the introduction of the foregoing embodiments, and the description will not be repeated here.
S144,响应于用户交互行为,确定虚拟视点的位置信息。S144, in response to the user interaction behavior, determine the position information of the virtual viewpoint.
在具体实施中,若自由视点视频采用6DoF的表达方式,基于用户交互的虚拟视点位置信息可以表示为坐标
Figure PCTCN2021102335-appb-000012
的形式,所述虚拟视点位置信息可以在预设的一种或多种用户交互方式下产生。例如,可以为用户操作输入的坐标,如手动点击或手势路径,或者语音输入确定的虚拟位置,或者可以为用户提供自定义的虚拟视点(例如:用户可以输入场景中的位置或视角,如篮下、场边、裁判视角、教练视角等等)。或者基于特定对象(例如球场上的球员、图像中的演员或嘉宾、主持人等,可以在用户点击相应对象后切换至所述对象的视角)。可以理解的是,本发明实施例中并不限定具体的用户交互行为方式,只要能获取到基于用户交互的虚拟视点位置信息即可。
In a specific implementation, if the free viewpoint video adopts the 6DoF expression, the virtual viewpoint position information based on user interaction can be expressed as coordinates
Figure PCTCN2021102335-appb-000012
, the virtual viewpoint position information can be generated in one or more preset user interaction manners. For example, coordinates can be entered for user operations, such as manual clicks or gesture paths, or virtual locations determined by voice input, or users can be provided with customized virtual viewpoints (for example, the user can enter a location or perspective in the scene, such as a basket off the court, from the sidelines, from the referee's perspective, from the coach's perspective, etc.). Or based on a specific object (eg, a player on the court, an actor or guest in an image, a host, etc., the user can switch to the perspective of the object after the user clicks on the corresponding object). It can be understood that a specific user interaction behavior mode is not limited in the embodiment of the present invention, as long as virtual viewpoint position information based on user interaction can be obtained.
作为一可选示例,响应于用户的手势交互操作,可以确定对应的虚拟视点路径信息。就手势交互而言,可以基于手势交互的不同形态规划相应的虚拟视点路径,从而基于用户具体的手势操作,即可确定相应的虚拟视点的路径信息,例如,可以预先规划用户手指相对于触摸屏的左右滑动,对应视角的左右移动;用户手指相对于触摸屏的上下滑动,对应视点位置的上下移动;手指的缩放操作,对应于视点位置的拉近和拉远。As an optional example, in response to the user's gesture interaction operation, the corresponding virtual viewpoint path information may be determined. As far as gesture interaction is concerned, the corresponding virtual viewpoint path can be planned based on different forms of gesture interaction, so that the path information of the corresponding virtual viewpoint can be determined based on the user's specific gesture operation. Left and right sliding corresponds to the left and right movement of the viewing angle; the user's finger sliding up and down relative to the touch screen corresponds to the up and down movement of the viewpoint position; the zoom operation of the finger corresponds to the zoom in and out of the viewpoint position.
可以理解的是,以上基于手势形态规划的虚拟视点路径,仅为示例性说明,可以预先定义基于其他手势形态的虚拟视点路径,或者也可以让用户自定义设置,从而可以增强用户体验。It can be understood that the above virtual viewpoint paths based on gesture shape planning are only exemplary descriptions, and virtual viewpoint paths based on other gesture shapes may be predefined, or user-defined settings may be allowed to enhance user experience.
S145,基于同步的多个视角的纹理图以及对应视角的估计深度图,根据虚拟视点的位置信息以及所述拼接图像对应的摄像机参数数据,重建得到所述虚拟视点的图像。S145 , based on the synchronized texture maps of multiple viewpoints and the estimated depth map of the corresponding viewpoints, and according to the position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image, reconstruct the image of the virtual viewpoint.
在具体实施中,可以根据所述虚拟视点路径信息,选取相应帧时刻的纹理图和对应视角的估计深度图,作为目标纹理图和目标深度图,并对所述目标纹理图和目标深度图进行组合渲染,得到所述虚拟视点的图像。In a specific implementation, according to the virtual viewpoint path information, the texture map of the corresponding frame moment and the estimated depth map of the corresponding viewing angle can be selected as the target texture map and the target depth map, and the target texture map and the target depth map can be processed. Rendering is combined to obtain an image of the virtual viewpoint.
具体选取方法可以参照前述实施例介绍,此处不再详细。The specific selection method may be introduced with reference to the foregoing embodiments, and will not be described in detail here.
需要说明的是,基于虚拟视点路径信息,可以按照时序选取一帧或连续多帧拼接图像中的部分纹理图和对应视角的第二深度图,作为目标纹理图和目标深度图,用于重建对应虚拟视点的图像。It should be noted that, based on the virtual viewpoint path information, part of the texture map and the second depth map of the corresponding viewing angle in one frame or consecutive multi-frame spliced images can be selected according to the time sequence, as the target texture map and the target depth map, which are used to reconstruct the corresponding Image of virtual viewpoint.
在具体实施中,还可以对重建得到的自由视点图像做进一步的处理。以下给出一示例性扩展方式。In a specific implementation, further processing may be performed on the reconstructed free-viewpoint image. An exemplary extension is given below.
为丰富用户视觉体验,可以在重建得到的自由视点图像中植入增强现实(Augmented Reality,AR)特效。在本说明一些实施例中,参照图15所示的自由视点视频处理方法的流程图,采用如下方式实现AR特效的植入:In order to enrich the user's visual experience, Augmented Reality (AR) special effects can be implanted in the reconstructed free-viewpoint images. In some embodiments of this description, referring to the flowchart of the free-viewpoint video processing method shown in FIG. 15 , the implantation of AR special effects is implemented in the following manner:
S151,获取所述虚拟视点的图像中的虚拟渲染目标对象。S151. Acquire a virtual rendering target object in the image of the virtual viewpoint.
在具体实施中,可以基于某些指示信息确定自由视点视频的图像中的某些对象作为 虚拟渲染目标对象,所述指示信息可以基于用户交互生成,也可以基于某些预设触发条件或第三方指令得到。在本说明书一可选实施例中,响应于特效生成交互控制指令,可以获取所述虚拟视点的图像中的虚拟渲染目标对象。In a specific implementation, certain objects in the image of the free-view video may be determined as virtual rendering target objects based on certain indication information, and the indication information may be generated based on user interaction, or may be based on certain preset trigger conditions or a third party. command is obtained. In an optional embodiment of the present specification, the virtual rendering target object in the image of the virtual viewpoint may be acquired in response to the interactive control instruction generated by the special effect.
S152,获取基于所述虚拟渲染目标对象的增强现实特效输入数据所生成的虚拟信息图像。S152: Acquire a virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object.
在本说明书实施例中,所植入的AR特效以虚拟信息图像的形式呈现。所述虚拟信息图像可以基于所述目标对象的增强现实特效输入数据生成。在确定虚拟渲染目标对象后,可以获取基于所述虚拟渲染目标对象的增强现实特效输入数据所生成的虚拟信息图像。In the embodiments of this specification, the implanted AR special effects are presented in the form of virtual information images. The virtual information image may be generated based on augmented reality special effect input data of the target object. After the virtual rendering target object is determined, a virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object can be acquired.
在本说明书实施例中,所述虚拟渲染目标对象对应的虚拟信息图像可以预先生成,也可以响应于特效生成指令即时生成。In the embodiment of this specification, the virtual information image corresponding to the virtual rendering target object may be generated in advance, or may be generated immediately in response to the special effect generation instruction.
在具体实施中,可以基于三维标定得到的所述虚拟渲染目标对象在重建得到的图像中的位置,得到与所述虚拟渲染目标对象位置匹配的虚拟信息图像,从而可以使得到的虚拟信息图像与所述虚拟渲染目标对象在三维空间中的位置更加匹配,进而所展示的虚拟信息图像更加符合三维空间中的真实状态,因而所展示的合成图像更加真实生动,增强用户的视觉体验。In a specific implementation, a virtual information image matching the position of the virtual rendering target object can be obtained based on the position of the virtual rendering target object obtained by three-dimensional calibration in the reconstructed image, so that the obtained virtual information image can be made to match the position of the virtual rendering target object. The position of the virtual rendering target object in the three-dimensional space is more matched, and the displayed virtual information image is more in line with the real state in the three-dimensional space, so the displayed composite image is more realistic and vivid, and the user's visual experience is enhanced.
在具体实施中,可以基于虚拟渲染目标对象的增强现实特效输入数据,按照预设的特效生成方式,生成所述目标对象对应的虚拟信息图像。In a specific implementation, a virtual information image corresponding to the target object may be generated according to a preset special effect generation method based on the augmented reality special effect input data of the virtual rendering target object.
在具体实施中,可以采用多种特效生成方式。In a specific implementation, a variety of special effect generation methods can be adopted.
例如,可以将所述目标对象的增强现实特效输入数据输入至预设的三维模型,基于三维标定得到的所述虚拟渲染目标对象在所述图像中的位置,输出与所述虚拟渲染目标对象匹配的虚拟信息图像;For example, the augmented reality special effect input data of the target object may be input into a preset three-dimensional model, and the output matches the virtual rendering target object based on the position of the virtual rendering target object obtained by the three-dimensional calibration in the image. virtual information images;
又如,可以将所述虚拟渲染目标对象的增强现实特效输入数据,输入至预设的机器学习模型,基于三维标定得到的所述虚拟渲染目标对象在所述图像中的位置,输出与所述虚拟渲染目标对象匹配的虚拟信息图像。For another example, the augmented reality special effects input data of the virtual rendering target object can be input into a preset machine learning model, and the position of the virtual rendering target object in the image obtained based on the three-dimensional calibration can be output and the same as that of the virtual rendering target object. A virtual information image that matches the virtual render target object.
S153,将所述虚拟信息图像与所述虚拟视点的图像进行合成处理并展示。S153 , synthesizing and displaying the virtual information image and the image of the virtual viewpoint.
在具体实施中,可以有多种方式将所述虚拟信息图像与所述虚拟视点的图像进行合成处理并展示,以下给出两种具体可实现示例:In a specific implementation, the virtual information image and the image of the virtual viewpoint can be synthesized and displayed in various ways. Two specific implementation examples are given below:
示例一:将所述虚拟信息图像与对应的图像进行融合处理,得到融合图像,对所述融合图像进行展示;Example 1: Perform fusion processing on the virtual information image and the corresponding image to obtain a fusion image, and display the fusion image;
示例二:将所述虚拟信息图像叠加在对应的图像之上,得到叠加合成图像,对所述叠加合成图像进行展示。Example 2: The virtual information image is superimposed on the corresponding image to obtain a superimposed composite image, and the superimposed composite image is displayed.
在具体实施中,可以将得到的合成图像直接展示,也可以将得到的合成图像插入待播放的视频流进行播放展示。例如,可以将所述融合图像插入待播放视频流进行播放展 示。In a specific implementation, the obtained composite image may be displayed directly, or the obtained composite image may be inserted into a video stream to be played for playback and display. For example, the fused image can be inserted into the video stream to be played for playback display.
自由视点视频中可以包括特效展示标识,在具体实施中,可以基于特效展示标识,确定所述虚拟信息图像在所述虚拟视点的图像中的叠加位置,之后,可以将所述虚拟信息图像在所确定的叠加位置进行叠加展示。The free viewpoint video may include a special effect display identifier. In a specific implementation, the superimposed position of the virtual information image in the image of the virtual viewpoint may be determined based on the special effect display identifier, and then the virtual information image may be placed in the image of the virtual viewpoint. The determined superposition position is displayed in superposition.
为使本领域技术人员更好地理解和实施,以下通过一交互终端的图像展示过程进行详细说明。参照图16至图20所示的交互终端的视频播放画面示意图,交互终端T1实时地进行视频的播放。其中,参照图16,展示视频帧P1,接下来,交互终端所展示的视频帧P2中包含特效展示标识I1等多个特效展示标识,视频帧P2中通过指向目标对象的倒三角符号表示,如图17所示。可以理解的是,也可以采用其他的方式展示所述特效展示标识。终端用户触摸点击所述特效展示标识I1,则系统自动获取对应于所述特效展示标识I1的虚拟信息图像,将所述虚拟信息图像叠加展示在视频帧P3中,如图18所示,以运动员Q1站立的场地位置为中心,渲染出一个立体圆环R1。接下来,如图19及图20所示,终端用户触摸点击视频帧P3中的特效展示标识I2,系统自动获取对应于所述特效展示标识I2的虚拟信息图像,将所述虚拟信息图像叠加展示在视频帧P3上,得到叠加图像,即视频帧P4,其中展示了命中率信息展示板M0。命中率信息展示板M0上展示了目标对象即运动员Q2的号位、姓名及命中率信息。In order to make those skilled in the art better understand and implement, a detailed description is given below through an image display process of an interactive terminal. Referring to the schematic diagrams of the video playback screens of the interactive terminal shown in FIG. 16 to FIG. 20 , the interactive terminal T1 plays the video in real time. 16 , the video frame P1 is displayed. Next, the video frame P2 displayed by the interactive terminal includes a plurality of special effect display marks such as the special effect display mark I1. The video frame P2 is represented by an inverted triangle symbol pointing to the target object, such as shown in Figure 17. It can be understood that, the special effect display logo may also be displayed in other manners. The terminal user touches and clicks on the special effect display identifier I1, then the system automatically acquires the virtual information image corresponding to the special effect display identifier I1, and superimposes the virtual information image on the video frame P3 and displays it in the video frame P3, as shown in FIG. The position of the site where Q1 stands is the center, and a three-dimensional ring R1 is rendered. Next, as shown in FIG. 19 and FIG. 20 , the end user touches and clicks the special effect display identifier I2 in the video frame P3, and the system automatically acquires the virtual information image corresponding to the special effect display identifier I2, and displays the virtual information image in a superimposed manner. On the video frame P3, a superimposed image is obtained, that is, the video frame P4, in which the hit rate information display board M0 is displayed. The hit rate information display board M0 displays the number position, name and hit rate information of the target object, namely the athlete Q2.
如图16、图17、图18、图19和图20所示,终端用户可以继续点击视频帧中展示的其他特效展示标识,观看展示各特效展示标识相应的AR特效的视频。As shown in Figure 16 , Figure 17 , Figure 18 , Figure 19 and Figure 20 , the end user can continue to click other special effect display logos displayed in the video frame to watch the video showing the AR special effect corresponding to each special effect display logo.
可以理解的是,可以通过不同类型的特效展示标识区分不同类型的植入特效。It can be understood that different types of implanted special effects can be distinguished by different types of special effect display signs.
参照图21所示的深度图处理装置的结构示意图,其中,深度图处理装置210可以包括:估计深度图获取单元211、深度值获取单元212、量化参数数据获取单元213和量化处理单元214,具体地:Referring to the schematic structural diagram of the depth map processing apparatus shown in FIG. 21, the depth map processing apparatus 210 may include: an estimated depth map obtaining unit 211, a depth value obtaining unit 212, a quantization parameter data obtaining unit 213, and a quantization processing unit 214, specifically land:
所述估计深度图获取单元211,适于获取基于多个帧同步的纹理图生成的估计深度图,所述多个纹理图的视角不同;The estimated depth map acquiring unit 211 is adapted to acquire an estimated depth map generated based on multiple frame-synchronized texture maps, and the multiple texture maps have different viewing angles;
深度值获取单元212,适于获取所述深度图中像素的深度值;a depth value obtaining unit 212, adapted to obtain the depth value of the pixel in the depth map;
量化参数数据获取单元213,适于获取与所述估计深度图视角对应的量化参数数据;a quantization parameter data acquisition unit 213, adapted to acquire quantization parameter data corresponding to the estimated depth map view angle;
量化处理单元214,适于基于与所述估计深度图视角对应的量化参数数据,对所述估计深度图中像素的深度值进行量化处理,得到量化深度图中对应像素的量化深度值。The quantization processing unit 214 is adapted to perform quantization processing on the depth values of the pixels in the estimated depth map based on the quantization parameter data corresponding to the perspective of the estimated depth map, to obtain the quantized depth values of the corresponding pixels in the quantized depth map.
在具体实施中,量化参数数据获取单元213,适于获取所述估计深度图对应视角的距离光心的深度距离最小值和距离光心的深度距离最大值;相应地,所述量化处理单元214,可以基于所述估计深度图对应视角的距离光心的深度距离最小值和距离光心的深度距离最大值,采用对应的量化公式对所述估计深度图中的对应像素的深度值进行量化处理,得到量化深度图中对应像素的量化深度值。In a specific implementation, the quantization parameter data obtaining unit 213 is adapted to obtain the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map; correspondingly, the quantization processing unit 214 , based on the minimum depth distance from the optical center and the maximum depth distance from the optical center of the corresponding viewing angle of the estimated depth map, the corresponding quantization formula can be used to quantize the depth value of the corresponding pixel in the estimated depth map. , to obtain the quantized depth value of the corresponding pixel in the quantized depth map.
所述量化处理单元214的具体量化原理以及可以采用的具体量化公式等均可以参见 前述实施例中的描述。For the specific quantization principle of the quantization processing unit 214 and the specific quantization formula that can be used, reference may be made to the descriptions in the foregoing embodiments.
作为一种可选示例,继续参照图21,深度图处理装置210还可以包括:降采样处理单元215和拼接单元216,其中:As an optional example, continuing to refer to FIG. 21 , the depth map processing apparatus 210 may further include: a downsampling processing unit 215 and a stitching unit 216, wherein:
所述降采样处理单元215,适于对所述量化深度图进行降采样处理,得到第一深度图;The downsampling processing unit 215 is adapted to perform downsampling processing on the quantized depth map to obtain a first depth map;
拼接单元216,适于将所述同步的多个视角的纹理图和对应视角的第一深度图按照预设的拼接方式进行拼接,得到拼接图像。The splicing unit 216 is adapted to splicing the synchronized texture maps of multiple viewing angles and the first depth map of the corresponding viewing angles according to a preset splicing method to obtain a spliced image.
参照图22所示的自由视点视频重建装置的结构示意图,其中,自由视点视频重建装置220可以包括:第一视频获取单元221、第一量化深度值获取单元222、第一量化参数数据获取单元223、第一深度图反量化处理单元224和第一图像重建单元225,具体地:Referring to the schematic structural diagram of the free-viewpoint video reconstruction apparatus shown in FIG. 22 , the free-viewpoint video reconstruction apparatus 220 may include: a first video acquisition unit 221 , a first quantized depth value acquisition unit 222 , and a first quantization parameter data acquisition unit 223 , the first depth map inverse quantization processing unit 224 and the first image reconstruction unit 225, specifically:
所述第一视频获取单元221,适于获取自由视点视频,所述自由视点视频包括多个帧时刻的拼接图像和所述拼接图像对应的参数数据,所述拼接图像包括同步的多个视角的纹理图和对应视角的量化深度图,所述拼接图像对应的参数数据包括:对应视角的估计深度图的量化参数数据和摄像机参数数据;The first video acquisition unit 221 is adapted to acquire a free-view video, the free-view video includes a spliced image at multiple frame moments and parameter data corresponding to the spliced image, and the spliced image includes synchronized multiple viewing angles. A texture map and a quantized depth map corresponding to a viewing angle, and the parameter data corresponding to the spliced image includes: quantized parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle;
所述第一量化深度值获取单元222,适于所述量化深度图中像素的量化深度值;The first quantized depth value obtaining unit 222 is adapted to the quantized depth values of the pixels in the quantized depth map;
所述第一量化参数数据获取单元223,适于获取与所述量化深度图视角对应的量化参数数据;The first quantization parameter data obtaining unit 223 is adapted to obtain quantization parameter data corresponding to the quantized depth map view angle;
所述第一深度图反量化处理单元224,适于基于与所述量化深度图视角对应的量化参数数据,对对应视角的量化深度图进行反量化处理,得到对应的估计深度图;The first depth map inverse quantization processing unit 224 is adapted to perform inverse quantization processing on the quantized depth map of the corresponding view angle based on the quantization parameter data corresponding to the view angle of the quantized depth map to obtain a corresponding estimated depth map;
所述第一图像重建单元225,适于基于多个视角的纹理图以及对应视角的估计深度图,根据获取得到的虚拟视点的位置信息以及所述拼接图像对应的摄像机参数数据,重建得到所述虚拟视点的图像。The first image reconstruction unit 225 is adapted to reconstruct the obtained image based on the texture maps of multiple viewing angles and the estimated depth maps of the corresponding viewing angles, according to the obtained position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image. Image of virtual viewpoint.
在具体实施中,在具体实施中,所述第一量化参数数据获取单元223,适于获取所述估计深度图对应视角的距离光心的深度距离最小值和距离光心的深度距离最大值;相应地,所述第一深度图反量化处理单元224,适于基于所述估计深度图对应视角的距离光心的深度距离最小值和距离光心的深度距离最大值,采用对应的反量化公式对所述量化深度图中的量化深度值进行反量化处理,得到对应视角的估计深度图对应像素的深度值。In a specific implementation, in a specific implementation, the first quantization parameter data acquisition unit 223 is adapted to acquire the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map; Correspondingly, the first depth map inverse quantization processing unit 224 is adapted to adopt the corresponding inverse quantization formula based on the minimum depth distance from the optical center and the maximum depth distance from the optical center based on the corresponding viewing angle of the estimated depth map. Perform inverse quantization processing on the quantized depth values in the quantized depth map to obtain the depth values of the pixels corresponding to the estimated depth map corresponding to the viewing angle.
在本说明书一些实施例中,所述第一深度图反量化处理单元224所采用的具体反量化公式可以参见前述实施例,此处不再赘述。In some embodiments of the present specification, the specific inverse quantization formula used by the first depth map inverse quantization processing unit 224 may refer to the foregoing embodiments, and details are not repeated here.
参照图23,本说明书实施例还提供了一种自由视点视频处理装置,如图23所示,自由视点视频处理装置230可以包括:第二视频获取单元231、第二量化深度值获取单元232、第二深度图反量化处理单元233、虚拟视点位置确定单元234和第二图像重建单元235,其中:Referring to FIG. 23 , an embodiment of the present specification further provides a free-view video processing apparatus. As shown in FIG. 23 , the free-view video processing apparatus 230 may include: a second video obtaining unit 231 , a second quantized depth value obtaining unit 232 , The second depth map inverse quantization processing unit 233, the virtual viewpoint position determination unit 234, and the second image reconstruction unit 235, wherein:
所述第二视频获取单元231,适于获取自由视点视频,所述自由视点视频包括多个帧时刻的拼接图像和所述拼接图像对应的参数数据,所述拼接图像包括同步的多个视角的纹理图和对应视角的量化深度图,所述拼接图像对应的参数数据包括:对应视角的估计深度图的量化参数数据和摄像机参数数据;The second video acquisition unit 231 is adapted to acquire a free-view video, the free-view video includes a spliced image at multiple frame moments and parameter data corresponding to the spliced image, and the spliced image includes synchronized multiple viewing angles. A texture map and a quantized depth map corresponding to a viewing angle, and the parameter data corresponding to the spliced image includes: quantized parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle;
所述第二量化深度值获取单元232,适于获取所述量化深度图中像素的量化深度值;The second quantized depth value obtaining unit 232 is adapted to obtain the quantized depth value of the pixel in the quantized depth map;
所述第二深度图反量化处理单元233,适于获取并基于所述量化深度图对应视角的估计深度图的量化参数数据,对所述量化深度图中像素的量化深度值进行反量化处理,得到对应的估计深度图;The second depth map inverse quantization processing unit 233 is adapted to obtain and perform inverse quantization processing on the quantized depth values of the pixels in the quantized depth map based on the quantization parameter data of the estimated depth map of the viewing angle corresponding to the quantized depth map, Get the corresponding estimated depth map;
所述虚拟视点位置确定单元234,适于响应于用户交互行为,确定虚拟视点的位置信息;The virtual viewpoint location determining unit 234 is adapted to determine location information of the virtual viewpoint in response to user interaction;
所述第二图像重建单元235,适于基于同步的多个视角的纹理图以及对应视角的估计深度图,根据虚拟视点的位置信息以及所述拼接图像对应的摄像机参数数据,重建得到所述虚拟视点的图像。The second image reconstruction unit 235 is adapted to reconstruct the virtual view point according to the position information of the virtual view point and the camera parameter data corresponding to the stitched image based on the synchronized texture maps of multiple viewing angles and the estimated depth map of the corresponding viewing angles. Viewpoint image.
本说明书实施例中自由视点视频处理装置的具体实现可以参见前述自由视点视频处理方法,此处不再赘述。For the specific implementation of the free-viewpoint video processing apparatus in the embodiments of the present specification, reference may be made to the aforementioned free-viewpoint video processing method, which will not be repeated here.
本说明书还提供了一种电子设备,参照图24所示的电子设备的结构示意图,其中电子设备240可以包括存储器241和处理器242,所述存储器241上存储有可在所述处理器242上运行的计算机指令,其中,所述处理器242运行所述计算机指令时可以执行前述任一实施例所述方法的步骤,具体步骤、原理等可以参见前述对应的方法实施例,此处不再赘述。This specification also provides an electronic device. Referring to the schematic structural diagram of the electronic device shown in FIG. 24 , the electronic device 240 may include a memory 241 and a processor 242. running computer instructions, wherein, when the processor 242 runs the computer instructions, the steps of the methods described in any of the foregoing embodiments may be executed. For specific steps, principles, etc., refer to the foregoing corresponding method embodiments, which will not be repeated here. .
在具体实施中,所述电子设备基于具体的方案,可以设置在服务侧作为服务器或云端设备,也可以设置在用户侧,作为终端设备。In a specific implementation, based on a specific solution, the electronic device can be set on the service side as a server or cloud device, or can be set on the user side as a terminal device.
本说明书实施例还提供了一种对应的服务端设备,参照图25所示的服务端设备的结构示意图,在具体实施中,如图25所示,所述服务端设备250可以包括处理器251和通信组件252,其中:The embodiment of this specification also provides a corresponding server device. Referring to the schematic structural diagram of the server device shown in FIG. 25 , in a specific implementation, as shown in FIG. 25 , the server device 250 may include a processor 251 and communication component 252, where:
所述处理器251,适于执行前述任一实施例所述的深度图处理方法的步骤,得到量化深度图,将同步的多个视角的纹理图和对应视角的第一深度图按照预设的拼接方式进行拼接,得到拼接图像,以及将多个帧的拼接图像及对应的参数数据进行封装处理,得到自由视点视频;The processor 251 is adapted to perform the steps of the depth map processing method described in any of the foregoing embodiments to obtain a quantized depth map, and to synchronize the texture maps of multiple viewing angles and the first depth map of the corresponding viewing angle according to a preset value. The splicing method is spliced to obtain a spliced image, and the spliced images of multiple frames and the corresponding parameter data are packaged to obtain a free-view video;
所述通信组件252,适于传输所述自由视点视频。The communication component 252 is adapted to transmit the free-view video.
本说明书实施例还提供了一种终端设备,参照图26所示的终端设备的结构示意图,在具体实施中,如图26所示,终端设备260可以包括通信组件261、处理器262和显示组件263,其中:An embodiment of this specification also provides a terminal device. Referring to the schematic structural diagram of the terminal device shown in FIG. 26 , in a specific implementation, as shown in FIG. 26 , the terminal device 260 may include a communication component 261 , a processor 262 and a display component 263, of which:
所述通信组件261,适于获取自由视点视频;the communication component 261, adapted to obtain free-view video;
所述处理器262,适于执行前述任一实施例所述的自由视点视频重建方法或自由视点视频处理方法的步骤,具体步骤可以参见前述自由视点视频重建方法及自由视点视频处理方法实施例中的描述,此处不再赘述。The processor 262 is adapted to execute the steps of the free-viewpoint video reconstruction method or the free-viewpoint video processing method described in any of the foregoing embodiments. For specific steps, refer to the foregoing embodiments of the free-viewpoint video reconstruction method and the free-viewpoint video processing method. description, which will not be repeated here.
所述显示组件263,适于显示所述处理器262得到的重建图像。The display component 263 is adapted to display the reconstructed image obtained by the processor 262 .
在本说明书实施例中,终端设备可以为手机等移动终端、平板电脑、个人电脑、电视机或者任意一种终端设备与外接显示装置的组合。In the embodiments of this specification, the terminal device may be a mobile terminal such as a mobile phone, a tablet computer, a personal computer, a television, or a combination of any terminal device and an external display device.
本说明书实施例还提供了一种计算机可读存储介质,其上存储有计算机指令,其中,所述计算机指令运行时执行前述任一实施例所述自由视点视频重建方法或自由视点视频处理方法的步骤,具体可以参见前述具体实施例,此处不再赘述。The embodiments of the present specification also provide a computer-readable storage medium on which computer instructions are stored, wherein, when the computer instructions are run, the execution of the free-viewpoint video reconstruction method or the free-viewpoint video processing method described in any of the foregoing embodiments is performed. For details of the steps, reference may be made to the foregoing specific embodiments, which will not be repeated here.
在具体实施中,所述计算机可读存储介质可以是光盘、机械硬盘、固态硬盘等各种适当的可读存储介质。In a specific implementation, the computer-readable storage medium may be various suitable readable storage mediums such as an optical disc, a mechanical hard disk, and a solid-state hard disk.
虽然本说明书实施例披露如上,但本发明并非限定于此。任何本领域技术人员,在不脱离本说明书实施例的精神和范围内,均可作各种更动与修改,因此本发明的保护范围应当以权利要求所限定的范围为准。Although the embodiments of the present specification are disclosed as above, the present invention is not limited thereto. Any person skilled in the art can make various changes and modifications without departing from the spirit and scope of the embodiments of this specification. Therefore, the protection scope of the present invention should be based on the scope defined by the claims.

Claims (22)

  1. 一种深度图处理方法,其中,包括:A depth map processing method, comprising:
    获取基于多个帧同步的纹理图生成的估计深度图,所述多个纹理图的视角不同;obtaining an estimated depth map generated based on multiple frame-synchronized texture maps, the multiple texture maps having different viewing angles;
    获取所述估计深度图中像素的深度值;obtaining the depth value of the pixel in the estimated depth map;
    获取并基于与所述估计深度图视角对应的量化参数数据,对所述估计深度图中像素的深度值进行量化处理,得到量化深度图中对应像素的量化深度值。Obtaining and performing quantization processing on the depth values of the pixels in the estimated depth map based on the quantization parameter data corresponding to the angle of view of the estimated depth map, to obtain the quantized depth values of the corresponding pixels in the quantized depth map.
  2. 根据权利要求1所述的方法,其中,所述获取并基于与所述估计深度图视角对应的量化参数数据,对所述估计深度图中的对应像素的深度值进行量化处理,得到量化深度图中对应像素的量化深度值,包括:The method according to claim 1, wherein the acquiring and based on the quantization parameter data corresponding to the perspective of the estimated depth map, perform quantization processing on the depth values of the corresponding pixels in the estimated depth map to obtain a quantized depth map The quantized depth value of the corresponding pixel in , including:
    获取所述估计深度图对应视角的距离光心的深度距离最小值和距离光心的深度距离最大值;Obtain the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map;
    基于所述估计深度图对应视角的距离光心的深度距离最小值和距离光心的深度距离最大值,采用对应的量化公式对所述估计深度图中的对应像素的深度值进行量化处理,得到量化深度图中对应像素的量化深度值。Based on the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map, a corresponding quantization formula is used to quantize the depth values of the corresponding pixels in the estimated depth map, and obtain The quantized depth value of the corresponding pixel in the quantized depth map.
  3. 根据权利要求2所述的方法,其中,所述基于所述估计深度图对应视角的距离光心的深度距离最小值和距离光心的深度距离最大值,采用对应的量化公式对所述估计深度图中的对应像素的深度值进行量化处理,得到量化深度图中对应像素的量化深度值,包括:The method according to claim 2, wherein, based on the minimum depth distance from the optical center and the maximum depth distance from the optical center of the corresponding viewing angle based on the estimated depth map, a corresponding quantization formula is used for the estimated depth The depth value of the corresponding pixel in the figure is quantized to obtain the quantized depth value of the corresponding pixel in the quantized depth map, including:
    采用如下量化公式对估计深度图中的对应像素的深度值进行量化处理:The depth value of the corresponding pixel in the estimated depth map is quantized using the following quantization formula:
    Figure PCTCN2021102335-appb-100001
    Figure PCTCN2021102335-appb-100001
    其中,M为所述估计深度图对应像素的量化比特位,range为所述估计深度图中对应像素的深度值,Depth为所述估计深度图中对应像素的量化深度值,N为所述估计深度图对应视角,depth_range_near_N为视角N对应的估计深度图中的距离光心的深度距离最小值,depth_range_far_N为视角N对应的估计深度图中距离光心的深度距离最大值。Among them, M is the quantization bit of the corresponding pixel in the estimated depth map, range is the depth value of the corresponding pixel in the estimated depth map, Depth is the quantized depth value of the corresponding pixel in the estimated depth map, and N is the estimated depth value. The depth map corresponds to the viewing angle, depth_range_near_N is the minimum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N, and depth_range_far_N is the maximum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N.
  4. 根据权利要求1所述的方法,其中,还包括:The method of claim 1, further comprising:
    对所述量化深度图进行降采样处理,得到第一深度图;performing down-sampling processing on the quantized depth map to obtain a first depth map;
    将所述帧同步的多个视角的纹理图和对应视角的第一深度图按照预设的拼接方式进行拼接,得到拼接图像。The texture maps of the frame-synchronized multiple viewing angles and the first depth map of the corresponding viewing angles are spliced according to a preset splicing method to obtain a spliced image.
  5. 一种自由视点视频重建方法,其中,包括:A free-viewpoint video reconstruction method, comprising:
    获取自由视点视频,所述自由视点视频包括多个帧时刻的拼接图像和所述拼接图像对应的参数数据,所述拼接图像包括同步的多个视角的纹理图和对应视角的量化深度 图,所述拼接图像对应的参数数据包括:对应视角的估计深度图的量化参数数据和摄像机参数数据;Obtain a free-view video, where the free-view video includes spliced images at multiple frame moments and parameter data corresponding to the spliced images, and the spliced images include synchronized texture maps of multiple viewing angles and quantized depth maps of corresponding viewing angles, so The parameter data corresponding to the spliced image includes: quantization parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle;
    获取所述量化深度图中像素的量化深度值;obtaining the quantized depth value of the pixel in the quantized depth map;
    获取并基于所述量化深度图对应视角的估计深度图的量化参数数据,对所述量化深度图中像素的量化深度值进行反量化处理,得到对应的估计深度图;Obtaining and performing inverse quantization processing on the quantized depth values of the pixels in the quantized depth map based on the quantization parameter data of the estimated depth map of the corresponding viewing angle of the quantized depth map to obtain a corresponding estimated depth map;
    基于同步的多个视角的纹理图以及对应视角的估计深度图,根据虚拟视点的位置信息以及所述拼接图像对应的摄像机参数数据,重建得到所述虚拟视点的图像。Based on the synchronized texture maps of multiple viewpoints and the estimated depth map of the corresponding viewpoints, the image of the virtual viewpoint is reconstructed according to the position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image.
  6. 根据权利要求5所述的方法,其中,所述获取并基于所述量化深度图对应视角的估计深度图的量化参数数据,对所述量化深度图中的量化深度值进行反量化处理,得到对应视角的估计深度图,包括:The method according to claim 5, wherein the obtaining and based on the quantization parameter data of the estimated depth map of the view corresponding to the quantized depth map, perform inverse quantization processing on the quantized depth value in the quantized depth map to obtain the corresponding The estimated depth map of the viewpoint, including:
    获取所述估计深度图对应视角的距离光心的深度距离最小值和距离光心的深度距离最大值;Obtain the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map;
    基于所述估计深度图对应视角的距离光心的深度距离最小值和距离光心的深度距离最大值,采用对应的反量化公式对所述量化深度图中的量化深度值进行反量化处理,得到对应视角的估计深度图对应像素的深度值。Based on the minimum depth distance from the optical center and the maximum depth distance from the optical center of the viewing angle corresponding to the estimated depth map, the corresponding inverse quantization formula is used to perform inverse quantization processing on the quantized depth values in the quantized depth map, to obtain The estimated depth map for the corresponding view corresponds to the depth value of the corresponding pixel.
  7. 根据权利要求6所述的方法,其中,所述基于所述估计深度图对应视角的距离光心的深度距离最小值和距离光心的深度距离最大值,采用对应的反量化公式对所述量化深度图中的量化深度值进行反量化处理,得到对应视角的估计深度图对应像素的深度值,包括:The method according to claim 6, wherein, based on the minimum depth distance from the optical center and the maximum depth distance from the optical center based on the viewing angle corresponding to the estimated depth map, a corresponding inverse quantization formula is used to quantize the quantization The quantized depth value in the depth map is subjected to inverse quantization processing to obtain the depth value of the pixel corresponding to the estimated depth map of the corresponding viewing angle, including:
    采用如下反量化公式对所述量化深度图中的量化深度值进行反量化处理,得到估计深度图中的对应像素值:The following inverse quantization formula is used to inverse quantize the quantized depth values in the quantized depth map to obtain corresponding pixel values in the estimated depth map:
    Figure PCTCN2021102335-appb-100002
    Figure PCTCN2021102335-appb-100002
    Figure PCTCN2021102335-appb-100003
    Figure PCTCN2021102335-appb-100003
    Figure PCTCN2021102335-appb-100004
    Figure PCTCN2021102335-appb-100004
    其中,M为所述量化深度图对应像素的量化比特位,range为所述估计深度图中对应像素的深度值,Depth为所述量化深度图中对应像素的量化深度值,N为所述估计深度图对应视角,depth_range_near_N为视角N对应的估计深度图中的距离光心的深度距离最小值,depth_range_far_N为视角N对应的估计深度图中距离光心的深度距离最大值,maxdisp为视角N对应的量化深度距离最大值,mindisp为视角N对应的量化深度距离最小值。Wherein, M is the quantization bit of the corresponding pixel in the quantized depth map, range is the depth value of the corresponding pixel in the estimated depth map, Depth is the quantized depth value of the corresponding pixel in the quantized depth map, and N is the estimated depth The depth map corresponds to the viewing angle, depth_range_near_N is the minimum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N, depth_range_far_N is the maximum depth distance from the optical center in the estimated depth map corresponding to the viewing angle N, and maxdisp is the corresponding viewing angle N. The maximum value of the quantized depth distance, and mindisp is the minimum value of the quantized depth distance corresponding to the viewing angle N.
  8. 根据权利要求5至7任一项所述的方法,其中,所述量化深度图的分辨率小于对应视角的纹理图的分辨率;在重建得到所述虚拟视点的图像之前,还包括:The method according to any one of claims 5 to 7, wherein the resolution of the quantized depth map is smaller than the resolution of the texture map of the corresponding viewing angle; before reconstructing to obtain the image of the virtual viewpoint, further comprising:
    对所述对应视角的估计深度图进行升采样,得到用于重建所述虚拟视点的图像的第 二深度图。Upsampling the estimated depth map of the corresponding viewpoint to obtain a second depth map for reconstructing the image of the virtual viewpoint.
  9. 根据权利要求8所述的方法,其中,所述对所述对应视角的估计深度图进行升采样,得到用于重建所述虚拟视点图像的第二深度图,包括:The method according to claim 8, wherein the up-sampling of the estimated depth map of the corresponding view to obtain a second depth map for reconstructing the virtual view image, comprising:
    获取所述估计深度图中像素的深度值,作为所述第二深度图中对应的偶数行及偶数列的像素值;acquiring the depth value of the pixel in the estimated depth map as the pixel value of the even-numbered row and the even-numbered column corresponding to the second depth map;
    对于所述第二深度图中偶数行奇数列的像素的深度值,确定对应纹理图中对应像素作为中间像素,基于对应纹理图中所述中间像素的亮度通道值与所述中间像素对应的左侧像素的亮度通道值和右侧像素的亮度通道值之间的关系确定;For the depth values of pixels in even rows and odd columns in the second depth map, determine the corresponding pixel in the corresponding texture map as an intermediate pixel, and based on the luminance channel value of the intermediate pixel in the corresponding texture map and the left pixel corresponding to the intermediate pixel The relationship between the luminance channel value of the side pixel and the luminance channel value of the right pixel is determined;
    对于所述第二深度图中奇数行像素的深度值,确定对应纹理图中的对应像素作为中间像素,基于对应的纹理图中所述中间像素的亮度通道值与所述中间像素对应的上方像素的亮度通道值和下方像素的亮度通道值之间的关系确定。For the depth values of pixels in odd rows in the second depth map, determine the corresponding pixel in the corresponding texture map as an intermediate pixel, based on the luminance channel value of the intermediate pixel in the corresponding texture map and the upper pixel corresponding to the intermediate pixel The relationship between the luminance channel value of and the luminance channel value of the underlying pixel is determined.
  10. 根据权利要求5至7任一项所述的方法,其中,所述基于同步的多个视角的纹理图以及对应视角的估计深度图,根据获取得到的虚拟视点的位置信息以及所述拼接图像对应的摄像机参数数据,重建得到所述虚拟视点的图像,包括:The method according to any one of claims 5 to 7, wherein the texture map based on the synchronized multiple viewing angles and the estimated depth map of the corresponding viewing angle are obtained according to the obtained position information of the virtual viewpoint and the corresponding stitched image. The camera parameter data is reconstructed to obtain the image of the virtual viewpoint, including:
    根据所述虚拟视点的位置信息,以及所述拼接图像对应的摄像机参数数据,在所述同步的多个视角的纹理图和对应视角的估计深度图中,选择多个目标纹理图和目标深度图;According to the position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image, select multiple target texture maps and target depth maps from the synchronized texture maps of multiple viewing angles and the estimated depth maps of the corresponding viewing angles ;
    对所述目标纹理图和目标深度图进行组合渲染,得到所述虚拟视点的图像。The target texture map and the target depth map are combined and rendered to obtain the image of the virtual viewpoint.
  11. 一种自由视点视频处理方法,其中,包括:A free-viewpoint video processing method, comprising:
    获取自由视点视频,所述自由视点视频包括多个帧时刻的拼接图像和所述拼接图像对应的参数数据,所述拼接图像包括同步的多个视角的纹理图和对应视角的量化深度图,所述拼接图像对应的参数数据包括:对应视角的估计深度图的量化参数数据和摄像机参数数据;Obtain a free-view video, where the free-view video includes spliced images at multiple frame moments and parameter data corresponding to the spliced images, and the spliced images include synchronized texture maps of multiple viewing angles and quantized depth maps of corresponding viewing angles, so The parameter data corresponding to the spliced image includes: quantization parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle;
    获取所述量化深度图中像素的量化深度值;obtaining the quantized depth value of the pixel in the quantized depth map;
    获取并基于所述量化深度图对应视角的估计深度图的量化参数数据,对所述量化深度图中像素的量化深度值进行反量化处理,得到对应的估计深度图;Obtaining and performing inverse quantization processing on the quantized depth values of the pixels in the quantized depth map based on the quantization parameter data of the estimated depth map of the corresponding viewing angle of the quantized depth map to obtain a corresponding estimated depth map;
    响应于用户交互行为,确定虚拟视点的位置信息;In response to the user interaction behavior, determine the location information of the virtual viewpoint;
    基于同步的多个视角的纹理图以及对应视角的估计深度图,根据虚拟视点的位置信息以及所述拼接图像对应的摄像机参数数据,重建得到所述虚拟视点的图像。Based on the synchronized texture maps of multiple viewpoints and the estimated depth map of the corresponding viewpoints, the image of the virtual viewpoint is reconstructed according to the position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image.
  12. 根据权利要求11所述的方法,其中,所述响应于用户交互行为,确定虚拟视点的位置信息,包括:响应于用户的手势交互操作,确定对应的虚拟视点路径信息;The method according to claim 11, wherein the determining the position information of the virtual viewpoint in response to the user interaction behavior comprises: determining the corresponding virtual viewpoint path information in response to the user's gesture interaction operation;
    所述基于同步的多个视角的纹理图以及对应视角的估计深度图,根据获取得到的虚拟视点的位置信息以及所述拼接图像对应的摄像机参数数据,重建得到所述虚拟视点的图像,包括:The texture map based on the synchronization of multiple viewing angles and the estimated depth map of the corresponding viewing angle, according to the obtained position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image, reconstruct the image of the virtual viewpoint, including:
    根据所述虚拟视点路径信息,选取相应帧时刻的拼接图像中的纹理图和对应视角的 估计深度图,作为目标纹理图和目标深度图;According to the virtual viewpoint path information, select the texture map in the spliced image of the corresponding frame moment and the estimated depth map of the corresponding viewing angle, as the target texture map and the target depth map;
    对所述目标纹理图和目标深度图进行组合渲染,得到所述虚拟视点的图像。The target texture map and the target depth map are combined and rendered to obtain the image of the virtual viewpoint.
  13. 根据权利要求11或12所述的方法,其中,还包括:The method of claim 11 or 12, further comprising:
    获取所述虚拟视点的图像中的虚拟渲染目标对象;acquiring the virtual rendering target object in the image of the virtual viewpoint;
    获取基于所述虚拟渲染目标对象的增强现实特效输入数据所生成的虚拟信息图像;acquiring a virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object;
    将所述虚拟信息图像与所述虚拟视点的图像进行合成处理并展示。The virtual information image and the image of the virtual viewpoint are synthesized and displayed.
  14. 根据权利要求13所述的方法,其中,所述获取基于所述虚拟渲染目标对象的增强现实特效输入数据所生成的虚拟信息图像,包括:The method according to claim 13, wherein the acquiring the virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object comprises:
    根据三维标定得到的所述虚拟渲染目标对象在所述虚拟视点的图像中的位置,得到与所述虚拟渲染目标对象位置匹配的虚拟信息图像。According to the position of the virtual rendering target object in the image of the virtual viewpoint obtained by the three-dimensional calibration, a virtual information image matching the position of the virtual rendering target object is obtained.
  15. 根据权利要求13所述的方法,其中,所述获取所述虚拟视点的图像中的虚拟渲染目标对象,包括:The method according to claim 13, wherein the acquiring the virtual rendering target object in the image of the virtual viewpoint comprises:
    响应于特效生成交互控制指令,获取所述虚拟视点的图像中的虚拟渲染目标对象。In response to the special effect generating the interactive control instruction, the virtual rendering target object in the image of the virtual viewpoint is acquired.
  16. 一种深度图处理装置,其中,包括:A depth map processing device, comprising:
    估计深度图获取单元,适于获取基于多个帧同步的纹理图生成的估计深度图,所述多个纹理图的视角不同;an estimated depth map acquiring unit, adapted to acquire an estimated depth map generated based on multiple frame-synchronized texture maps, the multiple texture maps having different viewing angles;
    深度值获取单元,适于获取所述深度图中像素的深度值;a depth value obtaining unit, adapted to obtain the depth value of the pixel in the depth map;
    量化参数数据获取单元,适于获取与所述估计深度图视角对应的量化参数数据;a quantization parameter data acquisition unit, adapted to acquire quantization parameter data corresponding to the estimated depth map viewing angle;
    量化处理单元,适于基于与所述估计深度图视角对应的量化参数数据,对所述估计深度图中像素的深度值进行量化处理,得到量化深度图中对应像素的量化深度值。The quantization processing unit is adapted to perform quantization processing on the depth values of the pixels in the estimated depth map based on the quantization parameter data corresponding to the angle of view of the estimated depth map, so as to obtain the quantized depth values of the corresponding pixels in the quantized depth map.
  17. 一种自由视点视频重建装置,其中,包括:A free-viewpoint video reconstruction device, comprising:
    第一视频获取单元,适于获取自由视点视频,所述自由视点视频包括多个帧时刻的拼接图像和所述拼接图像对应的参数数据,所述拼接图像包括同步的多个视角的纹理图和对应视角的量化深度图,所述拼接图像对应的参数数据包括:对应视角的估计深度图的量化参数数据和摄像机参数数据;The first video acquisition unit is adapted to acquire a free-view video, the free-view video includes spliced images at multiple frame moments and parameter data corresponding to the spliced images, and the spliced images include synchronized texture maps of multiple viewing angles and a quantized depth map corresponding to a viewing angle, and the parameter data corresponding to the spliced image includes: quantized parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle;
    第一量化深度值获取单元,适于所述量化深度图中像素的量化深度值;a first quantized depth value acquisition unit, adapted to the quantized depth values of the pixels in the quantized depth map;
    第一量化参数数据获取单元,适于获取与所述量化深度图视角对应的量化参数数据;a first quantization parameter data acquisition unit, adapted to acquire quantization parameter data corresponding to the perspective of the quantized depth map;
    第一深度图反量化处理单元,适于基于与所述量化深度图视角对应的量化参数数据,对对应视角的量化深度图进行反量化处理,得到对应的估计深度图;a first depth map inverse quantization processing unit, adapted to perform inverse quantization processing on the quantized depth map of the corresponding view angle based on the quantization parameter data corresponding to the view angle of the quantized depth map to obtain a corresponding estimated depth map;
    第一图像重建单元,适于基于多个视角的纹理图以及对应视角的估计深度图,根据获取得到的虚拟视点的位置信息以及所述拼接图像对应的摄像机参数数据,重建得到所述虚拟视点的图像。The first image reconstruction unit is adapted to reconstruct the obtained virtual viewpoint based on the texture maps of multiple viewpoints and the estimated depth map of the corresponding viewpoints, according to the obtained position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image. image.
  18. 一种自由视点视频处理装置,其中,包括:A free-viewpoint video processing device, comprising:
    第二视频获取单元,适于获取自由视点视频,所述自由视点视频包括多个帧时刻的 拼接图像和所述拼接图像对应的参数数据,所述拼接图像包括同步的多个视角的纹理图和对应视角的量化深度图,所述拼接图像对应的参数数据包括:对应视角的估计深度图的量化参数数据和摄像机参数数据;The second video acquisition unit is adapted to acquire a free-view video, where the free-view video includes spliced images at multiple frame moments and parameter data corresponding to the spliced images, and the spliced images include synchronized texture maps of multiple viewing angles and a quantized depth map corresponding to a viewing angle, and the parameter data corresponding to the spliced image includes: quantized parameter data and camera parameter data of the estimated depth map corresponding to the viewing angle;
    第二量化深度值获取单元,适于获取所述量化深度图中像素的量化深度值;a second quantized depth value obtaining unit, adapted to obtain the quantized depth value of the pixel in the quantized depth map;
    第二深度图反量化处理单元,适于获取并基于所述量化深度图对应视角的估计深度图的量化参数数据,对所述量化深度图中像素的量化深度值进行反量化处理,得到对应的估计深度图;The second depth map inverse quantization processing unit is adapted to obtain and perform inverse quantization processing on the quantized depth values of the pixels in the quantized depth map based on the quantization parameter data of the estimated depth map of the view angle corresponding to the quantized depth map, to obtain the corresponding estimated depth map;
    虚拟视点位置确定单元,适于响应于用户交互行为,确定虚拟视点的位置信息;a virtual viewpoint position determination unit, adapted to determine the location information of the virtual viewpoint in response to user interaction;
    第二图像重建单元,适于基于同步的多个视角的纹理图以及对应视角的估计深度图,根据虚拟视点的位置信息以及所述拼接图像对应的摄像机参数数据,重建得到所述虚拟视点的图像。The second image reconstruction unit is adapted to reconstruct the image of the virtual viewpoint based on the synchronized texture maps of multiple viewpoints and the estimated depth map of the corresponding viewpoints, according to the position information of the virtual viewpoint and the camera parameter data corresponding to the spliced image. .
  19. 一种电子设备,包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的计算机指令,其中,所述处理器运行所述计算机指令时执行权利要求1至4、权利要求5至10或权利要求11至15任一项所述方法的步骤。An electronic device comprising a memory and a processor, wherein the memory stores computer instructions that can be executed on the processor, wherein the processor executes claims 1 to 4 and claims when the processor executes the computer instructions 5 to 10 or the steps of any one of claims 11 to 15.
  20. 一种服务端设备,包括处理器和通信组件,其中:A server device including a processor and a communication component, wherein:
    所述处理器,适于执行权利要求1至4任一项所述的方法的步骤,得到量化深度图,将同步的多个视角的纹理图和对应视角的第一深度图按照预设的拼接方式进行拼接,得到拼接图像,以及将多个帧的拼接图像及对应的参数数据进行封装处理,得到自由视点视频;The processor is adapted to perform the steps of the method according to any one of claims 1 to 4, obtain a quantized depth map, and splices the synchronized texture maps of multiple viewing angles and the first depth map of the corresponding viewing angle according to a preset The spliced image is obtained by splicing in a way, and the spliced image of multiple frames and the corresponding parameter data are encapsulated to obtain a free-view video;
    所述通信组件,适于传输所述自由视点视频。The communication component is adapted to transmit the free-view video.
  21. 一种终端设备,包括通信组件、处理器和显示组件,其中:A terminal device including a communication component, a processor and a display component, wherein:
    所述通信组件,适于获取自由视点视频;the communication component, adapted to obtain free-view video;
    所述处理器,适于执行权利要求5至10或权利要求11至15任一项所述的方法的步骤;the processor adapted to perform the steps of the method of any one of claims 5 to 10 or claims 11 to 15;
    所述显示组件,适于显示所述处理器得到的重建图像。The display component is adapted to display the reconstructed image obtained by the processor.
  22. 一种计算机可读存储介质,其上存储有计算机指令,其中,所述计算机指令运行时执行权利要求1至4、权利要求5至10或权利要求11至15任一项所述方法的步骤。A computer-readable storage medium having computer instructions stored thereon, wherein the computer instructions, when executed, perform the steps of the method of any one of claims 1 to 4, 5 to 10, or 11 to 15.
PCT/CN2021/102335 2020-07-03 2021-06-25 Depth map and video processing and reconstruction methods and apparatuses, device, and storage medium WO2022001865A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010630749.X 2020-07-03
CN202010630749.XA CN113963094A (en) 2020-07-03 2020-07-03 Depth map and video processing and reconstruction method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2022001865A1 true WO2022001865A1 (en) 2022-01-06

Family

ID=79317406

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/102335 WO2022001865A1 (en) 2020-07-03 2021-06-25 Depth map and video processing and reconstruction methods and apparatuses, device, and storage medium

Country Status (2)

Country Link
CN (1) CN113963094A (en)
WO (1) WO2022001865A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230033177A1 (en) * 2021-07-30 2023-02-02 Zoox, Inc. Three-dimensional point clouds based on images and depth data
CN117197319A (en) * 2023-11-07 2023-12-08 腾讯科技(深圳)有限公司 Image generation method, device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014010583A1 (en) * 2012-07-09 2014-01-16 日本電信電話株式会社 Video image encoding/decoding method, device, program, recording medium
CN103905812A (en) * 2014-03-27 2014-07-02 北京工业大学 Texture/depth combination up-sampling method
CN105049866A (en) * 2015-07-10 2015-11-11 郑州轻工业学院 Rendering distortion model-based code rate allocation method of multi-viewpoint plus depth coding
JP2019184308A (en) * 2018-04-04 2019-10-24 日本放送協会 Depth estimation device and program, as well as virtual viewpoint video generator and its program
CN110495178A (en) * 2016-12-02 2019-11-22 华为技术有限公司 The device and method of 3D Video coding

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014010583A1 (en) * 2012-07-09 2014-01-16 日本電信電話株式会社 Video image encoding/decoding method, device, program, recording medium
CN103905812A (en) * 2014-03-27 2014-07-02 北京工业大学 Texture/depth combination up-sampling method
CN105049866A (en) * 2015-07-10 2015-11-11 郑州轻工业学院 Rendering distortion model-based code rate allocation method of multi-viewpoint plus depth coding
CN110495178A (en) * 2016-12-02 2019-11-22 华为技术有限公司 The device and method of 3D Video coding
JP2019184308A (en) * 2018-04-04 2019-10-24 日本放送協会 Depth estimation device and program, as well as virtual viewpoint video generator and its program

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230033177A1 (en) * 2021-07-30 2023-02-02 Zoox, Inc. Three-dimensional point clouds based on images and depth data
CN117197319A (en) * 2023-11-07 2023-12-08 腾讯科技(深圳)有限公司 Image generation method, device, electronic equipment and storage medium
CN117197319B (en) * 2023-11-07 2024-03-22 腾讯科技(深圳)有限公司 Image generation method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113963094A (en) 2022-01-21

Similar Documents

Publication Publication Date Title
US11037365B2 (en) Method, apparatus, medium, terminal, and device for processing multi-angle free-perspective data
WO2022002181A1 (en) Free viewpoint video reconstruction method and playing processing method, and device and storage medium
US10650590B1 (en) Method and system for fully immersive virtual reality
CN111669567B (en) Multi-angle free view video data generation method and device, medium and server
WO2022001865A1 (en) Depth map and video processing and reconstruction methods and apparatuses, device, and storage medium
CN111669564B (en) Image reconstruction method, system, device and computer readable storage medium
CN111669561B (en) Multi-angle free view image data processing method and device, medium and equipment
CN111669518A (en) Multi-angle free visual angle interaction method and device, medium, terminal and equipment
TW202029742A (en) Image synthesis
US11348252B1 (en) Method and apparatus for supporting augmented and/or virtual reality playback using tracked objects
CN111669569A (en) Video generation method and device, medium and terminal
CN111669604A (en) Acquisition equipment setting method and device, terminal, acquisition system and equipment
CN111669570B (en) Multi-angle free view video data processing method and device, medium and equipment
CN111669603B (en) Multi-angle free visual angle data processing method and device, medium, terminal and equipment
CN111669568B (en) Multi-angle free view angle interaction method and device, medium, terminal and equipment
WO2022022548A1 (en) Free viewpoint video reconstruction and playing processing method, device, and storage medium
CN111669571B (en) Multi-angle free view image data generation method and device, medium and equipment
CN114881898A (en) Multi-angle free visual angle image data generation method and device, medium and equipment
CN114007058A (en) Depth map correction method, video processing method, video reconstruction method and related devices

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21834038

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21834038

Country of ref document: EP

Kind code of ref document: A1