WO2021213301A1 - 深度图处理方法、视频重建方法及相关装置 - Google Patents

深度图处理方法、视频重建方法及相关装置 Download PDF

Info

Publication number
WO2021213301A1
WO2021213301A1 PCT/CN2021/088024 CN2021088024W WO2021213301A1 WO 2021213301 A1 WO2021213301 A1 WO 2021213301A1 CN 2021088024 W CN2021088024 W CN 2021088024W WO 2021213301 A1 WO2021213301 A1 WO 2021213301A1
Authority
WO
WIPO (PCT)
Prior art keywords
depth map
value
depth
processed
pixel
Prior art date
Application number
PCT/CN2021/088024
Other languages
English (en)
French (fr)
Inventor
盛骁杰
魏开进
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2021213301A1 publication Critical patent/WO2021213301A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/271Image signal generators wherein the generated image signals comprise depth maps or disparity maps
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/128Adjusting depth or disparity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/282Image signal generators for generating image signals corresponding to three or more geometrical viewpoints, e.g. multi-view systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20182Noise reduction or smoothing in the temporal domain; Spatio-temporal filtering

Definitions

  • the embodiments of this specification relate to the field of video processing technology, and in particular to a depth map processing method, a video reconstruction method, and related devices.
  • 6 Degrees of Freedom (6DoF) technology is a technology to provide a high degree of freedom viewing experience. Users can adjust the viewing angle through interactive operations during viewing, so that they can view from the free point of view they want to watch. Watch.
  • DIBR depth image-based rendering
  • the stability of the depth map in the time domain has an important influence on the quality of the final reconstructed image.
  • one aspect of the embodiments of this specification provides a depth map processing method and related device.
  • another aspect of the embodiments of this specification provides a video reconstruction method and related devices.
  • the embodiment of this specification provides a method for generating a depth map, including:
  • a depth map to be processed from an image combination of a current video frame with a multi-angle free view, where the image combination of the current video frame with a multi-angle free view includes multiple sets of texture maps and depth maps that are synchronized at multiple angles and have a corresponding relationship;
  • the window filter coefficient value is generated from the weight value of at least two dimensions, including: the first filter coefficient weight value corresponding to the pixel confidence level, which is adopted as follows Obtain the first filter coefficient weight value: Obtain the confidence value of the pixel corresponding to the position in the second depth map in the to-be-processed depth map, and determine the first filter coefficient weight value corresponding to the confidence value ,
  • the second depth map is a depth map with the same view angle as the to-be-processed depth map in each video frame of the video frame sequence;
  • the pixel corresponding to the position in the depth map to be processed is filtered according to a preset filtering method to obtain the filtered depth of the pixel corresponding to the position in the depth map to be processed value.
  • the weight value of the window filter coefficient further includes: at least one of the second filter coefficient weight value corresponding to the frame distance and the third filter coefficient weight value corresponding to the pixel similarity; and the second filter coefficient weight value is obtained in the following manner.
  • the obtaining the confidence value of the pixel corresponding to the position in the to-be-processed depth map and each second depth map includes at least one of the following:
  • the determining the confidence value of the pixel corresponding to the position in the to-be-processed depth map and each second depth map based on the third depth map of each corresponding viewing angle includes:
  • the to-be-processed depth map is The texture value of the corresponding position in the texture map corresponding to the depth map and the texture map corresponding to each second depth map is mapped to the corresponding position in the texture map corresponding to the third depth map of each corresponding view angle, and the third value of each corresponding view angle is obtained.
  • the mapped texture value is matched with the actual texture value of the corresponding position in the texture map corresponding to the third depth map of each corresponding viewing angle, based on the distribution interval of the matching degree of the texture value corresponding to the third depth map of each corresponding viewing angle , Determining the confidence value of the pixel corresponding to the position in the to-be-processed depth map and each second depth map.
  • the determining the confidence value of the pixel corresponding to the position in the to-be-processed depth map and each second depth map based on the third depth map of each corresponding viewing angle includes:
  • the distribution interval of the matching degree of the depth value corresponding to the image determines the confidence value of the pixel corresponding to the position in the depth image to be processed and each second depth image.
  • the determining the confidence value of the pixel corresponding to the position in the to-be-processed depth map and each second depth map based on the third depth map of each corresponding viewing angle includes:
  • the depth values of the pixels corresponding to the positions in the to-be-processed depth map and in each second depth map respectively map the corresponding pixel positions in the third depth map of the corresponding viewing angles according to the depth values, and obtain and obtain according to the corresponding viewing angles
  • the depth value of the corresponding pixel position in the third depth map is inversely mapped to the corresponding pixel position in the to-be-processed depth map and each second depth map, and the third depth map of each corresponding view angle is obtained in the to-be-processed depth map.
  • the depth to be processed is determined based on the spatial consistency of the pixels corresponding to the positions in the depth map to be processed and each second depth map and the pixels in a preset area around the depth map where the pixels are located
  • the confidence value of the pixel corresponding to the position in the image and each second depth image includes at least one of the following:
  • the pixels corresponding to the positions in the to-be-processed depth map and each second depth map are matched with the depth values of the pixels in the preset area around the depth map where the pixels are located, based on the degree of matching of the depth values And the number of pixels whose matching degree meets the preset pixel matching degree threshold, respectively determining the confidence value of the pixel corresponding to the position in the to-be-processed depth map and each second depth map;
  • the pixel corresponding to the position in the depth map to be processed and each second depth map is matched with the weighted average of the depth values of the pixels in the preset area around the depth map where the pixel is located, based on the to-be-processed
  • the degree of matching between the pixel corresponding to the position in the depth map and each second depth map and the corresponding weighted average value determines the confidence value of the pixel corresponding to the position in the depth map to be processed and each second depth map, respectively.
  • the depth value of the pixel corresponding to the position in the depth map to be processed is filtered according to a preset filtering method based on the window filter coefficient value corresponding to each video frame to obtain the position corresponding to the depth map to be processed
  • the filtered depth values of the pixels include:
  • the current video frame is located in the middle of the sequence of video frames.
  • the embodiment of this specification provides another depth map processing method, including:
  • a depth map to be processed from an image combination of a current video frame with a multi-angle free view, where the image combination of the current video frame with a multi-angle free view includes multiple sets of texture maps and depth maps that are synchronized at multiple angles and have a corresponding relationship;
  • the window filter coefficient value is generated from the weight value of at least two dimensions, including: the first filter coefficient weight value corresponding to the pixel confidence level, which is adopted as follows Obtain the weight value of the first filter coefficient: Obtain the depth map to be processed and the depth map within the preset viewing angle range around the corresponding viewing angle of each second depth map, and obtain the third depth map of the corresponding viewing angle, based on each corresponding The third depth map of the viewing angle, determining the confidence value of the pixel corresponding to the position in the to-be-processed depth map and each second depth map; and determining the first filter coefficient weight value corresponding to the confidence value;
  • the pixel corresponding to the position in the depth map to be processed is filtered according to a preset filtering method to obtain the filtered depth value of the pixel corresponding to the position in the depth map to be processed.
  • the embodiment of this specification also provides a video reconstruction method, including:
  • the virtual viewpoint position information and the parameter data corresponding to the image combination of the video frame according to a preset rule, select the texture map and the filtered depth map of the corresponding group in the image combination of the video frame at the moment of user interaction;
  • the texture map of the corresponding group in the image combination of the video frame at the selected user interaction time is selected.
  • the image and the filtered depth map are combined and rendered to obtain a reconstructed image corresponding to the position of the virtual viewpoint at the moment of user interaction.
  • the embodiment of this specification also provides a depth map processing device, including:
  • the depth map acquiring unit is adapted to acquire the depth map to be processed from the image combination of the current video frame of the multi-angle free view angle, and the image combination of the current video frame of the multi-angle free view angle includes multiple groups of synchronized angles with corresponding relationships The texture map and depth map;
  • a frame sequence obtaining unit adapted to obtain a video frame sequence including a preset window in the time domain of the current video frame
  • the window filter coefficient value obtaining unit is adapted to obtain a window filter coefficient value corresponding to each video frame in the video frame sequence, and the window filter coefficient value is generated by weight values of at least two dimensions, including: pixel confidence corresponding to A first filter coefficient weight value, and the window filter coefficient value acquisition unit includes: a first filter coefficient weight value acquisition subunit, adapted to acquire the confidence of the pixel corresponding to the position in the to-be-processed depth map and each second depth map Degree value, and determining the first filter coefficient weight value corresponding to the confidence value, wherein: the second depth map is a depth map with the same view angle as the to-be-processed depth map in each video frame of the video frame sequence;
  • the filtering unit is adapted to filter the pixels corresponding to the positions in the depth map to be processed according to a preset filtering method based on the window filter coefficient values corresponding to each video frame to obtain the pixel filtering corresponding to the positions in the depth map to be processed After the depth value.
  • the window filter coefficient value obtaining unit further includes at least one of the following:
  • the second filter coefficient weight value obtaining subunit is adapted to obtain the frame distance between each video frame in the video frame sequence and the current video frame, and determine the second filter coefficient weight value corresponding to the frame distance;
  • the third filter coefficient weight value obtaining subunit is adapted to obtain the similarity value of the pixel corresponding to the position of the texture image corresponding to each second depth map and the texture image corresponding to the depth image to be processed, and to determine the similarity value The corresponding weight value of the third filter coefficient.
  • the embodiment of this specification also provides a video reconstruction system, including:
  • the acquisition module is adapted to acquire image combinations of video frames with multiple angles and free viewing angles, parameter data corresponding to the image combinations of the video frames, and virtual viewpoint position information based on user interaction, wherein the image combinations of the video frames include multiple Multiple sets of texture maps and depth maps with corresponding relationships in angle synchronization;
  • a filtering module adapted to filter the depth map in the video frame
  • the selection module is adapted to select the corresponding set of texture images and filtered texture images in the image combination of the video frame at the time of user interaction according to preset rules according to the virtual viewpoint position information and the parameter data corresponding to the image combination of the video frame Depth map
  • the image reconstruction module is adapted to convert the image of the video frame at the selected user interaction time based on the virtual viewpoint position information and the parameter data corresponding to the texture map and the depth map in the image combination of the video frame at the time of user interaction. Combining the texture map and the filtered depth map of the corresponding group in the combination to perform combined rendering to obtain a reconstructed image corresponding to the position of the virtual viewpoint at the moment of user interaction;
  • the filtering module includes:
  • the depth map acquiring unit is adapted to acquire the depth map to be processed from the image combination of the current video frame of the multi-angle free view angle, and the image combination of the current video frame of the multi-angle free view angle includes multiple groups of synchronized angles with corresponding relationships The texture map and depth map;
  • a frame sequence obtaining unit adapted to obtain a video frame sequence including a preset window in the time domain of the current video frame
  • the window filter coefficient value obtaining unit is adapted to obtain a window filter coefficient value corresponding to each video frame in the video frame sequence, and the window filter coefficient value is generated by weight values of at least two dimensions, including: pixel confidence corresponding to A first filter coefficient weight value, and the window filter coefficient value acquisition unit includes: a first filter coefficient weight value acquisition subunit, adapted to acquire the confidence of the pixel corresponding to the position in the to-be-processed depth map and each second depth map Degree value, and determining the first filter coefficient weight value corresponding to the confidence value, wherein: the second depth map is a depth map with the same view angle as the to-be-processed depth map in each video frame of the video frame sequence;
  • the filtering unit is adapted to filter the pixels corresponding to the positions in the depth map to be processed according to a preset filtering method based on the window filter coefficient values corresponding to each video frame to obtain the pixel filtering corresponding to the positions in the depth map to be processed After the depth value.
  • the window filter coefficient value obtaining unit further includes at least one of the following:
  • the second filter coefficient weight value obtaining subunit is adapted to obtain the frame distance between each video frame in the video frame sequence and the current video frame, and determine the second filter coefficient weight value corresponding to the frame distance;
  • the third filter coefficient weight value obtaining subunit is adapted to obtain the similarity value of the pixel corresponding to the position of the texture image corresponding to each second depth map and the texture image corresponding to the depth image to be processed, and to determine the similarity value The corresponding weight value of the third filter coefficient.
  • the embodiments of this specification also provide a computer-readable storage medium on which computer instructions are stored, and the computer instructions execute the steps of the method described in any of the foregoing embodiments when the computer instructions are run.
  • the depth map to be processed is obtained from the image combination of the current video frame from multiple angles and free perspectives and filtered in the time domain.
  • the depth map with the same view angle as the depth map to be processed that is, the second depth map
  • the confidence value of the pixel corresponding to the position in the depth map to be processed and each second depth map is obtained, and the confidence is determined.
  • the first filter coefficient weight value corresponding to the degree value, and the window filter coefficient value is generated based on the first filter coefficient weight value, and based on the window filter coefficient value corresponding to each video frame, the waiting filter is processed according to a preset filtering method.
  • the video reconstruction scheme of the embodiment of the present specification is adopted, in which the depth map in the video frame is filtered in the time domain, and the view angle of each video frame in the video frame sequence of the preset window in the time domain is the same as that of the depth map to be processed
  • the depth map that is, the second depth map
  • adding the third filter coefficient weight value to the window filter coefficient value can prevent the introduction of unreliable depth values in the to-be-processed depth map and each second depth map from affecting the filtering result, thereby increasing the depth
  • the stability of the image in the time domain can further improve the image quality of the reconstructed video.
  • FIG. 1 is a schematic structural diagram of a data processing system in a specific application scenario to which the depth map processing method of the embodiment of this specification is applicable.
  • Fig. 2 is a schematic diagram of a process of generating multi-angle free-view data in an embodiment of this specification.
  • Fig. 3 is a schematic diagram of a user side processing 6DoF video data in an embodiment of this specification.
  • Fig. 4 is a schematic diagram of input and output of a video reconstruction system in an embodiment of this specification.
  • Fig. 5 is a flowchart of a depth map processing method in an embodiment of this specification.
  • Fig. 6 is a schematic diagram of a video frame sequence in an application scenario in an embodiment of this specification.
  • FIG. 7 is a flowchart of a method for obtaining the confidence value of the pixel corresponding to the position in each second depth map in a depth map to be processed in an embodiment of the present specification.
  • FIG. 8 is a flowchart of another method for obtaining the confidence value of the pixel corresponding to the position in each second depth map in the depth map to be processed in the embodiment of the present specification.
  • FIG. 9 is a flowchart of another method for obtaining the confidence value of the pixel corresponding to the position in each second depth map in the depth map to be processed in the embodiment of the present specification.
  • FIG. 10 is a schematic diagram of a scene for determining the confidence value of the pixel corresponding to the position in each second depth map in a depth map to be processed in an embodiment of the present specification.
  • Fig. 11 is a flowchart of a video reconstruction method in an embodiment of this specification.
  • Fig. 12 is a schematic structural diagram of a depth map processing device in an embodiment of this specification.
  • Fig. 13 is a schematic structural diagram of a video reconstruction system in an embodiment of this specification.
  • FIG. 14 is a schematic diagram of the structure of an electronic device in an embodiment of this specification.
  • 6DoF 6 Degrees of Freedom
  • Users can adjust the viewing angle of the video through interactive means during the viewing process, and watch from the free point of view they want to watch. Enhance the viewing experience.
  • the Free-D playback technology is to express the 6DoF image by acquiring the point cloud data of the scene through multi-angle shooting, and reconstruct the 6DoF image or video based on the point cloud data.
  • the 6DoF video generation method based on the depth map is based on the virtual viewpoint position and the corresponding parameter data corresponding to the texture map and depth map, and combines the texture map and depth map of the corresponding group in the image combination of the video frame at the time of user interaction. Perform combined rendering, and perform 6DoF image or video reconstruction.
  • the quality of the reconstructed viewpoint can already be close to the quality of the original acquired viewpoint by the DIBR technology.
  • the depth map is filtered in the time domain during the DIBR process to improve the temporal stability of the depth map reconstruction.
  • the inventor found that, in some cases, the quality of the depth map generated after filtering decreases. For this reason, the inventor has carried out further in-depth research and experiments, and found that the depth value of the pixels in the depth map participating in the filtering in the time domain is not always reliable. After the unreliable depth value is added to the filtering, it will lead to the filtering. The quality of the final generated depth map is degraded.
  • FIG. 1 shows the layout of a data processing system for a basketball game.
  • the data processing system 10 includes a collection array composed of multiple collection devices. 11. Data processing equipment 12, cloud server cluster 13, playback control device 14, playback terminal 15 and interactive terminal 16.
  • the reconstruction of the multi-angle free-view video can be realized, and the user can watch the low-latency multi-angle free-view video.
  • the basketball hoop on the left is taken as the core point of view
  • the core point of view is taken as the center of the circle
  • the fan-shaped area on the same plane as the core point of view is used as the preset multi-angle free viewing angle range.
  • the collection devices in the collection array 11 can be fan-shaped and placed in different positions of the field collection area according to the preset multi-angle free viewing angle range, and can collect video data streams synchronously in real time from corresponding angles respectively.
  • the data processing device 12 can be placed in a non-collection area on site, which can be regarded as a site server.
  • the data processing device 12 may send a streaming instruction to each acquisition device in the acquisition array 11 through a wireless local area network, and each acquisition device in the acquisition array 11 will obtain a streaming instruction based on the streaming instruction sent by the data processing device 12
  • the video data stream of is transmitted to the data processing device 12 in real time.
  • the data processing device 12 When the data processing device 12 receives the video frame interception instruction, it intercepts the video frame at the specified frame time from the received multiple video data streams to obtain the frame images of multiple synchronized video frames, and the obtained specified specified The multiple synchronized video frames at the frame time are uploaded to the server cluster 13 in the cloud.
  • the cloud server cluster 13 uses the received frame images of multiple synchronized video frames as image combinations, determines the parameter data corresponding to the image combination and the depth data of each frame image in the image combination, and based on the image Combine the corresponding parameter data, the pixel data and depth data of the preset frame image in the image combination, and perform frame image reconstruction on the preset virtual viewpoint path to obtain the corresponding multi-angle free view video data, the multi-angle free view
  • the video data may include: multi-angle free-view spatial data and multi-angle free-view time data of frame images sorted by frame time.
  • the server cluster 13 in the cloud can store the pixel data and depth data of the image combination in the following manner:
  • a stitched image corresponding to the frame time is generated, the stitched image includes a first field and a second field, wherein the first field includes a preset frame image in the image combination
  • the second field includes the second field of the depth data of the preset frame image in the image combination.
  • the playback control device 14 can insert the received multi-angle free-view video data into the data stream to be played, and the playback terminal 15 receives the data stream to be played from the playback control device 14 and plays it in real time.
  • the playback control device 14 may be a manual playback control device or a virtual playback control device.
  • a dedicated server that can automatically switch video streams can be set as a virtual playback control device to control the data source.
  • a broadcast control device such as a broadcast control station can be used as a broadcast control device in the embodiments of this specification.
  • the server cluster 13 in the cloud When the server cluster 13 in the cloud receives the image reconstruction instruction from the interactive terminal 16, it can extract the spliced image of the preset frame image in the corresponding image combination and the corresponding parameter data of the corresponding image combination and transmit it to the interactive terminal 16. .
  • the interactive terminal 16 determines the interactive frame time information based on the trigger operation, sends an image reconstruction instruction containing the interactive frame time information to the server cluster 13, and receives the preset frame image in the image combination corresponding to the interactive frame time returned from the server cluster 13 in the cloud.
  • Spliced images and corresponding parameter data determine virtual viewpoint position information based on interactive operations, select corresponding pixel data, depth data, and corresponding parameter data in the spliced image according to preset rules, and perform the selected pixel data and depth data Combined rendering, reconstruction to obtain and play the multi-angle free-view video data corresponding to the virtual viewpoint position at the interactive frame time.
  • the entities in the video are not completely static.
  • the entities collected by the collection array such as athletes, basketballs, and referees, are mostly in motion.
  • the texture data and pixel data in the image combination of the collected video frames also constantly change with time.
  • the server cluster 13 in the cloud can perform temporal filtering on the depth map that generates the multi-angle free-view video.
  • the corresponding filter coefficient may be set based on the similarity between the texture map of the depth map to be processed and the texture map of the depth map with the same viewing angle in the time domain. Domain filtering.
  • the depth map to be processed in the current video frame in the video frame sequence of the preset window in the time domain, and the depth to be processed in each video frame and the current video frame For depth maps with the same viewing angle, consider the confidence value of the corresponding pixel in the position, and add the filter coefficient weight value corresponding to the confidence value to the filter coefficient value of the preset window in the time domain, so as to avoid the preset.
  • the unreliable depth value in the depth map to be processed in the video frame sequence of the window and the depth map in each video frame with the same view angle as the depth map to be processed in the current video frame affects the filtering result, which can improve the time of the depth map. Stability on the domain.
  • the video data or image data can be acquired through the acquisition device, and the depth map calculation can be performed. It mainly includes three steps: Multi-camera Video Capturing and Camera Parameter Estimation. , And depth map calculation (Depth Map Calculation).
  • Multi-camera Video Capturing and Camera Parameter Estimation. the video captured by each camera is required to be frame-level aligned.
  • the texture image (Texture Image) 21 can be obtained through multi-camera video capture (step S21); through the calculation of the camera internal and external parameters (step S22), the camera parameter (Camera Parameter) 22 can be obtained, that is, the following
  • the parameter data includes internal parameter data and external parameter data of the camera; through depth map calculation (step S23), a depth map (Depth Map) 23 can be obtained.
  • the texture map collected from multiple cameras, all camera parameters, and the depth map of each camera are obtained.
  • These three parts of data can be referred to as data files in the multi-angle free-view video data, and can also be referred to as 6-degree-of-freedom video data (6DoF video data).
  • 6-DOF video data the user terminal can generate a virtual viewpoint according to the virtual 6-DOF (Degree of Freedom, DoF) position, thereby providing a 6DoF video experience.
  • the 6DoF video data and indicative data can be compressed and transmitted to the user side, where the indicative data can also be referred to as metadata (Metadata).
  • the user side can obtain the user side 6DoF expression according to the received data, that is, the 6DoF video data and the aforementioned metadata, and then perform 6DoF rendering on the user side.
  • Metadata can be used to describe the data pattern of 6DoF video data, which can specifically include: Stitching Pattern metadata, which is used to indicate the pixel data of multiple images in the stitched image and the storage rules of depth data; edge protection Metadata (Padding pattern metadata) can be used to indicate the method of edge protection in the stitched image and other metadata (Other metadata). Metadata can be stored in data header files.
  • 6DoF rendering based on depth image rendering (DIBR, Depth Image-Based Rendering) (step S30) can be used to generate a virtual viewpoint image at a specific 6DoF position 35 generated according to user behavior, that is, according to the user Instruction, determine the virtual viewpoint of the 6DoF position corresponding to the instruction.
  • DIBR Depth Image-Based Rendering
  • the camera parameters, texture map, depth map, and the 6DoF position of the virtual camera can be received as input, and the generated texture map and depth map at the virtual 6DoF position can be output at the same time .
  • the 6DoF position of the virtual camera is the aforementioned 6DoF position determined according to user behavior.
  • the DIBR application software may be software that realizes image reconstruction based on virtual viewpoints in the embodiments of this specification.
  • the DIBR software 40 can receive camera parameters 41, texture map 42, depth map 43, and 6DoF position data 44 of the virtual camera as input, and can generate texture
  • the step of FIG. S41 and the step S42 of generating a depth map generate a texture map and a depth map at the virtual 6DoF position, and output the generated texture map and depth map at the same time.
  • the input depth map can be processed, for example, filtering in the time domain.
  • S52 Obtain a video frame sequence including a preset window in the time domain of the current video frame.
  • a video frame sequence containing a preset window in the time domain of the current video frame may be acquired.
  • the Tth frame in the video sequence is the current video frame
  • the preset window size D in the time domain is equal to 2N+1 frame
  • the current video frame is in the preset window in the time domain
  • the current video frame may not be located in the middle position of the video frame sequence of the preset window.
  • the size of the preset window in the time domain can be set according to the requirements of filtering accuracy and taking into account the requirements of processing resources, and based on experience.
  • N can also be other values such as 3 or 4.
  • the value can be selected from different values and determined according to the final filtering effect.
  • the size of the preset window in the time domain can be adjusted according to the position of the current frame in the entire video stream.
  • filtering processing may not be performed, that is, filtering is performed from the N+1th frame.
  • T is greater than N.
  • S53 Obtain a window filter coefficient value corresponding to each video frame in the video frame sequence, where the window filter coefficient value is generated from a weight value of at least two dimensions, including: a first filter coefficient weight value corresponding to the pixel confidence.
  • the weight value of the first filter coefficient may be obtained in the following manner:
  • the second depth map is the In each video frame of the video frame sequence, a depth map with the same view angle as the to-be-processed depth map.
  • a method for evaluating the confidence levels of pixels in the to-be-processed depth map and each second depth map, and obtaining the confidence values of the pixels corresponding to the positions in the to-be-processed depth map and each second depth map There are many kinds.
  • the depth map to be processed in each video frame in the preset window and the depth map within the preset viewing angle range around the corresponding viewing angle of each second depth map may be acquired, and a third depth map of the corresponding viewing angle may be obtained, based on the The third depth map of each corresponding viewing angle determines the confidence value of the pixel corresponding to the position in the to-be-processed depth map and each second depth map.
  • the depth map to be processed may be determined based on the spatial consistency between the pixels corresponding to the positions in the depth map to be processed and each second depth map and the pixels in the preset area around the depth map where the pixels are located. Neutralize the confidence value of the pixel corresponding to the position in each second depth map.
  • the correspondence relationship between the reliability value and the weight value of the first filter coefficient may be preset. Among them, the greater the confidence value c, the greater the corresponding first filter coefficient weight value Weight_c; the smaller the confidence value c, the greater the corresponding first filter coefficient weight value Weight_c, and the two are in an inverse correlation.
  • S54 Filter a pixel corresponding to a position in the depth map to be processed according to a preset filtering method based on the window filter coefficient value corresponding to each video frame, to obtain a filtered pixel corresponding to the position in the depth map to be processed The depth value.
  • the depth map with the same view angle as the depth map to be processed is obtained by obtaining the depth map to be processed
  • the confidence value of the pixel corresponding to the position in each second depth map, and the first filter coefficient weight value corresponding to the confidence value is determined, and the window filter coefficient value is generated based on the first filter coefficient weight value, based on all Filtering the pixel corresponding to the position in the depth map to be processed according to the filter coefficient value of the window corresponding to each video frame to obtain the filtered depth value of the pixel corresponding to the position in the depth map to be processed, It is possible to avoid introducing unreliable depth values in the to-be-processed depth map and each second depth map from affecting the filtering result, thereby improving the stability of the depth map in the time domain.
  • the window filter coefficient value is generated by the weight value of at least two dimensions, and the weight value of one dimension is The weight value of the first filter coefficient corresponding to the pixel confidence.
  • the weight values of other dimensions selected to generate the window filter coefficient values are described below through specific embodiments.
  • the window filter coefficient value may also be generated based on the first filter coefficient weight value and the filter coefficient weight values of one or more other dimensions, or the window filter
  • the coefficient weight value may be generated by including the first filter coefficient weight value, the following filter coefficient weight values in at least one dimension, and the filter coefficient weight values in other dimensions.
  • Example dimension 1 The second filter coefficient weight value corresponding to the frame distance
  • the frame distance between each video frame in the video frame sequence and the current video frame is obtained, and the second filter coefficient weight value corresponding to the frame distance is determined.
  • the frame distance may be represented by the difference between the positions of the frames in the video frame sequence, or may be in the unit of the time interval between corresponding video frames in the video frame sequence. Since frames are usually distributed at equal intervals between frames in a frame sequence, in order to facilitate calculations, the difference between the positions of the frames in the video frame sequence is selected here. Continuing to refer to Fig.
  • the frame distance between the T-1th frame and the T+1th frame and the current video frame (Tth frame) is 1 frame
  • the frame distance between frames (T-th frame) is 2 frames
  • the frame distance between the TN-th frame and the T+N-th frame and the current video frame (T-th frame) is N frames.
  • the corresponding relationship between the frame distance d and the corresponding second filter coefficient weight value Weight_d can be preset. Among them, the smaller the frame distance d, the larger the corresponding second filter coefficient weight value Weight_d; the larger the frame distance d, the smaller the corresponding second filter coefficient weight value Weight_d, and the two are in an inverse correlation.
  • Example dimension 2 The weight value of the third filter coefficient corresponding to the pixel similarity
  • the similarity value of the pixel corresponding to the position of the texture map corresponding to each second depth map and the texture map corresponding to the depth map to be processed can be obtained, and the third filter coefficient weight corresponding to the similarity value can be determined value.
  • the first pixel For the first frame Angle T M T M of the depth map corresponding to the texture map in any location (x, y) corresponding to a pixel, for ease of description, referred to as a first pixel, the first pixel
  • the texture value is expressed as Pixel (x1, y1), and the texture value Color (x1', y1') of the pixel corresponding to the first pixel position in the texture map corresponding to each second depth map can be obtained, and then each second depth map can be obtained.
  • the similarity value of the texture value Color (x1', y1') of the pixel corresponding to the position in the texture map corresponding to the second depth map with respect to the texture value Color (x1, y1) of the first pixel, and determining the similarity The third filter coefficient weight value Weight_s corresponding to the degree value s.
  • the corresponding relationship between the similarity value s and the corresponding third filter coefficient weight value Weight_s may be preset. Among them, the larger the similarity value s, the smaller the corresponding third filter coefficient weight value Weight_s; the smaller the similarity value s, the larger the corresponding third filter coefficient weight value Weight_s, and the two are in an inverse correlation.
  • the filter coefficient weight values of the above two example dimensions are also considered: the second filter corresponding to the frame distance
  • the weighted average value of the product of the depth value of the pixel corresponding to the position in each second depth map and the window filter coefficient value corresponding to each video frame can be calculated to obtain the position in the depth map to be processed The filtered depth value of the corresponding pixel.
  • the product of the first filter coefficient weight value and one of the second filter coefficient weight value and the third filter coefficient weight value, or the weighted average value may also be used as The window filter coefficient value corresponding to each video frame. Then, calculate the weighted average of the product of the depth value of the pixel corresponding to the position in the depth map to be processed and each second depth map and the window filter coefficient value corresponding to each video frame to obtain the position in the depth map to be processed The filtered depth value of the corresponding pixel.
  • the following formula can be used for filtering processing to obtain the filtered depth value of the second pixel
  • the pixels corresponding to the second pixel position in each depth map in the above formula are all represented by Pixel(x2,y2), and the viewing angle identifier and the frame corresponding to each depth map pass through the upper part of Pixel(x2,y2). To distinguish between subscripts and subscripts.
  • the method of obtaining window filter coefficient values is not limited to the above embodiments.
  • the first filter coefficient weight value Weight i _c and the second filter coefficient weight value Weight i _d may also be used.
  • each pixel in the depth map to be processed can be filtered in turn.
  • each pixel in the depth map to be processed can also be filtered. Pixels are filtered in parallel in the manner of the above-mentioned embodiment, or multiple pixels are filtered in batches in batches.
  • the depth map processing method in the above embodiment in the process of filtering the depth map to be processed, not only the depth map (that is, the second Depth map) and the frame distance of the depth map to be processed, and/or the texture value corresponding to the corresponding position pixel of the texture map corresponding to each depth map with the same view angle as the depth map to be processed (ie, the second depth map) Similarity, and considering the confidence of the corresponding pixel in the depth map to be processed and each video frame with the same viewing angle as the depth map to be processed (ie, the second depth map), the confidence of the corresponding pixel The value is added to the weight value of the window filter coefficient, thereby avoiding the introduction of unreliable depth values in the time domain (including the unreliable depth value of the to-be-processed depth map and each second depth map) from affecting the filtering result, so Can improve the stability of the depth map in the time domain.
  • Manner 1 Determine the confidence value of the pixel corresponding to the position in the to-be-processed depth map and each second depth map based on the depth map of the preset viewing angle range around the corresponding viewing angle of each second depth map .
  • the depth map to be processed in each video frame and the depth map within the preset viewing angle range around the corresponding viewing angle of each second depth map can be acquired, and the third depth map of the corresponding viewing angle can be obtained, based on the corresponding viewing angle.
  • the third depth map determines the confidence value of the pixel at the position corresponding to the position in the to-be-processed depth map and each second depth map.
  • the second depth map viewing angle in each video frame is also M.
  • each depth map with a viewing angle of M (including the to-be-processed depth map) can be acquired.
  • the processing depth map and each second depth map) correspond to the depth map in the preset viewing angle range [MK, M+K] around the viewing angle.
  • MK, M+K the preset viewing angle range
  • it may be called the third depth map.
  • 15°, 30°, 45°, 60°, etc. may be radiated to both sides with the viewing angle M as the center.
  • the specific value is related to the viewpoint density of the corresponding image combination in the image combination of each video frame. The higher the viewpoint density, the higher the viewpoint density. The smaller the value range, the lower the viewpoint density, and the larger the value range.
  • the viewing angle range can also be determined by the spatial distribution of the viewpoints corresponding to the image combination, for example, the texture map synchronously collected by a total of 40 acquisition devices arranged in an arc and the corresponding corresponding M and K can represent the position of the acquisition device.
  • M represents the viewing angle of the 10th acquisition device from the left
  • K is 3, which can be based on the 7th to the 13th acquisition device from the left.
  • Viewing angles, based on the viewing angles of the 7th to 9th acquisition devices and the depth maps corresponding to the viewing angles of the 11th to 13th acquisition devices respectively, the confidence of the pixels at the corresponding positions in the depth map corresponding to the 10th acquisition device can be determined Degree value.
  • the value interval of the preset viewing angle range may not be centered on the viewing angle of the depth map to be processed, and the specific value may be determined according to the spatial position relationship corresponding to the depth map in each video frame. For example, one or more depth maps closest to the corresponding viewpoint in each video frame may be selected to determine the confidence level of the pixels in the to-be-processed depth map and each second depth map.
  • Manner 2 Determine the depth map to be processed based on the spatial consistency between the pixels corresponding to the positions in the depth map to be processed and each second depth map and the pixels in the preset area around the depth map where the pixels are located The confidence value of the pixel corresponding to the position in each second depth map.
  • Example 1 Method 1: Determine the confidence of the pixel based on the matching difference of the texture map
  • S71 Acquire a texture map corresponding to the to-be-processed depth map and a texture map corresponding to each second depth map.
  • the current video can be obtained from the video frame sequence of the preset window.
  • the texture map of the corresponding viewing angle is obtained from the image combination of the frame and the video frame of each second depth map. Referring to FIG. 6, texture maps corresponding to all depth maps with a view angle of M in the T-Nth frame to the T+Nth frame of video frames can be obtained respectively.
  • the corresponding positions in the texture maps corresponding to the to-be-processed depth maps and the texture maps corresponding to the second depth maps The texture value is respectively mapped to the corresponding position in the texture map corresponding to the third depth map of each corresponding viewing angle, and the mapped texture value corresponding to the third depth map of each corresponding viewing angle is obtained.
  • the to-be-processed depth map and the depth value of the pixel corresponding to the position in each second depth map may be based on the spatial positional relationship of the image combination of different viewing angles in the image combination of each video frame.
  • the texture value of the corresponding position in the texture map corresponding to the processing depth map and the texture map corresponding to each second depth map is mapped to the corresponding position in the texture map corresponding to the third depth map of each corresponding view angle to obtain the first position of each corresponding view angle.
  • the mapped texture values corresponding to the three depth maps may be based on the spatial positional relationship of the image combination of different viewing angles in the image combination of each video frame.
  • the texture value of the texture map corresponding to the depth map of the view angle M in the TN-th video frame It can be respectively mapped to the corresponding positions in the texture map corresponding to the third depth map of multiple viewing angles in the range of [MK, M+k] in the TN-th video frame, to obtain the third depth map corresponding to the multiple viewing angles
  • the mapped texture value Color'(x,y) namely:
  • the matching degree of the texture value corresponding to the third depth map of each corresponding viewing angle is higher, the difference between the corresponding texture maps is smaller, and the depth of the pixels in the corresponding depth map to be processed and each second depth map The higher the reliability of the value, the higher the confidence value correspondingly.
  • the to-be-processed depth map and each second depth map may be comprehensively determined based on the matching degree of the texture value corresponding to the third depth map of each corresponding viewing angle and the number of third depth maps that meet the corresponding matching degree
  • the confidence value of the pixel corresponding to the middle position For example, it can be set that when the number of matching degrees greater than the preset first matching degree threshold is greater than the preset first number threshold, the corresponding confidence value is set to 1, otherwise the corresponding confidence value is set to 0.
  • the corresponding relationship between the texture value matching degree threshold, the number threshold of the third depth map that meets the corresponding matching degree threshold, and the confidence value can also be set in a gradient.
  • the confidence value of the pixel corresponding to the position in the to-be-processed depth map and each second depth map can be binarized, that is, it can be 0 or 1, or it can be set to [0,1]. Any value or set discrete value.
  • the third depth map of each corresponding viewing angle of each video frame can be respectively corresponding to The mapped texture value Color'(x, y) is matched with the actual texture value Color_1(x, y) at the corresponding position in the texture map corresponding to the third depth map of the corresponding viewing angle, for example, the first matching degree threshold Set to 80%.
  • the third depth map corresponding to the to-be-processed depth map and each second depth map in each video frame are the viewing angles of the to-be-processed depth map and the corresponding second depth map, respectively
  • the third depth map within a range of 30 degrees there are third depth maps with three viewing angles within 30 degrees on both sides of the viewing angle of the to-be-processed depth map and each second depth map.
  • the confidence value of the pixel corresponding to the position in the second depth map of the video frame is determined If the number of third depth maps that meet the preset first matching degree threshold is 0, it is determined that the confidence value of the pixel corresponding to the position in the second depth map of the video frame is 1; If the number of third depth maps of the preset first matching degree threshold is greater than or equal to 2, it is determined that the confidence value of the pixel corresponding to the position in the second depth map of the video frame is 0.5.
  • the to-be-processed depth map there are also three-view third depth maps, and the confidence value of the pixel corresponding to the position in the to-be-processed depth map is the same as the value of the pixel corresponding to the position in the second depth map.
  • the judgment conditions are the same.
  • Example 2 Determine the confidence level of the pixel based on the consistency of the depth map
  • the value is mapped to the third depth map of each corresponding view angle, and the mapped depth value of the corresponding position pixel in the third depth map of each corresponding view angle is matched with the actual depth value of the corresponding position pixel; the other is first Map the acquired depth value of the pixel corresponding to the position in the third depth map to the corresponding position in the to-be-processed depth map and the corresponding position in the second depth map.
  • the mapping depth values of the corresponding viewing angles are respectively matched with the actual depth values of the corresponding positions in the second depth map.
  • the to-be-processed depth map and each second depth map may be The depth value of the pixel corresponding to the position is mapped to the third depth map of each corresponding view angle, and the mapped depth value of the corresponding position pixel in the third depth map of each corresponding view angle is matched with the actual depth value of the corresponding position pixel respectively ,
  • the following steps can be specifically adopted:
  • the image combination of a multi-angle free-view video frame includes multiple sets of texture maps and depth maps that are synchronized at multiple angles, and any video frame includes multiple-view depth maps.
  • the depth value of the pixel corresponding to the position in the to-be-processed depth map and each second depth map in each video frame in the preset window can be mapped to the third depth of each corresponding viewing angle
  • the mapped depth value of the pixel at the corresponding position in the third depth map of each corresponding viewing angle is obtained. Referring to FIG.
  • the depth maps (including the to-be-processed depth map and each second depth map) in each video frame in the window from the TN-th frame to the T+N-th frame with a view angle of M can be obtained in the preset view range [ MK, M+K] corresponding to the third depth map, and complete the mapping of the depth values of pixels in the same frame between different depth maps within the preset viewing angle range, that is, completing the viewing angles in each video frame in the preset window
  • the depth value of the pixel corresponding to the position in the depth map of M is within the preset viewing angle range [MK, M+K] of the other corresponding viewing angles in the frame.
  • S82 Match the mapped depth value of the pixel at the corresponding position in the third depth map of each corresponding view angle with the actual depth value of the pixel at the corresponding position in the third depth map of each corresponding view angle.
  • the distribution intervals of the matching degrees of the depth values corresponding to the three depth maps determine the confidence values of the pixels corresponding to the positions in the to-be-processed depth map and each second depth map.
  • the matching degree of the depth value corresponding to the third depth map of each corresponding viewing angle is higher, it means that the difference of the corresponding depth map is smaller, then the corresponding pixel in the depth map to be processed or the second depth map The higher the reliability of the depth value, the higher the confidence value correspondingly.
  • the depth map to be processed and each second depth map may be comprehensively determined based on the matching degree of the depth value corresponding to the third depth map of each corresponding viewing angle and the number of third depth maps that meet the corresponding matching degree.
  • the confidence value of the pixel corresponding to the middle position can be set when the number of matching degrees is greater than the preset second matching degree threshold (as a specific example, the second matching degree threshold is 80%, or the value is 70%) is greater than the preset second number threshold , Set the corresponding confidence value to 1, otherwise set the corresponding confidence value to 0.
  • the corresponding relationship between the matching degree threshold of the depth value, the number threshold of the third depth map that meets the corresponding matching degree threshold, and the confidence value can also be set in a gradient.
  • the confidence value of the pixel corresponding to the position in the to-be-processed depth map and each second depth map can be binarized, that is, it can be 0 or 1, or it can be set to [0,1]. Any value or set discrete value.
  • the third depth map corresponding to the to-be-processed depth map or the second depth map in each video frame is the third depth within a range of 30 degrees on both sides of the viewing angle of the second depth map.
  • the range of 30 degrees on both sides of the viewing angle of the second depth image there are three third depth images with three viewing angles.
  • the position in the to-be-processed depth map or the second depth map in the video frame is determined
  • the confidence value of the corresponding pixel is 1; if the number of third depth maps that meet the preset second matching degree threshold is 0, determine the to-be-processed depth map or the second depth map in the video frame
  • the confidence value of the pixel corresponding to the position is 1; if the number of third depth maps that meet the preset second matching degree threshold is greater than or equal to 2, then the to-be-processed depth map or the second depth map in the video frame is determined
  • the confidence value of the pixel corresponding to the position in the depth map is 0.5.
  • the depth of the pixel corresponding to the position in the third depth map may be obtained first.
  • the value is inversely mapped to the corresponding position in the second depth map, and then the distance between the mapped pixel position in the corresponding viewing angle of the second depth map and the actual pixel position in the corresponding position in the second depth map is compared, according to the difference between the two
  • the value determines the confidence value of the pixel corresponding to the position in the to-be-processed depth map and each second depth map, and the following steps may be specifically adopted:
  • S91 Obtain the depth values of the pixels corresponding to the positions in the depth map to be processed and in each second depth map, respectively, and map the corresponding pixel positions in the third depth map of the corresponding viewing angle according to the depth values, and obtain and calculate according to the The depth value of the corresponding pixel position in the third depth map corresponding to the viewing angle is inversely mapped to the corresponding pixel position in the to-be-processed depth map and each second depth map, and the third depth map of each corresponding viewing angle is obtained in the to-be-processed depth map. Process the corresponding mapped pixel positions in the depth map and each second depth map.
  • the image combination of a multi-angle free-view video frame includes multiple sets of texture maps and depth maps that are synchronized at multiple angles, and any video frame includes depth maps of multiple views.
  • the depth values of the pixels corresponding to the positions in the depth maps (including the to-be-processed depth map and each second depth map) in each video frame with a view angle of M in the window from the TNth frame to the T+Nth frame can be obtained, according to
  • the depth values are respectively mapped to corresponding pixel positions in the third depth map corresponding to the preset viewing angle range [MK, M+K], and obtaining the preset viewing angle range [MK, M+K] in each video frame
  • the preset spatial position relationship according to the acquired depth value of the corresponding pixel in the third depth map position in each video frame in the window from the TNth frame to the T+Nth frame, inverse mapping to the From the corresponding pixel positions in the depth map to be processed and each second depth map, the mapped pixel positions corresponding to the third depth map of each corresponding viewing angle in the depth map to be processed and each second depth map can be obtained.
  • the depth values of the pixels corresponding to the positions in each second depth map and the to-be-processed depth map with a viewing angle of M in each video frame in the window from the TNth frame to the T+Nth frame can be obtained, according to the
  • the depth values are respectively mapped to the corresponding pixel positions in the third depth map corresponding to the preset viewing angle range [MK, M+K], and the corresponding third depth maps of different viewing angles within the preset viewing angle range in the same frame are obtained and corresponding
  • the depth value of the pixel position can be inversely mapped to the corresponding pixel position in the same video frame according to the depth value of the corresponding pixel position in the third depth map of the corresponding perspective in each video frame according to the preset spatial position relationship.
  • each third depth can be obtained Image (e.g. multiple third depth maps with a view angle of M-2, M-1, M+1, and M+2 in the TN frame) Corresponding mapping pixel positions in the depth map with a view angle of M in the TN frame .
  • the reliability of the depth value of the pixel at the corresponding position in the corresponding depth map to be processed or the second depth map is higher, and correspondingly, the confidence value is higher.
  • the depth map to be processed and each second depth map may be comprehensively determined based on the pixel distance size corresponding to the third depth map of each corresponding viewing angle and the number of third depth maps that meet the corresponding distance threshold interval.
  • the confidence value of the pixel corresponding to the position For example, it can be set that when the number of distances less than the preset distance threshold d 0 is greater than the preset third number threshold, the corresponding confidence value is set to 1, otherwise the corresponding confidence value is set to 0.
  • the corresponding relationship between the distance threshold set by gradient, the number threshold of the third depth map that meets the corresponding distance threshold, and the confidence value can also be used.
  • the confidence value of the corresponding pixel in the to-be-processed depth map and each second depth map can be binarized, that is, it can be 0 or 1, or it can be set to any value within [0,1]. Value or discrete value set.
  • the third depth map corresponding to the to-be-processed depth map and each second depth map in each video frame is the viewing angle two sides of the to-be-processed depth map and each second depth map, respectively
  • the third depth map within a range of 30 degrees there are third depth maps with three viewing angles within 30 degrees on both sides of the viewing angle of the to-be-processed depth map and each second depth map.
  • the position in the to-be-processed depth map or the second depth map of the video frame is determined
  • the confidence value of the corresponding pixel is 1; if the number of third depth maps whose pixel distance is less than the preset first distance threshold is 0, the to-be-processed depth map or the second depth map of the video frame is determined
  • the confidence value of the pixel corresponding to the first pixel position is 1; if the number of third depth maps whose pixel distance meets the preset first distance threshold is greater than or equal to 2, determine the to-be-processed video frame
  • the confidence value of the pixel corresponding to the position in the depth map or the second depth map is 0.5.
  • Combination method 1 Take the product of the confidence of the pixel determined based on the matching difference of the texture map and the confidence of the pixel determined based on the consistency of the depth map as the confidence of the pixel corresponding to the position in the second depth map, and the formula can be used Expressed as follows:
  • Weight_c Weight_c_texture*Weight_c_depth
  • Weight_c represents the confidence level of the pixel corresponding to the position in the second depth image in the depth map to be processed
  • Weight_c_texture represents the confidence level of the pixel determined based on the matching difference of the texture image
  • Weight_c_depth represents the pixel determined based on the consistency of the depth map Confidence level.
  • Combination mode 2 Take the weighted sum of the confidence of the pixel determined based on the matching difference of the texture map and the confidence of the determined pixel based on the consistency of the depth map as the pixel corresponding to the position of the to-be-processed depth map and each second depth map
  • the confidence level of can be expressed as follows:
  • Weight_c a*Weight_c_texture+b*Weight_c_depth
  • Weight_c represents the confidence level of the pixel corresponding to the position in the second depth image in the depth map to be processed
  • Weight_c_texture represents the confidence level of the pixel determined based on the matching difference of the texture image
  • Weight_c_depth represents the consistency determination based on the depth map
  • Example 1 Match the pixels corresponding to the positions in the to-be-processed depth map and each second depth map with the depth values of the pixels in the preset area around the depth map where the pixels are located, based on the depth value
  • the matching degree and the number of pixels whose matching degree meets the preset pixel matching degree threshold respectively determine the confidence value of the pixel corresponding to the position in the to-be-processed depth map and each second depth map.
  • any second depth map Px shown in FIG. 10 for a pixel Pixel (x1', y1') corresponding to any pixel position in the depth map to be processed, as the pixel whose confidence level is to be determined, the pixel Pixel (x1', y1') is matched with the depth value of any pixel in the preset area R around Pixel (x1', y1') in the second depth map Px, for example, if the The matching degree of 5 of the 8 pixels is preset to be greater than the preset pixel matching degree threshold of 60%, and the confidence degree of the pixel Pixel(x1', y1') in the second depth map Px is determined to be 0.8.
  • the preset area can take a circular, rectangular or irregular shape, and the specific shape is not limited, as long as it surrounds the pixel whose confidence level is to be determined, and the size of the preset area can be set based on experience .
  • Example 2 Matching the pixel corresponding to the position in the depth map to be processed and each second depth map with the weighted average of the depth values of the pixels in the preset area around the depth map where the pixel is located, based on all The degree of matching between the pixel corresponding to the position in the to-be-extracted depth map and each second depth map and the weighted average is used to determine the confidence level of the pixel corresponding to the position in the to-be-processed depth map and each second depth map, respectively value.
  • the depth values of the pixels in the preset area R around Pixel (x1', y1') are first weighted and averaged, and then similarity matching is performed, for example, the weighted average is compared with The matching degree of the depth values of Pixel (x1', y1') is greater than 50%, and it can be determined that the confidence degree of the pixel Pixel (x1', y1') in the second depth map Px is 1.
  • the above gives a variety of ways to determine the confidence level of the pixel corresponding to the position in the depth map to be processed and each second depth map. In specific implementations, at least two of these methods can be used in combination.
  • the depth map to be processed is processed according to a preset filtering method.
  • the depth value of the pixel corresponding to the middle position is filtered to obtain the filtered depth value of the pixel corresponding to the position in the to-be-processed depth map, which can avoid introducing unreliable depth values in the to-be-processed depth map and each second depth map It affects the filtering result, which can improve the stability of the depth map in the time domain.
  • the depth map processing method of the foregoing embodiment after filtering the depth map in the time domain, the image quality of the video reconstruction can be improved.
  • an embodiment is used to explain how Perform video reconstruction.
  • S112 Perform filtering processing on the depth map in the time domain.
  • the depth map processing method in the embodiment of this specification can be used to perform filtering processing.
  • the specific method refer to the description of the foregoing embodiments, which will not be further elaborated here.
  • S113 According to the virtual viewpoint position information and the parameter data corresponding to the image combination of the video frame, select the texture map and the filtered depth map of the corresponding group in the image combination of the video frame at the moment of user interaction according to a preset rule.
  • the above video reconstruction method is adopted, wherein the depth map in the video frame is filtered in the time domain, and for each video frame in the video frame sequence of the preset window in the time domain, the depth map with the same view angle as the to-be-processed depth map, That is, the second depth map is obtained by obtaining the confidence value of the pixel corresponding to the position in the second depth map in the to-be-processed depth map, and determining the first filter coefficient weight value corresponding to the confidence value, and combining the
  • the first filter coefficient weight value is added to the window filter coefficient value, which can prevent the introduction of unreliable depth values in the to-be-processed depth map and each second depth map from affecting the filtering result, thereby improving the depth map in the time domain.
  • the stability of the video can then improve the image quality of the reconstructed video.
  • the depth map processing device 120 may include:
  • the depth map acquiring unit 121 is adapted to acquire the to-be-processed depth map from the image combination of the current video frame of the multi-angle free viewing angle, and the image combination of the current video frame of the multi-angle free viewing angle includes multiple groups of synchronized angles. Relationship texture map and depth map;
  • the frame sequence obtaining unit 122 is adapted to obtain a video frame sequence including a preset window in the time domain of the current video frame;
  • the window filter coefficient value obtaining unit 123 is adapted to obtain a window filter coefficient value corresponding to each video frame in the video frame sequence, and the window filter coefficient value is generated by weight values of at least two dimensions, including: pixel confidence corresponding
  • the window filter coefficient value obtaining unit 123 includes: a first filter coefficient weight value obtaining subunit 1231, adapted to obtain the corresponding position in the to-be-processed depth map and each second depth map The confidence value of the pixel, and the first filter coefficient weight value corresponding to the confidence value is determined, wherein: the second depth map is the same view angle as the depth map to be processed in each video frame of the video frame sequence Depth map
  • the filtering unit 124 is adapted to filter the pixels corresponding to the positions in the depth map to be processed based on the corresponding window filter coefficient values of each video frame to obtain the pixels corresponding to the positions in the depth map to be processed The filtered depth value.
  • the first filter coefficient weight value obtaining subunit 1231 may include at least one of the following confidence value determining subunits:
  • the first confidence value determining component 12311 is adapted to obtain the depth map to be processed and the depth map within the preset viewing angle range around the corresponding viewing angle of each second depth map, to obtain the third depth map of the corresponding viewing angle, based on the corresponding corresponding viewing angles.
  • the third depth map of the viewing angle determining the confidence value of the pixel corresponding to the position in the to-be-processed depth map and each second depth map;
  • the second confidence value determining component 12312 is adapted to be based on the spatial consistency of the pixel corresponding to the position in the to-be-processed depth map and each second depth map and the pixel in the preset area around the depth map where the pixel is located, Determine the confidence value of the pixel corresponding to the position in the to-be-processed depth map and each second depth map.
  • the first confidence value determining component 12311 is adapted to obtain the texture map corresponding to the to-be-processed depth map and the texture map corresponding to each second depth map, according to the to-be-processed depth map Neutralize the depth value of the pixel corresponding to the position in each second depth map, and map the texture map corresponding to the depth map to be processed and the texture value of the corresponding position in the texture map corresponding to each second depth map to the corresponding view angle
  • the corresponding position in the texture map corresponding to the third depth map is obtained, and the mapped texture value corresponding to the third depth map of each corresponding view angle is obtained;
  • the actual texture values of the corresponding positions are matched, and the confidence of the pixels corresponding to the positions in the to-be-processed depth map and each second depth map is determined based on the distribution interval of the matching degree of the texture value corresponding to the third depth map of each corresponding viewing angle Degree value.
  • the first confidence value determining component 12311 maps pixels corresponding to positions in the to-be-processed depth map and each second depth map to the third depth map of each corresponding viewing angle , Obtain the mapping depth value of the pixel at the corresponding position in the third depth map of each corresponding view; compare the mapping depth value of the pixel at the corresponding position in the third depth map of each corresponding view with the third depth value of the corresponding view.
  • the actual depth value of the pixel at the corresponding position in the depth map is matched, and based on the distribution interval of the matching degree of the depth value corresponding to the third depth map of each corresponding viewing angle, the position in the to-be-processed depth map and each second depth map are determined to be corresponding The confidence value of the pixel.
  • the first confidence value determining component 12311 is adapted to obtain the depth values of the pixels corresponding to the positions in the to-be-processed depth map and each second depth map, according to the depth
  • the values are respectively mapped to the corresponding pixel positions in the third depth map of the corresponding viewing angle, and the depth values of the corresponding pixel positions in the third depth map of the corresponding viewing angle are obtained and inversely mapped to the to-be-processed depth map and each second depth
  • the corresponding pixel positions in the figure are obtained, and the corresponding mapped pixel positions of the third depth map of each corresponding viewing angle in the to-be-processed depth map and each of the second depth maps are obtained; respectively, the to-be-processed depth map and each second depth map are calculated
  • the pixel distance between the actual pixel position of the pixel at the corresponding position in the depth map and the mapped pixel position obtained by inverse mapping of the third depth map of the corresponding viewing angle, and based on the calculated distribution interval of each
  • the second confidence value determining component 12312 is adapted to associate the pixels corresponding to the positions in the to-be-processed depth map and each second depth map with the depth map where the pixel is located. Match the depth values of the pixels in the surrounding preset area, and determine the to-be-processed depth map and each second depth based on the matching degree of the depth value and the number of pixels whose matching degree meets the preset pixel matching degree threshold The confidence value of the pixel corresponding to the position in the figure.
  • the second confidence value determining component 12312 compares the pixels corresponding to the positions in the to-be-processed depth map and each second depth map with the surroundings in the depth map where the pixel is located. It is assumed that the weighted average value of the depth values of the pixels in the area is matched, and the to-be-processed depth map and the pixel corresponding to the position in each second depth map are matched with the corresponding weighted average value. The confidence value of the pixel corresponding to the position in the depth map and each second depth map.
  • the weight value of the window filter coefficient may further include at least one of the following: a second filter coefficient weight value corresponding to the frame distance, and a third filter coefficient weight value corresponding to the pixel similarity.
  • the window filter coefficient value obtaining unit 123 may further include at least one of the following:
  • the second filter coefficient weight value obtaining subunit 1232 is adapted to obtain the frame distance between each video frame in the video frame sequence and the current video frame, and determine the second filter coefficient weight value corresponding to the frame distance;
  • the third filter coefficient weight value obtaining subunit 1233 is adapted to obtain the similarity value of the pixel corresponding to the position of the texture map corresponding to each second depth map and the texture map corresponding to the depth map to be processed, and to determine the similarity Value corresponding to the third filter coefficient weight value.
  • the filtering unit 124 is adapted to multiply the product of the first filter coefficient weight value and at least one of the second filter coefficient weight value and the third filter coefficient weight value, Or the weighted average value is used as the window filter coefficient value corresponding to each video frame; the product of the depth value of the pixel corresponding to the position in each second depth map and the corresponding window filter coefficient value of each video frame is calculated The weighted average value is used to obtain the filtered depth value corresponding to the position in the to-be-processed depth map.
  • the embodiments of this specification also provide a video reconstruction system, which can be used to perform video reconstruction, which can improve the image quality of the reconstructed video.
  • the video reconstruction system 130 includes: an acquisition module 131, a filtering module 132, a selection module 133, and an image reconstruction module 134, wherein:
  • the acquisition module 131 is adapted to acquire image combinations of video frames with multiple angles and free viewing angles, parameter data corresponding to the image combinations of the video frames, and virtual viewpoint position information based on user interaction, wherein the image combinations of the video frames Including multiple sets of texture maps and depth maps that are synchronized at multiple angles;
  • the filtering module 132 is adapted to filter the depth map in the video frame
  • the selection module 133 is adapted to select, according to the virtual viewpoint position information and the parameter data corresponding to the image combination of the video frame, according to a preset rule, select the texture map and the corresponding group of the image combination of the video frame at the moment of user interaction. Filtered depth map;
  • the image reconstruction module 134 is adapted to convert the video at the selected user interaction time based on the virtual viewpoint position information and the parameter data corresponding to the texture map and the depth map of the corresponding group in the image combination of the video frame at the time of user interaction.
  • the texture map and the filtered depth map of the corresponding group in the image combination of the frame are combined and rendered to obtain the reconstructed image corresponding to the virtual viewpoint position at the user interaction time;
  • the filtering module 132 may include:
  • the depth map acquiring unit 1321 is adapted to acquire the depth map to be processed from the image combination of the current video frame of the multi-angle free viewing angle, and the image combination of the current video frame of the multi-angle free viewing angle includes multiple groups synchronized at multiple angles. Relationship texture map and depth map;
  • the frame sequence obtaining unit 1322 is adapted to obtain a video frame sequence including a preset window in the time domain of the current video frame;
  • the window filter coefficient value obtaining unit 1323 is adapted to obtain a window filter coefficient value corresponding to each video frame in the video frame sequence, and the window filter coefficient value is generated by weight values of at least two dimensions, including: pixel confidence corresponding
  • the first filter coefficient weight value of the window filter coefficient value acquisition unit includes: a first filter coefficient weight value acquisition subunit 13231, adapted to acquire pixels corresponding to positions in the to-be-processed depth map and each second depth map The confidence value of and the first filter coefficient weight value corresponding to the confidence value is determined, wherein: the second depth map is the same depth in each video frame of the video frame sequence as the perspective of the to-be-processed depth map picture;
  • the filtering unit 1324 is adapted to filter the pixels corresponding to the positions in the depth map to be processed based on the corresponding window filter coefficient values of each video frame to obtain the pixels corresponding to the positions in the depth map to be processed The filtered depth value.
  • the window filter coefficient value obtaining unit 1323 further includes at least one of the following:
  • the second filter coefficient weight value obtaining subunit 13232 is adapted to obtain the frame distance between each video frame in the video frame sequence and the current video frame, and determine the second filter coefficient weight value corresponding to the frame distance;
  • the third filter coefficient weight value obtaining subunit 13233 is adapted to obtain the similarity value of the pixel corresponding to the position of the texture image corresponding to each second depth map and the texture image corresponding to the depth image to be processed, and to determine the similarity Value corresponding to the third filter coefficient weight value.
  • the specific implementation of the filtering module 132 can be seen in FIG. 12, and the depth map processing device shown in FIG. 12 can be used as the filtering module 132 to perform temporal filtering.
  • the depth map processing device and depth map in the foregoing embodiment.
  • the treatment method is implemented.
  • the depth map processing device may be implemented by corresponding software, hardware, or a combination of software and hardware.
  • the calculation of the weight value of each filter coefficient can be implemented by one or more CPUs or GPUs, or the CPU and GPU coordinately.
  • the CPU can communicate with one or more GPU chips or GPU modules to control each GPU chip or GPU module. Perform filtering processing of the depth map.
  • the embodiment of this specification also provides an electronic device.
  • the electronic device 140 may include a memory 141 and a processor 142.
  • the processor 142 can execute the steps of the depth map processing method described in any of the foregoing embodiments or the video reconstruction method described in any of the foregoing embodiments when the processor 142 runs the computer instructions.
  • steps please refer to the introduction of the foregoing embodiment, which will not be repeated here.
  • the processor 142 may specifically include a CPU chip 1421 formed by one or more CPU cores, or may include a GPU chip 1422, or a chip module composed of the CPU chip 1421 and the GPU chip 1422.
  • the processor 142 and the memory 141 may communicate through a bus or the like, and each chip may also communicate through a corresponding communication interface.
  • the embodiments of this specification also provide a computer-readable storage medium on which computer instructions are stored, and the computer instructions can execute the depth map processing method described in any of the foregoing embodiments or the depth map processing method described in any of the foregoing embodiments when the computer instructions are run.
  • the steps of the video reconstruction method For specific steps, please refer to the introduction of the foregoing embodiment, which will not be repeated here.
  • the server cluster 13 in the cloud can first use the solution of the embodiment of this specification to perform temporal filtering on the depth map, and then perform image reconstruction based on the texture map and the filtered depth map of the corresponding group in the image combination of the video frame to obtain the reconstructed multi-angle Free perspective image.
  • the cloud server cluster 13 may include: a first cloud server 131, a second cloud server 132, a third cloud server 133, and a fourth cloud server 134.
  • the first cloud server 131 may be used to determine the parameter data corresponding to the image combination;
  • the second cloud server 132 may be used to determine the depth data of each frame of the image in the image combination;
  • the third cloud server 133 may be based on the The parameter data corresponding to the image combination, the pixel data and the depth data of the image combination are reconstructed using a depth image-based virtual view point reconstruction (DIBR) algorithm to reconstruct a preset virtual view point path;
  • DIBR depth image-based virtual view point reconstruction
  • the fourth cloud server 134 may be used to generate multi-angle free-view video, where the multi-angle free-view video data may include: multi-angle free-view spatial data and multi-angle free-view time data of frame images sorted by frame time.
  • first cloud server 131, the second cloud server 132, the third cloud server 133, and the fourth cloud server 134 may also be a server group composed of a server array or server sub-clusters, which is not done in the embodiment of this specification. limit.
  • the second cloud server 132 may obtain a depth map from the image combination of the current video frame with multiple angles and free viewing angles as the depth map to be processed.
  • the solution performs time-domain filtering on the to-be-processed depth map, which can improve the stability of the depth map in the time domain.
  • the depth map after time-domain filtering is used for video reconstruction, whether on the playback terminal 15 or the interactive terminal 16. All of them can improve the image quality of video reconstruction.
  • the collection device can also be set in the ceiling area of the basketball stadium, on the basketball stand, and so on.
  • the collection devices can be arranged and distributed along a straight line, a fan shape, an arc line, a circle, a matrix, or an irregular shape.
  • the specific arrangement can be set according to one or more factors such as the specific site environment, the number of acquisition devices, the characteristics of the acquisition devices, and the requirements for imaging effects.
  • the collection device may be any device with a camera function, for example, a common camera, a mobile phone, a professional camera, and the like.
  • each collection device in the collection array 11 can transmit the obtained video data stream to the data processing device 12 in real time via a switch 17 or a local area network.
  • the data processing device 12 can be placed in an on-site non-collection area or in the cloud according to a specific scenario, and the server (cluster) and playback control device can be placed in an on-site non-collection area, cloud or terminal access according to the specific scenario.
  • this embodiment is not used to limit the specific implementation and protection scope of the present invention.
  • working principles, specific functions, and effects of each device, system, device, or system in the embodiments of this specification please refer to the specific introduction in the corresponding method embodiments.
  • solutions of the above embodiments are applicable to live or quasi-live broadcast scenarios, but are not limited to this.
  • the solutions in the embodiments of this specification are for video or image collection, data processing of video data streams, and server image generation. It can also be applied to the playback requirements of non-live broadcast scenes, such as recording, broadcasting, and other scenes with low latency requirements.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Generation (AREA)
  • Processing Or Creating Images (AREA)

Abstract

深度图处理方法、视频重建方法及相关装置,其中所述方法包括:从多角度自由视角的当前视频帧的图像组合中获取待处理深度图,所述多角度自由视角的当前视频帧的图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图;获取包含所述当前视频帧的时域上预设窗口的视频帧序列;获取所述视频帧序列中各视频帧相应的窗口滤波系数值,所述窗口滤波系数值由至少两个维度的权重值生成,其中包括:像素置信度对应的第一滤波系数权重值;基于所述各视频帧相应的窗口滤波系数值,按照预设的滤波方式对所述待处理深度图中位置相应的像素进行滤波,得到所述待处理深度图中位置相应的像素滤波后的深度值。采用上述方案可以提高深度图在时域上的稳定性。

Description

深度图处理方法、视频重建方法及相关装置
本申请要求2020年04月20日递交的申请号为202010312853.4、发明名称为“深度图处理方法、视频重建方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本说明书实施例涉及视频处理技术领域,尤其涉及一种深度图处理方法、视频重建方法及相关装置。
背景技术
6自由度(6 Degree of Freedom,6DoF)技术是为了提供高自由度观看体验的一种技术,用户可以在观看中通过交互操作,来调整观看的视角,从而可以从想观看的自由视点角度进行观看。
在大范围的场景中,比如体育比赛,通过基于深度图的图像绘制(Depth Image Based Rendering,DIBR)技术来实现高自由度的观看是一种具有很大潜力和可行性的方案。相比于点云重建方案对于重建的视点质量和稳定性的不足,DIBR技术在重建的视点质量上已经可以接近原始采集的视点的质量。
在DIBR方案中,深度图在时域上的稳定性对于最终的重建图像的质量具有重要的影响。
发明内容
有鉴于此,为提高深度图在时域上的稳定性,本说明书实施例的一个方面,提供一种深度图处理方法及相关装置。
为提高重建视频的图像质量,本说明书实施例的另一方面,提供一种视频重建方法及相关装置。
首先,本说明书实施例提供了一种深度图生成方法,包括:
从多角度自由视角的当前视频帧的图像组合中获取待处理深度图,所述多角度自由视角的当前视频帧的图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图;
获取包含所述当前视频帧的时域上预设窗口的视频帧序列;
获取所述视频帧序列中各视频帧相应的窗口滤波系数值,所述窗口滤波系数值由至少两个维度的权重值生成,其中包括:像素置信度对应的第一滤波系数权重值,采用如下方式获取所述第一滤波系数权重值:获取所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,以及确定所述置信度值对应的第一滤波系数权重值,其中:所述第二深度图为所述视频帧序列各视频帧中与所述待处理深度图视角相同的深度图;
基于所述各视频帧相应的窗口滤波系数值,按照预设的滤波方式对所述待处理深度图中位置相应的像素进行滤波,得到所述待处理深度图中位置相应的像素滤波后的深度值。
可选地,所述窗口滤波系数的权重值还包括:帧距对应的第二滤波系数权重值和像素相似度对应的第三滤波系数权重值其中至少一种;采用如下方式获取所述第二滤波系数权重值和第三滤波系数权重值:
获取所述视频帧序列中各视频帧与所述当前视频帧的帧距,以及确定所述帧距对应的第二滤波系数权重值;
获取各第二深度图对应的纹理图与所述待处理深度图对应的纹理图中位置相应的像素的相似度值,以及确定所述相似度值对应的第三滤波系数权重值;
可选地,所述获取所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,包括以下至少一种:
获取所述待处理深度图和各第二深度图对应视角周围预设视角范围内的深度图,得到对应视角的第三深度图,基于所述各对应视角的第三深度图,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值;
基于所述待处理深度图中和各第二深度图中位置相应的像素与所述像素所处深度图中周围预设区域内像素的空间一致性,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。
可选地,所述基于所述各对应视角的第三深度图,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,包括:
获取所述待处理深度图对应的纹理图和各第二深度图对应的纹理图,根据所述待处理深度图中和各第二深度图中位置相应的像素的深度值,将所述待处理深度图对应的纹理图中和各第二深度图对应的纹理图中相应位置的纹理值分别映射到各对应视角的第三深度图对应的纹理图中的相应位置,得到各对应视角的第三深度图对应的映射纹理值;
将所述映射纹理值分别与各对应视角的第三深度图对应的纹理图中的相应位置的实 际纹理值进行匹配,基于各对应视角的第三深度图对应的纹理值的匹配度的分布区间,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。
可选地,所述基于所述各对应视角的第三深度图,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,包括:
将所述待处理深度图中和各第二深度图中位置相应的像素映射到各对应视角的第三深度图上,得到所述各对应视角的第三深度图中相应位置像素的映射深度值;
将所述各对应视角的第三深度图中相应位置像素的映射深度值分别与所述各对应视角的第三深度图中相应位置像素的实际深度值进行匹配,基于各对应视角的第三深度图对应的深度值的匹配度的分布区间,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。
可选地,所述基于所述各对应视角的第三深度图,确定所述待处理深度图和各第二深度图中位置相应的像素的置信度值,包括:
分别获取所述待处理深度图中和各第二深度图中位置相应的像素的深度值,根据所述深度值,分别映射到对应视角的第三深度图中相应像素位置,获取并根据对应视角的第三深度图中相应像素位置的深度值,反映射到所述待处理深度图中和各第二深度图中相应的像素位置,得到各对应视角的第三深度图在所述待处理深度图中和各第二深度图中对应的映射像素位置;
分别计算所述待处理深度图中和各第二深度图中相应位置的像素的实际像素位置与对应视角的第三深度图反映射得到的映射像素位置的像素距离,基于计算得到的各像素距离的分布区间,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。
可选地,所述基于所述待处理深度图和各第二深度图中位置相应的像素与所述像素所处深度图中周围预设区域内像素的空间一致性,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,包括以下至少一种:
将所述待处理深度图中和各第二深度图中位置相应的像素分别与所述像素所处深度图中周围预设区域内的像素的深度值进行匹配,基于所述深度值的匹配度以及匹配度满足预设像素匹配度阈值的像素的数量,分别确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值;
将所述待处理深度图中和各第二深度图中位置相应的像素与所述像素所处深度图中周围预设区域内的像素的深度值的加权平均值进行匹配,基于所述待处理深度图中和各 第二深度图中位置相应的像素与对应的加权平均值的匹配度,分别确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。
可选地,所述基于各视频帧相应的窗口滤波系数值,按照预设的滤波方式对所述待处理深度图中位置相应的像素的深度值进行滤波,得到所述待处理深度图位置相应的像素滤波后的深度值,包括:
将所述第一滤波系数权重值与所述第二滤波系数权重值和所述第三滤波系数权重值至少其中之一的乘积,或者加权平均值作为各视频帧相应的窗口滤波系数值;
计算所述待处理深度图中和各第二深度图中位置相应的像素的深度值与各视频帧相应的窗口滤波系数值之积的加权平均值,得到所述待处理深度图中位置相应的像素滤波后的深度值。
可选地,所述当前视频帧位于所述视频帧序列的中间位置。
本说明书实施例提供了另一种深度图处理方法,包括:
从多角度自由视角的当前视频帧的图像组合中获取待处理深度图,所述多角度自由视角的当前视频帧的图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图;
获取包含所述当前视频帧的时域上预设窗口的视频帧序列;
获取所述视频帧序列中各视频帧相应的窗口滤波系数值,所述窗口滤波系数值由至少两个维度的权重值生成,其中包括:像素置信度对应的第一滤波系数权重值,采用如下方式获取所述第一滤波系数权重值:获取所述待处理深度图和各第二深度图对应视角周围预设视角范围内的深度图,得到对应视角的第三深度图,基于所述各对应视角的第三深度图,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值;以及确定所述置信度值对应的第一滤波系数权重值;
基于各视频帧相应的窗口滤波系数值,按照预设的滤波方式对所述待处理深度图中位置相应的像素进行滤波,得到所述待处理深度图中位置相应的像素滤波后的深度值。
本说明书实施例还提供了一种视频重建方法,包括:
获取多角度自由视角的视频帧的图像组合、所述视频帧的图像组合对应的参数数据以及基于用户交互的虚拟视点位置信息,其中,所述视频帧的图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图;
采用前述任一实施例所述深度图处理方法得到滤波后的深度图;
根据所述虚拟视点位置信息及所述视频帧的图像组合对应的参数数据,按照预设规 则选择用户交互时刻所述视频帧的图像组合中相应组的纹理图和滤波后的深度图;
基于所述虚拟视点位置信息及用户交互时刻所述视频帧的图像组合中相应组的纹理图和深度图对应的参数数据,将选择的用户交互时刻所述视频帧的图像组合中相应组的纹理图和滤波后的深度图进行组合渲染,得到所述用户交互时刻虚拟视点位置对应的重建图像。
本说明书实施例还提供了一种深度图处理装置,包括:
深度图获取单元,适于从多角度自由视角的当前视频帧的图像组合中获取待处理深度图,所述多角度自由视角的当前视频帧的图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图;
帧序列获取单元,适于获取包含所述当前视频帧的时域上预设窗口的视频帧序列;
窗口滤波系数值获取单元,适于获取所述视频帧序列中各视频帧相应的窗口滤波系数值,所述窗口滤波系数值由至少两个维度的权重值生成,其中包括:像素置信度对应的第一滤波系数权重值,所述窗口滤波系数值获取单元包括:第一滤波系数权重值获取子单元,适于获取所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,以及确定所述置信度值对应的第一滤波系数权重值,其中:所述第二深度图为所述视频帧序列各视频帧中与所述待处理深度图视角相同的深度图;
滤波单元,适于基于各视频帧相应的窗口滤波系数值,按照预设的滤波方式对所述待处理深度图中位置相应的像素进行滤波,得到所述待处理深度图中位置相应的像素滤波后的深度值。
可选地,所述窗口滤波系数值获取单元还包括如下至少一种:
第二滤波系数权重值获取子单元,适于获取所述视频帧序列中各视频帧与所述当前视频帧的帧距,以及确定所述帧距对应的第二滤波系数权重值;
第三滤波系数权重值获取子单元,适于获取各第二深度图对应的纹理图与所述待处理深度图对应的纹理图中位置相应的像素的相似度值,以及确定所述相似度值对应的第三滤波系数权重值。
本说明书实施例还提供了一种视频重建系统,包括:
获取模块,适于获取多角度自由视角的视频帧的图像组合、所述视频帧的图像组合对应的参数数据以及基于用户交互的虚拟视点位置信息,其中,所述视频帧的图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图;
滤波模块,适于对所述视频帧中的深度图进行滤波;
选择模块,适于根据所述虚拟视点位置信息及所述视频帧的图像组合对应的参数数据,按照预设规则选择用户交互时刻所述视频帧的图像组合中相应组的纹理图和滤波后的深度图;
图像重建模块,适于基于所述虚拟视点位置信息及用户交互时刻所述视频帧的图像组合中相应组的纹理图和深度图对应的参数数据,将选择的用户交互时刻所述视频帧的图像组合中相应组的纹理图和滤波后的深度图进行组合渲染,得到所述用户交互时刻虚拟视点位置对应的重建图像;
其中,所述滤波模块包括:
深度图获取单元,适于从多角度自由视角的当前视频帧的图像组合中获取待处理深度图,所述多角度自由视角的当前视频帧的图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图;
帧序列获取单元,适于获取包含所述当前视频帧的时域上预设窗口的视频帧序列;
窗口滤波系数值获取单元,适于获取所述视频帧序列中各视频帧相应的窗口滤波系数值,所述窗口滤波系数值由至少两个维度的权重值生成,其中包括:像素置信度对应的第一滤波系数权重值,所述窗口滤波系数值获取单元包括:第一滤波系数权重值获取子单元,适于获取所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,以及确定所述置信度值对应的第一滤波系数权重值,其中:所述第二深度图为所述视频帧序列各视频帧中与所述待处理深度图视角相同的深度图;
滤波单元,适于基于各视频帧相应的窗口滤波系数值,按照预设的滤波方式对所述待处理深度图中位置相应的像素进行滤波,得到所述待处理深度图中位置相应的像素滤波后的深度值。
可选地,所述窗口滤波系数值获取单元还包括如下至少一种:
第二滤波系数权重值获取子单元,适于获取所述视频帧序列中各视频帧与所述当前视频帧的帧距,以及确定所述帧距对应的第二滤波系数权重值;
第三滤波系数权重值获取子单元,适于获取各第二深度图对应的纹理图与所述待处理深度图对应的纹理图中位置相应的像素的相似度值,以及确定所述相似度值对应的第三滤波系数权重值。
本说明书实施例还提供了一种电子设备,包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的计算机指令,=所述处理器运行所述计算机指令时执行前述任一实施例所述方法的步骤。
本说明书实施例还提供了一种计算机可读存储介质,其上存储有计算机指令,所述计算机指令运行时执行前述任一实施例所述方法的步骤。
与现有技术相比,本说明书实施例的技术方案具有以下有益效果:
采用本说明书实施例的深度图处理方案,从多角度自由视角的当前视频帧的图像组合获取待处理深度图在时域上进行滤波,对于时域上预设窗口的视频帧序列中各视频帧中与所述待处理深度图视角相同的深度图,即第二深度图,通过获取所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,以及确定所述置信度值对应的第一滤波系数权重值,并基于所述第一滤波系数权重值生成窗口滤波系数值,基于所述各视频帧相应的窗口滤波系数值,按照预设的滤波方式对所述待处理深度图中位置相应的像素进行滤波,得到所述待处理深度图中位置相应的像素滤波后的深度值,可以避免引入所述待处理深度图中和各第二深度图中不可靠的深度值对滤波结果造成影响,从而可以提高深度图在时域上的稳定性。
采用本说明书实施例的视频重建方案,其中对于视频帧中的深度图在时域上进行滤波,对于时域上预设窗口的视频帧序列中各视频帧中与所述待处理深度图视角相同的深度图,即第二深度图,通过获取所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,以及确定所述置信度值对应的第一滤波系数权重值,并将所述第三滤波系数权重值加入到窗口滤波系数值,可以避免引入所述待处理深度图中和各第二深度图中不可靠的深度值对滤波结果造成影响,从而可以提高深度图在时域上的稳定性,进而可以提高重建视频的图像质量。
附图说明
图1是本说明书实施例的深度图处理方法适用的一具体应用场景中的数据处理系统的架构示意图。
图2是本说明书实施例中一种多角度自由视角数据生成过程的示意图。
图3是本说明书实施例中一种用户侧对6DoF视频数据处理的示意图。
图4是本说明书实施例中一种视频重建系统的输入和输出示意图。
图5是本说明书实施例中一种深度图处理方法的流程图。
图6是本说明书实施例中一种应用场景中视频帧序列示意图。
图7是本说明书实施例中一种获取待处理深度图中和各第二深度图中位置相应的像素的置信度值的方法流程图。
图8是本说明书实施例中另一种获取待处理深度图中和各第二深度图中位置相应的像素的置信度值的方法流程图。
图9是本说明书实施例中另一种获取待处理深度图中和各第二深度图中位置相应的像素的置信度值的方法流程图。
图10是本说明书实施例中一种确定待处理深度图中和各第二深度图中位置相应的像素的置信度值的场景示意图。
图11是本说明书实施例中一种视频重建方法的流程图。
图12是本说明书实施例中一种深度图处理装置的结构示意图。
图13是本说明书实施例中一种视频重建系统的结构示意图。
图14是本说明书实施例中一种电子设备的结构示意图。
具体实施方式
在传统的视频播放场景中,例如体育比赛的播放视频,用户在观看过程中往往只能通过一个视点位置观看比赛,无法自己自由切换视点位置,来观看不同视角位置处的比赛画面或比赛过程,也就无法体验在现场一边移动视点一边看比赛的感觉。
采用6自由度(6 Degree of Freedom,6DoF)技术可以提供高自由度观看体验,用户可以在观看过程中通过交互手段,来调整视频观看的视角,从想观看的自由视点角度进行观看,从而大幅度的提升观看体验。
为实现6DoF场景,目前有Free-D回放技术及基于深度图的DIBR技术等。其中,Free-D回放技术是通过多角度拍摄获取场景的点云数据对6DoF图像进行表达,并基于点云数据进行6DoF图像或视频的重建。而基于深度图的6DoF视频生成方法是基于所述虚拟视点位置及对应组的纹理图和深度图对应的参数数据,将用户交互时刻所述视频帧的图像组合中相应组的纹理图和深度图进行组合渲染,进行6DoF图像或视频的重建。
相比于点云重建方案对于重建的视点质量和稳定性的不足,DIBR技术在重建的视点质量上已经可以接近原始采集的视点的质量。其中,为提升重建的视点质量,DIBR过程中会对深度图在时域上进行滤波,以提升深度图重建的时域稳定性。
然而,发明人发现,在某些情况下滤波后所生成的深度图质量反而下降。为此,发明人进行了进一步深入的研究和实验,发现时域上参与滤波的深度图中像素的深度值本身并不总是可靠的,在不可靠的深度值加入滤波后,反而导致滤波后最终所生成的深度图的质量下降。
参见图1所示的一种具体应用场景中的数据处理系统的结构示意图,其中示出了一场篮球赛的数据处理系统的布置场景,数据处理系统10包括由多个采集设备组成的采集阵列11、数据处理设备12、云端的服务器集群13、播放控制设备14,播放终端15和交互终端16。采用数据处理系统10,可以实现多角度自由视角视频的重建,用户可以观看低时延的多角度自由视角视频。
参照图1,以左侧的篮球框作为核心看点,以核心看点为圆心,与核心看点位于同一平面的扇形区域作为预设的多角度自由视角范围。所述采集阵列11中各采集设备可以根据所述预设的多角度自由视角范围,成扇形置于现场采集区域不同位置,可以分别从相应角度实时同步采集视频数据流。
而为了不影响采集设备工作,所述数据处理设备12可以置于现场非采集区域,可视为现场服务器。所述数据处理设备12可以通过无线局域网向所述采集阵列11中各采集设备分别发送拉流指令,所述采集阵列11中各采集设备基于所述数据处理设备12发送的拉流指令,将获得的视频数据流实时传输至所述数据处理设备12。
当所述数据处理设备12接收到视频帧截取指令时,从接收到的多路视频数据流中对指定帧时刻的视频帧截取得到多个同步视频帧的帧图像,并将获得的所述指定帧时刻的多个同步视频帧上传至云端的服务器集群13。
相应地,云端的服务器集群13将接收的多个同步视频帧的帧图像作为图像组合,确定所述图像组合相应的参数数据及所述图像组合中各帧图像的深度数据,并基于所述图像组合相应的参数数据、所述图像组合中预设帧图像的像素数据和深度数据,对预设的虚拟视点路径进行帧图像重建,获得相应的多角度自由视角视频数据,所述多角度自由视角视频数据可以包括:按照帧时刻排序的帧图像的多角度自由视角空间数据和多角度自由视角时间数据。
在具体实施中,云端的服务器集群13可以采用如下方式存储所述图像组合的像素数据及深度数据:
基于所述图像组合的像素数据及深度数据,生成对应帧时刻的拼接图像,所述拼接图像包括第一字段和第二字段,其中,所述第一字段包括所述图像组合中预设帧图像的像素数据,所述第二字段包括所述图像组合中预设帧图像的深度数据的第二字段。获取的拼接图像和相应的参数数据可以存入数据文件中,当需要获取拼接图像或参数数据时,可以根据数据文件的头文件中相应的存储地址,从相应的存储空间中读取。
然后,播放控制设备14可以将接收到的所述多角度自由视角视频数据插入待播放数 据流中,播放终端15接收来自所述播放控制设备14的待播放数据流并进行实时播放。其中,播放控制设备14可以为人工播放控制设备,也可以为虚拟播放控制设备。在具体实施中,可以设置专门的可以自动切换视频流的服务器作为虚拟播放控制设备进行数据源的控制。导播控制设备如导播台可以作为本说明书实施例中的一种播放控制设备。
当云端的服务器集群13收到的来自交互终端16的图像重建指令时,可以提取所述相应图像组合中预设帧图像的拼接图像及相应图像组合相应的参数数据并传输至所述交互终端16。
交互终端16基于触发操作,确定交互帧时刻信息,向服务器集群13发送包含交互帧时刻信息的图像重建指令,接收从云端的服务器集群13返回的对应交互帧时刻的图像组合中预设帧图像的拼接图像及对应的参数数据,并基于交互操作确定虚拟视点位置信息,按照预设规则选择所述拼接图像中相应的像素数据和深度数据及对应的参数数据,将选择的像素数据和深度数据进行组合渲染,重建得到所述交互帧时刻虚拟视点位置对应的多角度自由视角视频数据并进行播放。
通常而言,视频中的实体不会是完全静止的,例如采用上述数据处理系统,在篮球比赛过程中,采集阵列采集到的实体如运动员、篮球、裁判员等大都处于运动状态。相应地,采集到的视频帧的图像组合中的纹理数据和像素数据均也随着时间变化而不断地变动。
为了提高所生成的多角度自由视角视频图像的质量,针对上述问题,云端的服务器集群13可以对生成多角度自由视角视频的深度图进行时域滤波。例如,对于待处理的深度图,可以基于所述待处理深度图的纹理图与时域上所述待处理深度图具有相同视角的深度图的纹理图的相似度,设置相应的滤波系数进行时域滤波。
发明人发现,实际采集的所述待处理深度图中的像素值,或者与所述待处理深度图具有相同视角的深度图中的像素值,有可能是错误的,例如在所述视角下某些实体被遮挡住了,因此,参与滤波的深度图中像素的深度值本身可能是不可靠的,将不可靠的深度值加入滤波后,反而会导致滤波后最终所生成的深度图的质量下降。针对上述问题,本说明书实施例的深度图处理方案中,对于时域上预设窗口的视频帧序列中当前视频帧中的待处理深度图,以及各视频帧中与当前视频帧中待处理深度图视角相同的深度图,考虑其中位置相应像素的置信度值,并将与所述置信度值对应的滤波系数权重值加入时域上所述预设窗口的滤波系数值,从而可以避免预设窗口的视频帧序列中所述待处理深度图和各视频帧中与当前视频帧中待处理深度图相同视角的深度图中的不可靠深度值对 滤波结果造成影响,从而可以提高深度图在时域上的稳定性。
为使本领域技术人员更好地理解和实现本说明书实施例中的深度图处理方案和视频重建方案,以下首先对基于DIBR得到6DoF视频的原理进行简要介绍。
首先,可以通过采集设备获取视频数据或图像数据,并进行深度图计算,主要包括三个步骤,分别为:多摄像机的视频采集(Multi-camera Video Capturing),摄像机内外参计算(Camera Parameter Estimation),以及深度图计算(Depth Map Calculation)。对于多摄像机采集来说,要求各个摄像机采集的视频可以帧级对齐。结合参考图2,通过多摄像机的视频采集(步骤S21)可以得到纹理图(Texture Image)21;通过摄像机内外参计算(步骤S22),可以得到摄像机参数(Camera Parameter)22,也即后文中的参数数据,包括摄像机的内部参数数据和外部参数数据;通过深度图计算(步骤S23),可以得到深度图(Depth Map)23。在以上三个步骤完成后,就得到了从多摄像机采集来的纹理图、所有的摄像机参数以及每个摄像机的深度图。可以把这三部分数据称作为多角度自由视角视频数据中的数据文件,也可以称作6自由度视频数据(6DoF video data)。有了6自由度视频数据,用户端就可以根据虚拟的6自由度(Degree of Freedom,DoF)位置,来生成虚拟视点,从而提供6DoF的视频体验。
6DoF视频数据以及指示性数据可以经过压缩和传输到达用户侧,其中,指示性数据也可以称作元数据(Metadata)。用户侧可以根据接收到的数据,获取用户侧6DoF表达,也即6DoF视频数据和前述的元数据,进而在用户侧进行6DoF渲染。
其中,元数据可以用来描述6DoF视频数据的数据模式,具体可以包括:拼接模式元数据(Stitching Pattern metadata),用来指示拼接图像中多个图像的像素数据以及深度数据的存储规则;边缘保护元数据(Padding pattern metadata),可以用于指示对拼接图像中进行边缘保护的方式,以及其它元数据(Other metadata)。元数据可以存储于数据头文件。
结合参考图3,基于6DoF视频数据(其中包括摄像机参数31,纹理图和深度图32,以及元数据33),除此之外,还有用户端的交互行为数据34。通过这些数据,可以采用基于深度图渲染(DIBR,Depth Image-Based Rendering)的6DoF渲染(步骤S30),从而在一个特定的根据用户行为产生的6DoF位置35产生虚拟视点的图像,也即根据用户指示,确定与该指示对应的6DoF位置的虚拟视点。
在本说明书实施例所采用的视频重建系统或DIBR应用软件中,可以接收摄像机参数、纹理图、深度图,以及虚拟摄像机的6DoF位置作为输入,同时输出在虚拟6DoF 位置的生成纹理图以及深度图。虚拟摄像机的6DoF位置即前述的根据用户行为确定的6DoF位置。所述DIBR应用软件可以是实现本说明书实施例中基于虚拟视点的图像重建的软件。
在本说明书实施例所采用的一DIBR软件中,结合参考图4,DIBR软件40可以接收摄像机参数41、纹理图42、深度图43,以及虚拟摄像机的6DoF位置数据44作为输入,可以通过生成纹理图S41的步骤和生成深度图的步骤S42生成在虚拟6DoF位置的纹理图以及深度图,并同时输出所生成的纹理图及深度图。
生成在虚拟6DoF位置的纹理图及深度图之前,可以对对输入的深度图进行处理,例如在时域上进行滤波。
以下参照附图,通过具体实施例对本说明书实施例中采用的可以提高深度图在时域上的稳定性的深度图处理方法进行详细介绍。
参照图5所示的深度图处理方法的流程图,具体可以采用如下步骤对深度图进行滤波处理:
S51,从多角度自由视角的当前视频帧的图像组合中获取待处理深度图,所述多角度自由视角的当前视频帧的图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图。
S52,获取包含所述当前视频帧的时域上预设窗口的视频帧序列。
在具体实施中,在获取到包含待处理深度图的当前视频帧后,可以获取包含所述当前视频帧的时域上预设窗口的视频帧序列。如图6所示的视频帧序列示意图,设视频序列中的第T帧为当前视频帧,在时域上预设窗口大小D等于2N+1帧,且当前视频帧处于时域上预设窗口截取的视频帧序列的中间位置,则可以得到从第T-N帧至第T+N帧共2N+1帧的视频帧序列。
可以理解的是,在具体实施中,当前视频帧也可以不位于预设窗口的视频帧序列的中间位置。
需要说明的是,所述时域上预设窗口的大小可以根据滤波精度要求并兼顾处理资源需求,根据经验进行设置。在本说明书一实施例中,窗口大小D为5个视频帧,即2N+1=5,N=2,在本说明书其他实施例中,N也可以为3或4等其他取值,具体取值可以选择不同的数值,并根据最终的滤波效果进行确定。并且,所述时域上预设窗口的大小可以根据当前帧在整个视频流中的位置进行调整。
在具体实施中,对于在整个视频流中的前N个视频帧中的深度图,可以不进行滤波 处理,即从第N+1帧开始进行滤波,在本说明书实施例中,T大于N。
S53,获取所述视频帧序列中各视频帧相应的窗口滤波系数值,所述窗口滤波系数值由至少两个维度的权重值生成,其中包括:像素置信度对应的第一滤波系数权重值。
在具体实施中,可以采用如下方式获取所述第一滤波系数权重值:
获取所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,以及确定所述置信度值对应的第一滤波系数权重值,其中:所述第二深度图为所述视频帧序列各视频帧中与所述待处理深度图视角相同的深度图。
在具体实施中,评估所述待处理深度图中和各第二深度图中像素的置信度,获取所述待处理深度图中和各第二深度图中位置相应的像素的置信度值的方法有多种。
例如,可以获取所述预设窗口内各视频帧中所述待处理深度图和各第二深度图对应视角周围预设视角范围内的深度图,得到对应视角的第三深度图,基于所述各对应视角的第三深度图,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。
又如,可以基于所述待处理深度图中和各第二深度图中位置相应的像素与所述像素所处深度图中周围预设区域内像素的空间一致性,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。
下文中将通过具体应用场景详细描述具体如何获取所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。
在本说明书实施例中,可以预设置信度值与第一滤波系数权重值的对应关系。其中,置信度值c越大,所对应的第一滤波系数权重值Weight_c越大;置信度值c越小,所对应的第一滤波系数权重值Weight_c越大,二者呈反相关关系。
S54,基于所述各视频帧相应的窗口滤波系数值,按照预设的滤波方式对所述待处理深度图中位置相应的像素进行滤波,得到所述待处理深度图中位置相应的像素滤波后的深度值。
采用上述实施例方案,对于时域上预设窗口的视频帧序列中各视频帧中与所述待处理深度图视角相同的深度图,即第二深度图,通过获取所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,以及确定所述置信度值对应的第一滤波系数权重值,并基于所述第一滤波系数权重值生成窗口滤波系数值,基于所述各视频帧相应的窗口滤波系数值,按照预设的滤波方式对所述待处理深度图中位置相应的像素进行滤波,得到所述待处理深度图中位置相应的像素滤波后的深度值,可以避免引入所述待处理深度图中和各第二深度图中不可靠的深度值对滤波结果造成影响,从而可以提高深度图在时域 上的稳定性。
在本说明书实施例中,为了提高深度图在时域上的稳定性,如前实施例所述,所述窗口滤波系数值由至少两个维度的权重值生成,且其中一个维度的权重值为像素置信度对应的第一滤波系数权重值。为使本领域技术人员更好地理解和实现本说明书实施例,以下对选取的生成所述窗口滤波系数值的其他维度的权重值通过具体实施例进行示例说明。
可以理解的是,除了以下示例维度的权重值,所述窗口滤波系数值还可以基于所述第一滤波系数权重值和其他一个或多个维度的滤波系数权重值生成,或者,所述窗口滤波系数权重值可以通过包括第一滤波系数权重值、如下至少一个维度的滤波系数权重值以及其他维度的滤波系数权重值生成。
示例维度一:帧距对应的第二滤波系数权重值
具体而言,获取所述视频帧序列中各视频帧与所述当前视频帧的帧距,以及确定所述帧距对应的第二滤波系数权重值。
在具体实施中,帧距可以以视频帧序列中帧位置的差值表示,也可以以视频帧序列中相应视频帧之间的时间间隔为单位。由于通常帧序列中帧与帧之间是等间隔分布的,为便于运算,这里选择以视频帧序列中帧位置的差值表示。继续参照图6,例如,第T-1帧和第T+1帧与当前视频帧(第T帧)之间的帧距为1帧,第T-2帧和第T+2帧与当前视频帧(第T帧)之间的帧距为2帧,以此类推,第T-N帧和第T+N帧与当前视频帧(第T帧)之间的帧距为N帧。
在具体实施中,可以预设设置帧距d与对应的第二滤波系数权重值Weight_d的对应关系。其中,帧距d越小,所对应的第二滤波系数权重值Weight_d越大;帧距d越大,所对应的第二滤波系数权重值Weight_d越小,二者呈反相关关系。
示例维度二:像素相似度对应的第三滤波系数权重值
具体而言,可以获取各第二深度图对应的纹理图与所述待处理深度图对应的纹理图中位置相应的像素的相似度值,以及确定所述相似度值对应的第三滤波系数权重值。
以下继续结合图6进行说明,例如,获取当前视频帧第T帧中视角M对应的深度图T M为待处理深度图,将窗口[T-N,T+N]内视角M对应的深度图(T-N) M…(T-2) M、(T-1) M、(T+1) M…(T+2) M、(T+N) M依次作为第T-N帧…第T-2帧、第T-1帧、第T+1帧…第T+2帧、第T+N帧与所述待处理深度图T M视角相同的深度图,即所述视频帧序列中当前视频帧T之外的各视频帧与所述待处理深度图T M对应的第二深度图。
继续参照图6,对于第T帧视角M的深度图T M对应的纹理图中的任一位置(x,y)对应的像素,为描述方便,称为第一像素,所述第一像素的纹理值表示为Pixel(x1,y1),可以获得各第二深度图对应的纹理图中与所述第一像素位置相应的像素的纹理值Color(x1’,y1’),进而可以获取各第二深度图对应的纹理图中与位置相应的像素的纹理值Color(x1’,y1’)相对于所述第一像素的纹理值Color(x1,y1)的相似度值,以及确定所述相似度值s对应的第三滤波系数权重值Weight_s。
在具体实施中,可以预设相似度值s与对应的第三滤波系数权重值Weight_s的对应关系。其中,相似度值s越大,所对应的第三滤波系数权重值Weight_s越小;相似度值s越小,所对应的第三滤波系数权重值Weight_s越大,二者呈反相关关系。
对于如何生成本说明书实施例中的窗口滤波系数值,除了基于像素置信度对应的第一滤波系数权重值外,若同时考虑上述两个示例维度的滤波系数权重值:帧距对应的第二滤波系数权重值和像素相似度对应的第三滤波系数权重值,在本说明书一些实施例中,可以将所述第一滤波系数权重值Weight i_c、所述第二滤波系数权重值Weight i_d和所述第三滤波系数权重值Weight i_s之积作为各视频帧相应的窗口滤波系数值Weight i,即:Weight i=Weight i_c*Weight i_d*Weight i_s(i取T-N至T+N)。之后,可以计算所述待处理深度图和各第二深度图中位置相应的像素的深度值与各视频帧相应的窗口滤波系数值之积的加权平均值,得到所述待处理深度图中位置相应的像素滤波后的深度值。
可以理解的是,在具体实施中,也可以将所述第一滤波系数权重值与所述第二滤波系数权重值和所述第三滤波系数权重值其中之一的乘积,或者加权平均值作为各视频帧相应的窗口滤波系数值。之后,计算所述待处理深度图中和各第二深度图中位置相应的像素的深度值与各视频帧相应的窗口滤波系数值之积的加权平均值,得到所述待处理深度图中位置相应的像素滤波后的深度值。
继续结合图6进行说明,对第T帧中视角为M的待处理深度图T M中的任一像素,为描述方便,称为第二像素,设第二像素的深度值
Figure PCTCN2021088024-appb-000001
可以采用如下公式进行滤波处理,得到第二像素滤波后的深度值
Figure PCTCN2021088024-appb-000002
Figure PCTCN2021088024-appb-000003
其中,上述公式中各深度图中与所述第二像素位置对应的像素均用Pixel(x2,y2)表 示,各深度图对应的视角标识和所处帧分别通过Pixel(x2,y2)的上标和下标进行区分。
可以理解的是,在具体实施中,得到窗口滤波系数值的方式并不限于以上实施例,例如,还可以取所述第一滤波系数权重值Weight i_c、第二滤波系数权重值Weight i_d、所述第三滤波系数权重值Weight i_s和三者的算术平均值或者加权平均值,或者其他的权重分配方式得到所述窗口滤波系数值。
采用以上方式对当前视频帧中待处理深度图中各像素分别进行滤波,在具体实施过程中,可以对待处理深度图中各像素依次进行滤波,为提高滤波速度,也可以对待处理深度图中各像素采用上述实施例方式并行地进行滤波处理,或者分批次对多个像素批量进行滤波处理。
采用上述实施例中的深度图处理方法,在对待处理深度图进行滤波过程中,不但考虑到了时域上预设窗口内各视频帧与所述待处理深度图视角相同的深度图(即第二深度图)与所述待处理深度图的帧距,和/或各与所述待处理深度图视角相同的深度图(即第二深度图)对应的纹理图相对应位置像素对应的纹理值的相似度,而且考虑所述待处理深度图中和各视频帧与所述待处理深度图视角相同的深度图(即第二深度图)中相对应像素的置信度,将相对应像素的置信度值加入到窗口滤波系数权重值,从而可以避免在时域上引入不可靠的深度值(包括所述待处理深度图和各第二深度图中不可靠的深度值)对滤波结果造成影响,故可以提高深度图在时域上的稳定性。
为使本领域技术人员更好地理解和实施本说明书实施例,以下通过一些具体实施例详细描述如何获取所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。
方式一,基于所述待处理深度图中和各第二深度图对应视角周围预设视角范围的深度图,确定所述待处理深度图和各第二深度图中位置相应的像素的置信度值。
具体而言,可以获取各视频帧中所述待处理深度图和各第二深度图对应视角周围预设视角范围内的深度图,得到对应视角的第三深度图,基于所述各对应视角的第三深度图,确定所述待处理深度图中和各第二深度图中位置相应位置的像素的置信度值。
参照图6,对于当前视频帧中视角为M的待处理深度图,各视频帧中所述第二深度图视角也为M,在具体实施中,可以获取视角为M的各深度图(包括待处理深度图和各第二深度图)对应视角周围预设视角范围[M-K,M+K]内的深度图,为描述方便,可以称为第三深度图。例如,具体可以以视角M为中心向两侧分别辐射15°、30°、45°、60°等等。可以理解的是,所述取值仅为示例性说明,并不用于限定本发明的范围,具体取值与各视频帧的图像组合中对应的图像组合的视点密度相关,视点密度越高,取值范围 可以越小,视点密度越低,取值范围可以相应扩大。
在具体实施中,视角范围也可以采用图像组合对应的视点在空间中的分布位置来确定,例如,呈弧线型排布的共40个采集设备同步采集到的纹理图以及由此得到的对应的深度图,M和K可以表示采集设备的位置,例如,M表示从左侧数第10个采集设备的视角,K取3,则可以基于从左数第7个至第13个采集设备的视角,分别基于第7个至第9个采集设备的视角和第11个至第13个采集设备的视角对应的深度图,可以确定第10个采集设备对应的深度图中相应位置的像素的置信度值。
需要说明的是,所述预设视角范围的取值区间也可以不以所述待处理深度图的视角为中心,具体取值可以根据各视频帧中深度图对应的空间位置关系进行确定。例如,可以选择距离各视频帧中对应的视点最近的一个或多个深度图,用于确定所述待处理深度图和各第二深度图中像素的置信度。
方式二,基于所述待处理深度图中和各第二深度图中位置相应的像素与所述像素所处深度图中周围预设区域内像素的空间一致性,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。
需要说明的是,对于方式一,可以有不同的实现形式,以下通过两个示例进行说明,在具体实施中,可以单独采用其中一种方式,或者二者结合使用,或者可以将其中任意一种或其结合进一步与其他方式组合使用,本说明书中示例仅为使本领域技术人员更好地理解和实施本发明,并不用于限定本发明的保护范围。
方式一之示例一:基于纹理图的匹配差异确定像素的置信度
参照图7所示的获取所述待处理深度图中和各第二深度图中位置相应的像素的置信度值的方法流程图,具体可以采用如下步骤:
S71,获取所述待处理深度图对应的纹理图和各第二深度图对应的纹理图。
如前实施例所述,由于多角度自由视角视频帧的图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图,因此可以从所述预设窗口的视频帧序列中当前视频帧和各第二深度图的视频帧的图像组合中获得对应视角的纹理图。参照图6,可以分别获得第T-N帧至第T+N帧视频帧中所有视角为M的深度图对应的纹理图。
S72,根据所述待处理深度图和各第二深度图中位置相应的像素的深度值,将所述待处理深度图对应的纹理图中和各第二深度图对应的纹理图中相应位置的纹理值分别映射到各对应视角的第三深度图对应的纹理图中的相应位置,得到各对应视角的第三深度图对应的映射纹理值。
在具体实施中,可以基于各视频帧的图像组合中不同视角的图像组合的空间位置关系,根据所述待处理深度图和各第二深度图中位置相应的像素的深度值,将所述待处理深度图对应的纹理图中和各第二深度图对应的纹理图中相应位置的纹理值分别映射到各对应视角的第三深度图对应的纹理图中的相应位置,得到各对应视角的第三深度图对应的映射纹理值。
继续结合图6进行说明,对于第T-N帧视频帧中视角M的深度图对应的纹理图的纹理值
Figure PCTCN2021088024-appb-000004
可以分别映射到第T-N帧视频帧中视角为[M-K,M+k]范围内的多个视角的第三深度图对应的纹理图中的相应位置,得到这多个视角的第三深度图对应的映射纹理值Color’(x,y),即:
Figure PCTCN2021088024-appb-000005
Figure PCTCN2021088024-appb-000006
S73,将所述映射纹理值分别与各对应视角的第三深度图对应的纹理图中的相应位置的实际纹理值进行匹配,基于各对应视角的第三深度图对应的纹理值的匹配度的分布区间,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。
其中,若各对应视角的第三深度图对应的纹理值的匹配度越高,则说明相应的纹理图的差异越小,则对应的待处理深度图中和各第二深度图中像素的深度值的可靠度越高,相应地,置信度值越高。
对于如何基于各对应视角的第三深度图对应的纹理值的匹配度的分布区间,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,可以有多种实施方式。
在具体实施中,可以基于各对应视角的第三深度图对应的纹理值的匹配度以及满足相应匹配度的第三深度图的数量,综合确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。例如,可以设置当匹配度大于预设的第一匹配度阈值的数量大于预设第一数目阈值,设置对应的置信度值为1,否则设置对应的置信度值为0。类似地,在具体实施中,也可以梯度设置纹理值的匹配度阈值、满足相应匹配度阈值的第三深度图的数量阈值以及置信度值的对应关系。
由上可知,所述待处理深度图中和各第二深度图中位置相应的像素的置信度值可以二值化取值,即取0或1,也可以设置为[0,1]内的任意值或者设置的离散值。
为描述方便,设各对应视角的第三深度图对应的纹理图中的相应位置的实际纹理值为Color_1(x,y),则可以分别将各视频帧的各对应视角的第三深度图对应的映射纹理值Color’(x,y)分别与对应视角的第三深度图对应的纹理图中的相应位置的实际纹理值Color_1(x,y)进行匹配,例如,所述第一匹配度阈值设置为80%。
在本说明书一实施例中,各视频帧中与所述待处理深度图和各第二深度图对应的第三深度图分别为所述待处理深度图和对应的第二深度图的视角两侧30度范围内的第三深度图,在所述待处理深度图和各第二深度图的视角两侧30度范围内,分别存在三个视角的第三深度图。考虑到视角的遮挡,例如若满足所述预设第一匹配度阈值的第三深度图的数量大于或等于2,则确定所述视频帧的第二深度图中位置相应的像素的置信度值为1;若满足所述预设第一匹配度阈值的第三深度图的数量为0,则确定所述视频帧的第二深度图中位置相应的像素的置信度值为1;若满足所述预设第一匹配度阈值的第三深度图的数量大于或等于2,则确定所述视频帧的第二深度图中位置相应的像素的置信度值为0.5。对于所述待处理深度图,同样存在三个视角的第三深度图,所述待处理深度图中位置相应的像素的置信度取值与上述各第二深度图中位置相应的像素的取值判断条件相同。
方式一之示例二:基于深度图的一致性确定像素的置信度
基于深度图的一致性确定像素的置信度,根据深度图映射方向的不同,可以有两种实现方式,一种是将所述待处理深度图中和第二深度图中位置相应的像素的深度值映射到各对应视角的第三深度图上,并将所述各对应视角的第三深度图中相应位置像素的映射深度值分别与相应位置像素的实际深度值进行匹配;另一种是先将获取得到的第三深度图中位置相应像素的深度值映射到对应的所述待处理深度图中和第二深度图中相应位置,再将在所述待处理深度图和各第二深度图对应视角的映射深度值分别与第二深度图中相应位置的实际深度值进行匹配。以下通过具体应用场景展开进行详细描述。
参照图8所示的获取所述待处理深度图中和各第二深度图中位置相应的像素的置信度值的方法流程图,可以将所述待处理深度图中和各第二深度图中位置相应的像素的深度值映射到各对应视角的第三深度图上,并将所述各对应视角的第三深度图中相应位置像素的映射深度值分别与相应位置像素的实际深度值进行匹配,具体可以采用如下步骤:
S81,将所述待处理深度图中和各第二深度图中位置相应的像素的深度值映射到各对应视角的第三深度图上,得到所述各对应视角的第三深度图中相应位置像素的映射深度值。
如前实施例所述,由于多角度自由视角视频帧的图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图,对于任一视频帧,均包括多个视角的深度图。按照预设的空间位置关系,可以将所述预设窗口内各视频帧中所述待处理深度图中和各第二深度图中位置相应的像素的深度值映射到各对应视角的第三深度图上,得到所述各对应视 角的第三深度图中相应位置像素的映射深度值。参照图6,可以分别获得第T-N帧至第T+N帧的窗口内各视频帧中视角为M的深度图(包括所述待处理深度图和各第二深度图)在预设视角范围[M-K,M+K]中对应的第三深度图,并完成同一帧内像素的深度值在预设视角范围内不同深度图之间的映射,即完成所述预设窗口内各视频帧中视角为M的深度图(包括所述待处理深度图和各第二深度图)中位置相应的像素的深度值在帧内所述预设视角范围[M-K,M+K]内其他对应视角的第三深度图中相应位置像素的深度值映射。
S82,将所述各对应视角的第三深度图中相应位置像素的映射深度值分别与所述各对应视角的第三深度图中相应位置像素的实际深度值进行匹配,基于各对应视角的第三深度图对应的深度值的匹配度的分布区间,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。
其中,若各对应视角的第三深度图对应的深度值的匹配度越高,则说明相应的深度图的差异越小,则对应的所述待处理深度图中或第二深度图中像素的深度值的可靠度越高,相应地,置信度值越高。
对于如何基于各对应视角的第三深度图对应的深度值的匹配度的分布区间,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,可以有多种实施方式。
在具体实施中,可以基于各对应视角的第三深度图对应的深度值的匹配度以及满足相应匹配度的第三深度图的数量,综合确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。例如,可以设置当匹配度大于预设的第二匹配度阈值(作为具体示例,所述第二匹配度阈值取值为80%,或者取值为70%)的数量大于预设第二数目阈值,设置对应的置信度值为1,否则设置对应的置信度值为0。类似地,在具体实施中,也可以梯度设置深度值的匹配度阈值、满足相应匹配度阈值的第三深度图的数量阈值以及置信度值的对应关系。
由上可知,所述待处理深度图中和各第二深度图中位置相应的像素的置信度值可以二值化取值,即取0或1,也可以设置为[0,1]内的任意值或者设置的离散值。
在本说明书一实施例中,各视频帧中与所述待处理深度图或第二深度图对应的第三深度图分别为所述第二深度图的视角两侧30度范围内的第三深度图,在第二深度图的视角两侧30度范围内,分别存在三个视角的第三深度图。考虑到视角的遮挡,例如若满足所述预设第二匹配度阈值的第三深度图的数量大于或等于2,则确定所述视频帧中的待处理深度图中或第二深度图中位置相应的像素的置信度值为1;若满足所述预设第二匹 配度阈值的第三深度图的数量为0,则确定所述视频帧中的待处理深度图中或第二深度图中位置相应的像素的置信度值为1;若满足所述预设第二匹配度阈值的第三深度图的数量大于或等于2,则确定所述视频帧中的待处理深度图中或第二深度图中位置相应的像素的置信度值为0.5。
参照图9所示的获取所述待处理深度图中和各第二深度图中位置相应的像素的置信度值的方法流程图,可以先将获取得到的第三深度图中位置相应像素的深度值反映射到所述第二深度图中相应位置,再将在所述第二深度图对应视角的映射像素位置分别与第二深度图中相应位置的实际像素位置的距离比较,根据二者差值确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,具体可以采用如下步骤:
S91,分别获取所述待处理深度图中和各第二深度图中位置相应的像素的深度值,根据所述深度值,分别映射到对应视角的第三深度图中相应像素位置,获取并根据对应视角的第三深度图中相应像素位置的深度值,反映射到所述待处理深度图中和各第二深度图中相应的像素位置,得到各对应视角的第三深度图在所述待处理深度图中和各第二深度图中对应的映射像素位置。
如前实施例所述,多角度自由视角视频帧的图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图,对于任一视频帧,均包括多个视角的深度图。可以分别获得第T-N帧至第T+N帧的窗口内各视频帧中视角为M的深度图(包括所述待处理深度图和各第二深度图)中位置相应的像素的深度值,根据所述深度值,分别映射到在预设视角范围[M-K,M+K]内对应的第三深度图中相应像素位置,获取所述各视频帧中在预设视角范围[M-K,M+K]内各对应视角的第三深度图中与所述待处理深度图中和各第二深度图中位置相应的像素的深度值。之后,可以按照预设的空间位置关系,根据获取得到的所述第T-N帧至第T+N帧的窗口内各视频帧中第三深度图中位置相应像素的深度值,反映射到所述待处理深度图中和各第二深度图中相应的像素位置,可以得到各对应视角的第三深度图在所述待处理深度图中和各第二深度图对应的映射像素位置。
参照图6,可以分别获得第T-N帧至第T+N帧的窗口内各视频帧中视角为M的待处理深度图中和各第二深度图中位置相应的像素的深度值,根据所述深度值,分别映射到预设视角范围[M-K,M+K]内对应的第三深度图中的相应像素位置,获取并根据同一帧内在预设视角范围内不同视角的第三深度图中相应像素位置的深度值,之后,可以按照预设的空间位置关系,根据获取得到各视频帧内对应视角的第三深度图中相应像素位置的深度值,反映射到在同一视频帧内对应的待处理深度图中或第二深度图中相应的像素 位置,得到各对应视角的第三深度图在所述待处理深度图中和各第二深度图中对应的映射像素位置。例如,根据第T-N帧内视角范围为[M-K,M+K]中第三深度图相应像素位置的深度值,分别映射到第T-N帧内视角为M的深度图中,可以得到各第三深度图(如第T-N帧内视角分别为M-2、M-1、M+1、M+2的多个第三深度图)在第T-N帧内视角为M的深度图中对应的映射像素位置。
S92,分别计算所述待处理深度图中和各第二深度图中相应位置的像素的实际像素位置与对应视角的第三深度图反映射得到的映射像素位置的像素距离,基于计算得到的各像素距离的分布区间,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。
其中,若像素距离越小,则对应的待处理深度图中或第二深度图中相应位置的像素的深度值的可靠度越高,相应地,置信度值越高。
对于如何基于各对应视角的第三深度图对应的深度值的匹配度的分布区间,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,可以有多种实施方式。
在具体实施中,可以基于各对应视角的第三深度图对应的像素距离大小以及满足相应距离阈值区间的第三深度图的数量,综合确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。例如,可以设置当距离小于预设的距离阈值d 0的数量大于预设第三数目阈值,设置对应的置信度值为1,否则设置对应的置信度值为0。类似地,在具体实施中,也可以梯度设置的距离阈值、满足相应距离阈值的第三深度图的数量阈值以及置信度值的对应关系。
由上可知,所述待处理深度图中和各第二深度图中相应的像素的置信度值可以二值化取值,即取0或1,也可以设置为[0,1]内的任意值或者设置的离散值。
在本说明书一实施例中,各视频帧中与所述待处理深度图中和各第二深度图对应的第三深度图分别为所述待处理深度图和各第二深度图的视角两侧30度范围内的第三深度图,在所述待处理深度图和各第二深度图的视角两侧30度范围内,分别存在三个视角的第三深度图。考虑到视角的遮挡,例如若像素距离小于所述预设第一距离阈值的第三深度图的数量大于或等于2,则确定所述视频帧的待处理深度图中或第二深度图中位置相应的像素的置信度值为1;若像素距离小于所述预设第一距离阈值的第三深度图的数量为0,则确定所述视频帧的待处理深度图中或第二深度图中与所述第一像素位置相应的像素的置信度值为1;若像素距离满足所述预设第一距离阈值的第三深度图的数量大于或等于2,则确定所述视频帧的待处理深度图中或第二深度图中位置相应的像素的置 信度值为0.5。
以上给出了分别基于纹理图的匹配差异确定像素的置信度以及基于深度图的一致性确定像素的置信度的具体实现示例。在具体实施中,也可以将二者结合共同确定像素的置信度。以下给出一些实施例中的具体结合方式,可以理解的是,以下示例并不用于限定本发明的保护范围。
结合方式一:取基于纹理图的匹配差异确定的像素的置信度和基于深度图的一致性确定像素的置信度的乘积作为所述第二深度图中位置相应的像素的置信度,可以用公式表示如下:
Weight_c=Weight_c_texture*Weight_c_depth;
其中,Weight_c表示所述待处理深度图中和第二深度图中位置相应的像素的置信度,Weight_c_texture表示基于纹理图的匹配差异确定的像素的置信度,Weight_c_depth表示基于深度图的一致性确定像素的置信度。
结合方式二:取基于纹理图的匹配差异确定的像素的置信度和基于深度图的一致性确定像素的置信度的加权和作为所述待处理深度图和各第二深度图中位置相应的像素的置信度,可以用公式表示如下:
Weight_c=a*Weight_c_texture+b*Weight_c_depth;
其中,Weight_c表示所述待处理深度图中和各第二深度图中位置相应的像素的置信度,Weight_c_texture表示基于纹理图的匹配差异确定的像素的置信度,Weight_c_depth表示基于深度图的一致性确定像素的置信度,a为基于纹理图的匹配差异确定的像素的置信度的加权系数,b为基于深度图的一致性确定像素的置信度。
以上示出了方式一确定像素置信度的方式,以下接着给出方式二确定像素置信度的两种实现示例:
示例一:将所述待处理深度图中和各第二深度图中位置相应的像素分别与所述像素所处深度图周围预设区域内的像素的深度值进行匹配,基于所述深度值的匹配度以及匹配度满足预设像素匹配度阈值的像素的数量,分别确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。
参照图10所示的任一第二深度图Px,对于与待处理深度图中任一像素位置相应的像素Pixel(x1’,y1’),作为待确定置信度的像素,可以分别将像素Pixel(x1’,y1’)与所述第二深度图Px中Pixel(x1’,y1’)周围预设区域R内的任一像素的深度值进行匹配,例如,若在预设区域R内的8个像素中有5个像素的匹配度预设大于预设的像素匹配度 阈值60%,则确定所述第二深度图Px中像素Pixel(x1’,y1’)的置信度为0.8。
在具体实施中,所述预设区域可以取圆形、矩形或者不规则形状,具体形状并不做限制,包围所述待确定置信度的像素即可,预设区域的大小可以根据经验进行设置。
示例二:将所述待处理深度图中和各第二深度图中位置相应的像素与所述像素所处深度图中周围预设区域内的像素的深度值的加权平均值进行匹配,基于所述待出深度图中和各第二深度图中位置相应的像素与所述加权平均值的匹配度,分别确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。
继续参照图10,在本说明书一实施例中,先将Pixel(x1’,y1’)周围预设区域R内的像素的深度值进行加权平均,然后再进行相似度匹配,例如加权平均后与Pixel(x1’,y1’)的深度值的匹配度大于50%,可以确定第二深度图Px中像素Pixel(x1’,y1’)的置信度为1。
以上给出了多种可以确定待处理深度图中和各第二深度图中位置相应的像素的置信度的方式。在具体实施中,可以将其中至少两种方式结合使用。通过将所述待处理深度图中和各第二深度图中对应像素的置信度值对应的第一滤波系数权重值加入到窗口滤波系数值,按照预设的滤波方式对所述待处理深度图中位置相应的像素的深度值进行滤波,得到所述待处理深度图中位置相应的像素滤波后的深度值,可以避免引入所述待处理深度图和各第二深度图中不可靠的深度值对滤波结果造成影响,从而可以提高深度图在时域上的稳定性。
采用上述实施例的深度图处理方法,对深度图在时域上进行滤波处理后,可以提高视频重建的图像质量,为使本领域技术人员更好地理解和实施,以下通过一实施例说明如何进行视频重建。
参照图11所示的视频重建方法的流程图,具体可以包括如下步骤:
S111,获取多角度自由视角的视频帧的图像组合、所述视频帧的图像组合对应的参数数据以及基于用户交互的虚拟视点位置信息,其中,所述视频帧的图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图。
S112,对所述深度图在时域上进行滤波处理。
在具体实施中,可以采用本说明书实施例中的深度图处理方法进行滤波处理,具体方法可以参见前述各实施例的描述,此处不再展开阐述。
S113,根据所述虚拟视点位置信息及所述视频帧的图像组合对应的参数数据,按照预设规则选择用户交互时刻所述视频帧的图像组合中相应组的纹理图和滤波后的深度 图。
S114,基于所述虚拟视点位置信息及用户交互时刻所述视频帧的图像组合中相应组的纹理图和深度图对应的参数数据,将选择的用户交互时刻所述视频帧的图像组合中相应组的纹理图和滤波后的深度图进行组合渲染,得到所述用户交互时刻虚拟视点位置对应的重建图像。
采用上述视频重建方法,其中对于视频帧中的深度图在时域上进行滤波,对于时域上预设窗口的视频帧序列中各视频帧中与所述待处理深度图相同视角的深度图,即第二深度图,通过获取所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,以及确定所述置信度值对应的第一滤波系数权重值,并将所述第一滤波系数权重值加入到窗口滤波系数值,可以避免引入所述待处理深度图和各第二深度图中不可靠的深度值对滤波结果造成影响,从而可以提高深度图在时域上的稳定性,进而可以提高重建视频的图像质量。
本说明书实施例还提供了能够实现前述实施例方法的具体装置和系统,以下参照附图,通过具体实施例进行描述。
本说明书实施例提供了一种深度图处理装置,可以对深度图在时域上进行滤波处理。参照图12所示的深度图处理装置的结构示意图,深度图处理装置120可以包括:
深度图获取单元121,适于从多角度自由视角的当前视频帧的图像组合中获取待处理深度图,所述多角度自由视角的当前视频帧的图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图;
帧序列获取单元122,适于获取包含所述当前视频帧的时域上预设窗口的视频帧序列;
窗口滤波系数值获取单元123,适于获取所述视频帧序列中各视频帧相应的窗口滤波系数值,所述窗口滤波系数值由至少两个维度的权重值生成,其中包括:像素置信度对应的第一滤波系数权重值,所述窗口滤波系数值获取单元123包括:第一滤波系数权重值获取子单元1231,适于获取所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,以及确定所述置信度值对应的第一滤波系数权重值,其中:所述第二深度图为所述视频帧序列各视频帧中与所述待处理深度图视角相同的深度图;
滤波单元124,适于基于各视频帧相应的窗口滤波系数值,按照预设的滤波方式对所述待处理深度图中位置相应的像素进行滤波,得到所述待处理深度图中位置相应的像素滤波后的深度值。
在具体实施中,所述第一滤波系数权重值获取子单元1231可以包括以下至少一种置信度值确定子单元:
第一置信度值确定构件12311,适于获取所述待处理深度图和各第二深度图对应视角周围预设视角范围内的深度图,得到对应视角的第三深度图,基于所述各对应视角的第三深度图,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值;
第二置信度值确定构件12312,适于基于所述待处理深度图中和各第二深度图中位置相应的像素与所述像素所处深度图中周围预设区域内像素的空间一致性,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。
在本说明书一实施例中,所述第一置信度值确定构件12311,适于获取所述待处理深度图对应的纹理图和各第二深度图对应的纹理图,根据所述待处理深度图中和各第二深度图中位置相应的像素的深度值,将所述待处理深度图对应的纹理图和各第二深度图对应的纹理图中相应位置的纹理值分别映射到各对应视角的第三深度图对应的纹理图中的相应位置,得到各对应视角的第三深度图对应的映射纹理值;将所述映射纹理值分别与各对应视角的第三深度图对应的纹理图中的相应位置的实际纹理值进行匹配,基于各对应视角的第三深度图对应的纹理值的匹配度的分布区间,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。
在本说明书另一实施例中,所述第一置信度值确定构件12311,将所述待处理深度图中和各第二深度图中位置相应的像素映射到各对应视角的第三深度图上,得到所述各对应视角的第三深度图中相应位置像素的映射深度值;将所述各对应视角的第三深度图中相应位置像素的映射深度值分别与所述各对应视角的第三深度图中相应位置像素的实际深度值进行匹配,基于各对应视角的第三深度图对应的深度值的匹配度的分布区间,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。
在本说明书又一实施例中,所述第一置信度值确定构件12311,适于分别获取所述待处理深度图中和各第二深度图中位置相应的像素的深度值,根据所述深度值,分别映射到对应视角的第三深度图中相应像素位置,获取并根据对应视角的第三深度图中相应像素位置的深度值,反映射到所述待处理深度图中和各第二深度图中相应的像素位置,得到各对应视角的第三深度图在所述待处理深度图中和各第二深度图中对应的映射像素位置;分别计算所述待处理深度图中和各第二深度图中相应位置的像素的实际像素位置与对应视角的第三深度图反映射得到的映射像素位置的像素距离,基于计算得到的各像素距离的分布区间,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信 度值。
在本说明书一实施例中,所述第二置信度值确定构件12312,适于将所述待处理深度图中和各第二深度图中位置相应的像素分别与所述像素所处深度图中周围预设区域内的像素的深度值进行匹配,基于所述深度值的匹配度以及匹配度满足预设像素匹配度阈值的像素的数量,分别确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。
在本说明书另一实施例中,所述第二置信度值确定构件12312,将所述待处理深度图中和各第二深度图中位置相应的像素与所述像素所处深度图中周围预设区域内的像素的深度值的加权平均值进行匹配,基于所述待处理深度图中和各第二深度图中位置相应的像素与对应的加权平均值的匹配度,分别确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。
在具体实施中,所述窗口滤波系数的权重值还可以包括如下至少一种:帧距对应的第二滤波系数权重值,像素相似度对应的第三滤波系数权重值。
相应地,所述窗口滤波系数值获取单元123还可以包括如下至少一种:
第二滤波系数权重值获取子单元1232,适于获取所述视频帧序列中各视频帧与所述当前视频帧的帧距,以及确定所述帧距对应的第二滤波系数权重值;
第三滤波系数权重值获取子单元1233,适于获取各第二深度图对应的纹理图与所述待处理深度图对应的纹理图中位置相应的像素的相似度值,以及确定所述相似度值对应的第三滤波系数权重值。
在本说明书一实施例中,所述滤波单元124,适于将所述第一滤波系数权重值与所述第二滤波系数权重值和所述第三滤波系数权重值至少其中之一的乘积,或者加权平均值作为各视频帧相应的窗口滤波系数值;计算所述待处理深度图中和各第二深度图中位置相应的像素的深度值与各视频帧相应的窗口滤波系数值之积的加权平均值,得到所述待处理深度图中位置相应的滤波后的深度值。
本说明书实施例还提供了一种视频重建系统,采用所述视频重建系统进行视频重建,可以提高重建视频的图像质量。参照图13所示的视频重建系统的结构示意图,视频重建系统130包括:获取模块131、滤波模块132、选择模块133和图像重建模块134,其中:
所述获取模块131,适于获取多角度自由视角的视频帧的图像组合、所述视频帧的图像组合对应的参数数据以及基于用户交互的虚拟视点位置信息,其中,所述视频帧的图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图;
所述滤波模块132,适于对所述视频帧中的深度图进行滤波;
所述选择模块133,适于根据所述虚拟视点位置信息及所述视频帧的图像组合对应的参数数据,按照预设规则选择用户交互时刻所述视频帧的图像组合中相应组的纹理图和滤波后的深度图;
所述图像重建模块134,适于基于所述虚拟视点位置信息及用户交互时刻所述视频帧的图像组合中相应组的纹理图和深度图对应的参数数据,将选择的用户交互时刻所述视频帧的图像组合中相应组的纹理图和滤波后的深度图进行组合渲染,得到所述用户交互时刻虚拟视点位置对应的重建图像;
其中,所述滤波模块132可以包括:
深度图获取单元1321,适于从多角度自由视角的当前视频帧的图像组合中获取待处理深度图,所述多角度自由视角的当前视频帧的图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图;
帧序列获取单元1322,适于获取包含所述当前视频帧的时域上预设窗口的视频帧序列;
窗口滤波系数值获取单元1323,适于获取所述视频帧序列中各视频帧相应的窗口滤波系数值,所述窗口滤波系数值由至少两个维度的权重值生成,其中包括:像素置信度对应的第一滤波系数权重值,所述窗口滤波系数值获取单元包括:第一滤波系数权重值获取子单元13231,适于获取所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,以及确定所述置信度值对应的第一滤波系数权重值,其中:所述第二深度图为所述视频帧序列各视频帧中与所述待处理深度图视角相同的深度图;
滤波单元1324,适于基于各视频帧相应的窗口滤波系数值,按照预设的滤波方式对所述待处理深度图中位置相应的像素进行滤波,得到所述待处理深度图中位置相应的像素滤波后的深度值。
在具体实施中,所述窗口滤波系数值获取单元1323还包括如下至少一种:
第二滤波系数权重值获取子单元13232,适于获取所述视频帧序列中各视频帧与所述当前视频帧的帧距,以及确定所述帧距对应的第二滤波系数权重值;
第三滤波系数权重值获取子单元13233,适于获取各第二深度图对应的纹理图与所述待处理深度图对应的纹理图中位置相应的像素的相似度值,以及确定所述相似度值对应的第三滤波系数权重值。
所述滤波模块132的具体实现可以参见图12,可以采用图12所示的深度图处理装 置作为所述滤波模块132进行时域滤波,具体可以参见前述实施例中的深度图处理装置和深度图处理方法进行实施。需要说明的是,所述深度图处理装置具体可以通过相应的软件、硬件或者软硬件结合的方式实现。其中,各滤波系数权重值的计算可以由一个或多个CPU或者GPU,或者CPU与GPU协同实施,CPU可以与一个或者多个GPU芯片或者GPU模组进行通信,控制各GPU芯片或者GPU模组进行深度图的滤波处理。
本说明书实施例还提供了一种电子设备,参照图14所示的电子设备的结构示意图,电子设备140可以包括存储器141和处理器142,所述存储器141上存储有可在所述处理器142上运行的计算机指令,所述处理器142运行所述计算机指令时可以执行前述任一实施例所述的深度图处理方法或前述任一实施例所述视频重建方法的步骤。具体步骤可以参见前述实施例的介绍,此处不再赘述。
需要说明的是,所述处理器142具体可以包括一个或者多个CPU核心形成的CPU芯片1421,或者可以包括GPU芯片1422,或者是由所述CPU芯片1421和GPU芯片1422组成的芯片模组。处理器142和存储器141之间可以通过总线等进行通信,各芯片之间也可以通过相应的通信接口进行通信。
本说明书实施例还提供了一种计算机可读存储介质,其上存储有计算机指令,所述计算机指令运行时可以执行前述任一实施例所述的深度图处理方法或前述任一实施例所述视频重建方法的步骤。具体步骤可以参见前述实施例的介绍,此处不再赘述。
为使本领域技术人员更好地理解和实施,以下以图1所示的具体应用场景的具体应用进行示例说明。
云端的服务器集群13可以先采用本说明书实施例方案对深度图进行时域滤波,之后再基于视频帧的图像组合中相应组的纹理图和滤波后的深度图进行图像重建,得到重建的多角度自由视角图像。
在具体实施中,所述云端的服务器集群13可以包括:第一云端服务器131,第二云端服务器132,第三云端服务器133,第四云端服务器134。其中,第一云端服务器131可以用于确定所述图像组合相应的参数数据;第二云端服务器132可以用于确定所述图像组合中各帧图像的深度数据;第三云端服务器133可以基于所述图像组合相应的参数数据、所述图像组合的像素数据和深度数据,使用基于深度图的虚拟视点重建(Depth Image Based Rendering,DIBR)算法,对预设的虚拟视点路径进行帧图像重建;所述第四云端服务器134可以用于生成多角度自由视角视频,其中,所述多角度自由视角视频数据可以包括:按照帧时刻排序的帧图像的多角度自由视角空间数据和多角度自由视角 时间数据。
可以理解的是,所述第一云端服务器131、第二云端服务器132、第三云端服务器133、第四云端服务器134也可以为服务器阵列或服务器子集群组成的服务器组,本说明书实施例不做限制。
作为一具体示例,可以由第二云端服务器132从多角度自由视角的当前视频帧的图像组合中获取深度图,作为待处理深度图,所述第二云端服务器132可以对采用本说明书前述实施例方案对所述待处理深度图进行时域滤波,可以提高深度图在时域上的稳定性,之后,采用时域滤波后的深度图进行视频重建,不论是在播放终端15还是在交互终端16进行播放,均可以提高视频重建的图像质量。
在本说明书实施例中,采集设备还可以设置在篮球场馆的顶棚区域、篮球架上等。各采集设备可以沿直线、扇形、弧线、圆形、矩阵或者不规则形状排列分布。具体排列方式可以根据具体的现场环境、采集设备数量、采集设备的特点、成像效果需求等一种或多种因素进行设置。所述采集设备可以是任何具有摄像功能的设备,例如,普通的摄像机、手机、专业摄像机等。
在本说明书一些实施例中,如图1所示,所述采集阵列11中各采集设备可以通过交换机17或局域网等将获得的视频数据流实时传输至所述数据处理设备12。
可以理解的是,所述数据处理设备12可以根据具体情景置于现场非采集区域或云端,所述服务器(集群)和播放控制设备可以根据具体情景置于现场非采集区域,云端或者终端接入侧,本实施例并不用于限制本发明的具体实现和保护范围。本说明书实施例中各装置、系统、设备或系统的具体实现方式、工作原理和具体作用及效果,可以参见对应方法实施例中的具体介绍。
可以理解的是,以上实施例方案适用于直播或准直播场景,但并不仅限于此,本说明书实施例中的方案中对于视频或图像采集、视频数据流的数据处理以及服务器的图像生成等方案也可以适用于非直播场景的播放需求,如录播、转播以及其他有低时延需求的场景。
虽然本说明书实施例披露如上,但本发明并非限定于此。任何本领域技术人员,在不脱离本说明书实施例的精神和范围内,均可作各种更动与修改,因此本发明的保护范围应当以权利要求所限定的范围为准。

Claims (17)

  1. 一种深度图处理方法,其特征在于,包括:
    从多角度自由视角的当前视频帧的图像组合中获取待处理深度图,所述多角度自由视角的当前视频帧的图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图;
    获取包含所述当前视频帧的时域上预设窗口的视频帧序列;
    获取所述视频帧序列中各视频帧相应的窗口滤波系数值,所述窗口滤波系数值由至少两个维度的权重值生成,其中包括:像素置信度对应的第一滤波系数权重值,采用如下方式获取所述第一滤波系数权重值:获取所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,以及确定所述置信度值对应的第一滤波系数权重值,其中:所述第二深度图为所述视频帧序列各视频帧中与所述待处理深度图视角相同的深度图;
    基于所述各视频帧相应的窗口滤波系数值,按照预设的滤波方式对所述待处理深度图中位置相应的像素进行滤波,得到所述待处理深度图中位置相应的像素滤波后的深度值。
  2. 根据权利要求1所述的深度图处理方法,其特征在于,所述窗口滤波系数的权重值还包括:帧距对应的第二滤波系数权重值和像素相似度对应的第三滤波系数权重值其中至少一种;采用如下方式获取所述第二滤波系数权重值和第三滤波系数权重值:
    获取所述视频帧序列中各视频帧与所述当前视频帧的帧距,以及确定所述帧距对应的第二滤波系数权重值;
    获取各第二深度图对应的纹理图与所述待处理深度图对应的纹理图中位置相应的像素的相似度值,以及确定所述相似度值对应的第三滤波系数权重值。
  3. 根据权利要求1或2所述的深度图处理方法,其特征在于,所述获取所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,包括以下至少一种:
    获取所述待处理深度图和各第二深度图对应视角周围预设视角范围内的深度图,得到对应视角的第三深度图,基于所述各对应视角的第三深度图,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值;
    基于所述待处理深度图中和各第二深度图中位置相应的像素与所述像素所处深度图中周围预设区域内像素的空间一致性,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。
  4. 根据权利要求3所述的深度图处理方法,其特征在于,所述基于所述各对应视 角的第三深度图,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,包括:
    获取所述待处理深度图对应的纹理图和各第二深度图对应的纹理图,根据所述待处理深度图中和各第二深度图中位置相应的像素的深度值,将所述待处理深度图对应的纹理图中和各第二深度图对应的纹理图中相应位置的纹理值分别映射到各对应视角的第三深度图对应的纹理图中的相应位置,得到各对应视角的第三深度图对应的映射纹理值;
    将所述映射纹理值分别与各对应视角的第三深度图对应的纹理图中的相应位置的实际纹理值进行匹配,基于各对应视角的第三深度图对应的纹理值的匹配度的分布区间,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。
  5. 根据权利要求3所述的深度图处理方法,其特征在于,所述基于所述各对应视角的第三深度图,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,包括:
    将所述待处理深度图中和各第二深度图中位置相应的像素映射到各对应视角的第三深度图上,得到所述各对应视角的第三深度图中相应位置像素的映射深度值;
    将所述各对应视角的第三深度图中相应位置像素的映射深度值分别与所述各对应视角的第三深度图中相应位置像素的实际深度值进行匹配,基于各对应视角的第三深度图对应的深度值的匹配度的分布区间,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。
  6. 根据权利要求3所述的深度图处理方法,其特征在于,所述基于所述各对应视角的第三深度图,确定所述待处理深度图和各第二深度图中位置相应的像素的置信度值,包括:
    分别获取所述待处理深度图中和各第二深度图中位置相应的像素的深度值,根据所述深度值,分别映射到对应视角的第三深度图中相应像素位置,获取并根据对应视角的第三深度图中相应像素位置的深度值,反映射到所述待处理深度图中和各第二深度图中相应的像素位置,得到各对应视角的第三深度图在所述待处理深度图中和各第二深度图中对应的映射像素位置;
    分别计算所述待处理深度图中和各第二深度图中相应位置的像素的实际像素位置与对应视角的第三深度图反映射得到的映射像素位置的像素距离,基于计算得到的各像素距离的分布区间,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信 度值。
  7. 根据权利要求3所述的深度图处理方法,其特征在于,所述基于所述待处理深度图和各第二深度图中位置相应的像素与所述像素所处深度图中周围预设区域内像素的空间一致性,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,包括以下至少一种:
    将所述待处理深度图中和各第二深度图中位置相应的像素分别与所述像素所处深度图中周围预设区域内的像素的深度值进行匹配,基于所述深度值的匹配度以及匹配度满足预设像素匹配度阈值的像素的数量,分别确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值;
    将所述待处理深度图中和各第二深度图中位置相应的像素与所述像素所处深度图中周围预设区域内的像素的深度值的加权平均值进行匹配,基于所述待处理深度图中和各第二深度图中位置相应的像素与对应的加权平均值的匹配度,分别确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值。
  8. 根据权利要求2所述的深度图处理方法,其特征在于,所述基于各视频帧相应的窗口滤波系数值,按照预设的滤波方式对所述待处理深度图中位置相应的像素的深度值进行滤波,得到所述待处理深度图位置相应的像素滤波后的深度值,包括:
    将所述第一滤波系数权重值与所述第二滤波系数权重值和所述第三滤波系数权重值至少其中之一的乘积,或者加权平均值作为各视频帧相应的窗口滤波系数值;
    计算所述待处理深度图中和各第二深度图中位置相应的像素的深度值与各视频帧相应的窗口滤波系数值之积的加权平均值,得到所述待处理深度图中位置相应的像素滤波后的深度值。
  9. 根据权利要求1或2所述的深度图处理方法,其特征在于,所述当前视频帧位于所述视频帧序列的中间位置。
  10. 一种深度图处理方法,其特征在于,包括:
    从多角度自由视角的当前视频帧的图像组合中获取待处理深度图,所述多角度自由视角的当前视频帧的图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图;
    获取包含所述当前视频帧的时域上预设窗口的视频帧序列;
    获取所述视频帧序列中各视频帧相应的窗口滤波系数值,所述窗口滤波系数值由至少两个维度的权重值生成,其中包括:像素置信度对应的第一滤波系数权重值,采用如 下方式获取所述第一滤波系数权重值:获取所述待处理深度图和各第二深度图对应视角周围预设视角范围内的深度图,得到对应视角的第三深度图,基于所述各对应视角的第三深度图,确定所述待处理深度图中和各第二深度图中位置相应的像素的置信度值;以及确定所述置信度值对应的第一滤波系数权重值;
    基于各视频帧相应的窗口滤波系数值,按照预设的滤波方式对所述待处理深度图中位置相应的像素进行滤波,得到所述待处理深度图中位置相应的像素滤波后的深度值。
  11. 一种视频重建方法,其特征在于,包括:
    获取多角度自由视角的视频帧的图像组合、所述视频帧的图像组合对应的参数数据以及基于用户交互的虚拟视点位置信息,其中,所述视频帧的图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图;
    采用权利要求1-9任一项所述深度图处理方法得到滤波后的深度图;
    根据所述虚拟视点位置信息及所述视频帧的图像组合对应的参数数据,按照预设规则选择用户交互时刻所述视频帧的图像组合中相应组的纹理图和滤波后的深度图;
    基于所述虚拟视点位置信息及用户交互时刻所述视频帧的图像组合中相应组的纹理图和深度图对应的参数数据,将选择的用户交互时刻所述视频帧的图像组合中相应组的纹理图和滤波后的深度图进行组合渲染,得到所述用户交互时刻虚拟视点位置对应的重建图像。
  12. 一种深度图处理装置,其特征在于,包括:
    深度图获取单元,适于从多角度自由视角的当前视频帧的图像组合中获取待处理深度图,所述多角度自由视角的当前视频帧的图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图;
    帧序列获取单元,适于获取包含所述当前视频帧的时域上预设窗口的视频帧序列;
    窗口滤波系数值获取单元,适于获取所述视频帧序列中各视频帧相应的窗口滤波系数值,所述窗口滤波系数值由至少两个维度的权重值生成,其中包括:像素置信度对应的第一滤波系数权重值,所述窗口滤波系数值获取单元包括:第一滤波系数权重值获取子单元,适于获取所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,以及确定所述置信度值对应的第一滤波系数权重值,其中:所述第二深度图为所述视频帧序列各视频帧中与所述待处理深度图视角相同的深度图;
    滤波单元,适于基于各视频帧相应的窗口滤波系数值,按照预设的滤波方式对所述待处理深度图中位置相应的像素进行滤波,得到所述待处理深度图中位置相应的像素滤 波后的深度值。
  13. 根据权利要求12所述的深度图处理装置,其特征在于,所述窗口滤波系数值获取单元还包括如下至少一种:
    第二滤波系数权重值获取子单元,适于获取所述视频帧序列中各视频帧与所述当前视频帧的帧距,以及确定所述帧距对应的第二滤波系数权重值;
    第三滤波系数权重值获取子单元,适于获取各第二深度图对应的纹理图与所述待处理深度图对应的纹理图中位置相应的像素的相似度值,以及确定所述相似度值对应的第三滤波系数权重值。
  14. 一种视频重建系统,其特征在于,包括:
    获取模块,适于获取多角度自由视角的视频帧的图像组合、所述视频帧的图像组合对应的参数数据以及基于用户交互的虚拟视点位置信息,其中,所述视频帧的图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图;
    滤波模块,适于对所述视频帧中的深度图进行滤波;
    选择模块,适于根据所述虚拟视点位置信息及所述视频帧的图像组合对应的参数数据,按照预设规则选择用户交互时刻所述视频帧的图像组合中相应组的纹理图和滤波后的深度图;
    图像重建模块,适于基于所述虚拟视点位置信息及用户交互时刻所述视频帧的图像组合中相应组的纹理图和深度图对应的参数数据,将选择的用户交互时刻所述视频帧的图像组合中相应组的纹理图和滤波后的深度图进行组合渲染,得到所述用户交互时刻虚拟视点位置对应的重建图像;
    其中,所述滤波模块包括:
    深度图获取单元,适于从多角度自由视角的当前视频帧的图像组合中获取待处理深度图,所述多角度自由视角的当前视频帧的图像组合包括多个角度同步的多组存在对应关系的纹理图和深度图;
    帧序列获取单元,适于获取包含所述当前视频帧的时域上预设窗口的视频帧序列;
    窗口滤波系数值获取单元,适于获取所述视频帧序列中各视频帧相应的窗口滤波系数值,所述窗口滤波系数值由至少两个维度的权重值生成,其中包括:像素置信度对应的第一滤波系数权重值,所述窗口滤波系数值获取单元包括:第一滤波系数权重值获取子单元,适于获取所述待处理深度图中和各第二深度图中位置相应的像素的置信度值,以及确定所述置信度值对应的第一滤波系数权重值,其中:所述第二深度图为所述视频 帧序列各视频帧中与所述待处理深度图视角相同的深度图;
    滤波单元,适于基于各视频帧相应的窗口滤波系数值,按照预设的滤波方式对所述待处理深度图中位置相应的像素进行滤波,得到所述待处理深度图中位置相应的像素滤波后的深度值。
  15. 根据权利要求14所述的视频重建系统,其特征在于,所述窗口滤波系数值获取单元还包括如下至少一种:
    第二滤波系数权重值获取子单元,适于获取所述视频帧序列中各视频帧与所述当前视频帧的帧距,以及确定所述帧距对应的第二滤波系数权重值;
    第三滤波系数权重值获取子单元,适于获取各第二深度图对应的纹理图与所述待处理深度图对应的纹理图中位置相应的像素的相似度值,以及确定所述相似度值对应的第三滤波系数权重值。
  16. 一种电子设备,包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的计算机指令,其特征在于,所述处理器运行所述计算机指令时执行权利要求1至9任一项所述方法或权利要求10或11所述方法的步骤。
  17. 一种计算机可读存储介质,其上存储有计算机指令,其特征在于,所述计算机指令运行时执行权利要求1至9任一项所述方法或权利要求10或11所述方法的步骤。
PCT/CN2021/088024 2020-04-20 2021-04-19 深度图处理方法、视频重建方法及相关装置 WO2021213301A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010312853.4 2020-04-20
CN202010312853.4A CN113542721B (zh) 2020-04-20 2020-04-20 深度图处理方法、视频重建方法及相关装置

Publications (1)

Publication Number Publication Date
WO2021213301A1 true WO2021213301A1 (zh) 2021-10-28

Family

ID=78123605

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/088024 WO2021213301A1 (zh) 2020-04-20 2021-04-19 深度图处理方法、视频重建方法及相关装置

Country Status (2)

Country Link
CN (1) CN113542721B (zh)
WO (1) WO2021213301A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117599519A (zh) * 2024-01-24 2024-02-27 山东泽林农业科技有限公司 一种用于数字化反冲洗一体机的智能控制方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104662589A (zh) * 2012-08-21 2015-05-27 派力肯影像公司 用于使用阵列照相机捕捉的图像中的视差检测和校正的系统和方法
KR20160024419A (ko) * 2014-08-25 2016-03-07 국방과학연구소 Dibr 방식의 입체영상 카메라 판별 방법 및 장치
CN107077742A (zh) * 2015-04-28 2017-08-18 华为技术有限公司 一种图像处理装置和方法
CN110390690A (zh) * 2019-07-11 2019-10-29 Oppo广东移动通信有限公司 深度图处理方法和装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102800129B (zh) * 2012-06-20 2015-09-30 浙江大学 一种基于单幅图像的头发建模和肖像编辑方法
JP6450226B2 (ja) * 2015-03-13 2019-01-09 日本放送協会 カメラ制御装置及びそのプログラム

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104662589A (zh) * 2012-08-21 2015-05-27 派力肯影像公司 用于使用阵列照相机捕捉的图像中的视差检测和校正的系统和方法
KR20160024419A (ko) * 2014-08-25 2016-03-07 국방과학연구소 Dibr 방식의 입체영상 카메라 판별 방법 및 장치
CN107077742A (zh) * 2015-04-28 2017-08-18 华为技术有限公司 一种图像处理装置和方法
CN110390690A (zh) * 2019-07-11 2019-10-29 Oppo广东移动通信有限公司 深度图处理方法和装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117599519A (zh) * 2024-01-24 2024-02-27 山东泽林农业科技有限公司 一种用于数字化反冲洗一体机的智能控制方法
CN117599519B (zh) * 2024-01-24 2024-04-12 山东泽林农业科技有限公司 一种用于数字化反冲洗一体机的智能控制方法

Also Published As

Publication number Publication date
CN113542721B (zh) 2023-04-25
CN113542721A (zh) 2021-10-22

Similar Documents

Publication Publication Date Title
US11217006B2 (en) Methods and systems for performing 3D simulation based on a 2D video image
JP7472362B2 (ja) 受信方法、端末及びプログラム
WO2021083176A1 (zh) 数据交互方法及系统、交互终端、可读存储介质
US11653065B2 (en) Content based stream splitting of video data
WO2018059034A1 (zh) 一种全景视频播放方法及装置
CN110869980B (zh) 将内容分发和呈现为球形视频和3d资产组合
WO2021083178A1 (zh) 数据处理方法及系统、服务器和存储介质
Lee et al. High‐resolution 360 video foveated stitching for real‐time VR
JP2018180655A (ja) 画像処理装置、画像生成方法及びプログラム
WO2021120157A1 (en) Light weight multi-branch and multi-scale person re-identification
US20120120201A1 (en) Method of integrating ad hoc camera networks in interactive mesh systems
Aladagli et al. Predicting head trajectories in 360 virtual reality videos
CN114097248B (zh) 一种视频流处理方法、装置、设备及介质
CN112581627A (zh) 用于体积视频的用户控制的虚拟摄像机的系统和装置
WO2021213301A1 (zh) 深度图处理方法、视频重建方法及相关装置
JP2018033107A (ja) 動画の配信装置及び配信方法
JP7423974B2 (ja) 情報処理システム、情報処理方法及びプログラム
US20230353717A1 (en) Image processing system, image processing method, and storage medium
Zheng et al. Research on panoramic stereo live streaming based on the virtual reality
US20220353484A1 (en) Information processing apparatus, information processing method, and program
WO2021083175A1 (zh) 数据处理方法、设备、系统、可读存储介质及服务器
CN112738009B (zh) 数据同步方法、设备、同步系统、介质和服务器
KR102242710B1 (ko) 반자유 시점 영상을 제공하는 장치
Ozcinar et al. Delivery of omnidirectional video using saliency prediction and optimal bitrate allocation
JP2017103672A (ja) 画像処理装置、画像処理方法及び画像処理プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21793243

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21793243

Country of ref document: EP

Kind code of ref document: A1