WO2022120809A1 - 虚拟视点绘制、渲染、解码方法及装置、设备、存储介质 - Google Patents

虚拟视点绘制、渲染、解码方法及装置、设备、存储介质 Download PDF

Info

Publication number
WO2022120809A1
WO2022120809A1 PCT/CN2020/135779 CN2020135779W WO2022120809A1 WO 2022120809 A1 WO2022120809 A1 WO 2022120809A1 CN 2020135779 W CN2020135779 W CN 2020135779W WO 2022120809 A1 WO2022120809 A1 WO 2022120809A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
pixels
viewpoint
visible image
pixel
Prior art date
Application number
PCT/CN2020/135779
Other languages
English (en)
French (fr)
Inventor
杨铀
苏永全
刘琼
吴科君
陈泽
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to CN202080107720.1A priority Critical patent/CN116601958A/zh
Priority to PCT/CN2020/135779 priority patent/WO2022120809A1/zh
Publication of WO2022120809A1 publication Critical patent/WO2022120809A1/zh
Priority to US18/207,982 priority patent/US20230316464A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/04Texture mapping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20192Edge enhancement; Edge preservation

Definitions

  • the embodiments of the present application relate to computer vision technologies, and relate to, but are not limited to, virtual viewpoint rendering, rendering, and decoding methods and apparatuses, devices, and storage media.
  • immersive video content such as virtual reality content, three-dimensional content, 180-degree content, or 360-degree content
  • immersive formats such as game videos or animations.
  • the compression distortion is very serious; in this way, at the decoding end , the quality of the depth map recovered by decoding will be greatly degraded, resulting in obvious noise in the depth map of the generated target viewport (Target Viewport), and the edge of the depth map does not exactly match the edge of the actual texture.
  • target viewport Target Viewport
  • One manifestation reflected on the texture map is that there is a transition area at the junction of the foreground and background, and the foreground edge is not steep enough.
  • the virtual viewpoint rendering, rendering, decoding method and device, device, and storage medium provided by the embodiments of the present application can significantly reduce the noise and transition area of the target texture map of the target viewpoint; the virtual viewpoint provided by the embodiments of the present application
  • the drawing, rendering, decoding method and apparatus, device, and storage medium are implemented as follows:
  • a virtual viewpoint rendering method provided by an embodiment of the present application includes: generating an initial visibility map (Visibility Map) of a target viewpoint according to a reconstructed depth map (Reconstructed depth maps) of an input viewpoint (Source views); The image quality is improved to obtain the target visible image of the target viewpoint; the target visible image of the target viewpoint is shaded to obtain the target texture map of the target viewpoint.
  • a rendering (Rendering) method provided by an embodiment of the present application, the method includes: performing Pruned View Reconstruction on an atlas of a depth map of an input viewpoint, to obtain a reconstructed depth map of the input viewpoint; Performing the steps in the virtual viewpoint rendering method described in the embodiment of the present application on the reconstructed depth map of the input viewpoint to obtain a target texture map of the target viewpoint; generating the target according to the target texture map of the target viewpoint The target view of the viewpoint (Viewport).
  • a decoding method provided by an embodiment of the present application includes: decoding an input code stream to obtain an atlas of a depth map of an input viewpoint; clipping the atlas of the depth map of the input viewpoint Cut view recovery to obtain a reconstructed depth map of the input viewpoint; perform the steps in the virtual viewpoint rendering method described in the embodiments of the present application on the reconstructed depth map of the input viewpoint to obtain a target texture map of the target viewpoint ; According to the target texture map of the target viewpoint, the target view of the target viewpoint is generated.
  • a virtual viewpoint rendering apparatus includes: a visible image generation module, configured to generate an initial visible image of a target viewpoint according to the reconstructed depth map of the input viewpoint; The initial visible image is processed for quality improvement to obtain the target visible image of the target viewpoint; the coloring module is used for coloring the target visible image of the target viewpoint to obtain the target texture image of the target viewpoint.
  • a rendering apparatus includes: a clipping view restoration module, configured to perform clipping view recovery on an atlas of a depth map of an input viewpoint to obtain a reconstructed depth map of the input viewpoint; virtual viewpoint rendering a module for performing the steps in the virtual viewpoint rendering method described in the embodiments of the present application on the reconstructed depth map of the input viewpoint to obtain a target texture map of the target viewpoint; a target view generation module for generating a target view according to the The target texture map of the target viewpoint, and the target view of the target viewpoint is generated.
  • a decoding apparatus includes: a decoding module for decoding an input code stream to obtain an atlas of a depth map of an input view; a clipping view restoration module for decoding the input view's depth map The atlas of the depth map performs clipping view recovery to obtain the reconstructed depth map of the input viewpoint; a virtual viewpoint drawing module is configured to perform the virtual viewpoint rendering described in the embodiments of the present application on the reconstructed depth map of the input viewpoint In the steps of the method, a target texture map of the target viewpoint is obtained; a target view generation module is configured to generate a target view of the target viewpoint according to the target texture map of the target viewpoint.
  • a viewpoint weighting synthesizer (View Weighting Synthesizer, VWS) provided by the embodiment of the present application is used to implement the virtual viewpoint rendering method described in the embodiment of the present application.
  • a rendering device provided by an embodiment of the present application is used to implement the rendering method described in the embodiment of the present application.
  • a decoder provided by an embodiment of the present application is used to implement the decoding method described in the embodiment of the present application.
  • An electronic device provided by an embodiment of the present application includes a memory and a processor, where the memory stores a computer program that can be run on the processor, and when the processor executes the program, any a method.
  • a computer-readable storage medium provided by an embodiment of the present application stores a computer program thereon, and when the computer program is executed by a processor, any method described in the embodiments of the present application is implemented.
  • the initial visible image of the target viewpoint is generated according to the reconstructed depth map of the input viewpoint; at this time, instead of directly generating the target texture map of the target viewpoint from the initial visible image of the target viewpoint, the initial visible image
  • the quality of the image is improved, and the target visible image obtained by the processing is colored to obtain the target texture image; in this way, on the one hand, the noise and/or transition area in the target texture image is significantly reduced;
  • the encoding end can use larger quantization parameters to compress and encode the depth map, thereby reducing the coding overhead of the depth map, thereby improving the overall coding efficiency.
  • FIG. 1 is a schematic diagram of a system architecture to which an embodiment of the present application may be applied;
  • Fig. 2 is the structural representation of VWS
  • FIG. 3 is a schematic diagram of the calculation flow of the weight of the uncropped pixel
  • FIG. 4 is a schematic diagram of a comparison between a depth map obtained by using depth estimation and a depth map generated by VWS;
  • Fig. 5 is the contrast schematic diagram of the edge of the depth map and texture map generated by VWS;
  • FIG. 6 is a schematic diagram of an implementation flowchart of a method for rendering a virtual viewpoint according to an embodiment of the present application
  • FIG. 7 is a schematic diagram of an implementation flowchart of a method for rendering a virtual viewpoint according to an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of an implementation of a virtual viewpoint rendering method according to an embodiment of the present application.
  • FIG. 10 is a schematic flowchart of an implementation of a virtual viewpoint rendering method according to an embodiment of the present application.
  • FIG. 11 is a schematic flowchart of an implementation of a virtual viewpoint rendering method according to an embodiment of the present application.
  • FIG. 12 is a schematic flowchart of a depth map optimization technique in viewpoint generation using superpixel segmentation according to an embodiment of the present application
  • FIG. 13 is a schematic diagram of a system architecture after the depth map optimization technology is introduced in an embodiment of the present application.
  • Fig. 14 is the contrast schematic diagram of two kinds of depth maps generated using the test sequence of fencing (Fencing) scene;
  • Fig. 15 is the contrast schematic diagram of two kinds of depth maps generated using the test sequence of the frog (Frog) scene;
  • Fig. 16 is the contrast schematic diagram of two kinds of texture maps that use Fencing test sequence to generate;
  • Figure 17 is a schematic diagram of the comparison of two texture maps generated using the Frog test sequence
  • FIG. 18 is a schematic diagram illustrating the comparison of two texture maps generated using a test sequence of a parking lot (Carpark) scene;
  • Figure 19 is a schematic diagram of the comparison of two texture maps generated using a test sequence of a street scene
  • FIG. 20 is a schematic diagram of the comparison of two texture maps generated using the test sequence of the Painter scene
  • 21 is a schematic structural diagram of an apparatus for rendering a virtual viewpoint according to an embodiment of the present application.
  • FIG. 22 is a schematic structural diagram of a rendering apparatus according to an embodiment of the present application.
  • FIG. 23 is a schematic structural diagram of a decoding apparatus according to an embodiment of the present application.
  • FIG. 24 is a schematic diagram of a hardware entity of an electronic device according to an embodiment of the present application.
  • first ⁇ second ⁇ third involved in the embodiments of the present application is to distinguish similar or different objects, and does not represent a specific ordering of objects. It is understood that “first ⁇ second ⁇ third” “Three” may be interchanged where permitted in a specific order or sequence, so that the embodiments of the present application described herein can be implemented in sequences other than those illustrated or described herein.
  • FIG. 1 shows a system architecture to which the embodiments of the present application may be applied, that is, a moving picture experts group (Moving Picture Experts Group, MPEG) immersive video test model (Test The system architecture 10 of the decoding end of the Model of Immersive Video (TMIV), as shown in FIG. 1 , the system architecture 10 includes: a decoding access unit (Decoded access unit) 11 and a rendering unit (Rendering unit) 12; wherein, the decoding access unit (Rendering unit) 12; The unit 11 contains various metadata and atlas information obtained after decoding. Afterwards, the information will be transmitted to the rendering unit 12 for virtual viewpoint rendering.
  • the subunits marked with the word optional (opt.) represent optional subunits.
  • the patch culling (Patch culling) subunit 121 of the rendering unit 12 filters the patches in the atlas information according to the user target viewpoint parameters (viewport parameters), and eliminates the patches that do not overlap with the user target view, thereby reducing the number of patches. The amount of computation when the virtual viewpoint is drawn.
  • the occupancy map recovery (Occupancy reconstruction) subunit 122 of the rendering unit 12 finds out the position of each patch (patches) in the view according to the information transmitted by the decoding access unit 11, and then pastes the filtered patches (patches) into The corresponding position completes the Pruned view reconstruction.
  • the view generation (View synthesis) subunit 123 uses the reconstructed cutout view to perform virtual viewpoint drawing, that is, the drawing of the target viewpoint. Since the generated virtual viewpoint has a certain hole area, the inpainting (Inpainting) subunit 124 needs to fill the hole. Finally, the Viewing space handling subunit 125 may smoothly fade the view out to black.
  • the viewpoint weighted synthesizer VWS is a virtual viewpoint rendering tool used by MPEG in 3DoF+TMIV.
  • the VWS is used in the renderer at the decoding end, and is specifically applied in the view synthesis link after the Pruned view reconstruction subunit 126 .
  • the VWS mainly includes three modules: a weight calculation module 201 , a visible map generation module 202 and a coloring module 203 .
  • the visible image generation module 202 aims to generate the visible image under the target viewpoint
  • the coloring module 203 aims to colorize the generated visible image under the target viewpoint to obtain the texture image under the target viewpoint. Since the visible image generation module 202 and the coloring module 203 depend on the weight of the input viewpoint relative to the target viewpoint, the weight calculation module 201 aims to calculate the weight of the input viewpoint according to the relationship between the input viewpoint and the target viewpoint.
  • the weight calculation module 201 calculates the weight of the input viewpoint according to the metadata information of the input viewpoint and the metadata information of the target viewpoint.
  • the weight of an input viewpoint is a function of the distance between that viewpoint and the target viewpoint.
  • the contribution of relevant pixels to the result is a weighted weight of the contribution of their corresponding viewpoints.
  • the weight calculation is a pixel-level operation, and the corresponding weight is calculated for the pixels that are not clipped.
  • the pixel weights are updated when the viewpoint is generated. As shown in Figure 3, the weights of uncropped pixels are calculated according to the following steps:
  • the initial weight is the weight of the viewpoint to which the pixel p belongs, and the value depends on the distance between the viewpoint to which the pixel p belongs and the target viewpoint; then the weights of the pixel p are determined by the processes described in a to c below.
  • the weight of the pixel p is added to the original value of the sub-node view.
  • the purpose of computing the visible map is to obtain the visible map under the target viewpoint from the reconstructed depth map of the input viewpoint (ie, the reconstructed depth map).
  • the whole process is divided into three steps: Warping, Selection and Filtering.
  • the pixels in the depth map of the input viewpoint are reprojected to the target viewpoint to generate a warped depth map. Doing this on the input viewpoint yields several warped depth maps at the target viewpoint.
  • the screening step combines several generated distorted depth maps to generate a relatively complete depth map under the target viewpoint, as shown in the figure.
  • the screening step is performed according to the weight of each input viewpoint, using the pixel-level majority voting principle.
  • the principle of majority voting means that there may be multiple depth values projected to the same pixel position, and the pixel depth value is selected as the one with the most projection.
  • a filtering step is performed on the resulting visible map, which is filtered using a median filter to remove outliers.
  • This step aims to generate the texture map under the target viewpoint.
  • the generation of the texture map under the target view requires the use of the filtered visible map and the restored texture map of the input view.
  • the continuity of the pixels in the input viewpoint in the visible map and the weight of the viewpoint to which they belong need to be considered.
  • the generated texture map is processed using bilinear filtering.
  • the detected pixels from the edge of the object in the input view texture map need to be culled.
  • the depth value calculated by the depth estimation method will have certain errors, resulting in noise in the estimated depth map.
  • Using such a depth map for virtual viewpoint rendering will inevitably result in a certain amount of noise in the generated depth map of the target viewpoint.
  • the left image 401 is a depth map obtained by using depth estimation
  • the right image 402 is a depth map obtained by using the left image 401 to draw virtual viewpoints, that is, a depth map generated by VWS.
  • Depth maps can usually be compressed using video coding standards. After compression coding, the depth map will produce certain compression distortion. Especially when a larger Quantization Parameter (QP) is used to compress the depth map, the compression distortion will be more serious.
  • QP Quantization Parameter
  • One manifestation reflected on the texture map is that there is a transition band at the junction of the foreground and background, and the foreground edge is not steep enough.
  • the image 5021 in the white rectangle in the right image 502 has more noise, which is reflected on the texture map, that is, there is a large transition zone at the junction between the foreground and the background in the image area 5022.
  • an embodiment of the present application provides a virtual viewpoint rendering method, which can be applied to any electronic device with data processing capability, and the electronic device can be a TV, a projector, a mobile phone, a personal computer, a tablet computer , virtual reality (Virtual Reality, VR) headset and any other device with video encoding and decoding functions or only decoding functions.
  • the functions implemented by the virtual viewpoint rendering method can be implemented by calling a program code by a processor in the electronic device.
  • the program code can be stored in a computer storage medium. It can be seen that the electronic device includes at least a processor and a storage medium.
  • FIG. 6 is a schematic diagram of an implementation flowchart of a virtual viewpoint rendering method according to an embodiment of the present application. As shown in FIG. 6 , the method may include the following steps 601 to 603:
  • Step 601 Generate an initial visible map of the target viewpoint according to the reconstructed depth map of the input viewpoint.
  • the electronic device may generate an initial visible map of the target viewpoint based on the reconstructed depth maps of these input viewpoints.
  • the electronic device can obtain the initial visible image through the visible image calculation module 202 shown in FIG. 2 .
  • the visible map and the depth map have the same meaning, and both represent the distance between the scene and the camera position.
  • the difference between the visible image and the depth image is that the closer the visible image is to the camera position, the smaller the pixel value.
  • Step 602 Perform quality improvement processing on the initial visible image to obtain a target visible image of the target viewpoint.
  • the purpose of the quality upscaling process is to reduce noise and/or transitions in the initially visible image.
  • the electronic device may perform denoising and/or edge enhancement processing on the initial visible image to improve the quality of the initial visible image, thereby obtaining the target visible image of the target viewpoint.
  • transition area refers to the transition zone area existing at the junction in the image.
  • the existence of this area leads to deviations in the subsequent analysis and understanding of the image, that is, the transition at the junction in the final target view is unnatural. .
  • the manner in which the electronic device performs denoising and/or edge enhancement processing on the initial visible image may vary. For example, the electronic device performs filtering processing on the initial visible image; for another example, the electronic device performs replacement processing on noise points existing in the initial visible image and pixel values in the transition area.
  • Step 603 coloring the target visible map of the target viewpoint to obtain the target texture map of the target viewpoint.
  • the electronic device generates the initial visible image of the target viewpoint according to the reconstructed depth map of the input viewpoint; at this time, instead of directly generating the target texture map of the target viewpoint from the initial visible image of the target viewpoint, the The initial visible image is subjected to quality improvement processing, and the target visible image obtained by the processing is colored to obtain the target texture image; in this way, on the one hand, the noise and/or transition area in the target texture image is significantly reduced;
  • the coding end can use a larger quantization parameter to compress and encode the depth map, thereby reducing the coding overhead of the depth map, thereby improving the overall coding efficiency.
  • FIG. 7 is a schematic diagram of an implementation flow of the method for rendering a virtual viewpoint according to an embodiment of the present application. As shown in FIG. 7 , the method may include the following steps 701 to 705:
  • Step 701 according to the reconstructed depth map of the input viewpoint, generate the initial visible map of the target viewpoint;
  • the electronic device may decode the input code stream to obtain an atlas of the depth map of the input viewpoint; then, perform cropped view restoration on the atlas of the depth map of the input viewpoint to obtain the input The reconstructed depth map of the viewpoint.
  • the number of input viewpoints based on which the initial visible image is generated is not limited.
  • the electronic device may generate an initial visible map of the target viewpoint according to the reconstructed depth map of one or more input viewpoints.
  • Step 702 obtaining the initial texture map of the target viewpoint
  • the electronic device when it decodes the input code stream, it not only obtains the atlas of the depth map of the input viewpoint, but also obtains the atlas of the texture map of the input viewpoint; based on this, in some embodiments, the electronic device The device can obtain the initial texture map of the target viewpoint by performing clipping view restoration on the atlas of the texture maps of the input viewpoint to obtain the reconstructed texture map of the input viewpoint; according to the reconstructed texture map of the input viewpoint, The initial visible map of the target viewpoint is colored to obtain the initial texture map of the target viewpoint.
  • Step 703 Segment the initial texture map of the target viewpoint to obtain a segmented region.
  • the reason why the initial texture map is segmented, rather than the initial visible map directly, is because: if the initial visible map is segmented directly, such as shown in Figure 8, it will follow the initial It can be seen that the edge in Figure 801 is used for segmentation, so if there are some noise points on the edge, this segmentation method cannot segment these noise points. Compared with the segmentation based on the initial visible image, the segmentation based on the initial texture map can produce better segmentation at the edge (that is, the junction), so that the more accurate segmentation results using the initial texture map can be better. Directly guiding the quality improvement process of the initial visible image will be very beneficial to sharpen the edges, so that the noise and transition areas at the edges of the target texture map obtained after quality improvement and coloring are significantly reduced.
  • superpixel segmentation may be performed on the initial texture map to obtain segmented regions.
  • the used superpixel segmentation algorithm may be various, which is not limited in this embodiment of the present application.
  • the superpixel segmentation algorithm may be a simple linear iterative clustering (Simple Linear Iterative Cluster, SLIC) superpixel segmentation algorithm, a superpixel extraction (Superpixels Extracted via Energy-Driven Sampling, SEEDS) algorithm by energy-driven sampling, Contour-Relaxed Superpixels (CRS) algorithm, ETPS or Entropy Rate Superpixels Segmentation (ERS) algorithm, etc.
  • the SLIC superpixel segmentation algorithm is ideal in terms of running speed, compactness of generating superpixels, and contour preservation. Therefore, in some embodiments, the electronic device uses the SLIC superpixel segmentation algorithm to perform superpixel segmentation on the initial texture map, which can improve the quality of the target texture map to a certain extent without significantly increasing the processing time, thereby achieving a certain degree of improvement. The objective quality and subjective effect of the final target texture map and the corresponding target view are significantly improved.
  • Step 704 Perform denoising and/or edge enhancement processing on the corresponding region of the segmented region on the initial visible map to obtain a target visible map of the target viewpoint.
  • the electronic device may use the segmented area of the initial texture map as the segmented area of the initial visible image, and classify the pixels of the segmented area of the initial visible image to determine the initial visible image
  • the determination of the target pixel to be updated can be achieved through steps 904 and 905 in the following embodiments.
  • Classification algorithms can be varied.
  • the classification algorithm is K-means clustering, decision tree, Bayesian, artificial neural network, support vector machine or classification based on association rules, etc.
  • the electronic device can directly transfer the segmentation result of the initial texture image to the initial visible image, and use the segmented area of the initial texture image as the segmented area of the initial visible image.
  • the electronic device may filter the pixel value of the target pixel in the visible image to update the pixel value.
  • the electronic device may also replace the pixel value of the target pixel in the visible image, so as to update the pixel value.
  • each segmented area corresponds to a pixel replacement value
  • the pixel replacement value may be an average value of pixel values of non-target pixels in the corresponding segmented area.
  • each segmented region corresponds to a cluster center of a non-target pixel class, so the pixel value of the center can be used as the pixel replacement value.
  • Step 705 Colorize the target visible map of the target viewpoint to obtain the target texture map of the target viewpoint.
  • This step can be implemented by the coloring module 203 shown in FIG. 2 .
  • the target visible map of the target viewpoint is colored to obtain the target texture map of the target viewpoint.
  • FIG. 9 is a schematic diagram of an implementation flow of the method for drawing a virtual viewpoint according to an embodiment of the present application. As shown in FIG. 9 , the method may include the following steps 901 to 907:
  • Step 901 according to the reconstructed depth map of the input viewpoint, generate the initial visible map of the target viewpoint;
  • Step 902 obtaining the initial texture map of the target viewpoint
  • Step 903 segment the initial texture map of the target viewpoint to obtain segmented regions
  • Step 904 Cluster the pixel values of the pixels in the segmented area of the initial visible image, and obtain at least: the number of pixels of the first type of pixels and the pixel value of the cluster center of the first type of pixels, and the second type of pixels. The number of pixels of pixels and the pixel value of the cluster center of the second type of pixels;
  • each segmented region has a corresponding clustering result.
  • clustering algorithms which are not limited in this embodiment of the present application.
  • the K-means clustering algorithm For example, the K-means clustering algorithm.
  • the initial texture map can be divided into several divided regions through step 903 . These segmented regions are mapped to the initial visible map.
  • the electronic device may separately classify pixels in a part of the segmented area or all of the segmented area on the initial visible image. For example, in some embodiments, the electronic device may utilize a classification algorithm (eg, K-means clustering algorithm) to divide the pixels in each segmented area in the initial visible image into two categories: non-target pixels belonging to the background area and noise The target pixel for the region (or transition).
  • a classification algorithm eg, K-means clustering algorithm
  • Step 905 at least according to the relationship between the number of pixels of the first type of pixels in the segmented area and the number of pixels of the second type of pixels, and the pixel value of the cluster center of the first type of pixels and the first type of pixels.
  • One of the relationships between the pixel values of the cluster centers of the two types of pixels determines the target pixel to be updated in the corresponding segmented area.
  • a first operation result of subtracting the pixel value of the cluster center of the first type of pixels from the pixel value of the cluster center of the second type of pixels is greater than or equal to a first threshold, and the In the case where the second operation result obtained by dividing the number of pixels of the first type of pixels by the number of pixels of the second type of pixels is greater than or equal to the second threshold, determine that the second type of pixels is the target pixel to be updated corresponding to the segmented area ; Correspondingly, in this case, the pixels of the first type are non-target pixels.
  • the third operation result of subtracting the pixel value of the pixel value of the cluster center of the second type of pixel from the pixel value of the cluster center of the first type of pixel is greater than or equal to the first threshold, and the second type of pixel
  • the fourth operation result of dividing the number of pixels of the first type by the number of pixels of the first type of pixels is greater than or equal to the second threshold, determine that the first type of pixels is the target pixel to be updated corresponding to the segmented area.
  • the pixels of the second type are non-target pixels.
  • Step 906 Update the pixel value of the target pixel in the initial visible image to obtain the target visible image.
  • the way of updating can be varied. For example, the pixel value of the cluster center of the non-target pixels of the segmented area is replaced with the pixel value of the target pixel of the area.
  • Step 907 coloring the target visible map of the target viewpoint to obtain the target texture map of the target viewpoint.
  • FIG. 10 is a schematic flowchart of the implementation of the method for rendering a virtual viewpoint according to an embodiment of the present application. As shown in FIG. 10 , the method may include the following steps 1001 to 1007:
  • Step 1001 according to the reconstructed depth map of the input viewpoint, generate the initial visible map of the target viewpoint;
  • Step 1002 obtaining the initial texture map of the target viewpoint
  • Step 1003 segment the initial texture map of the target viewpoint to obtain segmented regions
  • Step 1004 mapping the pixel values of the pixels in the initial visible image to a specific interval to obtain a first visible image
  • the specific interval can be [0,255].
  • the specific interval can be [0,255].
  • engineers can also configure other specific intervals according to actual needs.
  • Step 1005 Use the segmented area of the initial texture map as the segmented area of the first visible image, and perform clustering on the pixels of the segmented area of the first visible image to obtain at least: pixels of the first type of pixels the number and the pixel value of the cluster center of the first type of pixels, and the pixel number of the second type of pixels and the pixel value of the cluster center of the second type of pixels;
  • Step 1006 at least according to the relationship between the number of pixels of the first type of pixels and the number of pixels of the second type of pixels, and the pixel value of the cluster center of the first type of pixels and the clustering of the second type of pixels.
  • a first operation result of subtracting the pixel value of the cluster center of the first type of pixels from the pixel value of the cluster center of the second type of pixels is greater than or equal to a first threshold, and the In the case where the second operation result obtained by dividing the number of pixels of the first type of pixels by the number of pixels of the second type of pixels is greater than or equal to the second threshold, determine that the second type of pixels is the target pixel to be updated corresponding to the segmented area ; Correspondingly, in this case, the first type of pixels are non-target pixels.
  • the third operation result of subtracting the pixel value of the pixel value of the cluster center of the pixel of the second type minus the pixel value of the cluster center of the pixel of the first type is greater than or equal to the first threshold, and the pixel of the second type
  • the fourth operation result obtained by dividing the number of pixels of the first type by the number of pixels of the first type of pixels is greater than or equal to the second threshold
  • the first type of pixels is determined as the target pixel to be updated corresponding to the segmented area.
  • the pixels of the second type are non-target pixels.
  • the cluster center of the first type of pixels is represented by cen 1
  • the cluster center of the second type of pixels is represented by cen 2
  • the number of pixels of the first type of pixels is represented by num 1
  • the number of pixels of the second type of pixels is represented by num 2 means
  • the first type of pixels and the second type of pixels are both non-target pixels, and the pixel values in the corresponding segmented regions are not processed.
  • the value range of the first threshold is [25, 33] and the value range of the second threshold is [5, 10].
  • the first threshold is 30 and the second threshold is 7.
  • Step 1007 Update the pixel value of the target pixel to be updated in the first visible image to obtain a second visible image.
  • the clustering of pixels in the segmented area of the first visible image further determines non-target pixels in the segmented area; step 1007 may be implemented as follows: according to the first visible image The pixel value of the non-target pixel of the segmented area is determined to determine the pixel replacement value of the segmented area; the pixel value of the target pixel of the segmented area of the first visible image is updated to the pixel replacement value of the corresponding segmented area, thereby obtaining the first Two can be seen in Fig.
  • the pixel value of the cluster center of the non-target pixels in the segmented area of the first visible image may be determined as the pixel replacement value of the segmented area.
  • Step 1008 Perform reverse mapping on pixel values of pixels in the second visible image according to the mapping relationship between the initial visible image and the first visible image to obtain the target visible image.
  • the quality improvement processing of the initial visible image is realized through steps 1002 to 1008, that is, before determining the target pixel of the initial visible image, the pixel value of the pixel of the image is first mapped to a specific interval, and then the mapping result ( That is, the pixel value of the pixel in the first visible image) is classified, and the pixel value of the target pixel determined by the classification is updated to obtain the second visible image, and finally the second visible image is reversely mapped to the target visible image; so , so that the virtual viewpoint rendering method has a certain generalization ability, so as to adapt to the processing of various scene images.
  • Step 1009 coloring the target visible map of the target viewpoint to obtain the target texture map of the target viewpoint.
  • FIG. 11 is a schematic diagram of an implementation flowchart of the method for rendering a virtual viewpoint according to an embodiment of the present application. As shown in FIG. 11 , the method may include the following steps 111 to 1112:
  • Step 111 decoding the input code stream to obtain an atlas of the depth map of the input viewpoint
  • Step 112 performing clipping view restoration on the atlas of the depth map of the input viewpoint to obtain a reconstructed depth map of the input viewpoint;
  • Step 113 generating an initial visible image of the target viewpoint according to the reconstructed depth map of the input viewpoint
  • Step 114 obtaining the initial texture map of the target viewpoint
  • Step 115 performing superpixel segmentation on the initial texture map of the target viewpoint to obtain a segmentation result
  • Step 116 mapping the pixel values of the pixels in the initial visible image of the target viewpoint to a specific interval to obtain a first visible image
  • Step 117 using the segmentation result as the superpixel segmentation result of the first visible image, and clustering the pixel values of the pixels in the superpixels of the first visible image to obtain a clustering result;
  • the clustering result includes: : the number of pixels of the first type of pixels and the pixel value of the cluster center of the first type of pixels, and the number of pixels of the second type of pixels and the pixel value of the cluster center of the second type of pixels;
  • the electronic device may use a K-means clustering algorithm to classify the pixel values of the pixels in each superpixel in the first visible map, respectively.
  • Step 118 according to the relationship between the number of pixels of the first type of pixels and the number of pixels of the second type of pixels, and the clustering between the pixel values of the cluster centers of the first type of pixels and the second type of pixels The relationship between the pixel values in the center, determine the target pixels in the corresponding superpixels that belong to the noise point or transition area, and determine the non-target pixels that do not belong to the noise point or transition area in the corresponding superpixels.
  • each superpixel corresponds to a clustering result
  • the corresponding superpixel described here refers to the superpixel corresponding to the clustering result based on, that is, the clustering result based on is the clustering of the corresponding superpixel. result.
  • a first operation result of subtracting the pixel value of the cluster center of the first type of pixels from the pixel value of the cluster center of the second type of pixels is greater than or equal to a first threshold, and the In the case where the second operation result obtained by dividing the number of pixels of the first type of pixels by the number of pixels of the second type of pixels is greater than or equal to the second threshold, determine that the second type of pixels is the target pixel to be updated corresponding to the superpixel ; Correspondingly, in this case, the pixels of the first type are non-target pixels.
  • a third operation result of subtracting the pixel value of the cluster center of the second type of pixels from the pixel value of the cluster center of the first type of pixels is greater than or equal to the first threshold, and In the case where the fourth operation result obtained by dividing the number of pixels of the second type of pixels by the number of pixels of the first type of pixels is greater than or equal to the second threshold, it is determined that the first type of pixels is a to-be-to-be-corresponding superpixel.
  • the updated target pixel correspondingly, in this case, the pixels of the second type are non-target pixels.
  • the third operation result is less than the first threshold or the first threshold
  • the result of the fourth operation is less than the second threshold
  • the value range of the first threshold is [25, 33] and the value range of the second threshold is [5, 10].
  • the first threshold is 30 and the second threshold is 7.
  • Step 119 Determine a pixel replacement value according to the pixel value of the non-target pixel of the superpixel in the first visible image.
  • the mean value of the non-target pixels in the superpixel can be used as the pixel replacement value of the target pixel in the superpixel; for example, the pixel value of the cluster center of the non-target pixel class of the superpixel is used as the pixel value Replacement value.
  • Step 1110 Update the pixel value of the target pixel in the superpixel in the first visible image to the pixel replacement value corresponding to the superpixel, thereby obtaining a second visible image.
  • the pixel value of the cluster center of non-target pixels in the superpixels in the first visible map is determined as the pixel replacement value.
  • filtering is often used to improve the quality of noise points and transition areas in visible images, so it is hoped to disperse such influences.
  • this will change the correct pixel values of the pixels around the noise and transition areas (that is, the non-target pixels), making the final target view of slightly poorer objective quality and subjective effect;
  • the pixel values of these noise points and transition areas are replaced with an approximately correct value (that is, pixel replacement values), so that the pixels surrounding the target pixels that are not the target pixels are The value will not be changed.
  • this method can make the target pixel after replacing the pixel value merge with the surrounding area more naturally, so that the objective quality and subjective effect of the final target view are better.
  • Step 1111 Perform reverse mapping on pixel values of pixels in the second visible image according to the mapping relationship between the initial visible image and the first visible image to obtain the target visible image.
  • Step 1112 Generate a target view of the target viewpoint according to the target visible map of the target viewpoint.
  • An embodiment of the present application provides a rendering method.
  • the data method can be applied not only to electronic devices, but also to rendering devices.
  • the method may include: recovering a clipped view of an atlas of a depth map of an input viewpoint to obtain the obtained The reconstructed depth map of the input viewpoint is performed; the steps in the virtual viewpoint rendering method described in the embodiments of the present application are performed on the reconstructed depth map of the input viewpoint to obtain the target texture map of the target viewpoint; according to the target viewpoint The target texture map of the target viewpoint is generated, and the target view of the target viewpoint is generated.
  • the obtaining the initial texture map of the target viewpoint includes: performing clipping view restoration on the atlas of the texture maps of the input viewpoint to obtain the reconstructed texture map of the input viewpoint;
  • the reconstructed texture map of the input viewpoint is colored, and the initial visible map of the target viewpoint is colored to obtain the initial texture map of the target viewpoint.
  • the atlas of the texture map of the input viewpoint is obtained by decoding the code stream by an electronic device
  • step 113 can be implemented as follows: coloring the target visible image of the target viewpoint to obtain a target texture image of the target viewpoint; filling holes in the target texture image to obtain an initial view; The initial view is processed in view space to obtain the target view.
  • the description of the rendering method embodiment is similar to the description of the other method embodiments described above, and has similar beneficial effects to the other method embodiments described above.
  • An embodiment of the present application provides a decoding method, the method includes: decoding an input code stream to obtain an atlas of a depth map of an input viewpoint; performing cropped view restoration on the atlas of the depth map of an input viewpoint, obtaining the reconstructed depth map of the input viewpoint; performing the steps in the virtual viewpoint rendering method described in the embodiments of the present application on the reconstructed depth map of the input viewpoint to obtain the target texture map of the target viewpoint; according to the The target texture map of the target viewpoint, and the target view of the target viewpoint is generated.
  • the code stream is decoded, and an atlas of the texture map of the input viewpoint is obtained; the obtaining the initial texture map of the target viewpoint includes: processing the texture map of the input viewpoint The atlas recovers the clipped view to obtain the reconstructed texture map of the input viewpoint; according to the reconstructed texture map of the input viewpoint, the initial visible image of the target viewpoint is colored to obtain the initial texture of the target viewpoint picture.
  • the electronic device decodes to obtain an atlas of depth maps and an atlas of texture maps of one or more input viewpoints.
  • the atlas of the depth map of the input viewpoint is subjected to cropped view restoration to obtain the reconstructed depth map of the input viewpoint;
  • the initial visible image of the target viewpoint is generated; at this time, instead of directly generating the target view of the target viewpoint from the initial visible image of the target viewpoint, the quality improvement process of the initial visible image is performed first, based on the quality
  • the target visible image is obtained by the lifting process to generate the target view; in this way, on the one hand, the noise and/or transition area in the final target view is significantly reduced; on the other hand, on the basis of ensuring the image quality of the target view, the encoder can
  • the depth map is compressed and encoded by using a larger quantization parameter, thereby reducing the coding overhead of the depth map, thereby improving the overall coding efficiency.
  • decoding method embodiment is similar to the description of the above other method embodiments, and has similar beneficial effects as the above other method embodiments.
  • technical details not disclosed in the decoding method embodiments please refer to the descriptions of the other method embodiments above to understand.
  • a technical solution for optimizing a depth map in viewpoint generation by using superpixel segmentation is provided.
  • This technical solution is an improvement on the basis of VWS, and aims to optimize the visible image under the target viewpoint obtained in the visible image generation step of VWS (the visible image and the depth map have the same meaning, and both represent the distance between the scene and the camera position. Distance relationship. Unlike the depth map, the closer to the camera position in the visible image, the smaller the pixel value).
  • the superpixel segmentation algorithm is used to segment the initial texture map generated by the VWS, and the result obtained from the segmentation is applied to the initial visible map generated by the VWS.
  • Use K-means clustering to cluster the superpixels on the obtained initial visible image, so that the noise that needs to be processed, and the transition area that needs to be processed and the area that does not need to be processed can be separated, and then the noise and the need to be processed.
  • the pixel value of the transition area to be processed is replaced.
  • the technical solution of the embodiment of the present application is divided into three modules: a superpixel segmentation (Superpixel Segmentation) module 121, a K-means clustering (K-Means Clustering) module 122 and a replacement (Replacement) module 123; wherein; ,
  • the generated visible image D (ie, the initial visible image) is obtained through the visible image generation step of the VWS. Due to the test sequences for different scene contents, the range of pixel values in the initial visible image is different.
  • a linear mapping algorithm can be used to transform the pixel values in the visible image D into the [0, 255] interval to obtain the visible image D 2 (that is, the first visible image); then, from the coloring step
  • the generated texture map T (that is, the initial texture map) is obtained in , and the SLIC superpixel segmentation algorithm is used for the texture map T to divide the texture map T with the number of superpixels (numSuperpixel) as 1200; then the texture map T is obtained.
  • the result of superpixel segmentation is applied to the visible image D2, and several superpixels Si divided on the visible image D2 are obtained.
  • a K-means clustering algorithm is used for each superpixel Si to divide the pixels in it into two classes: C 1 and C 2 .
  • C 1 and C 2 Denote the cluster centers of C 1 and C 2 as cen 1 and cen 2 , respectively, and the number of pixels contained in them are num 1 and num 2 , respectively.
  • cen 1 -cen 2 >30, and num 1 /num 2 >7 cen 1 -cen 2 >30, and num 1 /num 2 >7, then C 1 is considered to be the background area and C 2 is the noise area or transition area, then all pixels in C 1 are not processed, and the original The value is unchanged, and the value of all pixels in C 2 is replaced by cen 1 ; wherein, 30 is an example of the first threshold, and 7 is an example of the second threshold;
  • C 2 is considered to be the background area, and C 1 is the noise area or transition area, then all pixels in C 2 are not processed, and the original The value is unchanged, and the value of all pixels in C 1 is replaced by cen 2 ;
  • the optimized visible image D 3 (that is, the second visible image) is obtained.
  • the visible image D 3 is reversely linearly mapped, and zoomed to the original value range to obtain the visible image D 4 (that is, the target visible image).
  • the original visible image D is replaced with the visible image D 4 , and the coloring step is performed again on the visible image D 4 to obtain an optimized texture image T2 (ie, the target texture image).
  • the system architecture after introducing the depth map optimization technology is shown in Figure 13.
  • the technical solutions provided by the embodiments of the present application can be implemented on TMIV 6.0, and the test sequence of the natural scene content is tested in the Common Test Condition.
  • the experimental results show that after the introduction of this technical solution in VWS, the noise in the depth map of the generated target viewpoint has been greatly reduced, and the junctions of some foreground and background in the texture map have become more distinct. Since the superpixel segmentation algorithm adopts the SLIC superpixel segmentation algorithm, the technical solution achieves a certain degree of improvement in the quality of the depth map and texture map of the target viewpoint without significantly increasing the rendering time.
  • the experimental configuration in some embodiments is as follows: the superpixel segmentation adopts the SLIC algorithm; the number of superpixels (numSuperpixel) is 1200; the value of K in the K-means clustering algorithm is 2; the difference between the cluster centers cen1 and cen2 is The threshold was chosen to be 30; the threshold of the ratio of the number of pixels num1 and num2 in the two clusters was chosen to be 7.
  • one or more of the above configuration parameters may not be fixed values.
  • Relevant implementations may include: (1) Encoding in the code stream the above one or more parameter values that need to be used during the execution of the method in the embodiments of the present application, and the data units in the code stream used include: sequence layer data units (such as SPS, PPS), picture layer data units (such as PPS, APS, picture header, slice header, etc.), block layer data units (such as CTU, CU layer data units); (2) Use implicit derivation to determine one of the above or A plurality of parameter values; (3) a combination of (1) and (2), a method for adaptively determining the sequence, image, and block layers of the above parameter values.
  • sequence layer data units such as SPS, PPS
  • picture layer data units such as PPS, APS, picture header, slice header, etc.
  • block layer data units such as CTU, CU layer data units
  • the comparison effect of depth maps before and after using the technical solutions provided by the embodiments of the present application is shown in FIG. 14 , wherein the depth map 141 on the left is before using the technical solutions provided by the embodiments of the present application.
  • the depth map 142 on the right is the depth map generated by using the Fencing test sequence after using the technical solution provided by the embodiment of the present application.
  • the noise of the depth map 142 on the right is significantly reduced, especially in the area in the rectangular frame. Compared with the depth map 141 on the left, the noise disappears.
  • after replacing the noise with the pixel value of the cluster center it is integrated with the surrounding area, and the image effect is natural and clear.
  • the comparison effect of depth maps before and after using the technical solutions provided by the embodiments of the present application is shown in FIG. 15 , wherein the depth map 151 on the left is before using the technical solutions provided by the embodiments of the present application
  • the depth map is generated by using the test sequence of the frog (Frog) scene
  • the depth map 152 on the right is the depth map generated by using the Frog test sequence after using the technical solutions provided by the embodiments of the present application.
  • the noise of the depth map 152 on the right is reduced, especially the area in the rectangular box, and the noise disappears compared to the depth map 151 on the left.
  • after replacing the noise with the pixel value of the cluster center it is integrated with the surrounding area, and the image effect is natural and clear.
  • the comparison effect of texture maps before and after using the technical solutions provided by the embodiments of the present application is shown in FIG. 16 .
  • the texture map 162 below is the texture map generated by using the Fencing test sequence after using the technical solutions provided by the embodiments of the present application.
  • the texture map 162 below is of better quality.
  • the edge area in the rectangular box 1611 in the upper texture map 161 has an obvious transition band, while the transition band in the edge area in the rectangular box 1621 in the lower texture map 162 is obviously sharpened; for another example, the upper texture In Fig.
  • the comparison effect of texture maps before and after using the technical solutions provided by the embodiments of the present application is shown in FIG.
  • the texture map is generated by using the Frog test sequence
  • the texture map 172 below is the texture map generated by using the Frog test sequence after using the technical solutions provided by the embodiments of the present application.
  • the texture map 172 below has a better image quality.
  • the transition band in the edge region of the human hand in the rectangular box 1711 there is an obvious transition band in the edge region of the human hand in the rectangular box 1721 in the lower texture map 172 is obviously sharpened;
  • the texture map 171 there is an obvious transition band at the edge of the doll's collar in the rectangular frame 1712, while the transition band at the edge of the doll's collar in the rectangular frame 1722 in the texture map 172 below disappears; and, as can be seen from the figure After replacing the transition band with the pixel value of the cluster center, it is integrated with the surrounding area, and the image effect is natural and clear.
  • the comparison effect of texture maps before and after using the technical solutions provided by the embodiments of the present application is shown in FIG. 18 .
  • the texture map is generated using the test sequence of the parking lot (Carpark) scene
  • the texture map 182 below is the texture map generated by using the Carpark test sequence after using the technical solution provided by the embodiment of the present application.
  • the texture map 182 below is of better quality.
  • the area in the rectangular frame 1811 in the texture map 181 above is enlarged as shown in 1812, and there is obvious noise in the circle frame, while the area in the rectangular frame 1821 in the texture image 182 below is shown in 1822 after magnification.
  • the texture map is generated by using the test sequence of the street scene.
  • the texture map 192 below is the texture map generated by using the Street test sequence after using the technical solutions provided by the embodiments of the present application. As can be seen from the figure, the lower texture map 192 has a better image quality.
  • the area in the rectangular frame 1911 in the texture map 191 above is shown in 1912 after being enlarged, where there is a clear transition zone on the upper left edge of the sign, while the area in the rectangular frame 1921 in the texture image 192 below is enlarged Then, as shown in 1922, the transition zone on the upper left edge of the sign basically disappears; another example, the area in the rectangular frame 1913 in the texture map 191 above is enlarged as shown in 1914, in which the arc-shaped stick above the car There is a transition zone on the edge of the bracket, and the area in the rectangular frame 1923 in the texture map 192 below is shown in 1924 when enlarged, in which the edge of the arc-shaped stick bracket above the car becomes clear; and, it can be seen from the figure After replacing the transition band with the pixel value of the cluster center, it is integrated with the surrounding area, and the image effect is natural and clear.
  • the comparison effect of texture maps before and after using the technical solutions provided by the embodiments of the present application is shown in FIG.
  • the texture map is generated by using the test sequence of the Painter scene
  • the texture map 202 below is the texture map generated by using the Painter test sequence after using the technical solutions provided by the embodiments of the present application.
  • the lower texture map 202 has a better image quality.
  • the area in the rectangular frame 2011 in the texture map 201 above is enlarged as shown in 2012, in which there is an obvious transition zone along the edge of the human hand, while the area in the rectangular frame 2021 in the texture image 202 below is enlarged as shown in 2022 shows that the transition zone of the edge of the human hand basically disappears, especially the edges of the index finger and middle finger are more clear. Moreover, it can be seen from the figure that after replacing the transition band with the pixel value of the cluster center, it is integrated with the surrounding area, and the image effect is natural and clear.
  • the simple linear iterative clustering (Simple Linear Iterative Cluster, SLIC) superpixel segmentation algorithm and the K-means clustering algorithm are used to separate the noise points and transition areas in the visible image, and these are processed to improve the Objective quality and subjective effects of depth maps and texture maps.
  • the virtual viewpoint rendering apparatus provided by the embodiments of the present application, including each module included and each unit included in each module, can be implemented by a decoder or processor in an electronic device; of course, it can also be It is realized by a specific logic circuit; in the process of implementation, the processor can be a central processing unit (CPU), a microprocessor (MPU), a digital signal processor (DSP), a field programmable gate array (FPGA) or a graphics processor processor (Graphics Processing Unit, GPU), etc.
  • CPU central processing unit
  • MPU microprocessor
  • DSP digital signal processor
  • FPGA field programmable gate array
  • GPU Graphics Processing Unit
  • FIG. 21 is a schematic structural diagram of a virtual viewpoint rendering device according to an embodiment of the present application. As shown in FIG. 21 , the device 21 includes:
  • a visible image generation module 211 configured to generate an initial visible image of the target viewpoint according to the reconstructed depth map of the input viewpoint
  • a visible image optimization module 212 configured to perform quality improvement processing on the initial visible image to obtain a target visible image of the target viewpoint
  • the coloring module 213 is configured to colorize the target visible image of the target viewpoint to obtain the target texture map of the target viewpoint.
  • the visible image optimization module 212 is configured to: perform denoising and/or edge enhancement processing on the initial visible image to obtain the target visible image of the target viewpoint.
  • the visible image optimization module 212 includes: an acquisition unit, configured to obtain an initial texture map of the target viewpoint; a segmentation unit, configured to segment the initial texture map of the target viewpoint to obtain a segmented region; An enhancement unit, configured to perform denoising and/or edge enhancement processing on the corresponding area of the segmented area on the initial visible image to obtain the target visible image of the target viewpoint.
  • the segmentation unit is configured to: use the SLIC superpixel segmentation algorithm to perform superpixel segmentation on the initial texture map of the target viewpoint, and the segmented area is a superpixel.
  • the enhancement unit includes: a classification sub-unit, configured to use the segmented region of the initial texture map as the segmented region of the initial visible map, and perform pixel analysis on the pixels of the segmented region of the initial visible map. classification, to determine the target pixel to be updated in the segmented area of the initial visible image; update subunit, used to update the pixel value of the target pixel in the initial visible image to obtain the target visible image .
  • the classification sub-unit is configured to: cluster the pixel values of the pixels of the segmented region of the initial visible image, to obtain at least: the number of pixels of the first type of pixels and the number of pixels of the first type of pixels The pixel value of the cluster center, and the number of pixels of the second type of pixels and the pixel value of the cluster center of the second type of pixels; at least according to the pixel number of the first type of pixels and the second type of pixels.
  • One of the relationship between the number of pixels and the relationship between the pixel value of the cluster center of the first type of pixels and the pixel value of the cluster center of the second type of pixels determines the target pixel to be updated in the corresponding segmented area.
  • the classification subunit is configured to: map the pixel values of the pixels in the initial visible map to a specific interval to obtain a first visible map; and use the segmented area of the initial texture map as the The segmentation area of the first visible image, and the pixels of the segmentation area of the first visible image are clustered to obtain at least: the number of pixels of the first type of pixels and the pixels of the cluster center of the first type of pixels value, and the number of pixels of the second type of pixels and the pixel value of the cluster center of the second type of pixels; at least according to the relationship between the number of pixels of the first type of pixels and the number of pixels of the second type of pixels , and one of the relationships between the pixel value of the cluster center of the first type of pixel and the pixel value of the cluster center of the second type of pixel, to determine the target to be updated in the segmented area of the first visible image
  • the update subunit is used to: update the pixel value of the target pixel to be updated in the first visible image to
  • the classification subunit is further configured to: cluster the pixels of the segmented area of the first visible image to determine non-target pixels in the segmented area; accordingly, the updater a unit for: determining the pixel replacement value of the segmented area according to the pixel value of the non-target pixel of the segmented area of the first visible image; converting the pixel value of the target pixel of the segmented area of the first visible image to Update the pixel replacement value corresponding to the segmented area to obtain the second visible image.
  • the update subunit is configured to: determine the pixel value of the cluster center of the non-target pixels in the segmented area of the first visible image as the pixel replacement value of the segmented area.
  • the classification subunit is configured to: a first operation result obtained by subtracting the pixel value of the cluster center of the pixel of the first type from the pixel value of the cluster center of the pixel of the second type is greater than or equal to the first threshold, and the second operation result of dividing the number of pixels of the first type of pixels by the number of pixels of the second type of pixels is greater than or equal to the second threshold, determine that the second type of pixels is The target pixel to be updated corresponding to the segmented area; the third operation result obtained by subtracting the pixel value of the cluster center of the pixel of the second type from the pixel value of the cluster center of the pixel of the first type is greater than or equal to the a threshold, and the fourth operation result obtained by dividing the number of pixels of the second type of pixels by the number of pixels of the first type of pixels is greater than or equal to the second threshold, determine that the pixels of the first type are corresponding The target pixel to be updated in the segmented region.
  • the classification subunit is configured to: when the first operation result is less than the first threshold or the second operation result is less than the second threshold, and the third operation result is less than When the first threshold value or the fourth operation result is smaller than the second threshold value, it is determined that both the first type pixel and the second type pixel are non-target pixels corresponding to the divided region.
  • the value range of the first threshold is [25, 33]
  • the value range of the second threshold is [5, 10].
  • the first threshold is 30 and the second threshold is 7.
  • FIG. 22 is a schematic structural diagram of the rendering apparatus according to an embodiment of the present application.
  • the apparatus 22 includes a clipping view restoration module 221 , a virtual viewpoint drawing module 222 and a target view generation module 223; of which,
  • a clipping view recovery module 221, configured to perform clipping view recovery on the atlas of the depth map of the input viewpoint to obtain the reconstructed depth map of the input viewpoint;
  • a virtual viewpoint drawing module 222 configured to perform the steps in the virtual viewpoint drawing method described in the embodiments of the present application on the reconstructed depth map of the input viewpoint, to obtain a target texture map of the target viewpoint;
  • the target view generation module 223 is configured to generate the target view of the target viewpoint according to the target texture map of the target viewpoint.
  • FIG. 23 is a schematic structural diagram of the decoding apparatus according to the embodiment of the present application.
  • the apparatus 23 includes a decoding module 231 , a cut view restoration module 232 , a virtual viewpoint drawing module 233 and target view generation module 234; wherein,
  • the decoding module 231 is used for decoding the input code stream to obtain the atlas of the depth map of the input viewpoint;
  • a clipping view recovery module 232 configured to perform clipping view recovery on the atlas of the depth map of the input viewpoint to obtain the reconstructed depth map of the input viewpoint;
  • a virtual viewpoint drawing module 233 configured to perform the steps in the virtual viewpoint drawing method described in the embodiments of the present application on the reconstructed depth map of the input viewpoint, to obtain a target texture map of the target viewpoint;
  • the target view generation module 234 is configured to generate the target view of the target viewpoint according to the target texture map of the target viewpoint.
  • the above-mentioned virtual viewpoint rendering method is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer-readable storage medium.
  • the technical solutions of the embodiments of the present application may be embodied in the form of software products in essence or the parts that make contributions to related technologies.
  • the computer software products are stored in a storage medium and include several instructions to make
  • the electronic device executes all or part of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: a U disk, a mobile hard disk, a read only memory (Read Only Memory, ROM), a magnetic disk or an optical disk and other media that can store program codes.
  • the embodiments of the present application are not limited to any specific combination of hardware and software.
  • FIG. 24 is a schematic diagram of a hardware entity of the electronic device according to an embodiment of the present application.
  • the electronic device 240 includes a memory 241 and a processor 242 , and the memory 241 stores a computer program that can be executed on the processor 242, and when the processor 242 executes the program, the steps in the methods provided in the above embodiments are implemented.
  • the memory 241 is configured to store instructions and applications executable by the processor 242, and can also cache data to be processed or processed by the processor 242 and each module in the electronic device 240 (eg, image data, audio data, etc.). , voice communication data and video communication data), which can be realized by flash memory (FLASH) or random access memory (Random Access Memory, RAM).
  • the embodiments of the present application provide a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps in the virtual viewpoint rendering method provided in the foregoing embodiments.
  • the embodiments of the present application provide a decoder for implementing the decoding methods described in the embodiments of the present application.
  • the embodiments of the present application provide a rendering device, which is used to implement the rendering methods described in the embodiments of the present application.
  • the embodiments of the present application provide a viewpoint weighted combiner, which is used to implement the methods described in the embodiments of the present application.
  • references throughout the specification to "one embodiment” or “an embodiment” or “some embodiments” or “other embodiments” mean that a particular feature, structure or characteristic associated with the embodiments is included herein in at least one embodiment of the application.
  • appearances of "in one embodiment” or “in an embodiment” or “in some embodiments” or “in other embodiments” in various places throughout this specification are not necessarily necessarily referring to the same embodiment.
  • the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments.
  • the size of the sequence numbers of the above-mentioned processes does not mean the order of execution, and the execution order of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application.
  • implementation constitutes any limitation.
  • the above-mentioned serial numbers of the embodiments of the present application are only for description, and do not represent the advantages or disadvantages of the embodiments.
  • the disclosed apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the modules is only a logical function division.
  • there may be other division methods for example, multiple modules or components may be combined, or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or modules may be electrical, mechanical or other forms. of.
  • modules described above as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules; they may be located in one place or distributed to multiple network units; Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional module in each embodiment of the present application may all be integrated in one processing unit, or each module may be separately used as a unit, or two or more modules may be integrated in one unit; the above integration
  • the module can be implemented in the form of hardware, or it can be implemented in the form of hardware plus software functional units.
  • the aforementioned program can be stored in a computer-readable storage medium, and when the program is executed, the execution includes: The steps of the above method embodiments; and the aforementioned storage medium includes: a removable storage device, a read only memory (Read Only Memory, ROM), a magnetic disk or an optical disk and other media that can store program codes.
  • ROM Read Only Memory
  • the above-mentioned integrated units of the present application are implemented in the form of software function modules and sold or used as independent products, they may also be stored in a computer-readable storage medium.
  • the technical solutions of the embodiments of the present application may be embodied in the form of software products in essence or the parts that make contributions to related technologies.
  • the computer software products are stored in a storage medium and include several instructions to make
  • the electronic device executes all or part of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes various media that can store program codes, such as a removable storage device, a ROM, a magnetic disk, or an optical disk.
  • the methods disclosed in the several method embodiments provided in this application can be arbitrarily combined under the condition of no conflict to obtain new method embodiments.
  • the features disclosed in the several product embodiments provided in this application can be combined arbitrarily without conflict to obtain a new product embodiment.
  • the features disclosed in several method or device embodiments provided in this application can be combined arbitrarily without conflict to obtain new method embodiments or device embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Signal Processing (AREA)
  • Geometry (AREA)
  • Image Generation (AREA)

Abstract

本申请实施例公开了虚拟视点绘制、渲染、解码方法及装置、设备、存储介质;其中,所述虚拟视点绘制方法包括:根据输入视点的重构深度图,生成目标视点的初始可见图;对所述初始可见图进行质量提升处理,得到所述目标视点的目标可见图;对所述目标视点的目标可见图进行着色,得到所述目标视点的目标纹理图。

Description

虚拟视点绘制、渲染、解码方法及装置、设备、存储介质 技术领域
本申请实施例涉及计算机视觉技术,涉及但不限于虚拟视点绘制、渲染、解码方法及装置、设备、存储介质。
背景技术
大多数用户喜欢观看沉浸式视频内容(诸如虚拟现实内容、三维内容、180度内容或360度内容),该沉浸式视频内容能够为观看者提供沉浸式体验。此外,这些用户可能喜欢观看沉浸式格式的计算机生成的内容,诸如游戏视频或动画。
然而,在编码端,由于深度图(Depth Map)的部分像素的深度值存在一定的错误,且采用较大的量化参数对深度图进行压缩编码,所以导致压缩失真非常严重;这样,在解码端,解码恢复的深度图的质量会大幅下降,从而导致生成的目标视点(Target Viewport)的深度图中出现明显噪点,深度图边缘与实际纹理边缘不完全吻合。反映到纹理图上的一种表现是前景和背景的交界处存在过渡区,前景边缘不够陡峭。
发明内容
有鉴于此,本申请实施例提供的虚拟视点绘制、渲染、解码方法及装置、设备、存储介质,能够使得目标视点的目标纹理图的噪点和过渡区明显减少;本申请实施例提供的虚拟视点绘制、渲染、解码方法及装置、设备、存储介质,是这样实现的:
本申请实施例提供的一种虚拟视点绘制方法,包括:根据输入视点(Source views)的重构深度图(Reconstructed depth maps),生成目标视点的初始可见图(Visibility Map);对所述初始可见图进行质量提升处理,得到所述目标视点的目标可见图;对所述目标视点的目标可见图进行着色(Shading),得到所述目标视点的目标纹理图。
本申请实施例提供的一种渲染(Rendering)方法,所述方法包括:对输入视点的深度图的图集进行剪切视图恢复(Pruned View Reconstruction),得到所述输入视点的重构深度图;对所述输入视点的重构深度图执行本申请实施例所述的虚拟视点绘制方法中的步骤,得到所述目标视点的目标纹理图;根据所述目标视点的目标纹理图,生成所述目标视点的目标视图(Viewport)。
本申请实施例提供的一种解码方法,所述方法包括:对输入的码流进行解码,得到输入视点的深度图的图集(atlas);对所述输入视点的深度图的图集进行剪切视图恢复,得到所述输入视点的重构深度图;对所述输入视点的重构深度图执行本申请实施例所述的虚拟视点绘制方法中的步骤,得到所述目标视点的目标纹理图;根据所述目标视点的目标纹理图,生成所述目标视点的目标视图。
本申请实施例提供的一种虚拟视点绘制装置,包括:可见图生成模块,用于根据所述输入视点的重构深度图,生成目标视点的初始可见图;可见图优化模块,用于对所述初始可见图进行质量提升处理,得到所述目标视点的目标可见图;着色模块,用于对所述目标视点的目标可见图进行着色,得到所述目标视点的目标纹理图。
本申请实施例提供的一种渲染装置,包括:剪切视图恢复模块,用于对输入视点的深度图的图集进行剪切视图恢复,得到所述输入视点的重构深度图;虚拟视点绘制模块,用于对所述输入视点的重构深度图执行本申请实施例所述的虚拟视点绘制方法中的步骤,得到所述目标视点的目标纹理图;目标视图生成模块,用于根据所述目标视点的目标纹理图,生成所述目标视点的目标视图。
本申请实施例提供的一种解码装置,包括:解码模块,用于对输入的码流进行解码,得到输入视点的深度图的图集;剪切视图恢复模块,用于对所述输入视点的深度图的图集进行剪切视图恢复,得到所述输入视点的重构深度图;虚拟视点绘制模块,用于对所述输入视点的重构深度图执行本申请实施例所述的虚拟视点绘制方法中的步骤,得到所述目标视点的目标纹理图;目标视图生成模块,用于根据所述目标视点的目标纹理图,生成所述目标视点的目标视图。
本申请实施例提供的一种视点加权合成器(View Weighting Synthesizer,VWS),用于实现本申请实施例所述的虚拟视点绘制方法。
本申请实施例提供的一种渲染设备,用于实现本申请实施例所述的渲染方法。
本申请实施例提供的一种解码器,用于实现本申请实施例所述的解码方法。
本申请实施例提供的一种电子设备,包括存储器和处理器,所述存储器存储有可在处理器上运 行的计算机程序,所述处理器执行所述程序时实现本申请实施例所述的任一方法。
本申请实施例提供的一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现本申请实施例所述的任一方法。
在本申请实施例中,根据输入视点的重构深度图,生成目标视点的初始可见图;此时不是直接将目标视点的初始可见图生成目标视点的目标纹理图,而是先对该初始可见图进行质量提升处理,对该处理得到的目标可见图进行着色,得到目标纹理图;如此,一方面使得该目标纹理图中的噪点和/或过渡区明显减少;另一方面,在确保目标纹理图的图像质量的基础上,使得编码端可以使用较大的量化参数对深度图进行压缩编码,从而降低深度图的编码开销,进而提高整体的编码效率。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本申请的实施例,并与说明书一起用于说明本申请的技术方案。
图1为本申请实施例可能适用的一种系统架构示意图;
图2为VWS的结构示意图;
图3为未被剪切的像素的权值的计算流程示意图;
图4为采用深度估计得到的深度图和VWS生成的深度图的对比示意图;
图5为VWS生成的深度图和纹理图的边缘的对比示意图;
图6为本申请实施例虚拟视点绘制方法的实现流程示意图;
图7为本申请实施例虚拟视点绘制方法的实现流程示意图;
图8为初始可见图的示意图;
图9为本申请实施例虚拟视点绘制方法的实现流程示意图;
图10为本申请实施例虚拟视点绘制方法的实现流程示意图;
图11为本申请实施例虚拟视点绘制方法的实现流程示意图;
图12为本申请实施例采用超像素分割对视点生成中深度图优化技术的流程示意图;
图13为本申请实施例引入深度图优化技术后的系统架构示意图;
图14为使用击剑(Fencing)场景的测试序列生成的两种深度图的对比示意图;
图15为使用青蛙(Frog)场景的测试序列生成的两种深度图的对比示意图;
图16为使用Fencing测试序列生成的两种纹理图的对比示意图;
图17为使用Frog测试序列生成的两种纹理图的对比示意图;
图18为使用停车场(Carpark)场景的测试序列生成的两种纹理图的对比示意图;
图19为使用街道(Street)场景的测试序列生成的两种纹理图的对比示意图;
图20为使用画家(Painter)场景的测试序列生成的两种纹理图的对比示意图;
图21为本申请实施例虚拟视点绘制装置的结构示意图;
图22为本申请实施例渲染装置的结构示意图;
图23为本申请实施例解码装置的结构示意图;
图24为本申请实施例的电子设备的硬件实体示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更清楚,下面将结合本申请实施例中的附图,对本申请的具体技术方案做进一步详细描述。以下实施例用于说明本申请,但不用来限制本申请的范围。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。
需要指出,本申请实施例所涉及的术语“第一\第二\第三”是为了区别类似或不同的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。
本申请实施例描述的系统架构以及业务场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定。本领域普通技术人员可知,随着系统架构的演变和新业务场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
图1示出了本申请实施例可能适用的一种系统架构,即动态图像专家组(Moving Picture Experts Group,MPEG)在3自由度+(3 degrees of freedom+,3DoF+)沉浸式视频测试模型(Test Model of Immersive Video,TMIV)的解码端的系统架构10,如图1所示,该系统架构10包括:解码存取单 元(Decoded access unit)11和渲染单元(Rendering unit)12;其中,解码存取单元11中包含解码后得到的各类元数据和图集(atlas)信息。之后这些信息将会被传输到渲染单元12中进行虚拟视点绘制。标注有可选(opt.)字样的子单元表示可选子单元,由于这些子单元在本申请实施例的技术方案中暂不涉及,因此在此不做描述。渲染单元12的图块剔除(Patch culling)子单元121根据用户目标视点参数(viewport parameters)对图集信息中的图块(patches)进行筛选,剔除与用户目标视图没有重叠的图块,从而降低虚拟视点绘制时的计算量。渲染单元12的占用图恢复(Occupancy reconstruction)子单元122根据解码存取单元11传输来的信息找出各个图块(patch)在视图中的位置,然后将筛选出来的图块(patches)贴入相应的位置完成剪切视图恢复(Pruned view reconstruction)。视图生成(View synthesis)子单元123利用上述重构的剪切视图进行虚拟视点绘制,也就是目标视点的绘制。由于生成的虚拟视点存在一定的空洞区域,所以需要填充(Inpainting)子单元124对空洞进行填充。最后,视图空间处理(Viewing space handling)子单元125可使视图平滑淡出为黑色。
视点加权合成器VWS,是MPEG在3DoF+TMIV中用到的虚拟视点绘制工具。VWS用在解码端的渲染器中,具体应用在剪切视图恢复(Pruned view reconstruction)子单元126之后的视点合成(view synthesis)环节。
如图2所示,相关技术中,VWS主要包括三个模块:权值计算模块201、可见图生成模块202和着色模块203。可见图生成模块202旨在生成目标视点下的可见图,着色模块203旨在对生成的目标视点下的可见图进行着色,得到目标视点下的纹理图。由于可见图生成模块202和着色模块203依赖于输入视点相对于目标视点的权值,所以权值计算模块201旨在根据输入视点和目标视点的关系进行输入视点的权值计算。
1)关于权值计算模块201的相关内容,说明如下:
权值计算模块201根据输入视点的元数据信息和目标视点的元数据信息计算输入视点的权值。输入视点的权值是该视点与目标视点之间距离的函数。在可见图计算和着色的过程中,相关像素对结果的贡献是其对应视点贡献的加权。在处理剪切视图时,由于其内容不完整,所以对剪切视图的权值的计算需要考虑被剪切的图像区域。权值计算是像素级的操作,对未被剪切的像素计算相应的权值。像素的权值在视点生成时进行更新。如图3所示,未被剪切的像素的权值按照下面的步骤计算得出:
对于一个与剪切图中节点N关联的视图中未被剪切的像素p,其初始权值w P=w N。需要说明的是,初始权值是该像素p所属视点的权值,该值取决于像素p所属视点与目标视点之间的距离;接着采用下面a至c所述的过程对像素p的权值进行更新:
a.如果将像素p向子节点视图中进行重投影且p点重投影后对应到子节点视图中的剪切像素,那么像素p的权值就在原来的基础上加上该子节点视图的权值w o,即w p=w p+w o;需要说明的是,子节点视图的权值仅仅取决于它所在视点和目标视点的距离。然后继续对其孙节点执行上述操作。
b.如果像素p重投影后没有对应到其子节点视图上,那么对其孙节点递归执行上述操作。
c.如果像素p重投影后对应其子节点视图中未剪切的像素,则像素p的权值不变,且不再对其孙节点执行上述操作。
2)关于可见图计算模块202的相关内容,说明如下:
计算可见图旨在根据重新恢复的输入视点的深度图(即重构深度图)获得目标视点下的可见图。整个过程分为三个步骤:扭曲(Warping)、筛选(Selection)和滤波(Filtering)。
扭曲步骤中将输入视点的深度图中的像素向目标视点上进行重投影,生成扭曲后的深度图。对输入视点执行这样的操作,得到若干幅目标视点下的扭曲深度图。
筛选步骤对生成的若干幅扭曲深度图进行合并,生成一幅比较完整的目标视点下的深度图,即可见图。筛选步骤根据每个输入视点的权值,采用基于像素级的多数投票原则进行。其中,多数投票原则是指投影到同一个像素位置的深度值可能会有多个,该像素深度值选择为投影最多的那个。
最后,对生成的可见图执行滤波步骤,使用中值滤波器加以滤波,去除离群点。
3)关于着色模块203的相关内容,说明如下:
这个步骤旨在生成目标视点下的纹理图。目标视点下的纹理图的生成需要使用滤波后的可见图和重新恢复的输入视点的纹理图。在这个过程中,需要考虑输入视点中的像素在可见图中的连续性和所属视点的权值。为了提高生成的纹理内容的视觉质量,对生成的纹理图使用双线性滤波进行处理。此外,为了避免混叠(aliasing)现象,对于检测到的来自输入视点纹理图中物体边缘的像素点,需要剔除。
受限于深度采集技术尚不成熟,且设备价格昂贵,相关方案更多的是采用先纹理采集后深度估 计的方法得到深度图。然而,发明人在研究的过程中发现以下问题:
采用深度估计方法计算出的深度值会存在一定的错误,从而导致估计出的深度图中存在噪点。而采用这样的深度图进行虚拟视点绘制势必会导致生成的目标视点深度图中也存在一定的噪点。举例来说,如图4所示,左图401为采用深度估计得到的深度图,右图402为采用左图401进行虚拟视点绘制得到的深度图,即VWS生成的深度图。从图中可以看出,右图402中的噪点较多。
在编码端对深度图压缩之前,通常需要对该深度图进行下采样以降低分辨率。通常可以使用视频编码标准对深度图进行压缩。经过压缩编码后,深度图会产生一定的压缩失真。尤其当采用较大量化参数(Quantization Parameter,QP)对深度图进行压缩时,压缩失真会更严重。发明人基于上述分析,在研究过程中发现:在解码端,解码恢复的深度图质量会大幅下降,从而导致生成的目标视点的深度图中出现明显噪点,以及深度图边缘与实际纹理边缘不完全吻合。反映到纹理图上的一种表现是前景和背景的交界处存在过渡带,前景边缘不够陡峭。
举例来说,如图5所示,左图501是采用量化参数QP=7对深度图进行压缩后的效果,右图502是采用量化参数QP=42对深度图进行压缩后的效果。从图中可以看出,右图502的白矩形框内的图像5021的噪点较多,反映到纹理图上,即图像区域5022中前景与背景的交界处存在较大的过渡带。
由于深度图的压缩失真会导致VWS生成的深度图和纹理图的质量下降。因此,想要生成高质量的深度图和纹理图,需要采用尽可能小的QP对深度图进行压缩。这也就限制了深度图的压缩程度,从而导致了深度图编码开销的增加,降低了编码效率,客观上降低了对“多视点视频和多视点深度”的整体压缩编码效率。
有鉴于此,本申请实施例提供一种虚拟视点绘制方法,所述方法可以应用于任何具有数据处理能力的电子设备,所述电子设备可以是电视机、投影仪、手机、个人计算机、平板电脑、虚拟现实(Virtual Reality,VR)头戴设备等任何具有视频编解码功能或者仅有解码功能的设备。所述虚拟视点绘制方法所实现的功能可以通过所述电子设备中的处理器调用程序代码来实现,当然程序代码可以保存在计算机存储介质中。可见,所述电子设备至少包括处理器和存储介质。
图6为本申请实施例虚拟视点绘制方法的实现流程示意图,如图6所示,所述方法可以包括以下步骤601至步骤603:
步骤601,根据输入视点的重构深度图,生成目标视点的初始可见图。
可以理解地,在1个以上的输入视点的重构深度图的情况下,电子设备可以基于这些输入视点的重构深度图,生成目标视点的初始可见图。在一些实施例中,电子设备可以通过图2所示的可见图计算模块202得到初始可见图。
需要说明的是,可见图和深度图的含义相同,均表示场景距离相机位置的远近关系。而可见图与深度图不同的是,可见图中距离相机位置越近,像素值越小。
步骤602,对所述初始可见图进行质量提升处理,得到所述目标视点的目标可见图。
质量提升处理的目的是削减初始可见图中的噪点和/或过渡区。在一些实施例中,电子设备可以对所述初始可见图进行去噪和/或边缘增强处理,以实现初始可见图的质量提升,从而得到所述目标视点的目标可见图。
可以理解地,所谓过渡区,是指图像中交界处存在的过渡带区域,该区域的存在,导致对图像的后续分析和理解易产生偏差,即最终得到的目标视图中的交界处过渡不自然。
电子设备对初始可见图进行去噪和/或边缘增强处理的方式可以是多种多样的。比如,电子设备对初始可见图进行滤波处理;再如,电子设备对初始可见图中存在的噪点和过渡区处的像素值做替换处理。
步骤603,对所述目标视点的目标可见图进行着色,得到所述目标视点的目标纹理图。
在本申请实施例中,电子设备根据输入视点的重构深度图,生成目标视点的初始可见图;此时不是直接将目标视点的初始可见图生成目标视点的目标纹理图,而是先对该初始可见图进行质量提升处理,对该处理得到的目标可见图进行着色,得到目标纹理图;如此,一方面使该目标纹理图中的噪点和/或过渡区明显减少;一方面,在确保目标纹理图的图像质量的基础上,使得编码端可以使用较大的量化参数对深度图进行压缩编码,从而降低深度图的编码开销,进而提高整体的编码效率。
本申请实施例再提供一种虚拟视点绘制方法,图7为本申请实施例虚拟视点绘制方法的实现流程示意图,如图7所示,所述方法可以包括以下步骤701至步骤705:
步骤701,根据输入视点的重构深度图,生成目标视点的初始可见图;
在一些实施例中,电子设备可以对输入的码流进行解码,得到输入视点的深度图的图集;然后,对所述输入视点的深度图的图集进行剪切视图恢复,得到所述输入视点的重构深度图。
在本申请实施例中,生成初始可见图所依据的输入视点的数目不做限定。电子设备可以根据一个或多个输入视点的重构深度图,生成目标视点的初始可见图。
步骤702,获得所述目标视点的初始纹理图;
实际上,电子设备在对输入的码流进行解码,不仅得到了输入视点的深度图的图集,还得到了所述输入视点的纹理图的图集;基于此,在一些实施例中,电子设备可以这样获得目标视点的初始纹理图:对所述输入视点的纹理图的图集进行剪切视图恢复,得到所述输入视点的重构纹理图;根据所述输入视点的重构纹理图,对所述目标视点的初始可见图进行着色,从而得到所述目标视点的初始纹理图。
步骤703,对所述目标视点的初始纹理图进行分割,得到分割区域。
可以理解地,之所以是对初始纹理图进行分割,而不是直接对初始可见图进行分割,是因为:如果直接对初始可见图进行分割,比如图8所示,它会非常好地沿着初始可见图801中的边缘来分割,这样如果边缘上有一些噪点,这种分割方法无法把这些噪点分割出来。而基于初始纹理图的分割,相比于基于初始可见图的分割,前者能够对边缘处(也就是交界处)产生较好的划分,这样利用初始纹理图的更为准确的分割结果能够更好地指导初始可见图的质量提升处理,会非常利于对边缘处进行一些锐化,如此使质量提升和着色后得到的目标纹理图的边缘处的噪点和过渡区明显减少。
在一些实施例中,可以对所述初始纹理图进行超像素分割,得到分割区域。需要说明的是,所使用的超像素分割算法可以是多种多样的,在本申请实施例中对此不做限制。例如,所述超像素分割算法可以是简单的线性迭代聚类(Simple Linear Iterative Cluster,SLIC)超像素分割算法、通过能量驱动采样的超像素提取(Superpixels Extracted via Energy-Driven Sampling,SEEDS)算法、轮廓松弛超像素(Contour-Relaxed Superpixels,CRS)算法、ETPS或熵率超像素分割(Entropy Rate Superpixels Segmentation,ERS)算法等。
由于相比其他的超像素分割算法,SLIC超像素分割算法在运行速度、生成超像素的紧凑度以及轮廓保持方面都比较理想。因此,在一些实施例中,电子设备采用SLIC超像素分割算法对初始纹理图进行超像素分割,能够在不显著增加处理时间的情况下,对目标纹理图的质量都实现一定程度的提升,从而使最终得到的目标纹理图和相应得到的目标视图的客观质量和主观效果均得到明显提升。
步骤704,对所述分割区域在所述初始可见图上的对应区域进行去噪和/或边缘增强处理,得到所述目标视点的目标可见图。
在一些实施例中,电子设备可以将所述初始纹理图的分割区域作为所述初始可见图的分割区域,对所述初始可见图的分割区域的像素进行分类,以确定出所述初始可见图的分割区域中待更新的目标像素;对所述初始可见图中的所述目标像素的像素值进行更新,得到所述目标可见图。
例如,可以通过如下实施例的步骤904和步骤905实现待更新的目标像素的确定。
分类算法可以是多种多样的。例如分类算法为K均值聚类、决策树、贝叶斯、人工神经网络、支持向量机或基于关联规则的分类等。
可以理解地,初始可见图与初始纹理图中的相同位置表达的场景内容是一致的。因此,这里电子设备可以直接将初始纹理图的分割结果直接搬移至初始可见图上,将初始纹理图的分割区域作为初始可见图的分割区域。
更新目标像素的像素值的方式可以是多种多样的。例如,电子设备可以对可见图中的目标像素的像素值进行滤波,以实现对该像素值的更新。再如,电子设备还可以对可见图中的目标像素的像素值进行替换,以实现对该像素值的更新。其中,每一分割区域对应一像素替换值,该像素替换值可以为对应分割区域中的非目标像素的像素值的均值。在使用聚类算法的情况下,每一分割区域均对应一个非目标像素类的聚类中心,因此可以将该中心的像素值作为像素替换值。
步骤705,对所述目标视点的目标可见图进行着色,得到所述目标视点的目标纹理图。
该步骤可以通过图2所示的着色模块203实现。根据输入视点的重构纹理图,对所述目标视点的目标可见图进行着色,得到所述目标视点的目标纹理图。
本申请实施例再提供一种虚拟视点绘制方法,图9为本申请实施例虚拟视点绘制方法的实现流程示意图,如图9所示,所述方法可以包括以下步骤901至步骤907:
步骤901,根据输入视点的重构深度图,生成目标视点的初始可见图;
步骤902,获得所述目标视点的初始纹理图;
步骤903,对所述目标视点的初始纹理图进行分割,得到分割区域;
步骤904,对所述初始可见图的分割区域的像素的像素值进行聚类,至少得到:第一类像素的像素数量和所述第一类像素的聚类中心的像素值,以及第二类像素的像素数量和所述第二类像素的 聚类中心的像素值;
至此,每一分割区域均有对应的聚类结果。聚类算法可以是多种多样的,在本申请实施例中对此不做限定。例如,K均值聚类算法。
可以理解地,通过步骤903即可将初始纹理图分为若干个分割区域。将这些分割区域对应到初始可见图上。电子设备可以对初始可见图上的部分分割区域或全部分割区域中的像素分别进行分类。例如,在一些实施例中,电子设备可以利用分类算法(例如K均值聚类算法)将初始可见图中的每一分割区域中的像素划分为两类:属于背景区域的非目标像素和属于噪点区域(或过渡区)的目标像素。
步骤905,至少根据所述分割区域的所述第一类像素的像素数量与所述第二类像素的像素数量的关系、以及所述第一类像素的聚类中心的像素值与所述第二类像素的聚类中心的像素值的关系中之一,确定对应分割区域中待更新的目标像素。
在一些实施例中,在所述第一类像素的聚类中心的像素值减去所述第二类像素的聚类中心的像素值的第一运算结果大于或等于第一阈值,且所述第一类像素的像素数量除以所述第二类像素的像素数量的第二运算结果大于或等于第二阈值的情况下,确定所述第二类像素为对应分割区域的待更新的目标像素;相应地,在该情况下,第一类像素则为非目标像素。
在所述第二类像素的聚类中心的像素值减去所述第一类像素的聚类中心的像素值的第三运算结果大于或等于所述第一阈值,且所述第二类像素的像素数量除以所述第一类像素的像素数量的第四运算结果大于或等于所述第二阈值的情况下,确定所述第一类像素为对应分割区域的待更新的目标像素。相应地,在该情况下,第二类像素则为非目标像素。
在所述第一运算结果小于所述第一阈值或所述第二运算结果小于所述第二阈值、且所述第三运算结果小于所述第一阈值或所述第四运算结果小于所述第二阈值的情况下,确定所述第一类像素和所述第二类像素均为对应分割区域的非目标像素。
步骤906,对所述初始可见图中的所述目标像素的像素值进行更新,得到所述目标可见图。
更新的方式可以是多种多样的。例如,将分割区域的非目标像素的聚类中心的像素值替换该区域的目标像素的像素值。
步骤907,对所述目标视点的目标可见图进行着色,得到所述目标视点的目标纹理图。
本申请实施例再提供一种虚拟视点绘制方法,图10为本申请实施例虚拟视点绘制方法的实现流程示意图,如图10所示,该方法可以包括以下步骤1001至步骤1007:
步骤1001,根据输入视点的重构深度图,生成目标视点的初始可见图;
步骤1002,获得所述目标视点的初始纹理图;
步骤1003,对所述目标视点的初始纹理图进行分割,得到分割区域;
步骤1004,将所述初始可见图中的像素的像素值映射到特定区间内,得到第一可见图;
对于特定区间不做限制。例如,特定区间可以为[0,255]。当然,在实际应用中,工程人员还可以根据实际需要配置其他特定区间。
步骤1005,将所述初始纹理图的分割区域作为所述第一可见图的分割区域,对所述第一可见图的分割区域的像素进行聚类,至少得到:所述第一类像素的像素数量和所述第一类像素的聚类中心的像素值,以及所述第二类像素的像素数量和所述第二类像素的聚类中心的像素值;
步骤1006,至少根据所述第一类像素的像素数量与所述第二类像素的像素数量的关系、以及所述第一类像素的聚类中心的像素值与所述第二类像素的聚类中心的像素值的关系中之一,确定所述第一可见图的分割区域中待更新的目标像素;
在一些实施例中,在所述第一类像素的聚类中心的像素值减去所述第二类像素的聚类中心的像素值的第一运算结果大于或等于第一阈值,且所述第一类像素的像素数量除以所述第二类像素的像素数量的第二运算结果大于或等于第二阈值的情况下,确定所述第二类像素为对应分割区域的待更新的目标像素;相应地,在该情况下,第一类像素则为非目标像素。
在所述第二类像素的聚类中心的像素值减去所述第一类像素的聚类中心的像素值的第三运算结果大于或等于所述第一阈值,且所述第二类像素的像素数量除以所述第一类像素的像素数量的第四运算结果大于或等于所述第二阈值的情况下,确定所述第一类像素为对应分割区域的待更新的目标像素。相应地,在该情况下,第二类像素则为非目标像素。
在所述第一运算结果小于所述第一阈值或所述第二运算结果小于所述第二阈值、且所述第三运算结果小于所述第一阈值或所述第四运算结果小于所述第二阈值的情况下,确定所述第一类像素和所述第二类像素均为对应分割区域的非目标像素。
简单来说,第一类像素的聚类中心用cen 1表示,第二类像素的聚类中心用cen 2表示,第一类像素的像素数量用num 1表示,第二类像素的像素数量用num 2表示,那么可以这样确定目标像素和非目标像素:
a)如果cen 1-cen 2≥第一阈值,且num 1/num 2≥第二阈值,那么认为第一类像素为非目标像素,第二类像素为目标像素;
b)如果cen 2-cen 1≥第一阈值,且num 2/num 1≥第二阈值,那么认为第一类像素为目标像素,第二类像素为非目标像素;
c)除上述a)和b)这两种情况,其它情况下第一类像素和第二类像素均为非目标像素,对其对应的分割区域中的像素值不做处理。
在一些实施例中,所述第一阈值的取值范围是[25,33],所述第二阈值的取值范围是[5,10]。例如,所述第一阈值为30,所述第二阈值为7。
步骤1007,对所述第一可见图中的待更新的目标像素的像素值进行更新,得到第二可见图。
在一些实施例中,所述对所述第一可见图的分割区域的像素进行聚类,还确定出所述分割区域的非目标像素;可以这样实现步骤1007:根据所述第一可见图的分割区域的非目标像素的像素值,确定所述分割区域的像素替换值;将所述第一可见图的分割区域的目标像素的像素值,更新为对应分割区域的像素替换值,从而得到第二可见图。
例如,可以将所述第一可见图的分割区域中非目标像素的聚类中心的像素值,确定为所述分割区域的像素替换值。
步骤1008,根据所述初始可见图与所述第一可见图之间的映射关系,将所述第二可见图中的像素的像素值进行反向映射,得到所述目标可见图。
可以理解地,通过步骤1002至步骤1008实现对初始可见图的质量提升处理,即在确定初始可见图的目标像素之前,先将该图的像素的像素值映射到特定区间,再对映射结果(即第一可见图)中的像素的像素值进行分类,并将分类确定出的目标像素的像素值更新,得到第二可见图,最后再将第二可见图反向映射为目标可见图;如此,能够使得所述虚拟视点绘制方法具有一定的泛化能力,从而适应各种不同的场景图像的处理。
步骤1009,对所述目标视点的目标可见图进行着色,得到所述目标视点的目标纹理图。
本申请实施例再提供一种虚拟视点绘制方法,图11为本申请实施例虚拟视点绘制方法的实现流程示意图,如图11所示,所述方法可以包括以下步骤111至步骤1112:
步骤111,对输入的码流进行解码,得到输入视点的深度图的图集;
步骤112,对所述输入视点的深度图的图集进行剪切视图恢复,得到所述输入视点的重构深度图;
步骤113,根据所述输入视点的重构深度图,生成目标视点的初始可见图;
步骤114,获得所述目标视点的初始纹理图;
步骤115,对所述目标视点的初始纹理图进行超像素分割,得到分割结果;
步骤116,将所述目标视点的初始可见图中的像素的像素值映射到特定区间内,得到第一可见图;
步骤117,将所述分割结果作为所述第一可见图的超像素分割结果,对所述第一可见图的超像素中的像素的像素值进行聚类,得到聚类结果;聚类结果包括:第一类像素的像素数量和所述第一类像素的聚类中心的像素值,以及第二类像素的像素数量和所述第二类像素的聚类中心的像素值;
在一些实施例中,电子设备可以采用K均值聚类算法分别对第一可见图中的每一超像素中的像素的像素值进行分类。
步骤118,根据所述第一类像素的像素数量与所述第二类像素的像素数量的关系、以及所述第一类像素的聚类中心的像素值与所述第二类像素的聚类中心的像素值的关系,确定对应超像素中属于噪点或过渡区的目标像素,以及确定所述对应超像素中不属于噪点或过渡区的非目标像素。
可以理解地,每一超像素均对应一聚类结果,所以这里所述的对应超像素是指依据的聚类结果所对应的超像素,即依据的聚类结果为该对应超像素的聚类结果。
在一些实施例中,在所述第一类像素的聚类中心的像素值减去所述第二类像素的聚类中心的像素值的第一运算结果大于或等于第一阈值,且所述第一类像素的像素数量除以所述第二类像素的像素数量的第二运算结果大于或等于第二阈值的情况下,确定所述第二类像素为对应超像素的待更新的目标像素;相应地,在该情况下,第一类像素则为非目标像素。
在一些实施例中,在所述第二类像素的聚类中心的像素值减去所述第一类像素的聚类中心的像素值的第三运算结果大于或等于所述第一阈值,且所述第二类像素的像素数量除以所述第一类像素的像素数量的第四运算结果大于或等于所述第二阈值的情况下,确定所述第一类像素为对应超像素的待更新的目标像素。相应地,在该情况下,第二类像素则为非目标像素。
在一些实施例中,在所述第一运算结果小于所述第一阈值或所述第二运算结果小于所述第二阈值、且所述第三运算结果小于所述第一阈值或所述第四运算结果小于所述第二阈值的情况下,确定所述第一类像素和所述第二类像素均为对应分割区域的非目标像素。
在一些实施例中,所述第一阈值的取值范围是[25,33],所述第二阈值的取值范围是[5,10]。例如,所述第一阈值为30,所述第二阈值为7。
步骤119,根据所述第一可见图中的超像素的非目标像素的像素值,确定像素替换值。
在一些实施例中,可以将超像素中的非目标像素的均值作为该超像素中的目标像素的像素替换值;例如,将该超像素的非目标像素类的聚类中心的像素值作为像素替换值。
步骤1110,将所述第一可见图中的超像素中的目标像素的像素值,更新为所述超像素对应的像素替换值,从而得到第二可见图。
在一些实施例中,将所述第一可见图中的超像素中非目标像素的聚类中心的像素值,确定为所述像素替换值。
在相关技术方案中,往往采用滤波的方法来实现对可见图中的噪点和过渡区的质量提升处理,如此希望把这种影响给分散掉。然而,这样就会改变噪点和过渡区周围的像素(也就是非目标像素)的正确像素值,使得最终得到的目标视图的客观质量和主观效果稍差;
在本申请实施例中,则是将这些噪点和过渡区(也就是目标像素)的像素值用一近似正确的值(即像素替换值)替换,如此使得这些目标像素周围的非目标像素的像素值不会被改变,相比于滤波方法,该方法能够使替换像素值后的目标像素与周围区域融合的更自然,从而使得最终得到的目标视图的客观质量和主观效果更好。
步骤1111,根据所述初始可见图与所述第一可见图之间的映射关系,将所述第二可见图中的像素的像素值进行反向映射,得到所述目标可见图。
步骤1112,根据所述目标视点的目标可见图,生成所述目标视点的目标视图。
本申请实施例提供一种渲染方法,该数据方法不仅可以应用于电子设备,还可以应用于渲染设备,所述方法可以包括:对输入视点的深度图的图集进行剪切视图恢复,得到所述输入视点的重构深度图;对所述输入视点的重构深度图执行本申请实施例所述的虚拟视点绘制方法中的步骤,得到所述目标视点的目标纹理图;根据所述目标视点的目标纹理图,生成所述目标视点的目标视图。
在一些实施例中,所述获得所述目标视点的初始纹理图,包括:对所述输入视点的纹理图的图集进行剪切视图恢复,得到所述输入视点的重构纹理图;根据所述输入视点的重构纹理图,对所述目标视点的初始可见图进行着色,得到所述目标视点的初始纹理图。
在一些实施例中,所述输入视点的纹理图的图集是电子设备对所述码流进行解码得到的;
在一些实施例中,可以这样实现步骤113:对所述目标视点的目标可见图进行着色,得到所述目标视点的目标纹理图;对所述目标纹理图进行空洞填充,得到初始视图;对所述初始视图进行视图空间处理,得到所述目标视图。
以渲染方法实施例的描述,与上述其他方法实施例的描述是类似的,具有同上述其他方法实施例相似的有益效果。对于渲染方法实施例中未披露的技术细节,请参照上述其他方法实施例的描述而理解。
本申请实施例提供一种解码方法,所述方法包括:对输入的码流进行解码,得到输入视点的深度图的图集;对所述输入视点的深度图的图集进行剪切视图恢复,得到所述输入视点的重构深度图;对所述输入视点的重构深度图执行本申请实施例所述的虚拟视点绘制方法中的步骤,得到所述目标视点的目标纹理图;根据所述目标视点的目标纹理图,生成所述目标视点的目标视图。
在一些实施例中,对所述码流进行解码,还得到所述输入视点的纹理图的图集;所述获得所述目标视点的初始纹理图,包括:对所述输入视点的纹理图的图集进行剪切视图恢复,得到所述输入视点的重构纹理图;根据所述输入视点的重构纹理图,对所述目标视点的初始可见图进行着色,得到所述目标视点的初始纹理图。
需要说明的是,对于得到多少个输入视点的深度图的图集和纹理图的图集不做限制。电子设备解码得到1个或1个以上的输入视点的深度图的图集和纹理图的图集。
在本申请实施例中,在对接收的码流进行解码得到输入视点的深度图的图集之后,对输入视点 的深度图的图集进行剪切视图恢复,得到输入视点的重构深度图;根据输入视点的重构深度图,生成目标视点的初始可见图;此时不是直接将目标视点的初始可见图生成目标视点的目标视图,而是先对该初始可见图进行质量提升处理,基于质量提升处理得到目标可见图生成目标视图;如此,一方面使得最终得到的目标视图中的噪点和/或过渡区明显减少;另一方面,在确保目标视图的图像质量的基础上,使得编码端可以使用较大的量化参数对深度图进行压缩编码,从而降低深度图的编码开销,进而提高整体的编码效率。
以上对解码方法实施例的描述,与上述其他方法实施例的描述是类似的,具有同上述其他方法实施例相似的有益效果。对于所述解码方法实施例中未披露的技术细节,请参照上述其他方法实施例的描述而理解。
下面将说明本申请实施例在一个实际的应用场景中的示例性应用。
在本申请实施例中,提供一种采用超像素分割对视点生成中的深度图进行优化的技术方案。该技术方案是在VWS的基础上进行的改进,旨在对VWS的可见图生成步骤中得到的目标视点下的可见图进行优化(可见图和深度图的含义相同,均表示场景距离相机位置的远近关系。与深度图不同的是,可见图中距离相机位置越近,像素值越小)。
在本申请实施例中,采用超像素分割算法对VWS生成的初始纹理图进行分割,将分割得到的结果应用到VWS生成的初始可见图上。利用K均值聚类对得到的初始可见图上的超像素进行聚类,如此可以将需要处理的噪点、以及需要处理的过渡区与不需处理的区域分割开来,再对需要处理的噪点和需要处理的过渡区的像素值进行替换。
如图12所示,本申请实施例的技术方案分为三个模块:超像素分割(Superpixel Segmentation)模块121、K均值聚类(K-Means Clustering)模块122和替换(Replacement)模块123;其中,
首先,通过VWS的可见图生成步骤得到生成的可见图D(即所述初始可见图)。由于对于不同场景内容的测试序列,初始可见图中像素值取值范围有差别。在本申请实施例中,可以利用线性映射算法将可见图D中的像素值变换到[0,255]区间内,得到可见图D 2(即所述第一可见图);然后,从着色步骤中得到生成的纹理图T(即所述初始纹理图),对纹理图T采用SLIC超像素分割算法以超像素数量(numSuperpixel)为1200对纹理图T进行分割;再将从纹理图T上得到的超像素分割结果应用到可见图D 2上,得到在可见图D 2上划分出来的若干超像素S i。对每个超像素S i使用K均值聚类算法将其中的像素划分为两个类:C 1和C 2。记C 1和C 2的聚类中心分别为cen 1和cen 2,包含的像素数量分别为num 1和num 2。比较聚类中心cen 1和cen 2,以及像素数量num 1和num 2,采用下面的过程对可见图D 2进行处理:
a)如果cen 1-cen 2>30,且num 1/num 2>7,那么认为C 1为背景区域,C 2为噪点区域或过渡区,则对C 1中所有像素不进行处理,保持原值不变,对C 2中所有像素的值采用cen 1替换;其中,30即为第一阈值的一种示例,7即为第二阈值的一种示例;
b)如果cen 2-cen 1>30,且num 2/num 1>7,那么认为C 2为背景区域,C 1为噪点区域或过渡区,则对C 2中所有像素不进行处理,保持原值不变,对C 1中所有像素的值采用cen 2替换;
c)除上述两种情况,其它情况下对C 1和C 2中所有像素均不处理,保持原值不变。
经过以上处理,得到优化后的可见图D 3(即所述第二可见图)。对可见图D 3反向线性映射,缩放至原取值范围,得到可见图D 4(即所述目标可见图)。使用可见图D 4替换掉原有的可见图D,对可见图D 4再次执行着色步骤,得到优化后的纹理图T2(即所述目标纹理图)。引入深度图优化技术后的系统架构如图13所示。
本申请实施例提供的技术方案,可以在TMIV6.0上实现,并在通用测试条件(Common Test Condition)中对自然场景内容的测试序列进行了测试。实验结果显示,在VWS中引入本技术方案后,生成的目标视点的深度图中的噪点得到了大幅削减,纹理图中一些前景和背景的交界处变得更加分明清晰。由于超像素分割算法采用的是SLIC超像素分割算法,因此,在不显著增加渲染时间的情况下,本技术方案对目标视点深度图和纹理图的质量都实现了一定程度的改进。
在一些实施例中的实验配置为:超像素分割采用的是SLIC算法;超像素数量(numSuperpixel)为1200;K均值聚类算法中的K取值为2;聚类中心cen1和cen2之差的阈值选为30;两个聚类中的像素数量num1和num2比值的阈值选为7。
从方案相关可能的实施方式方面考虑,以上配置参数中的一个或多个可以不是固定值的。相关的实施方式可能包括:(1)在码流中编码本申请实施例的方法执行过程中需要使用的以上一个或多个参数值,使用的码流中数据单元包括:序列层数据单元(如SPS、PPS)、图像层数据单元(如PPS、APS、picture header、slice header等)、块层数据单元(如CTU、CU层数据单元);(2)使用隐含推 导的方法确定以上一个或多个参数值;(3)结合(1)和(2)的,以上参数值的序列、图像、块层自适应确定方法。
举例来说,基于上述实验配置,使用本申请实施例提供的技术方案之前与之后的深度图对比效果如图14所示,其中,左边的深度图141为使用本申请实施例提供的技术方案之前,使用击剑(Fencing)场景的测试序列生成的深度图,右边的深度图142为使用本申请实施例提供的技术方案之后,使用Fencing测试序列生成的深度图。从图中可以看出,右边的深度图142的噪点明显减少,尤其是矩形框中的区域,相比于左边的深度图141,噪点消失。且,使用聚类中心的像素值替换噪点之后,与周围区域融为一体,图像效果自然清晰。
又如,基于上述实验配置,使用本申请实施例提供的技术方案之前与之后的深度图对比效果如图15所示,其中,左边的深度图151为使用本申请实施例提供的技术方案之前,使用青蛙(Frog)场景的测试序列生成的深度图,右边的深度图152为使用本申请实施例提供的技术方案之后,使用Frog测试序列生成的深度图。从图中可以看出,右边的深度图152的噪点减少,尤其是矩形框中的区域,相比于左边的深度图151,噪点消失。且,使用聚类中心的像素值替换噪点之后,与周围区域融为一体,图像效果自然清晰。
再如,基于上述实验配置,使用本申请实施例提供的技术方案之前与之后的纹理图对比效果如图16所示,其中,上方的纹理图161为使用本申请实施例提供的技术方案之前,使用Fencing测试序列生成的纹理图,下方的纹理图162为使用本申请实施例提供的技术方案之后,使用Fencing测试序列生成的纹理图。从图中可以看出,下方的纹理图162图像质量更好。例如,上方的纹理图161中矩形框1611中的边缘区域存在明显的过渡带,而下方的纹理图162中矩形框1621中的边缘区域的过渡带明显被锐化了;又如,上方的纹理图161中矩形框1612中明显存在一块类似三角形的噪点块,而下方的纹理图162中矩形框1622中的噪点块消失;再如,上方的纹理图161中矩形框1613中明显存很多噪点,放大之后,即1614中的圆框区域内存在明显噪点,而下方的纹理图162中矩形框1623中的大部分噪点消失,放大之后,即1624中的圆框区域内的大部分噪点消失,边缘区域明显被锐化了;且,从图中可以看出,使用聚类中心的像素值替换噪点之后,与周围区域融为一体,图像效果自然清晰。
又如,基于上述实验配置,使用本申请实施例提供的技术方案之前与之后的纹理图对比效果如图17所示,其中,上方的纹理图171为使用本申请实施例提供的技术方案之前,使用Frog测试序列生成的纹理图,下方的纹理图172为使用本申请实施例提供的技术方案之后,使用Frog测试序列生成的纹理图。从图中可以看出,下方的纹理图172图像质量更好。例如,上方的纹理图171中矩形框1711中人手的边缘区域存在明显的过渡带,而下方的纹理图172中矩形框1721中人手的边缘区域的过渡带明显被锐化了;又如,上方的纹理图171中矩形框1712中玩偶的领子的边缘区域存在明显的过渡带,而下方的纹理图172中矩形框1722中玩偶的领子的边缘的过渡带消失了;且,从图中可以看出,使用聚类中心的像素值替换过渡带之后,与周围区域融为一体,图像效果自然清晰。
再如,基于上述实验配置,使用本申请实施例提供的技术方案之前与之后的纹理图对比效果如图18所示,其中,上方的纹理图181为使用本申请实施例提供的技术方案之前,使用停车场(Carpark)场景的测试序列生成的纹理图,下方的纹理图182为使用本申请实施例提供的技术方案之后,使用Carpark测试序列生成的纹理图。从图中可以看出,下方的纹理图182图像质量更好。例如,上方的纹理图181中矩形框1811中的区域,放大后如1812所示,其中圆框中存在明显的噪点,而下方的纹理图182中矩形框1821中的区域,放大后如1822所示,其中圆框中的大部分噪点消失,尤其是窗户边沿更清晰;且,从图中可以看出,使用聚类中心的像素值替换噪点之后,与周围区域融为一体,图像效果自然清晰。
又如,基于上述实验配置,使用本申请实施例提供的技术方案之前与之后的纹理图对比效果如图19所示,其中,上方的纹理图191为使用本申请实施例提供的技术方案之前,使用街道(Street)场景的测试序列生成的纹理图,下方的纹理图192为使用本申请实施例提供的技术方案之后,使用Street测试序列生成的纹理图。从图中可以看出,下方的纹理图192图像质量更好。例如,上方的纹理图191中矩形框1911中的区域,放大后如1912所示,其中指示牌的左上方边沿存在明显的过渡带,而下方的纹理图192中矩形框1921中的区域,放大后如1922所示,其中指示牌的左上方边沿的过渡带基本消失;又如,上方的纹理图191中矩形框1913中的区域,放大后如1914所示,其中汽车上方的弧形棍状支架边沿存在过渡带,而下方的纹理图192中矩形框1923中的区域,放大后如1924所示,其中汽车上方的弧形棍状支架的边沿变得清晰了;且,从图中可以看出,使用聚类中心的像素值替换过渡带之后,与周围区域融为一体,图像效果自然清晰。
再如,基于上述实验配置,使用本申请实施例提供的技术方案之前与之后的纹理图对比效果如图20所示,其中,上方的纹理图201为使用本申请实施例提供的技术方案之前,使用画家(Painter)场景的测试序列生成的纹理图,下方的纹理图202为使用本申请实施例提供的技术方案之后,使用Painter测试序列生成的纹理图。从图中可以看出,下方的纹理图202图像质量更好。例如,上方的纹理图201中矩形框2011中的区域,放大后如2012所示,其中人手边沿存在明显的过渡带,而下方的纹理图202中矩形框2021中的区域,放大后如2022所示,其中人手边沿的过渡带基本消失,尤其是食指和中指的边沿更加清晰。且,从图中可以看出,使用聚类中心的像素值替换过渡带之后,与周围区域融为一体,图像效果自然清晰。
从图14至图20所示的实验结果,可以看出,与VWS生成的视图相比,本申请实施例提供的技术方案有效地遏制了因深度图压缩失真对生成的视图造成的不良影响。因此,相对于VWS,在保证最终得到的纹理图的质量的情况下,本申请实施例提供的技术方案可以对深度图采用较大的QP进行压缩,降低深度图的编码开销,进而提高整体的编码效率。
在本申请实施例中,采用简单的线性迭代聚类(Simple Linear Iterative Cluster,SLIC)超像素分割算法和K均值聚类算法分离出可见图中的噪点和过渡区,对此加以处理,从而提升深度图和纹理图的客观质量和主观效果。
基于前述的实施例,本申请实施例提供的虚拟视点绘制装置,包括所包括的各模块、以及各模块所包括的各单元,可以通过电子设备中的解码器或处理器来实现;当然也可以通过具体的逻辑电路实现;在实施的过程中,处理器可以为中央处理器(CPU)、微处理器(MPU)、数字信号处理器(DSP)、现场可编程门阵列(FPGA)或图形处理器(Graphics Processing Unit,GPU)等。
图21为本申请实施例虚拟视点绘制装置的结构示意图,如图21所示,该装置21包括:
可见图生成模块211,用于根据所述输入视点的重构深度图,生成目标视点的初始可见图;
可见图优化模块212,用于对所述初始可见图进行质量提升处理,得到所述目标视点的目标可见图;
着色模块213,用于对所述目标视点的目标可见图进行着色,得到所述目标视点的目标纹理图。
在一些实施例中,可见图优化模块212,用于:对所述初始可见图进行去噪和/或边缘增强处理,得到所述目标视点的目标可见图。
在一些实施例中,可见图优化模块212,包括:获取单元,用于获得所述目标视点的初始纹理图;分割单元,用于对所述目标视点的初始纹理图进行分割,得到分割区域;增强单元,用于对所述分割区域在所述初始可见图上的对应区域进行去噪和/或边缘增强处理,得到所述目标视点的目标可见图。
在一些实施例中,所述分割单元,用于:利用SLIC超像素分割算法,对所述目标视点的初始纹理图进行超像素分割,所述分割区域为超像素。
在一些实施例中,所述增强单元,包括:分类子单元,用于将所述初始纹理图的分割区域作为所述初始可见图的分割区域,对所述初始可见图的分割区域的像素进行分类,以确定出所述初始可见图的分割区域中待更新的目标像素;更新子单元,用于对所述初始可见图中的所述目标像素的像素值进行更新,得到所述目标可见图。
在一些实施例中,所述分类子单元,用于:对所述初始可见图的分割区域的像素的像素值进行聚类,至少得到:第一类像素的像素数量和所述第一类像素的聚类中心的像素值,以及第二类像素的像素数量和所述第二类像素的聚类中心的像素值;至少根据所述第一类像素的像素数量与所述第二类像素的像素数量的关系、以及所述第一类像素的聚类中心的像素值与所述第二类像素的聚类中心的像素值的关系中之一,确定对应分割区域中待更新的目标像素。
在一些实施例中,所述分类子单元,用于:将所述初始可见图中的像素的像素值映射到特定区间内,得到第一可见图;将所述初始纹理图的分割区域作为所述第一可见图的分割区域,对所述第一可见图的分割区域的像素进行聚类,至少得到:所述第一类像素的像素数量和所述第一类像素的聚类中心的像素值,以及所述第二类像素的像素数量和所述第二类像素的聚类中心的像素值;至少根据所述第一类像素的像素数量与所述第二类像素的像素数量的关系、以及所述第一类像素的聚类中心的像素值与所述第二类像素的聚类中心的像素值的关系中之一,确定所述第一可见图的分割区域中待更新的目标像素;相应地,所述更新子单元,用于:对所述第一可见图中的待更新的目标像素的像素值进行更新,得到第二可见图;根据所述初始可见图与所述第一可见图之间的映射关系,将所述第二可见图中的像素的像素值进行反向映射,得到所述目标可见图。
在一些实施例中,所述分类子单元,还用于:对所述第一可见图的分割区域的像素进行聚类, 确定出所述分割区域的非目标像素;相应地,所述更新子单元,用于:根据所述第一可见图的分割区域的非目标像素的像素值,确定所述分割区域的像素替换值;将所述第一可见图的分割区域的目标像素的像素值,更新为对应分割区域的像素替换值,从而得到第二可见图。
在一些实施例中,所述更新子单元,用于:将所述第一可见图的分割区域中非目标像素的聚类中心的像素值,确定为所述分割区域的像素替换值。
在一些实施例中,所述分类子单元,用于:在所述第一类像素的聚类中心的像素值减去所述第二类像素的聚类中心的像素值的第一运算结果大于或等于第一阈值,且所述第一类像素的像素数量除以所述第二类像素的像素数量的第二运算结果大于或等于第二阈值的情况下,确定所述第二类像素为对应分割区域的待更新的目标像素;在所述第二类像素的聚类中心的像素值减去所述第一类像素的聚类中心的像素值的第三运算结果大于或等于所述第一阈值,且所述第二类像素的像素数量除以所述第一类像素的像素数量的第四运算结果大于或等于所述第二阈值的情况下,确定所述第一类像素为对应分割区域的待更新的目标像素。
在一些实施例中,所述分类子单元,用于:在所述第一运算结果小于所述第一阈值或所述第二运算结果小于所述第二阈值、且所述第三运算结果小于所述第一阈值或所述第四运算结果小于所述第二阈值的情况下,确定所述第一类像素和所述第二类像素均为对应分割区域的非目标像素。
在一些实施例中,所述第一阈值的取值范围是[25,33],所述第二阈值的取值范围是[5,10]。
在一些实施例中,所述第一阈值为30,所述第二阈值为7。
以上装置实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请装置实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。
本申请实施例提供一种渲染装置,图22为本申请实施例渲染装置的结构示意图,如图22所示,该装置22包括剪切视图恢复模块221、虚拟视点绘制模块222和目标视图生成模块223;其中,
剪切视图恢复模块221,用于对输入视点的深度图的图集进行剪切视图恢复,得到所述输入视点的重构深度图;
虚拟视点绘制模块222,用于对所述输入视点的重构深度图执行本申请实施例所述的虚拟视点绘制方法中的步骤,得到所述目标视点的目标纹理图;
目标视图生成模块223,用于根据所述目标视点的目标纹理图,生成所述目标视点的目标视图。
以上装置实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请装置实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。
本申请实施例提供一种解码装置,图23为本申请实施例解码装置的结构示意图,如图23所示,该装置23包括解码模块231、剪切视图恢复模块232、虚拟视点绘制模块233和目标视图生成模块234;其中,
解码模块231,用于对输入的码流进行解码,得到输入视点的深度图的图集;
剪切视图恢复模块232,用于对所述输入视点的深度图的图集进行剪切视图恢复,得到所述输入视点的重构深度图;
虚拟视点绘制模块233,用于对所述输入视点的重构深度图执行本申请实施例所述的虚拟视点绘制方法中的步骤,得到所述目标视点的目标纹理图;
目标视图生成模块234,用于根据所述目标视点的目标纹理图,生成所述目标视点的目标视图。
以上装置实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请装置实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。
需要说明的是,本申请实施例中,如果以软件功能模块的形式实现上述的虚拟视点绘制方法,并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得电子设备执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。这样,本申请实施例不限制于任何特定的硬件和软件结合。
对应地,本申请实施例提供一种电子设备,图24为本申请实施例的电子设备的硬件实体示意图,如图24所示,所述电子设备240包括存储器241和处理器242,所述存储器241存储有可在处理器242上运行的计算机程序,所述处理器242执行所述程序时实现上述实施例中提供的方法中的步骤。
需要说明的是,存储器241配置为存储由处理器242可执行的指令和应用,还可以缓存待处理器242以及电子设备240中各模块待处理或已经处理的数据(例如,图像数据、音频数据、语音通 信数据和视频通信数据),可以通过闪存(FLASH)或随机访问存储器(Random Access Memory,RAM)实现。
对应地,本申请实施例提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述实施例中提供的虚拟视点绘制方法中的步骤。
本申请实施例提供一种解码器,用于实现本申请实施例所述的解码方法。
本申请实施例提供一种渲染设备,用于实现本申请实施例所述的渲染方法。
本申请实施例提供一种视点加权合成器,用于实现本申请实施例所述的方法。
这里需要指出的是:以上电子设备、存储介质、解码器、渲染设备和视点加权合成器实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请电子设备、存储介质、解码器、渲染设备和视点加权合成器实施例中未披露的技术细节,可以参照本申请方法实施例的描述而理解。
应理解,说明书通篇中提到的“一个实施例”或“一实施例”或“一些实施例”或“另一些实施例”意味着与实施例有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”或“在一些实施例中”或“在另一些实施例中”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者设备中还存在另外的相同要素。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个模块或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或模块的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的模块可以是、或也可以不是物理上分开的,作为模块显示的部件可以是、或也可以不是物理模块;既可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部模块来实现本实施例方案的目的。
另外,在本申请各实施例中的各功能模块可以全部集成在一个处理单元中,也可以是各模块分别单独作为一个单元,也可以两个或两个以上模块集成在一个单元中;上述集成的模块既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得电子设备执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、磁碟或者光盘等各种可以存储程序代码的介质。
本申请所提供的几个方法实施例中所揭露的方法,在不冲突的情况下可以任意组合,得到新的方法实施例。本申请所提供的几个产品实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的产品实施例。本申请所提供的几个方法或设备实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的方法实施例或设备实施例。
以上所述,仅为本申请的实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (24)

  1. 一种虚拟视点绘制方法,所述方法包括:
    根据输入视点的重构深度图,生成目标视点的初始可见图;
    对所述初始可见图进行质量提升处理,得到所述目标视点的目标可见图;
    对所述目标视点的目标可见图进行着色,得到所述目标视点的目标纹理图。
  2. 根据权利要求1所述的方法,其中,所述对所述初始可见图进行质量提升处理,得到所述目标视点的目标可见图,包括:
    对所述初始可见图进行去噪和/或边缘增强处理,得到所述目标视点的目标可见图。
  3. 根据权利要求2所述的方法,其中,所述对所述初始可见图进行去噪和/或边缘增强处理,得到所述目标视点的目标可见图,包括:
    获得所述目标视点的初始纹理图;
    对所述目标视点的初始纹理图进行分割,得到分割区域;
    对所述分割区域在所述初始可见图上的对应区域进行去噪和/或边缘增强处理,得到所述目标视点的目标可见图。
  4. 根据权利要求3所述的方法,其中,所述对所述目标视点的初始纹理图进行分割,得到分割区域,包括:
    利用SLIC超像素分割算法,对所述目标视点的初始纹理图进行超像素分割,所述分割区域为超像素。
  5. 根据权利要求3所述的方法,其中,所述对所述分割区域在所述初始可见图上的对应区域进行去噪和/或边缘增强处理,得到所述目标视点的目标可见图,包括:
    将所述初始纹理图的分割区域作为所述初始可见图的分割区域,对所述初始可见图的分割区域的像素进行分类,以确定出所述初始可见图的分割区域中待更新的目标像素;
    对所述初始可见图中的所述目标像素的像素值进行更新,得到所述目标可见图。
  6. 根据权利要求5所述的方法,其中,所述对所述初始可见图的分割区域的像素进行分类,以确定出所述初始可见图的分割区域中待更新的目标像素,包括:
    对所述初始可见图的分割区域的像素的像素值进行聚类,至少得到:第一类像素的像素数量和所述第一类像素的聚类中心的像素值,以及第二类像素的像素数量和所述第二类像素的聚类中心的像素值;
    至少根据所述第一类像素的像素数量与所述第二类像素的像素数量的关系、以及所述第一类像素的聚类中心的像素值与所述第二类像素的聚类中心的像素值的关系中之一,确定对应分割区域中待更新的目标像素。
  7. 根据权利要求6所述的方法,其中,所述对所述初始可见图的分割区域的像素的像素值进行聚类,至少得到:第一类像素的像素数量和所述第一类像素的聚类中心的像素值,以及第二类像素的像素数量和所述第二类像素的聚类中心的像素值,包括:
    将所述初始可见图中的像素的像素值映射到特定区间内,得到第一可见图;
    将所述初始纹理图的分割区域作为所述第一可见图的分割区域,对所述第一可见图的分割区域的像素进行聚类,至少得到:所述第一类像素的像素数量和所述第一类像素的聚类中心的像素值,以及所述第二类像素的像素数量和所述第二类像素的聚类中心的像素值;
    相应地,所述至少根据所述第一类像素的像素数量与所述第二类像素的像素数量的关系、以及所述第一类像素的聚类中心的像素值与所述第二类像素的聚类中心的像素值的关系中之一,确定对应分割区域中待更新的目标像素,包括:
    至少根据所述第一类像素的像素数量与所述第二类像素的像素数量的关系、以及所述第一类像素的聚类中心的像素值与所述第二类像素的聚类中心的像素值的关系中之一,确定所述第一可见图的分割区域中待更新的目标像素;
    相应地,所述对所述初始可见图中的所述目标像素的像素值进行更新,得到所述目标可见图,包括:
    对所述第一可见图中的待更新的目标像素的像素值进行更新,得到第二可见图;
    根据所述初始可见图与所述第一可见图之间的映射关系,将所述第二可见图中的像素的像素值 进行反向映射,得到所述目标可见图。
  8. 根据权利要求7所述的方法,其中,所述对所述第一可见图的分割区域的像素进行聚类,还确定出所述分割区域的非目标像素;相应地,
    所述对所述第一可见图中的待更新的目标像素的像素值进行更新,得到第二可见图,包括:
    根据所述第一可见图的分割区域的非目标像素的像素值,确定所述分割区域的像素替换值;
    将所述第一可见图的分割区域的目标像素的像素值,更新为对应分割区域的像素替换值,从而得到第二可见图。
  9. 根据权利要求8所述的方法,其中,所述根据所述第一可见图的分割区域的非目标像素的像素值,确定所述分割区域的像素替换值,包括:
    将所述第一可见图的分割区域中非目标像素的聚类中心的像素值,确定为所述分割区域的像素替换值。
  10. 根据权利要求7至9任一项所述的方法,其中,所述至少根据所述第一类像素的像素数量与所述第二类像素的像素数量的关系、以及所述第一类像素的聚类中心的像素值与所述第二类像素的聚类中心的像素值的关系之一,确定所述第一可见图的分割区域中待更新的目标像素,包括:
    在所述第一类像素的聚类中心的像素值减去所述第二类像素的聚类中心的像素值的第一运算结果大于或等于第一阈值,且所述第一类像素的像素数量除以所述第二类像素的像素数量的第二运算结果大于或等于第二阈值的情况下,确定所述第二类像素为对应分割区域的待更新的目标像素;
    在所述第二类像素的聚类中心的像素值减去所述第一类像素的聚类中心的像素值的第三运算结果大于或等于所述第一阈值,且所述第二类像素的像素数量除以所述第一类像素的像素数量的第四运算结果大于或等于所述第二阈值的情况下,确定所述第一类像素为对应分割区域的待更新的目标像素。
  11. 根据权利要求10所述的方法,其中,所述对所述第一可见图的分割区域的像素进行聚类,还确定出所述分割区域的非目标像素,包括:
    在所述第一运算结果小于所述第一阈值或所述第二运算结果小于所述第二阈值、且所述第三运算结果小于所述第一阈值或所述第四运算结果小于所述第二阈值的情况下,确定所述第一类像素和所述第二类像素均为对应分割区域的非目标像素。
  12. 根据权利要求10或11所述的方法,其中,所述第一阈值的取值范围是[25,33],所述第二阈值的取值范围是[5,10]。
  13. 根据权利要求12所述的方法,其中,所述第一阈值为30,所述第二阈值为7。
  14. 根据权利要求3所述的方法,其中,所述获得所述目标视点的初始纹理图,包括:
    对解码码流得到的输入视点的纹理图的图集进行剪切视图恢复,得到所述输入视点的重构纹理图;
    根据所述输入视点的重构纹理图,对所述目标视点的初始可见图进行着色,得到所述目标视点的初始纹理图。
  15. 一种渲染方法,所述方法包括:
    对输入视点的深度图的图集进行剪切视图恢复,得到所述输入视点的重构深度图;
    对所述输入视点的重构深度图执行如权利要求1至14任一项所述方法中的步骤,得到所述目标视点的目标纹理图;
    根据所述目标视点的目标纹理图,生成所述目标视点的目标视图。
  16. 一种解码方法,所述方法包括:
    对输入的码流进行解码,得到输入视点的深度图的图集;
    对所述输入视点的深度图的图集进行剪切视图恢复,得到所述输入视点的重构深度图;
    对所述输入视点的重构深度图执行如权利要求1至14任一项所述方法中的步骤,得到所述目标视点的目标纹理图;
    根据所述目标视点的目标纹理图,生成所述目标视点的目标视图。
  17. 一种虚拟视点绘制装置,包括:
    可见图生成模块,用于根据所述输入视点的重构深度图,生成目标视点的初始可见图;
    可见图优化模块,用于对所述初始可见图进行质量提升处理,得到所述目标视点的目标可见图;
    着色模块,用于对所述目标视点的目标可见图进行着色,得到所述目标视点的目标纹理图。
  18. 一种渲染装置,包括:
    剪切视图恢复模块,用于对输入视点的深度图的图集进行剪切视图恢复,得到所述输入视点的 重构深度图;
    虚拟视点绘制模块,用于对所述输入视点的重构深度图执行如权利要求1至14任一项所述方法中的步骤,得到所述目标视点的目标纹理图;
    目标视图生成模块,用于根据所述目标视点的目标纹理图,生成所述目标视点的目标视图。
  19. 一种解码装置,包括:
    解码模块,用于对输入的码流进行解码,得到输入视点的深度图的图集;
    剪切视图恢复模块,用于对所述输入视点的深度图的图集进行剪切视图恢复,得到所述输入视点的重构深度图;
    虚拟视点绘制模块,用于对所述输入视点的重构深度图执行如权利要求1至14任一项所述方法中的步骤,得到所述目标视点的目标纹理图;
    目标视图生成模块,用于根据所述目标视点的目标纹理图,生成所述目标视点的目标视图。
  20. 一种视点加权合成器VWS,用于实现权利要求1至14任一项所述的方法。
  21. 一种渲染设备,用于实现权利要求15所述的方法。
  22. 一种解码器,用于实现权利要求16所述的方法。
  23. 一种电子设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现权利要求1至16任一项所述的方法。
  24. 一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现权利要求1至16任一项所述的方法。
PCT/CN2020/135779 2020-12-11 2020-12-11 虚拟视点绘制、渲染、解码方法及装置、设备、存储介质 WO2022120809A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202080107720.1A CN116601958A (zh) 2020-12-11 2020-12-11 虚拟视点绘制、渲染、解码方法及装置、设备、存储介质
PCT/CN2020/135779 WO2022120809A1 (zh) 2020-12-11 2020-12-11 虚拟视点绘制、渲染、解码方法及装置、设备、存储介质
US18/207,982 US20230316464A1 (en) 2020-12-11 2023-06-09 Virtual view drawing method and apparatus, rendering method and apparatus, and decoding method and apparatus, and devices and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/135779 WO2022120809A1 (zh) 2020-12-11 2020-12-11 虚拟视点绘制、渲染、解码方法及装置、设备、存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/207,982 Continuation US20230316464A1 (en) 2020-12-11 2023-06-09 Virtual view drawing method and apparatus, rendering method and apparatus, and decoding method and apparatus, and devices and storage medium

Publications (1)

Publication Number Publication Date
WO2022120809A1 true WO2022120809A1 (zh) 2022-06-16

Family

ID=81974162

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/135779 WO2022120809A1 (zh) 2020-12-11 2020-12-11 虚拟视点绘制、渲染、解码方法及装置、设备、存储介质

Country Status (3)

Country Link
US (1) US20230316464A1 (zh)
CN (1) CN116601958A (zh)
WO (1) WO2022120809A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116310046A (zh) * 2023-05-16 2023-06-23 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130222534A1 (en) * 2011-08-29 2013-08-29 Nokia Corporation Apparatus, a Method and a Computer Program for Video Coding and Decoding
CN103942756A (zh) * 2014-03-13 2014-07-23 华中科技大学 一种深度图后处理滤波的方法
CN106162198A (zh) * 2016-08-31 2016-11-23 重庆邮电大学 基于不规则匀质块分割的三维视频深度图编码及解码方法
CN106341676A (zh) * 2016-09-29 2017-01-18 济南大学 基于超像素的深度图像预处理和深度空洞填充方法
CN107071383A (zh) * 2017-02-28 2017-08-18 北京大学深圳研究生院 基于图像局部分割的虚拟视点合成方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130222534A1 (en) * 2011-08-29 2013-08-29 Nokia Corporation Apparatus, a Method and a Computer Program for Video Coding and Decoding
CN103942756A (zh) * 2014-03-13 2014-07-23 华中科技大学 一种深度图后处理滤波的方法
CN106162198A (zh) * 2016-08-31 2016-11-23 重庆邮电大学 基于不规则匀质块分割的三维视频深度图编码及解码方法
CN106341676A (zh) * 2016-09-29 2017-01-18 济南大学 基于超像素的深度图像预处理和深度空洞填充方法
CN107071383A (zh) * 2017-02-28 2017-08-18 北京大学深圳研究生院 基于图像局部分割的虚拟视点合成方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116310046A (zh) * 2023-05-16 2023-06-23 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机及存储介质
CN116310046B (zh) * 2023-05-16 2023-08-22 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机及存储介质

Also Published As

Publication number Publication date
US20230316464A1 (en) 2023-10-05
CN116601958A (zh) 2023-08-15

Similar Documents

Publication Publication Date Title
US10977809B2 (en) Detecting motion dragging artifacts for dynamic adjustment of frame rate conversion settings
US10354394B2 (en) Dynamic adjustment of frame rate conversion settings
US7324594B2 (en) Method for encoding and decoding free viewpoint videos
US11432009B2 (en) Techniques for encoding and decoding immersive video
JP2002506585A (ja) マスクおよび丸め平均値を使用したオブジェクトベースの符号化システムのためのスプライト生成に関する方法
CN112954393A (zh) 一种基于视频编码的目标跟踪方法、系统、存储介质及终端
US11190803B2 (en) Point cloud coding using homography transform
JP7383128B2 (ja) 画像処理装置
CN113068034B (zh) 视频编码方法及装置、编码器、设备、存储介质
US20230316464A1 (en) Virtual view drawing method and apparatus, rendering method and apparatus, and decoding method and apparatus, and devices and storage medium
Hu et al. An adaptive two-layer light field compression scheme using GNN-based reconstruction
US20230343017A1 (en) Virtual viewport generation method and apparatus, rendering and decoding methods and apparatuses, device and storage medium
US11989919B2 (en) Method and apparatus for encoding and decoding volumetric video data
JP2022533754A (ja) ボリュメトリック映像の符号化および復号化のための方法、装置、およびコンピュータプログラム製品
US20210258590A1 (en) Switchable scalable and multiple description immersive video codec
Alface et al. V3c-based coding of dynamic meshes
CN115665427A (zh) 直播数据的处理方法、装置及电子设备
CN115633179A (zh) 一种用于实时体积视频流传输的压缩方法
Ali et al. Depth image-based spatial error concealment for 3-D video transmission
US20240087185A1 (en) Virtual view drawing method, rendering method, and decoding method
KR102658474B1 (ko) 가상 시점 합성을 위한 영상 부호화/복호화 방법 및 장치
US11727536B2 (en) Method and apparatus for geometric smoothing
US20200413094A1 (en) Method and apparatus for encoding/decoding image and recording medium for storing bitstream
US20230306687A1 (en) Mesh zippering
Lim et al. Adaptive Patch-Wise Depth Range Linear Scaling Method for MPEG Immersive Video Coding

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20964745

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202080107720.1

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20964745

Country of ref document: EP

Kind code of ref document: A1