WO2023272510A1 - 多平面图像的生成、数据处理、编码和解码方法、装置 - Google Patents

多平面图像的生成、数据处理、编码和解码方法、装置 Download PDF

Info

Publication number
WO2023272510A1
WO2023272510A1 PCT/CN2021/103233 CN2021103233W WO2023272510A1 WO 2023272510 A1 WO2023272510 A1 WO 2023272510A1 CN 2021103233 W CN2021103233 W CN 2021103233W WO 2023272510 A1 WO2023272510 A1 WO 2023272510A1
Authority
WO
WIPO (PCT)
Prior art keywords
pmpi
smpi
pixel
scene
depth
Prior art date
Application number
PCT/CN2021/103233
Other languages
English (en)
French (fr)
Inventor
杨铀
蒋小广
刘琼
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to PCT/CN2021/103233 priority Critical patent/WO2023272510A1/zh
Priority to CN202180099763.4A priority patent/CN117561715A/zh
Publication of WO2023272510A1 publication Critical patent/WO2023272510A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability

Definitions

  • Multiplane image is a non-redundant scene representation method.
  • MPI decomposes the scene into a series of layers, which are planar layers or spherical layers.
  • layers which are planar layers or spherical layers.
  • the depth range [dmin,dmax] of MPI needs to be set in advance according to the depth of field data of the scene.
  • dmin is the minimum depth, which can be represented by the distance from the layer closest to the reference viewpoint to the reference viewpoint
  • dmax represents the maximum depth, which can be represented by the distance from the reference viewpoint.
  • Each layer in MPI is divided into two parts: color map (Color frame) and transparency map (Transparency frame).
  • the color map and transparency map of a layer contain the texture information and transparency information of the scene at the position of the plane layer respectively.
  • MPI can be used for immersive video, but the quality needs to be improved.
  • An embodiment of the present disclosure provides a method for generating a multi-plane image, including:
  • Scene division processing including: dividing the 3D scene into multiple scene areas according to the depth information of the 3D scene;
  • the block multi-planar image PMPI generating process includes: generating multiple sub-multi-planar images sMPI respectively representing the multiple scene areas, each sMPI includes multiple layers obtained by sampling at different depths of the scene area represented by the sMPI.
  • An embodiment of the present disclosure also provides a data processing method for a multi-plane image, including:
  • the PMPI includes a plurality of sub-multi-plane images sMPI to respectively represent a plurality of scene areas divided into three-dimensional scenes, each sMPI includes different depth samples in the scene area represented by the sMPI The resulting multiple layers;
  • the original storage data of the PMPI is converted into encapsulated and compressed storage PCS data, and the PCS data is used to determine the depth of the effective layer of the pixel in the PMPI and the color and transparency of the pixel on the effective layer.
  • An embodiment of the present disclosure also provides a method for encoding a multi-plane image, including:
  • the PCS data includes image parameters, and the data of the texture attribute part and the transparency attribute part;
  • the PMPI includes a plurality of sub-multiplane images sMPI to respectively represent a plurality of scene regions into which the 3D scene is divided, and each sMPI includes a plurality of layers obtained by sampling at different depths of the scene region represented by the sMPI.
  • An embodiment of the present disclosure also provides a decoding method for a multi-plane image, including:
  • the PMPI includes a plurality of sub-multiplane images sMPI to respectively represent a plurality of scene regions into which the 3D scene is divided, and each sMPI includes a plurality of layers obtained by sampling at different depths of the scene region represented by the sMPI.
  • An embodiment of the present disclosure also provides a code stream, wherein the code stream is generated by encoding a block multi-plane image PMPI, and the code stream includes image parameters and Atlas data of the PMPI; wherein , the PMPI includes a plurality of sub-multiplanar images sMPI to respectively represent a plurality of scene regions into which the 3D scene is divided, and each sMPI includes a plurality of layers obtained by sampling at different depths of the scene region represented by the sMPI.
  • An embodiment of the present disclosure also provides a multi-plane image data processing device, including a processor and a memory storing a computer program that can run on the processor, wherein when the processor executes the computer program A data processing method for a multi-plane image as described in any embodiment of the present disclosure is implemented.
  • An embodiment of the present disclosure also provides a multi-plane image decoding device, including a processor and a memory storing a computer program that can run on the processor, wherein, when the processor executes the computer program, the implementation as described herein
  • a multi-plane image decoding device including a processor and a memory storing a computer program that can run on the processor, wherein, when the processor executes the computer program, the implementation as described herein
  • the method for decoding a multi-plane image described in any embodiment is disclosed.
  • An embodiment of the present disclosure also provides a multi-plane image encoding device, including a processor and a memory storing a computer program that can run on the processor, wherein when the processor executes the computer program, the The encoding method of a multi-plane image as described in any embodiment of the present disclosure.
  • An embodiment of the present disclosure also provides a device for generating a multi-plane image, including a processor and a memory storing a computer program that can run on the processor, wherein when the processor executes the computer program, the The method for generating a multi-plane image as described in any embodiment of the present disclosure.
  • An embodiment of the present disclosure also provides a non-transitory computer-readable storage medium, the computer-readable storage medium stores a computer program, wherein, when the computer program is executed by a processor, any implementation of the present disclosure can be realized.
  • FIG. 1 is a schematic structural diagram of an exemplary MPI composed of four plane layers
  • FIGS. 2A to 2F are schematic diagrams of six consecutive plane layers in an exemplary MPI, showing a color map and a transparency map of each plane layer;
  • Fig. 3 is a schematic diagram of representing a three-dimensional scene using ordinary MPI
  • FIG. 4 is a schematic diagram of using PMPI to characterize a three-dimensional scene according to an embodiment of the present disclosure
  • Fig. 5 is the flowchart of the PMPI generation method of an embodiment of the present disclosure
  • Figure 7 is a schematic diagram of a PMPI generated using an embodiment of the present disclosure.
  • Fig. 8 is a schematic diagram of a video compression process
  • Fig. 9 is a schematic diagram of a kind of PCS data converted from MPI original storage data
  • Fig. 10 is the flowchart of the data processing method of PMPI embodiment of the present disclosure.
  • Fig. 11 is a schematic diagram of a kind of PCS data converted from PMPI original storage data
  • Fig. 12 is the schematic diagram of another kind of PCS data that obtains from PMPI original storage data conversion
  • Fig. 13 is the schematic diagram of another kind of PCS data converted from PMPI original storage data
  • FIG. 14 is a schematic structural diagram of a PMPI encoding device according to an embodiment of the present disclosure.
  • FIG. 15 is a flow chart of a PMPI encoding method according to an embodiment of the present disclosure.
  • Fig. 16 is a flowchart of a PMPI decoding method according to an embodiment of the present disclosure
  • FIG. 17 is a schematic structural diagram of a PMPI rendering device according to an embodiment of the present disclosure.
  • Fig. 18 is a schematic diagram of an apparatus for generating PMPI according to an embodiment of the present disclosure.
  • words such as “exemplary” or “for example” are used to mean an example, illustration or illustration. Any embodiment described in this disclosure as “exemplary” or “for example” should not be construed as preferred or advantageous over other embodiments.
  • "And/or” in this article is a description of the relationship between associated objects, which means that there can be three relationships, for example, A and/or B, which can mean: A exists alone, A and B exist simultaneously, and there exists alone B these three situations.
  • “A plurality” means two or more than two.
  • words such as “first” and “second” are used to distinguish the same or similar items with basically the same function and effect. Those skilled in the art can understand that words such as “first” and “second” do not limit the number and execution order, and words such as “first” and “second” do not necessarily limit the difference.
  • MPI is a hierarchical representation of 3D scenes without redundancy. Please refer to Figure 1.
  • MPI decomposes the scene into a series of plane layers or spherical layers.
  • a reference viewpoint such as a reference camera
  • the depth range [dmin, dmax] of the plane layer needs to be set in advance according to the depth range of the actual scene.
  • the MPI includes S plane layers, and the size of each plane layer is W ⁇ H
  • the size of the MPI can be expressed as W ⁇ H ⁇ S.
  • W is the number of pixels in the width direction of the MPI
  • H is the number of pixels in the height direction of the MPI
  • the MPI contains W ⁇ H pixels
  • the planar image resolution is W ⁇ H.
  • the exemplary MPI shown in Figure 1 includes 4 layers, but the number of plane layers or spherical layers that MPI includes, that is, the number of layers, can also be 2, 3, 5 or more than 5, such as 100, 200 Wait a minute.
  • Each layer of MPI includes a color map and a transparency map, which are used to record the color and transparency of pixels on this layer. A pixel can have different colors and transparency on different layers.
  • MPI is a hierarchical representation of a 3D scene, and can also be regarded as a sampling of a 3D scene.
  • the points on the MPI plane layer are sampling points. From the examples in Figure 2A to Figure 2F, it can be seen that most of the sampling points in this MPI are located in The invalid positions of the 3D scene, these positions have no visible surface, and the transparency is 0. Only a small number of sampling points are located in the valid area of the 3D scene. There are visible surfaces at the positions of these valid areas, and the transparency is not 0.
  • MPI can be used for immersive video but is not limited to this. From the perspective of immersive experience, the decisive role is the effective area in the 3D scene, and most of the sampling points of MPI are wasted, resulting in low sampling efficiency and the final immersive experience. Video resolution is also lower.
  • the depth range [dmin,dmax] of MPI is set according to the global depth of the scene.
  • the depth range is enough to cover most of the effective information of the scene.
  • the depth of the layer closest to the reference viewpoint is called the initial depth dmin of MPI.
  • the depth of the layer farthest from the reference viewpoint may be referred to as the termination depth dmax of the MPI.
  • the parallel lines in the figure are used to indicate the depth position of each layer in the MPI in the three-dimensional scene.
  • MPI Since one geometry in this scene is far away from other geometries, in order to represent the main information of the scene (four geometries), MPI must use a larger depth range, and the resulting plane layer (four in the figure as an example ) is relatively sparse. For the three geometries located in the foreground region, valid information will only appear on the two plane layers with greater depth in MPI. MPI is less efficient at sampling.
  • a single sMPI is also a kind of MPI, but it represents the scene area divided into three-dimensional scenes, and the scene area can also be regarded as a special three-dimensional scene, but the size and shape are uncertain and will change with the content of the scene .
  • the way sMPI characterizes the scene area can still use the way ordinary MPI characterizes the three-dimensional scene.
  • an sMPI includes multiple layers sampled at different depths in the scene area. The size and shape of the multiple layers are the same, and each layer includes a color map and A transparency map, multiple layers can be distributed according to set rules such as equal spacing or equal viewing distance.
  • the setting of the depth range of the sMPI can also follow the principle of including most of the effective information in the scene area.
  • the end depths of multiple sMPIs in PMPI can be set to be the same and the start depths can be set to be different, so as to introduce the depth information of the scene and increase the depth of the scene.
  • the self-adaptive ability enables more sampling points to be placed in the effective position of the scene.
  • the 3D scene and reference viewpoint shown in FIG. 4 are the same as those in FIG. 3 .
  • the number of scene regions is set to 2
  • the 3D scene is divided into two scene regions, one of which includes two geometric bodies located on the left side of the reference viewpoint, one of which is The geometry is closest to the reference viewpoint, and the other geometry is farther away from the reference viewpoint.
  • This scene area can be called the foreground area.
  • Another scene area includes two geometric bodies located on the right side of the reference viewpoint, both of which are far away from the reference viewpoint, and this scene area may be referred to as a background area.
  • the above-mentioned foreground area and background area are represented by one sMPI respectively, and each sMPI contains 4 plane layers, because the depth difference of the two geometric bodies in the foreground area is relatively large, and the scene depth of the sMPI used to represent the foreground area needs to be set larger.
  • the depth difference between the two geometries in the background area is small and close to the rear, and the depth range of the sMPI used to characterize the background area can be set to be small.
  • the start depth of the sMPI used to characterize the foreground area is smaller, while the start depth of the sMPI used to characterize the background area is larger.
  • the PMPI representation of the resulting 3D scene is shown in Fig. 4. It can be seen from the figure that the four layers of sMPI used to characterize the background area become dense and are all located near the geometry. Compared with the MPI in Figure 3, PMPI has more sampling points located in effective positions, so the The layered PMPI represents the scene, which is more efficient in sampling than the layered representation of ordinary MPI.
  • the foreground area and the background area are obtained by dividing the three-dimensional scene into two with a plane in the depth direction, which is schematic.
  • PMPI differs from normal MPI.
  • the shape and size of the obtained scene area and layer are usually different from the outline of the object and the shape and size of the illustrated scene area and layer.
  • An embodiment of the present disclosure provides a method for generating a multi-plane image, as shown in FIG. 5 , including:
  • Step 310 scene division processing: divide the 3D scene into multiple scene areas according to the depth information of the 3D scene;
  • the end depths of the multiple sMPIs are the same, and the start depth of each sMPI is determined according to the minimum depth of the scene area represented by the sMPI.
  • the initial depth of sMPI can be directly set as the minimum depth value of the scene area, but the present disclosure is not limited thereto, and the initial depth can also be slightly smaller than the minimum depth value. Larger values, such as taking the mean value of the smallest multiple depth values, the Nth depth value after the depth values are sorted from small to large (N can be set or taken as a certain ratio of the number of pixels included in sMPI), etc.
  • the depth value of the scene area may be represented by the gray value of the pixels contained in the scene area, and may be obtained from a corresponding area of the depth map of the three-dimensional scene.
  • the starting depths of different sMPIs are different or not completely the same.
  • one or more connected areas may be divided in the same depth interval.
  • the scene area is represented by an sMPI.
  • multiple sMPIs are in one-to-one correspondence with multiple scene areas, and different sMPIs have different starting depths.
  • multiple connected regions existing on the same depth interval can also be used as multiple scene regions, which are represented by multiple sMPIs respectively.
  • the number of layers of the plurality of sMPIs is the same, and/or the distribution rule of the layers is also the same, and the distribution rule may be, for example, equidistant distribution or equidistant distribution.
  • the present disclosure is not limited thereto.
  • the number of layers of multiple sMPIs in the PMPI can also be set to be different, and the distribution rules of the layers can also be different. At this time, some coding complexity will be increased, but it can be More flexible representation of 3D scenes.
  • the dividing the 3D scene into multiple scene areas according to the depth information of the 3D scene includes:
  • using an image segmentation algorithm to segment the depth map of the 3D scene into multiple depth map regions includes: using a threshold segmentation algorithm or a superpixel segmentation algorithm to segment the depth map of the 3D scene The depth map is divided into multiple depth map regions.
  • Threshold segmentation method is a classic method in image segmentation. It uses the difference in grayscale between the target to be extracted and the background in the image, and determines whether the feature attribute of each pixel in the image meets the threshold requirements to determine the image. Whether the pixel belongs to the target area or the background area, thus converting a grayscale image into a binary image. Thresholds can be selected based on human experience, based on histograms, and the like. As an example, this embodiment uses the method of maximum between-class variance (OTSU), which is a method for automatically determining the threshold using the maximum between-class variance, which divides the image into two parts, the foreground and the background, according to the grayscale characteristics of the image .
  • OTD maximum between-class variance
  • the present disclosure is not limited to dividing a three-dimensional scene into two scene regions, so embodiments of the present disclosure may also use a multi-threshold image segmentation algorithm, such as an OTSU-based multi-threshold image segmentation algorithm.
  • a multi-threshold image segmentation algorithm such as an OTSU-based multi-threshold image segmentation algorithm.
  • the number of scene regions divided into three-dimensional scenes can be adaptively generated by the algorithm, and it is not necessarily fixed.
  • Superpixel refers to an irregular pixel block with certain visual significance composed of adjacent pixels with similar texture, color, brightness and other characteristics. It uses the similarity of features between pixels to group pixels and replaces a large number of superpixels Pixels to express image features have been widely used in computer vision applications such as image segmentation, pose estimation, target tracking, and target recognition.
  • Common superpixel segmentation algorithms include Graph-based, NCut, Turbopixel, Quick-shift, Graph-cut a, Graph-cut b, and SLIC, etc. What is useful in this embodiment is SLIC (simple linear iterative clustering), that is, simple linear iterative clustering clustering.
  • the SLIC algorithm converts the color image into a 5-dimensional feature vector under the CIELAB color space and XY coordinates, and then constructs a distance metric for the 5-dimensional feature vector, and performs local clustering of image pixels to generate compact, approximately uniform superpixels .
  • the image segmentation algorithm is not limited to the above two, and may also be a region segmentation algorithm, an edge segmentation algorithm, etc., and other image segmentation algorithms may also be used to segment the depth map in the embodiments of the present disclosure.
  • an image segmentation algorithm to segment the depth map of the 3D scene into multiple depth map regions includes:
  • a threshold segmentation algorithm is used to segment the depth map of the three-dimensional scene into N depth map regions;
  • a superpixel segmentation algorithm is used to segment the depth map of the three-dimensional scene into N depth map regions;
  • N is the number of scene regions into which the three-dimensional scene is to be divided, which can be set in advance.
  • a threshold segmentation algorithm is used to segment the three-dimensional scene.
  • using the superpixel segmentation algorithm to segment the 3D scene can make full use of the advantages of different algorithms and achieve better segmentation results.
  • the 3D scene is segmented according to the depth map (Depth map) of the 3D scene shown in FIG.
  • the number A of the scene area Divide the depth map into A scene areas, and use different segmentation methods for the depth map according to the value of A:
  • the Otsu threshold segmentation algorithm is used to segment the depth map, that is, according to the set A-1 thresholds, the depth map is divided into A regions in different depth intervals according to the gray value.
  • the number of connected regions obtained at this time may be greater than A, but the gray value of some connected regions is in the same depth interval.
  • all connected regions whose gray value is in the same depth interval are regarded as the same depth map area, so that the scene area The number is A;
  • FIG. 7B is a schematic diagram of a depth map after being segmented using the Otsu threshold segmentation algorithm.
  • FIG. 7C is a schematic diagram of the depth map after the SLIC superpixel segmentation algorithm is used.
  • the 3D scene can be divided into multiple scene regions. If the resolution of the depth map is the same as that of PMPI, the boundaries of multiple depth map regions can be directly used as the boundaries of the scene region in the 3D scene. If the resolution of the depth map is different from that of PMPI, the boundaries of multiple depth map regions can be updated. Sampling or downsampling is used for resolution adaptation, and then used as the boundary of the scene area in the 3D scene.
  • each sMPI in the PMPI is the same and preset, and the start depth of each area is equal to the minimum depth value of the scene area represented by the sMPI in the corresponding area of the depth map.
  • Each sMPI has the same number of layers and is distributed according to the same law (equidistant distribution or equal parallax distribution).
  • the shape and initial depth of the PMPI generated by the embodiments of the present disclosure are more flexible and change adaptively as the depth of field in different regions of the scene changes.
  • the resulting result is that the sampling points of PMPI are gathered on the visible surface of the scene, and the sampling efficiency is improved.
  • the distribution of plane layers of PMPI is denser, which is equivalent to the effect of ordinary MPI with more layers, but the number of sampling points does not increase.
  • the denser depth layer makes the final immersive video generated according to PMPI have more details and better quality.
  • MPI can be displayed as immersive video after video compression.
  • Figure 8 shows the corresponding video processing.
  • the three-dimensional scene images (such as images captured by a 3D camera) collected by the video acquisition device are preprocessed into MPI, and the MPI is compressed and encoded and then transmitted as a code stream.
  • the code stream is decoded and post-processed, and displayed and played in the form of immersive video.
  • MPEG Moving Picture Experts Group
  • PCS compressed package compression storage
  • Image data such as data and reference viewpoint camera parameters can be used as the input of the immersive video test model (TMIV: Test model of immersive video) in MPEG.
  • MPI also known as MPI frame
  • MPI frame needs to be preprocessed and converted into PCS data before being input into TMIV.
  • the PCS data records the relevant parameters of each pixel in the MPI.
  • N i, j the number of effective layers of the pixel (i, j);
  • C i,j,k the color data such as the color value at the kth effective layer position of the pixel (i,j);
  • D i,j,k the index (index) of the kth effective layer of the pixel (i,j) (D i,j,k ⁇ [1,S]);
  • T i,j,k The transparency value at the position of the k-th effective layer of pixel (i,j).
  • pixel (i, j) is contained in S layers of MPI, and the layer whose transparency value of pixel (i, j) is not 0 is called the effective layer of pixel (i, j).
  • Figure 9 shows an example of MPI encapsulated according to the above parameters.
  • the original stored data of the MPI with the size of W ⁇ H ⁇ S will not be completely preserved in the PCS data.
  • the value of a pixel (pixel) in some plane layers may be invalid (that is, the pixel is completely transparent and has no valid information). Therefore, for the pixel (i, j), it is only necessary to preserve the information of the N i, j effective plane layers of the pixel (i, j) in the S plane layers.
  • the number of effective plane layers corresponding to each pixel is uncertain.
  • the compressed PCS data reduces the storage space occupied by MPI.
  • the way PCS data is stored reduces the number of memory accesses during subsequent decoding. Given the size W ⁇ H of each MPI plane, only two memory access operations are required to read the entire MPI frame into the memory.
  • the start depth and end depth of multiple MPI frames within a set time are the same, and the distribution rules of multiple layers in MPI are known.
  • the depth of each layer in the MPI can be calculated according to the preset start depth and end depth. Therefore, the PCS data of a single MPI does not need to additionally record the depth information of the effective layer of the pixel.
  • PMPI can also be compressed and encoded as a video frame.
  • PMPI can be directly generated according to the image of the three-dimensional scene, or it can be generated on the basis of ordinary MPI.
  • Before encoding PMPI it is also necessary to convert the original storage data of PMPI into PCS data.
  • PMPI has been transformed on the basis of ordinary MPI.
  • PMPI includes multiple sMPIs.
  • the depth of the layer closest to the reference viewpoint is equal to the initial depth of the sMPI.
  • the depth of the layer with the farthest viewpoint is equal to the end depth of the sMPI, and the depths of other layers are between the start depth and end depth of the sMPI.
  • Multiple layers can be distributed according to set rules such as equidistant or equidistant. Therefore, after knowing the start depth and end depth of sMPI, the depth of each layer of sMPI can be calculated.
  • the termination depth of sMPI in different PMPIs can be set to be the same.
  • the starting depth of sMPI in PMPI is related to the depth of the scene area represented by it, and is not preset. Therefore, when converting the original stored data of the PMPI in the embodiment of the present disclosure into PCS data, it is necessary to provide additional starting depth information so that the decoding end can accurately calculate the depth of the effective layer of the pixel.
  • An embodiment of the present disclosure provides a multi-plane image data processing method, as shown in FIG. 10 , including:
  • Step 410 obtaining the original storage data of the block multi-plane image PMPI
  • the PMPI includes multiple sub-multiplanar images sMPI to respectively represent multiple scene regions into which the 3D scene is divided, and each sMPI includes multiple layers obtained by sampling at different depths of the scene region represented by the sMPI.
  • Step 420 converting the original storage data of the PMPI into encapsulated and compressed storage PCS data, the PCS data is used to determine the depth of the effective layer of the pixel in the PMPI and the color and transparency of the pixel on the effective layer.
  • the effective layer of the pixel refers to a layer whose transparency of the pixel is not 0 in the sPMI including the pixel in the PMPI.
  • the effective layer of a pixel can have one or more layers, depending on the actual scene.
  • the starting depths of different sMPIs are different or not completely the same.
  • the first PCS data format suitable for PMPI is proposed.
  • the PCS data includes the following parameters of each pixel in the PMPI:
  • the data of the start depths of the plurality of sMPIs is expressed as the start depth of the sMPI where each pixel in the PMPI is located, according to the start depth of the sMPI where the pixel is located, combined with the set end depth of the sMPI and According to the distribution rule of the layer, the depth of each layer included in the sMPI can be calculated, and combined with the layer index of each effective layer of the pixel in the sMPI where the pixel is located, the depth of each effective layer of the pixel can be known . For each pixel in the PMPI, after the depth of each effective layer of the pixel and the color and transparency on each effective layer are obtained, the PMPI image can be completely restored.
  • the image resolution of the PMPI is W ⁇ H
  • the number of sMPIs included in the PMPI is M
  • the number of layers of each sMPI is S.
  • the PCS data format of the PMPI is shown in Figure 11, and the PCS data of the PMPI includes the following parameters:
  • N i, j the number of effective layers of the pixel (i, j), i ⁇ [1,H], j ⁇ [1,W];
  • E i,j the starting depth of the sMPI where the pixel (i,j) is located;
  • C i,j,k the color data such as the color value at the kth effective layer position of the pixel (i,j), k ⁇ [1,N i,j ];
  • D i,j,k the index (index) of the kth effective layer of the pixel (i,j), D i,j,k ⁇ [1,S];
  • T i,j,k The transparency value at the position of the k-th effective layer of pixel (i,j).
  • the parameter Ei,j which is the starting depth of the sMPI where the pixel (i,j) is located, is added to calculate the depth of each effective layer of the pixel.
  • the PCS data includes:
  • the image resolution of the PMPI is W ⁇ H
  • the number of sMPIs included in the PMPI is M
  • the number of layers of each sMPI is S.
  • the PCS data format of the PMPI is as shown in Figure 12, and the PCS data of the PMPI includes the following parameters:
  • N i, j the number of effective layers of the pixel (i, j), i ⁇ [1,H], j ⁇ [1,W];
  • I i,j the index of the sMPI where the pixel (i,j) is located, I i,j ⁇ [1,M];
  • C i,j,k the color data such as the color value at the kth effective layer position of the pixel (i,j), k ⁇ [1,N i,j ];
  • D i,j,k the index (index) of the kth effective layer of the pixel (i,j), D i,j,k ⁇ [1,S];
  • T i,j,k The transparency value at the position of the k-th effective layer of pixel (i,j).
  • this embodiment directly uses the parameter Ei,j to indicate the initial depth of the sMPI where the pixel (i,j) is located.
  • the initial depth DP m of each sMPI and the index of the sMPI where the pixel (i, j) is located are used together to determine the initial depth of the sMPI where the pixel (i, j) is located.
  • Other parameters are similar.
  • the PCS data includes:
  • the depth of each layer included in each sMPI in the PMPI (the depth of the layer closest to the reference viewpoint for each sMPI is the initial depth of the sMPI);
  • the image resolution of the PMPI is W ⁇ H
  • the number of sMPIs included in the PMPI is M
  • the number of layers of each sMPI is S.
  • the PCS data format of the PMPI is shown in Figure 13, and the PCS data of the PMPI includes the following parameters:
  • DP l the starting depth of the lth sMPI, l ⁇ [1,M ⁇ S];
  • N i, j the number of effective layers of the pixel (i, j), i ⁇ [1,H], j ⁇ [1,W];
  • C i,j,k the color data such as the color value at the kth effective layer position of the pixel (i,j), k ⁇ [1,N i,j ];
  • T i,j,k The transparency value at the position of the k-th effective layer of pixel (i,j).
  • the above PCS data format applicable to PMPI adds information about the starting depth of sMPI to the PCS data, so that the decoder can combine the starting depth of sMPI to calculate the depth of the effective layer of the pixel, so that the compressed PCS data is accurate. Recover the PMPI image.
  • the starting depth of sMPI in PMPI has the characteristic of adaptive change, which needs to be recorded in the original data of PMPI.
  • the original storage data of the PMPI includes the data of the color map and the transparency map of each layer in each sMPI, and the data of the pixels contained in each sMPI and the initial depth.
  • the data of the pixels and the starting depth included in each sMPI is represented by a starting depth map, and the starting depth map is used to indicate the starting depth of the sMPI where each pixel in the PMPI is located, specifically Specifically, the position of each pixel in the PMPI may indicate the starting depth of the sMPI where the pixel is located.
  • the initial depth map also indicates the pixels included in each sMPI, which can reflect the shape, size and position of the sMPI.
  • the PMPI is generated using the generation method described in any embodiment of the present disclosure
  • each pixel in the PMPI is included in one sMPI
  • the sMPI includes multiple Each layer records the color value and transparency value of the pixel.
  • the pixels in the ordinary PMI are included in all layers of the ordinary PMI, and the PMPI is divided into blocks, so the pixels in the PMPI are included in all the layers of a sMPI.
  • the sMPI containing the pixel is called the sMPI where the pixel is located.
  • the color value and transparency value of the pixel are recorded in all layers of the sMPI where the pixel is located, but only some of these layers may be valid layers for the pixel.
  • FIG 14 is a structural diagram of an MPI encoding device that can be used in an embodiment of the present disclosure.
  • the input data of the MPI encoding device 10 is the PCS data of the source MPI (such as PMPI), in this example, including image parameters (View parameters ) (such as reference viewpoint camera parameters, etc.), the data of the Texture Attribute component and the data of the Transparency Attribute component, etc.
  • the MPI mask generation (Create mask from MPI) unit 101 is configured to generate an MPI mask according to input data.
  • the pixels (also referred to as sampling points) in the MPI layer may be screened according to the transparency threshold to obtain a mask of each layer. This is for distinguishing positions with high transparency (also referred to as pixels) and positions with low transparency (also referred to as pixels) on each layer, and masking positions with high transparency to reduce the amount of data.
  • the MPI mask generation unit 101 performs the above-mentioned operations on all MPI frames within an intra-period. Assume that the size of the MPI frame is W ⁇ H ⁇ S, and the number of frames included in an intra-period is M. After being processed by the MPI mask generating unit 101 , M W ⁇ H ⁇ S masks (masks) are obtained.
  • the MPI mask aggregation (Aggregate MPI masks) unit 103 is configured to take a union of multiple masks located on the same layer among the M W ⁇ H ⁇ S masks to obtain a W ⁇ H ⁇ S mask.
  • Effective pixel clustering (Cluster Active pixels) unit 105 is configured to cluster the regions (effective information regions) whose transparency is greater than the threshold value in the mask of each layer into a series of clusters (cluster);
  • the cluster segmentation (Split Clusters) unit 107 is configured to divide the cluster obtained by the clustering of the effective pixel clustering unit 105, and obtain the cluster after the segmentation process;
  • the block packing (Pack patches) unit 109 is configured to recombine the texture map and the transparency map corresponding to each block (such as a rectangular area containing clusters) into one map, and encode it as atlas data for transmission.
  • Video data generation (Generate video data) unit 111 is set to generate video data and transmit according to the atlas data that block encapsulation unit 109 outputs, and described video data comprises texture attribute video data (Texture attribute video data (raw)), transparency attribute video data (Transparency attribute video data (raw)), etc.
  • the parameter encoding unit 113 is configured to encode the encoded image parameters according to the source MPI data, and the encoded image parameters may include an image parameter list (View parameters list), a parameter set (Parameter set) and the like.
  • the sampling points in the MPI are first screened according to the transparency threshold to obtain a mask of each plane layer.
  • the size of MPI is W ⁇ H ⁇ S
  • the number of frames contained in a set period of time (intra-period) is M
  • the above operation is performed on all MPI frames in the period of time to obtain M W ⁇ H ⁇ S mask (mask).
  • the masks on the same plane layer are combined to obtain a W ⁇ H ⁇ S mask.
  • the regions (effective information regions) whose transparency is greater than the threshold value in the mask of each layer are clustered and divided into a series of clusters.
  • the cluster obtains the patch through steps such as fusion and decomposition.
  • An embodiment of the present disclosure provides a multi-plane image coding method, which can be used for PMPI coding. As shown in FIG. 15, the coding method includes:
  • Step 510 receiving the encapsulated and compressed stored PCS data of the block multi-plane image PMIP, the PCS data includes image parameters, and data of the texture attribute part and the transparency attribute part;
  • Step 520 encoding the PCS data to obtain encoded image parameters and Atlas data
  • the PMPI includes a plurality of sub-multiplane images sMPI to respectively represent a plurality of scene regions into which the 3D scene is divided, and each sMPI includes a plurality of layers obtained by sampling at different depths of the scene region represented by the sMPI.
  • the starting depths of different sMPIs are different or not completely the same.
  • the image parameters include depth information of the image, and the depth information includes:
  • the PCS data may be converted from the original stored data of the PMPI according to the data processing method described in any embodiment of the present disclosure.
  • the depth information further includes the termination depth and the number of layers set for the sPMI in the PMPI.
  • An embodiment of the present disclosure provides a multi-plane image decoding method, as shown in FIG. 16 , including:
  • Step 610 receiving the coded code stream of the block multi-plane image PMPI
  • the image parameters include depth information of the image, and the depth information includes:
  • FIG. 17 shows the architecture of an exemplary MPI rendering device that can be used in embodiments of the present disclosure.
  • the input data of the MPI rendering device 20 includes the data of the decoded access unit (Decoded access unit) and the viewing angle parameters (Viewport parameters), so
  • the data of the decoding channel unit includes image parameters (View parameters), parameter sets (Parameter sets) and specific Atlas (Unique Atlas) data, and specific Atlas can include Atlas parameter list (Atlas parameter list), block parameter list (Patch parameter list ), recombined image block and MPI image block comparison table (Block to patch map), texture video data (Texture video data), transparency video data (Transparency video data), etc.
  • the MPI rendering device 20 includes:
  • a Layer depth values decoding (Layer depth values decoding) unit 201 is configured to parse the MPI coded code stream and extract the depth information of the MPI middle layer;
  • Image synthesis unit (View synthesis) unit 203 is configured to perform image synthesis according to the depth information of the MPI middle layer and the data of the decoding channel unit;
  • An image repair (Inpainting) unit 205 configured to repair the synthesized image
  • Viewing space processing (Viewing space handling) unit is configured to perform viewing space processing on the repaired image according to viewing angle parameters.
  • the MPI rendering device restores the MPI during decoding and reads the attached image depth information at the same time. Synthesize new views from planar depth and MPI texture and transparency maps.
  • An embodiment of the present disclosure also provides a code stream, wherein the code stream is generated by encoding a block multi-plane image PMPI, and the code stream includes image parameters and Atlas data of the PMPI; wherein , the PMPI includes a plurality of sub-multiplanar images sMPI to respectively represent a plurality of scene regions into which the 3D scene is divided, and each sMPI includes a plurality of layers obtained by sampling at different depths of the scene region represented by the sMPI.
  • the image parameters include depth information of the image, and the depth information includes:
  • the starting depths of different sMPIs are different or not completely the same.
  • An embodiment of the present disclosure also provides a device for generating a multi-plane image, referring to FIG. 18 , including a processor 5 and a memory 6 storing a computer program that can run on the processor 5, wherein the processor 5.
  • the computer program When the computer program is executed, the method for generating a multi-plane image according to any embodiment of the present disclosure is implemented.
  • An embodiment of the present disclosure also provides a multi-plane image data processing device, as shown in FIG. 18 , including a processor and a memory storing a computer program that can run on the processor, wherein the processor executes The computer program implements the multi-plane image data processing method described in any embodiment of the present disclosure.
  • An embodiment of the present disclosure also provides a multi-plane image decoding device, referring to FIG. 18 , which includes a processor and a memory storing a computer program that can run on the processor, wherein the processor executes the The computer program implements the multi-plane image rendering method described in any embodiment of the present disclosure.
  • An embodiment of the present disclosure also provides a non-transitory computer-readable storage medium, the computer-readable storage medium stores a computer program, wherein, when the computer program is executed by a processor, any implementation of the present disclosure can be realized.
  • the method described in the example is the same as the computer program.
  • the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over, as one or more instructions or code, a computer-readable medium and executed by a hardware-based processing unit.
  • Computer-readable media may include computer-readable storage media that correspond to tangible media such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, eg, according to a communication protocol. In this manner, a computer-readable medium may generally correspond to a non-transitory tangible computer-readable storage medium or a communication medium such as a signal or carrier wave.
  • Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure.
  • a computer program product may comprise a computer readable medium.
  • such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk or other magnetic storage, flash memory, or may be used to store instructions or data Any other medium that stores desired program code in the form of a structure and that can be accessed by a computer.
  • any connection could also be termed a computer-readable medium. For example, if a connection is made from a website, server or other remote source for transmitting instructions, coaxial cable, fiber optic cable, dual wire, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium.
  • disk and disc include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, or blu-ray disc, etc. where disks usually reproduce data magnetically, while discs use lasers to Data is reproduced optically. Combinations of the above should also be included within the scope of computer-readable media.
  • processors can be implemented by one or more processors such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • processors may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein.
  • the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec.
  • the techniques may be fully implemented in one or more circuits or logic elements.
  • the technical solutions of the embodiments of the present disclosure may be implemented in a wide variety of devices or devices, including a wireless handset, an integrated circuit (IC), or a set of ICs (eg, a chipset).
  • IC integrated circuit
  • Various components, modules, or units are described in the disclosed embodiments to emphasize functional aspects of devices configured to perform the described techniques, but do not necessarily require realization by different hardware units. Rather, as described above, the various units may be combined in a codec hardware unit or provided by a collection of interoperable hardware units (comprising one or more processors as described above) in combination with suitable software and/or firmware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本公开提供一种多平面图像的生成、数据处理、编码和解码方法、装置,所述方法使用分块多平面图像PMPI来表征三维场景,所述PMPI包括多个子多平面图像sMPI,用于分别表征对三维场景划分得到的多个场景区域,多个sMPI的起始深度不同,每一sMPI包括在所表征场景区域的不同深度采样得到的多个层。本公开的方法和装置可以提高采样效率和视频分辨率。

Description

多平面图像的生成、数据处理、编码和解码方法、装置 技术领域
本公开实施例涉及但不限于图像处理技术,更具体地,涉及一种多平面图像的生成、数据处理、编码和解码方法、装置。
背景技术
多平面图像(MPI:Multiplane image)是一种无冗余的场景表征方式。在一个给定的参考视点作为坐标原点的空间坐标系内,MPI将场景分解为一系列的层,这些层为平面层或球面层。以由平面层组成的MPI为例,如图1所示,多个平面层相对于参考视点正向平行并且位于不同的深度上。MPI的深度范围[dmin,dmax]需要根据场景的景深数据提前设定,其中,dmin为最小深度,可以用离参考视点最近的层到参考视点的距离表示,dmax表示最大深度,可以用离参考视点最远的层到参考视点的距离表示。MPI中的每一个层分为两部分:颜色图(Color frame)和透明度图(Transparency frame)。一个层的颜色图和透明度图分别包含了场景在该平面层位置处的纹理信息和透明度信息,MPI可用于沉浸视频,但质量还有待提升。
发明概述
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本公开一实施例提供了一种多平面图像的生成方法,包括:
场景划分处理,包括:根据三维场景的深度信息将所述三维场景划分成多个场景区域;及
分块多平面图像PMPI生成处理,包括:生成分别表征所述多个场景区域的多个子多平面图像sMPI,每一sMPI包括在该sMPI所表征场景区域的不同深度采样得到的多个层。
本公开实施例还提供了一种多平面图像的数据处理方法,包括:
获取分块多平面图像PMPI的原始存储数据,所述PMPI包括多个子多平面图像sMPI以分别表征三维场景划分成的多个场景区域,每一sMPI包括在该sMPI所表征场景区域的不同深度采样得到的多个层;
将所述PMPI的原始存储数据转换为封装压缩存储PCS数据,所述PCS数据用于确定所述PMPI中像素的有效层的深度及像素在有效层上的颜色和透明度。
本公开一实施例还提供了一种多平面图像的编码方法,包括:
接收分块多平面图像PMIP的封装压缩存储PCS数据,所述PCS数据包括图像参数,及纹理属性部分和透明度属性部分的数据;
对所述PCS数据进行编码,得到编码后的图像参数和阿特拉斯数据;
其中,所述PMPI包括多个子多平面图像sMPI以分别表征三维场景划分成的多个场景区域,每一sMPI包括在该sMPI所表征场景区域的不同深度采样得到的多个层。
本公开一实施例还提供了一种多平面图像的解码方法,包括:
根据分块多平面图像PMPI的编码码流中的图像参数和阿特拉斯数据获取所述PMPI的图像参数,及纹理属性部分和透明度属性部分的数据;
其中,所述PMPI包括多个子多平面图像sMPI以分别表征三维场景划分成的多个场景区域,每一sMPI包括在该sMPI所表征场景区域的不同深度采样得到的多个层。
本公开一实施例还提供了一种码流,其中,所述码流通过对分块多平面图像PMPI编码生成,所述码流中包括所述PMPI的图像参数和阿特拉斯数据;其中,所述PMPI包括多个子多平面图像sMPI以分别表征三维场景划分成的多个场景区域,每一sMPI包括在该sMPI所表征场景区域的不同深度采样得到的多个层。
本公开一实施例还提供了一种多平面图像的数据处理装置,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如本公开任一实施例所述的多平面图像的数据处理方法。
本公开一实施例还提供了多平面图像的解码装置,包括处理器以及存储有可在所述处理器上运行的计 算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如本公开任一实施例所述的多平面图像的解码方法。
本公开一实施例还提供了一种多平面图像的编码装置,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如本公开任一实施例所述的多平面图像的编码方法。
本公开一实施例还提供了一种多平面图像的生成装置,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如本公开任一实施例所述的多平面图像的生成方法。
本公开一实施例还提供了一种非瞬态计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序时被处理器执行时实现如本公开任一实施例所述的多平面图像的数据处理方法或解码方法或编码方法或生成方法。
在阅读并理解了附图和详细描述后,可以明白其他方面。
附图概述
附图用来提供对本公开实施例的理解,并且构成说明书的一部分,与本公开实施例一起用于解释本公开的技术方案,并不构成对本公开技术方案的限制。
图1是4个平面层组成的一示例性的MPI的结构示意图;
图2A至图2F是一个示例性的MPI中连续6个平面层的示意图,示出了每一平面层的颜色图和透明度图;
图3是使用普通MPI表征一个三维场景的示意图;
图4是使用本公开实施例PMPI表征一个三维场景的示意图;
图5是本公开一实施例PMPI生成方法的流程图;
图6A至图6C分别是一示例性的三维场景的原始深度图,使用阈值分割算法分割后的深度图,及使用超像素算法分割后的深度图;
图7是使用本公开实施例生成的PMPI的示意图;
图8是一种视频压缩处理过程的示意图;
图9是从MPI原始存储数据转换得到的一种PCS数据的示意图;
图10是本公开实施例PMPI的数据处理方法的流程图;
图11是从PMPI原始存储数据转换得到的一种PCS数据的示意图;
图12是从PMPI原始存储数据转换得到的另一种PCS数据的示意图;
图13是从PMPI原始存储数据转换得到的又一种PCS数据的示意图;
图14是本公开一实施例PMPI编码装置的结构示意图;
图15是本公开一实施例PMPI编码方法的流程图;
图16是本公开一实施例PMPI解码方法的流程图;
图17是本公开一实施例PMPI渲染装置的结构示意图;
图18是本公开一实施例PMPI的生成装置的示意图。
详述
本公开描述了多个实施例,但是该描述是示例性的,而不是限制性的,并且对于本领域的普通技术人员来说显而易见的是,在本公开所描述的实施例包含的范围内可以有更多的实施例和实现方案。
本公开的描述中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本公开中被描述为“示例性的”或者“例如”的任何实施例不应被解释为比其他实施例更优选或更具优势。本文中的“和/或”是对关联对象的关联关系的一种描述,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。“多个”是指两个或多于两个。另外,为了便于清楚描述本公开实施例的技术方案,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本 领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。
在描述具有代表性的示例性实施例时,说明书可能已经将方法和/或过程呈现为特定的步骤序列。然而,在该方法或过程不依赖于本文所述步骤的特定顺序的程度上,该方法或过程不应限于所述的特定顺序的步骤。如本领域普通技术人员将理解的,其它的步骤顺序也是可能的。因此,说明书中阐述的步骤的特定顺序不应被解释为对权利要求的限制。此外,针对该方法和/或过程的权利要求不应限于按照所写顺序执行它们的步骤,本领域技术人员可以容易地理解,这些顺序可以变化,并且仍然保持在本公开实施例的精神和范围内。
MPI是一种无冗余的三维场景的分层表征方式。请参照图1,在一个给定的参考视点(如参考相机)作为坐标原点的空间坐标系内,MPI将场景分解为一系列平面层或球面层。以由平面层组成的MPI为例,平面层相对于参考视点正向平行并且位于不同的深度上。平面层的深度范围[dmin,dmax]需要根据实际场景的深度范围提前设定。假定MPI包括S个平面层,每一平面层的大小为W×H,则MPI的大小可以表示为W×H×S。其中,W为MPI宽度方向上的像素个数,H为MPI高度方向上的像素个数,该MPI包含W×H个像素,平面图像分辨率为W×H。图1所示的示例性的MPI包括4个层,但MPI包括的平面层或球面层的个数即层数也可以是2个,3个,5个或5个以上,如100个、200个等等。MPI的每一层均包括一个颜色图和一个透明度图,用于记录像素在该层上的颜色和透明度,一个像素在不同层上可以具有不同的颜色和透明度。
在现实场景中,大部分空间区域中是没有可见表面的。在MPI中直观体现为MPI多个层中的颜色图和透明度图的大部分区域为无效值,即不包含可见信息,如图2A至图2F所示是一个MPI的第40个平面层到第45个平面层共6个连续的平面层,其中图2A-1展示的是第40个平面层的颜色图,图2A-2展示的是第40个平面层的透明度图,图2B-1展示的是第41个平面层的颜色图,图2B-2展示的是第41个平面层的透明度图,其他图依此类推。其中,各透明度图中的黑色部分是透明度为0的无效区域。
MPI是三维场景的分层表征,也可以视为对三维场景的采样,MPI平面层上的点是采样点,从图2A至图2F的示例可以看出,该MPI中的大部分采样点位于三维场景的无效位置,这些位置没有可见表面,透明度为0,只有小部分采样点位于三维场景的有效区域,这些有效区域所在位置存在可见表面,透明度不为0。MPI可用于沉浸视频但不限于此,从沉浸式体验的角度而言,起决定性作用的是三维场景中的有效区域,而MPI的大部分采样点被浪费,导致采样效率低,最终呈现的沉浸视频分辨率也较低。
MPI的深度范围[dmin,dmax]根据场景的全局深度而设定,深度范围足以囊括场景大部分有效信息即可,其中,离参考视点最近的层的深度称为MPI的起始深度dmin,离参考视点最远的层的深度可以称为MPI的终止深度dmax。以图3所示的简单场景为例,图中的平行线用于指示MPI中各层在三维场景内的深度位置。由于该场景中有1个几何体离其他几何体较远,MPI为了表征该场景的主要信息(四个几何体),必须采用较大的深度范围,由此得到的平面层(图中以4个为例)较为稀疏。对于位于远景区域的三个几何体而言,有效信息只会出现在MPI中深度较大的两个平面层上。MPI的采样效率较低。
为了解决MPI采样效率低的问题,本公开实施例提出一种具有深度自适应变化特性的MPI,为了区别于图1、图3所示的MPI,文中将图1、图3所示的MPI称为普通MPI,将本公开实施例提出的MPI称为分块多平面图像(PMPI:Patch multiplane image)。本公开实施例的PMPI是这样的一种MPI:包括多个子多平面图像(sMPI:sub multiplane image)以分别表征三维场景划分成的多个场景区域,每一sMPI包括在该sMPI所表征场景区域的不同深度采样得到的多个层,不同sMPI的起始深度不同,终止深度可以相同。PMPI可以视为对普通MPI的扩展,普通MPI的基本单元是大小相同的多个层,用于表征一个完整的三维场景,而PMPI用多个sMPI分别表征三维场景划分成的多个场景区域,每一个场景区域可以视为三维场景的一个块(patch),而每一sMPI也可以为PMPI的一个块(patch),因此PMPI是三维场景的一种分层分块表征方式。
单个sMPI也是一种MPI,只是其表征的是三维场景划分成的场景区域,而场景区域也可视为一种特殊的三维场景,只是大小和形状是不确定的,会随着场景内容而变化。sMPI表征场景区域的方式仍可以采用普通MPI表征三维场景的方式,如一个sMPI包括在场景区域的不同深度采样的多个层,多个层的大小和形状相同,每一层包括一个颜色图和一个透明度图,多个层之间可以按设定规则如等间距或等视距分布。sMPI的深度范围的设置也可以遵循包括场景区域中大部分有效信息的原则。
由于多个sMPI分别表征的多个场景区域是根据场景深度划分成的,PMPI中多个sMPI的终止深度可 以设置为相同而起始深度设置为不同,以引入场景的深度信息,增加针对场景深度的自适应能力,使更多的采样点放置在场景的有效位置上。
图4所示的三维场景和参考视点与图3相同。采用本公开实施例的PMPI表征该三维场景时,将场景区域的个数设置为2,将三维场景划分为两个场景区域,其中一个场景区域包括位于参考视点左侧的两个几何体,其中一个几何体离参考视点最近,另一个几何体离参考视点较远,该场景区可以称为前景区域。另一个场景区域包括位于参考视点右侧的两个几何体,这两个几何体均离参考视点较远,该场景区域可以称为背景区域。
在PMPI中,上述前景区域和背景区域分别用一个sMPI表征,每一sMPI包含4个平面层,因为前景区域中两个几何体的深度差异较大,用于表征该前景区域的sMPI的场景深度需要设置的较大。背景区域中两个几何体的深度差异较小且靠近后方,用于表征该背景区域的sMPI的深度范围可以设置的较小。在不同PMPI的终止深度设置为相同时,用于表征前景区域的sMPI的起始深度较小,而用于表征背景区域的sMPI的起始深度较大。由此得到的三维场景的PMPI表征如图4所示。从该图可以看出,用于表征背景区域的sMPI的4层之间变得密集且均位于几何体附近,与图3的MPI相比,PMPI有更多的采样点位于有效位置,因此用分层分块的PMPI表征该场景,相对普通MPI的分层表征方式,采样效率更高。
需要说明的是,图4所示的示例中,前景区域和背景区域是用一个深度方向上的平面将三维场景一分为二得到的,这是示意性的。仅是为了举例说明PMPI与普通MPI的不同。在实际使用中,使用图像分割算法来自动生成PMPI时,获得的场景区域和层的形状、大小通常与物体的轮廓与图示的场景区域和层的形状、大小可以不同。
本公开一实施例提供了一种多平面图像的生成方法,如图5所示,包括:
步骤310,场景划分处理:根据三维场景的深度信息将所述三维场景划分成多个场景区域;
步骤320,分块多平面图像PMPI生成处理:生成分别表征所述多个场景区域的多个子多平面图像sMPI,每一sMPI包括在该sMPI所表征场景区域的不同深度采样得到的多个层。
在本公开一示例性的实施例中,所述多个sMPI的终止深度相同,每一sMPI的起始深度根据该sMPI所表征场景区域的最小深度确定。根据场景区域的最小深度确定sMPI的起始深度时,可以将sMPI的起始深度直接设置为场景区域的最小深度值,但本公开不局限于此,起始深度也可以取比最小深度值略大的值,如取最小的多个深度值的均值,深度值按从小到大排序后的第N个深度值(N可以设定或取为sMPI包含的像素数的某个比例)等等。场景区域的深度值可以用场景区域包含的像素的灰度值表示,可以从所述三维场景的深度图的相应区域中获取。
在本公开一示例性的实施例中,所述多个sMPI中,不同sMPI的起始深度不同或不完全相同。在将三维场景分割为多个场景区域时,在同一深度区间可能会分割出一个或多个连通区域,本实施例是将所述三维场景中深度位于同一深度区间的所有连通区域视为同一个场景区域,采用一个sMPI表征。这样多个sMPI与多个场景区域一一对应,不同sMPI的起始深度不同。但在其他实施例中,同一深度区间上存在的多个连通区域也可以作为多个场景区域,分别用多个sMPI来表征,此时PMPI包括的多个sMPI中有部分sMPI的起始深度相同,部分sMPI的起始深度不同,即多个sMPI中不同sMPI的起始深度不完全相同。
在本公开一示例性的实施例中,所述多个sMPI与所述多个场景区域一一对应,每一sMPI包括的多个层为平面层或球面层,每一层均包括纹理图部分和透明度图部分,其中,纹理图部分包含该sMPI对应的场景区域在该层位置处的颜色信息,透明度图部分包含该sMPI对应的场景区域在该层位置处的透明度信息。
在本公开一示例性的实施例中,所述多个sMPI的层的个数相同,和/或,层的分布规则也相同,所述分布规则如可以是等间距分布或等视距分布。但本公开不局限于此,在其他实施例中,PMPI中的多个sMPI的层的个数也可以设置为不同,层的分布规则也可以不同,此时会增加一些编码复杂度,但可以更为灵活地表征三维场景。
在本公开一示例性的实施例中,所述根据三维场景的深度信息将所述三维场景划分成多个场景区域,包括:
采用图像分割算法将所述三维场景的深度图分割为多个深度图区域;
根据所述多个深度图区域的边界确定所述三维场景中场景区域的边界,根据所述场景区域的边界将所 述三维场景划分成多个场景区域。
在本公开一示例性的实施例中,所述采用图像分割算法将所述三维场景的深度图分割为多个深度图区域,包括:采用阈值分割算法或超像素分割算法将所述三维场景的深度图分割为多个深度图区域。
阈值分割法是图像分割中的经典方法,它利用图像中要提取的目标与背景在灰度上的差异,通过判断图像中每一个像素点的特征属性是否满足阈值的要求,来确定图像中的该像素点是属于目标区域还是背景区域,从而将一幅灰度图像转换成二值图像。阈值可以基于人工经验选择、基于直方图选择等。作为示例,本实施例采用最大类间方差法(OTSU),OTSU是一种使用最大类间方差的自动确定阈值的方法,它根据图像的灰度特性,将图像分为前景和背景两个部分。当取最佳阈值时,两部分之间的差别应该是最大的,而OTSU算法中所采用的衡量差别的标准就是最大类间方差。但是本公开并不局限于将三维场景分割为两个场景区域,因此本公开实施例也可以采用多阈值的图像分割算法,例如基于OTSU的多阈值图像分割算法。在多阈值图像分割算法中,将三维场景分割成的场景区域的个数可以通过算法自适应地产生,并不一定是固定的。
“超像素”指具有相似纹理、颜色、亮度等特征的相邻像素构成的有一定视觉意义的不规则像素块,它利用像素之间特征的相似性将像素分组,用少量的超像素代替大量的像素来表达图片特征,已经广泛用于图像分割、姿势估计、目标跟踪、目标识别等计算机视觉应用。常见的超像素分割算法包括Graph-based、NCut、Turbopixel、Quick-shift、Graph-cut a、Graph-cut b和SLIC等,本实施例有用的是SLIC(simple linear iterativeclustering),即简单的线性迭代聚类。SLIC算法将彩色图像转化为CIELAB颜色空间和XY坐标下的5维特征向量,然后对5维特征向量构造距离度量标准,对图像像素进行局部聚类的过程,能生成紧凑、近似均匀的超像素。
图像分割算法并不只有以上两种,还可以是区域分割算法、边缘分割算法等等,本公开实施例也可以采用其他的图像分割算法分割深度图。
在本公开一示例性的实施例中,所述采用图像分割算法将所述三维场景的深度图分割为多个深度图区域,包括:
在N小于设定的阈值时,采用阈值分割算法将所述三维场景的深度图分割为N个深度图区域;
在N大于或等于所述设定的阈值时,采用超像素分割算法将所述三维场景的深度图分割为N个深度图区域;
其中,N为要将所述三维场景划分成的场景区域的个数,可以预先设定。
本实施例在要分割的场景区域较少时,采用阈值分割算法对三维场景进行分割。在要分割的场景区域较多时,采用超像素分割算法对三维场景进行分割,可以充分利用不同算法的优势,取得更好的分割效果。
在本公开的一示例中,根据图7A所示的参考视角下的三维场景的深度图(Depth map)对三维场景进行分割,在实际使用中可以根据场景深度的复杂度给定要分割成的场景区域的个数A。将深度图划分为A个场景区域,根据A的数值对深度图采用不同的分割方法:
如果A<10,采用Otsu阈值分割算法对深度图进行分割,即根据设定的A-1个阈值,将深度图按照灰度值分割为处于不同深度区间的A个区域。此时得到的连通区域数可能大于A,但是有的连通区域的灰度值处于同一深度区间中,本实施例将灰度值处于同一深度区间的所有连通区域作为同一深度图区域,使得场景区域的个数为A;
如果A≥10,采用SLIC超像素分割的方法对深度图进行分割,得到A个的连通区域,将得到A个的连通区域作为A个深度图区域。
经过以上划分处理,可以获得多个深度图区域的边界。图7B所示是采用Otsu阈值分割算法对深度图进行分割后的示意图。图7C所示是采用SLIC超像素分割算法对深度图分割后的示意图。
根据深度图区域的边界确定三维场景中场景区域的边界,就可以将所述三维场景划分成多个场景区域。如果深度图与PMPI的分辨率相同,可以直接将多个深度图区域的边界作为三维场景中场景区域的边界,如果深度图与PMPI的分辨率不同,可以对多个深度图区域的边界进行上采样或下采样作分辨率的适配,再作为三维场景中场景区域的边界。
在本示例中,PMPI中每一sMPI的终止深度dmax相同且预先设定,每个区域的起始深度等于该sMPI所表征场景区域在深度图对应区域中的最小深度值。每一sMPI的层数相同且按照相同规律分布(等距分布或者等视差分布)。
相比于普通MPI,本公开实施例生成的PMPI的形状和起始深度更加灵活多变,且随着场景不同区域的景深变化而自适应地变化。由此产生的结果是PMPI的采样点聚集于场景的可见表面,采样效率得到提升。同时,在每一sMPI的层数与普通MPI的层数相同的情况下,PMPI的平面层分布更为密集,效果上相当于提供了更多层数的普通MPI,但采样点数没有增加。如图7所示,更密集的深度层使得根据PMPI生成的最终的沉浸视频的细节保留更多,质量更好。
MPI经过视频压缩后,可以展示为沉浸视频。图8展示了相应的视频处理过程。在编码端,视频采集装置采集的三维场景图像(如3D相机拍摄的图像)经过预处理为MPI,MPI经压缩编码后作为码流传输。在解码端,对码流解码和后处理,以沉浸视频形式显示和播放。
在动态图像专家组(MPEG:Moving Picture Experts Group)关于沉浸视频的标准MPEG-I中,MPI的纹理图和透明度图等的原始存储(Raw storge)数据经过压缩之后得到的封装压缩存储(PCS)数据以及参考视点相机参数等图像数据可以作为MPEG中的沉浸式视频测试模型(TMIV:Test model of immersive video)的输入。MPI(也称为MPI帧)在输入TMIV之前,需要对其进行预处理,将其进行转换为PCS数据。
以图像分辨率为W×H,层数为S的MPI,即尺寸为W×H×S的MPI为例,可以转化为PCS数据形式,PCS数据中记录了MPI中每一像素的相关参数,包括:
N i,j:像素(i,j)的有效层的个数;
C i,j,k:像素(i,j)的第k个有效层位置处的颜色数据如颜色值;
D i,j,k:像素(i,j)的第k个有效层的索引(index)(D i,j,k∈[1,S]);
T i,j,k:像素(i,j)的第k个有效层的位置处的透明度值。
对于普通MPI而言,像素(i,j)包含在MPI的S个层中,像素(i,j)的透明度值不为0的层称为像素(i,j)的有效层。
图9所示是按照上述参数来封装的MPI的一个示例。
可以看出,尺寸为W×H×S的MPI的原始存储数据并不会完全保留到PCS数据中。在实际情况中,一个像素(pixel)在一些平面层的值可能是无效的(即该像素是完全透明的,没有有效信息)。因而对于像素(i,j),只需要将像素(i,j)在S个平面层的N i,j个有效平面层的信息保留下来即可。值得注意的是,每一个像素对应的有效平面层数是不确定的。显然,压缩后的PCS数据减少了MPI占用的存储空间。除此之外,PCS数据的存储方式减少了后续解码过程中的存储器访问次数。已知MPI每个平面的尺寸W×H,只需要两次存储器访问操作即可将整个MPI帧读取到内存中。
对于普通MPI而言,在设定时间内的多个MPI帧的起始深度和终止深度是相同的,且MPI中多个层的分布规则是已知的。根据预先设置的起始深度和终止深度就可以计算得到MPI中每一层的深度。因此单个MPI的PCS数据中并不需要额外地记录像素的有效层的深度信息。
PMPI与普通MPI一样,也可以作为视频帧进行压缩编码,PMPI可以根据三维场景的图像直接生成,也可以在普通MPI的基础上生成。对PMPI编码之前,也需要将PMPI的原始存储数据转换为PCS数据。
如上文可知,PMPI在普通MPI的基础上进行了改造,PMPI包括多个sMPI,每一sMPI包括的多个层中,离参考视点最近的一个层的深度等于该sMPI的起始深度,离参考视点最远的一个层的深度等于该sMPI的终止深度,其他层的深度介于该sMPI起始深度和终止深度之间,多个层可以按照设定的规则例如等间距或等视距分布。因此在获知sMPI的起始深度和终止深度后,可以计算出sMPI每一层的深度。不同PMPI中sMPI的终止深度可以设置为相同。但是PMPI中sMPI的起始深度与其表征的场景区域的深度有关,并不是预先设定的。因此将本公开实施例PMPI的原始存储数据转化为PCS数据时,需要提供额外的起始深度信息以使解码端准确计算出像素的有效层的深度。
本公开一实施例提供了种多平面图像的数据处理方法,如图10所示,包括:
步骤410,获取分块多平面图像PMPI的原始存储数据;
所述PMPI包括多个子多平面图像sMPI以分别表征三维场景划分成的多个场景区域,每一sMPI包括在该sMPI所表征场景区域的不同深度采样得到的多个层。
步骤420,将所述PMPI的原始存储数据转换为封装压缩存储PCS数据,所述PCS数据用于确定所述PMPI中像素的有效层的深度及像素在有效层上的颜色和透明度。
本公开一示例性的实施例性中,所述像素的有效层指所述PMPI中包含该像素的sPMI中该像素的透明度不为0的层。像素的有效层可以有一层或多层,根据实际场景而定。
本公开一示例性的实施例性中,所述PMPI包括的多个sMPI中,不同sMPI的起始深度不同或不完全相同。
本公开一示例性的实施例性中,提出了第一种适用于PMPI的PCS数据格式。所述PCS数据包括所述PMPI中每一像素的以下参数:
该像素的有效层的个数;
该像素所在sMPI的起始深度;
该像素在每一有效层上的颜色数据、透明度数据和该有效层在该像素所在sMPI中的层索引。
在该示例中,所述多个sMPI的起始深度的数据表现为所述PMPI中每一像素所在sMPI的起始深度,根据该像素所在sMPI的起始深度,结合设置的sMPI的终止深度和层的分布规则,就可以计算出该sMPI包括的每一层的深度,再结合该像素的每一有效层在该像素所在sMPI中的层索引,就可以知道该像素的每一有效层的深度。对PMPI中每一像素,该像素的每一有效层的深度和在每一有效层上的颜色和透明度都获取之后,就可以完全恢复出PMPI的图像。
在本实施例中,假定PMPI的图像分辨率为W×H,PMPI包括的sMPI的个数为M,每一sMPI的层数均为S。则该PMPI的PCS数据格式如图11所示,PMPI的PCS数据包括以下参数:
N i,j:像素(i,j)的有效层的个数,i∈[1,H],j∈[1,W];
E i,j:像素(i,j)所在sMPI的起始深度;
C i,j,k:像素(i,j)的第k个有效层位置处的颜色数据如颜色值,k∈[1,N i,j];
D i,j,k:像素(i,j)的第k个有效层的索引(index),D i,j,k∈[1,S];
T i,j,k:像素(i,j)的第k个有效层的位置处的透明度值。
相比于图9所示的MPI的PCS数据格式增加了参数Ei,j,即像素(i,j)所在sMPI的起始深度,用于计算出像素的每一有效层的深度。
本公开一示例性的实施例性中,提出了第二种适用于PMPI的PCS数据格式。所述PCS数据包括:
所述PMPI中每一sMPI的起始深度;及
所述PMPI中每一像素的以下参数:
该像素的有效层的个数;
该像素所在sMPI的索引;
该像素在每一有效层上的颜色数据、透明度数据和该有效层在该像素所在sMPI中的层索引。
在本实施例中,假定PMPI的图像分辨率为W×H,PMPI包括的sMPI的个数为M,每一sMPI的层数均为S。则该PMPI的PCS数据格式如图12所示,PMPI的PCS数据包括以下参数:
DP m:第m个sMPI的起始深度,m∈[1,M];
N i,j:像素(i,j)的有效层的个数,i∈[1,H],j∈[1,W];
I i,j:像素(i,j)所在sMPI的索引,I i,j∈[1,M];
C i,j,k:像素(i,j)的第k个有效层位置处的颜色数据如颜色值,k∈[1,N i,j];
D i,j,k:像素(i,j)的第k个有效层的索引(index),D i,j,k∈[1,S];
T i,j,k:像素(i,j)的第k个有效层的位置处的透明度值。
本实施例与图11所示的实施例相比,图11所示实施例是直接使用参数Ei,j来表示像素(i,j)所在sMPI的起始深度。而本实施例用每一sMPI的起始深度DP m和像素(i,j)所在sMPI的索引共同来确定像素(i,j)所在sMPI的起始深度。其他参数是类似的。
本公开一示例性的实施例性中,提出了第三种适用于PMPI的PCS数据格式。所述PCS数据包括:
所述PMPI中每一sMPI包括的每一层的深度(其中每一sMPI距离参考视点最近的层的深度即该sMPI的起始深度);及
所述PMPI中每一像素的以下参数:
该像素的有效层的个数;
该像素在每一有效层上的颜色数据、透明度数据和该有效层在所述PMPI中的层索引。
在本实施例中,假定PMPI的图像分辨率为W×H,PMPI包括的sMPI的个数为M,每一sMPI的层数均为S。则该PMPI的PCS数据格式如图13所示,PMPI的PCS数据包括以下参数:
DP l:第l个sMPI的起始深度,l∈[1,M×S];
N i,j:像素(i,j)的有效层的个数,i∈[1,H],j∈[1,W];
C i,j,k:像素(i,j)的第k个有效层位置处的颜色数据如颜色值,k∈[1,N i,j];
D’ i,j,k:像素(i,j)的第k个有效层的索引(index),D’ i,j,k∈[1,M×S];
T i,j,k:像素(i,j)的第k个有效层的位置处的透明度值。
本实施例是将PMPI中所有层(每一sMPI中每一层)统一编号,记录每一层的起始深度。并将像素(i,j)的第k个有效层的索引用该有效层在PMPI中的层索引表示。利用PMPI中每一层的起始深度和像素的每一有效层在PMPI中的层索引来确定像素的每一有效层的深度。
PCS数据中用于确定所述PMPI中像素的有效层的深度及像素在有效层上的颜色和透明度的参数,除了上述实施例提供的参数组合外,也可以采用其他的参数组合方式,本公开对此不做局限。
以上适用于PMPI的PCS数据格式通过在PCS数据中添加sMPI的起始深度的相关信息,使得解码端可以结合sMPI的起始深度计算出像素的有效层的深度,从而根据压缩后的PCS数据准确恢复出PMPI的图像。
PMPI中sMPI的起始深度具有自适应变化的特性,需要记录在PMPI的原始数据中。本公开一示例性的实施例性中,所述PMPI的原始存储数据包括每一sMPI中每一层的颜色图和透明度图的数据,以及每一sMPI包含的像素和起始深度的数据。在一个示例中,所述每一sMPI包含的像素和起始深度的数据使用起始深度图表示,所述起始深度图用于指示所述PMPI中每一像素所在sMPI的起始深度,具体地,可以在所述PMPI中每一像素的位置处指示该像素所在sMPI的起始深度。该起始深度图同时也指示了每一sMPI包括的像素,可以体现sMPI的形状、大小和位置。
本公开一示例性的实施例性中,所述PMPI采用如本公开任一实施例所述的生成方法生成,所述PMPI中的每一像素包含在一个sMPI中,且该sMPI包括的多个层均记录有该像素的颜色值和透明度值。普通PMI中的像素包含在普通PMI的所有层中,而PMPI是分块的,因此PMPI中的像素包含在一个sMPI的所有层中,文中将包含该像素的sMPI称为该像素所在的sMPI。在像素所在sMPI的所有层中均记录有该像素的颜色值和透明度值,但这些层中可能只有部分是该像素的有效层。
图14所示是一个可以用于本公开实施例的MPI编码装置的架构图,MPI编码装置10的输入数据是源MPI(如PMPI)的PCS数据,在本示例中,包括图像参数(View parameters)(如参考视点相机参数等)、纹理属性部分(Texture Attribute component)的数据和透明度属性部分(Transparency Attribute component)的数据等。
如图14所示,MPI编码装置10包括:
MPI掩膜生成(Create mask from MPI)单元101设置为根据输入数据生成MPI掩膜。在一个示例中,可以根据透明度阈值对MPI层中的像素点(也可称为采样点)进行筛选,得到每一层的掩膜(mask)。此举是为了将每一层上透明度大的位置(也可称为像素)和透明度小的位置(也可称为像素)进行区分,将透明度大的位置屏蔽掉,以减少数据量。MPI掩膜生成单元101对一段时间(intra-period)内的所有MPI帧实施上述操作。假设MPI帧的尺寸为W×H×S,一段时间(intra-period)内包含的帧数为M。则经过MPI掩膜生成单元101的处理之后,得到M个W×H×S的掩膜(mask)。
MPI掩膜聚合(Aggregate MPI masks)单元103设置为对M个W×H×S的掩膜中位于相同层上的多个掩膜取并集,得到一个W×H×S的掩膜。
有效像素聚类(Cluster Active pixels)单元105设置为将每一层的掩膜中透明度大于阈值的区域(有效 信息区域)聚类为一系列的簇(cluster);
簇分割(Split Clusters)单元107设置为将有效像素聚类单元105聚类得到的簇进行分割,得到经过分割处理后的簇;
块封装(Pack patches)单元109设置为将每一个块(如包含簇的矩形区域)对应的纹理图和透明度图重新组合成一张图,编码为阿特拉斯(atlas)数据进行传输。
视频数据生成(Generate video data)单元111设置为根据块封装单元109输出的atlas数据生成视频数据进行传输,所述视频数据包括纹理属性视频数据(Texture attribute video data(raw))、透明度属性视频数据(Transparency attribute video data(raw))等。
参数编码单元113设置为根据源MPI数据编码得到编码后图像参数,所述编码后图像参数可以包括图像参数列表(View parameters list)、参数集(Parameter set)等。
基于上述编码装置架构对MPI编码时,先将MPI中的采样点根据透明度阈值进行筛选,得到每个平面层的掩膜(mask)。假设MPI的尺寸为W×H×S,设定的一段时间(intra-period)内包含的帧数为M,对所述一段时间内的所有MPI帧实施上述操作,得到M个W×H×S的掩膜(mask)。接着将相同平面层上的掩膜取并集,即得到一个W×H×S的掩膜。再将每一层的掩膜中透明度大于阈值的区域(有效信息区域)通过聚类、分割为一系列簇(cluster)。Cluster经过融合、分解等步骤得到patch。然后将每一小块的patch对应的纹理图和透明度图分别重新组合成一张图,编码为atlas数据进行传输。
本公开一实施例提供了一种多平面图像的编码方法,可用于PMPI的编码,如图15所示,所述编码方法包括:
步骤510,接收分块多平面图像PMIP的封装压缩存储PCS数据,所述PCS数据包括图像参数,及纹理属性部分和透明度属性部分的数据;
步骤520,对所述PCS数据进行编码,得到编码后的图像参数和阿特拉斯数据;
其中,所述PMPI包括多个子多平面图像sMPI以分别表征三维场景划分成的多个场景区域,每一sMPI包括在该sMPI所表征场景区域的不同深度采样得到的多个层。
在本公开一示例性的实施例中,所述多个sMPI中,不同sMPI的起始深度不同或不完全相同。
在本公开一示例性的实施例中,所述图像参数包括图像的深度信息,所述深度信息包括:
所述PMPI中每一像素所在sMPI的起始深度,及所述PMPI中每一像素的每一有效层在该像素所在sMPI中的层索引;或者
所述PMPI中每一sMPI的起始深度,所述PMPI中每一像素所在sMPI的索引,及所述PMPI中每一像素的每一有效层在该像素所在sMPI中的层索引;或者
所述PMPI中每一sMPI包括的每一层的深度,及所述PMPI中每一像素的每一有效层在所述PMPI中的层索引。
在本公开一示例性的实施例中,所述PCS数据可以按照如本公开任一实施例所述的的数据处理方法、从所述PMPI的原始存储数据转换得到。
在本公开一示例性的实施例中,所述深度信息还包括为PMPI中sPMI设置的终止深度和层数。
本公开一实施例提供了一种多平面图像的解码方法,如图16所示,包括:
步骤610,接收分块多平面图像PMPI的编码码流;
步骤620,根据所述PMPI的编码码流中的图像参数和阿特拉斯数据获取所述PMPI的图像参数,及纹理属性部分和透明度属性部分的数据;
其中,所述PMPI包括多个子多平面图像SMPI以分别表征三维场景划分成的多个场景区域,每一sMPI包括在该sMPI所表征场景区域的不同深度采样得到的多个层。
在本公开一示例性的实施例中,所述多个sMPI中,不同sMPI的起始深度不同或不完全相同。
在本公开一示例性的实施例中,所述图像参数包括图像的深度信息,所述深度信息包括:
所述PMPI中每一像素所在sMPI的起始深度,及所述PMPI中每一像素的每一有效层在该像素所在 sMPI中的层索引;或者
所述PMPI中每一sMPI的起始深度,所述PMPI中每一像素所在sMPI的索引,及所述PMPI中每一像素的每一有效层在该像素所在sMPI中的层索引;或者
所述PMPI中每一sMPI包括的每一层的深度,及所述PMPI中每一像素的每一有效层在所述PMPI中的层索引。
在本公开一示例性的实施例中,所述深度信息还包括为为PMPI中sPMI设置的终止深度和层数。
图17所示是一个可以用于本公开实施例的示例性的MPI渲染装置的架构,MPI渲染装置20的输入数据包括解码通道单元(Decoded access unit)的数据和视角参数(Viewport parameters),所述解码通道单元的数据包括图像参数(View parameters)、参数集(Parameter sets)和特定Atlas(Unique Atlas)数据,而特定Atlas可以包括Atlas参数列表(Atlas parameter list)、块参数列表(Patch parameter list)、重组图像块与MPI图像块对照表(Block to patch map)、纹理视频数据(Texture video data)、透明度视频数据(Transparency video data)等。
如图所示,MPI渲染装置20包括:
层深度值解码(Layer depth values decoding)单元201,设置为解析MPI编码码流,提取出MPI中层的深度信息;
图像合成单元(View synthesis)单元203,设置为根据MPI中层的深度信息和解码通道单元的数据进行图像合成;
图像修复(Inpainting)单元205,设置为对合成后的图像进行修复;
视角空间处理(Viewing space handling)单元,设置为根据视角参数对修复后的图像进和地视角空间处理。
基于上述架构的MPI渲染装置,在解码时恢复MPI,同时读取附带的图像深度信息。由平面深度和MPI的纹理图和透明度图合成新视图。
本公开一实施例还提供了一种码流,其中,所述码流通过对分块多平面图像PMPI编码生成,所述码流中包括所述PMPI的图像参数和阿特拉斯数据;其中,所述PMPI包括多个子多平面图像sMPI以分别表征三维场景划分成的多个场景区域,每一sMPI包括在该sMPI所表征场景区域的不同深度采样得到的多个层。
在本公开一示例性的实施例中,所述图像参数包括图像的深度信息,所述深度信息包括:
所述PMPI中每一像素所在sMPI的起始深度,及所述PMPI中每一像素的每一有效层在该像素所在sMPI中的层索引;或者
所述PMPI中每一sMPI的起始深度,所述PMPI中每一像素所在sMPI的索引,及所述PMPI中每一像素的每一有效层在该像素所在sMPI中的层索引;或者
所述PMPI中每一sMPI包括的每一层的深度,及所述PMPI中每一像素的每一有效层在所述PMPI中的层索引。
在本公开一示例性的实施例中,所述多个sMPI中,不同sMPI的起始深度不同或不完全相同。
本公开一实施例还提供了一种多平面图像的生成装置,参见图18,包括处理器5以及存储有可在所述处理器5上运行的计算机程序的存储器6,其中,所述处理器5执行所述计算机程序时实现如本公开任一实施例所述的多平面图像的生成方法。
本公开一实施例还提供了一种多平面图像数据处理装置,如图18所示,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如本公开任一实施例所述的多平面图像的数据处理方法。
本公开一实施例还提供了一种多平面图像的编码装置,参见图18,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如本公开任一实施例所述的多平面图像的编码方法。
本公开一实施例还提供了一种多平面图像的解码装置,参见图18,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如本公开任一实施例所述的多平面图像的渲染方法。
本公开一实施例还提供了一种非瞬态计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序时被处理器执行时实现如本公开任一实施例所述的方法。
在一个或多个示例性实施例中,所描述的功能可以硬件、软件、固件或其任一组合来实施。如果以软件实施,那么功能可作为一个或多个指令或代码存储在计算机可读介质上或经由计算机可读介质传输,且由基于硬件的处理单元执行。计算机可读介质可包含对应于例如数据存储介质等有形介质的计算机可读存储介质,或包含促进计算机程序例如根据通信协议从一处传送到另一处的任何介质的通信介质。以此方式,计算机可读介质通常可对应于非暂时性的有形计算机可读存储介质或例如信号或载波等通信介质。数据存储介质可为可由一个或多个计算机或者一个或多个处理器存取以检索用于实施本公开中描述的技术的指令、代码和/或数据结构的任何可用介质。计算机程序产品可包含计算机可读介质。
举例来说且并非限制,此类计算机可读存储介质可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用来以指令或数据结构的形式存储所要程序代码且可由计算机存取的任何其它介质。而且,还可以将任何连接称作计算机可读介质举例来说,如果使用同轴电缆、光纤电缆、双绞线、数字订户线(DSL)或例如红外线、无线电及微波等无线技术从网站、服务器或其它远程源传输指令,则同轴电缆、光纤电缆、双纹线、DSL或例如红外线、无线电及微波等无线技术包含于介质的定义中。然而应了解,计算机可读存储介质和数据存储介质不包含连接、载波、信号或其它瞬时(瞬态)介质,而是针对非瞬时有形存储介质。如本文中所使用,磁盘及光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)、软磁盘或蓝光光盘等,其中磁盘通常以磁性方式再生数据,而光盘使用激光以光学方式再生数据。上文的组合也应包含在计算机可读介质的范围内。
可由例如一个或多个数字信号理器(DSP)、通用微处理器、专用集成电路(ASIC)现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一个或多个处理器来执行指令。因此,如本文中所使用的术语“处理器”可指上述结构或适合于实施本文中所描述的技术的任一其它结构中的任一者。另外,在一些方面中,本文描述的功能性可提供于经配置以用于编码和解码的专用硬件和/或软件模块内,或并入在组合式编解码器中。并且,可将所述技术完全实施于一个或多个电路或逻辑元件中。
本公开实施例的技术方案可在广泛多种装置或设备中实施,包含无线手机、集成电路(IC)或一组IC(例如,芯片组)。本公开实施例中描各种组件、模块或单元以强调经配置以执行所描述的技术的装置的功能方面,但不一定需要通过不同硬件单元来实现。而是,如上所述,各种单元可在编解码器硬件单元中组合或由互操作硬件单元(包含如上所述的一个或多个处理器)的集合结合合适软件和/或固件来提供。

Claims (32)

  1. 一种多平面图像的生成方法,包括:
    场景划分处理,包括:根据三维场景的深度信息将所述三维场景划分成多个场景区域;及
    分块多平面图像PMPI生成处理,包括:生成分别表征所述多个场景区域的多个子多平面图像sMPI,每一sMPI包括在该sMPI所表征场景区域的不同深度采样得到的多个层。
  2. 根据权利要求1所述的生成方法,其中:
    所述多个sMPI的终止深度相同,每一sMPI的起始深度根据该sMPI所表征场景区域的最小深度确定。
  3. 根据权利要求1所述的生成方法,其中:
    所述多个sMPI中,不同sMPI的起始深度不同或不完全相同。
  4. 根据权利要求1所述的生成方法,其中:
    所述多个sMPI与所述多个场景区域一一对应,每一sMPI包括的多个层为平面层或球面层,每一层均包括纹理图部分和透明度图部分。
  5. 根据权利要求1所述的生成方法,其中:
    所述多个sMPI的层的个数相同,和/或,所述多个sMPI的层的分布规则相同。
  6. 根据权利要求1所述的生成方法,其中:
    所述根据三维场景的深度信息将所述三维场景划分成多个场景区域,包括:
    采用图像分割算法将所述三维场景的深度图分割为多个深度图区域;
    根据所述多个深度图区域的边界确定所述三维场景中场景区域的边界,根据所述场景区域的边界将所述三维场景划分成多个场景区域。
  7. 根据权利要求6所述的生成方法,其中:
    所述采用图像分割算法将所述三维场景的深度图分割为多个深度图区域,包括:
    采用阈值分割算法或超像素分割算法将所述三维场景的深度图分割为多个深度图区域。
  8. 根据权利要求6所述的生成方法,其中:
    所述采用图像分割算法将所述三维场景的深度图分割为多个深度图区域,包括:
    在N小于设定的阈值时,采用阈值分割算法将所述三维场景的深度图分割为N个深度图区域;
    在N大于或等于所述设定的阈值时,采用超像素分割算法将所述三维场景的深度图分割为N个深度图区域;
    其中,N为要将所述三维场景划分成的场景区域的个数。
  9. 一种多平面图像的数据处理方法,包括:
    获取分块多平面图像PMPI的原始存储数据,所述PMPI包括多个子多平面图像sMPI以分别表征三维场景划分成的多个场景区域,每一sMPI包括在该sMPI所表征场景区域的不同深度采样得到的多个层;
    将所述PMPI的原始存储数据转换为封装压缩存储PCS数据,所述PCS数据用于确定所述PMPI中像素的有效层的深度及像素在有效层上的颜色和透明度。
  10. 根据权利要求9所述的数据处理方法,其中:
    所述像素的有效层指所述PMPI中包含该像素的sPMI中该像素的透明度不为0的层。
  11. 根据权利要求9所述的数据处理方法,其中:
    所述PMPI包括的多个sMPI中,不同sMPI的起始深度不同或不完全相同。
  12. 根据权利要求9所述的数据处理方法,其中:
    所述PCS数据包括所述PMPI中每一像素的以下参数:
    该像素所在sMPI的起始深度;及
    该像素在每一有效层上的颜色数据、透明度数据和该有效层在该像素所在sMPI中的层索引。
  13. 根据权利要求9所述的数据处理方法,其中:
    所述PCS数据包括:
    所述PMPI中每一sMPI的起始深度;及
    所述PMPI中每一像素的以下参数:
    该像素所在sMPI的索引;
    该像素在每一有效层上的颜色数据、透明度数据和该有效层在该像素所在sMPI中的层索引。
  14. 根据权利要求9所述的数据处理方法,其中:
    所述PCS数据包括:
    所述PMPI中每一sMPI包括的每一层的深度;及
    所述PMPI中每一像素的以下参数:
    该像素在每一有效层上的颜色数据、透明度数据和该有效层在所述PMPI中的层索引。
  15. 根据权利要求12或13或14所述的数据处理方法,其中:
    所述PCS数据还包括:所述PMPI中每一像素的有效层的个数。
  16. 根据权利要求9所述的数据处理方法,其中:
    所述PMPI的原始存储数据包括每一sMPI中每一层的颜色图和透明度图的数据,以及每一sMPI包含的像素和起始深度的数据。
  17. 根据权利要求16所述的数据处理方法,其中:
    所述每一sMPI包含的像素和起始深度的数据使用起始深度图表示,所述起始深度图用于指示所述PMPI中每一像素所在sMPI的起始深度。
  18. 根据权利要求9所述的数据处理方法,其中:
    所述PMPI采用如权利要求1至8中任一所述的生成方法生成,所述PMPI中的每一像素包含在一个sMPI中,且该sMPI包括的多个层均记录有该像素的颜色值和透明度值。
  19. 一种多平面图像的编码方法,包括:
    接收分块多平面图像PMIP的封装压缩存储PCS数据,所述PCS数据包括图像参数,及纹理属性部分和透明度属性部分的数据;
    对所述PCS数据进行编码,得到编码后的图像参数和阿特拉斯数据;
    其中,所述PMPI包括多个子多平面图像sMPI以分别表征三维场景划分成的多个场景区域,每一sMPI包括在该sMPI所表征场景区域的不同深度采样得到的多个层。
  20. 根据权利要求19所述的编码方法,其中:
    所述多个sMPI中,不同sMPI的起始深度不同或不完全相同。
  21. 根据权利要求19所述的编码方法,其中:
    所述图像参数包括图像的深度信息,所述深度信息包括:
    所述PMPI中每一像素所在sMPI的起始深度,及所述PMPI中每一像素的每一有效层在该像素所在sMPI中的层索引;或者
    所述PMPI中每一sMPI的起始深度,所述PMPI中每一像素所在sMPI的索引,及所述PMPI中每一像素的每一有效层在该像素所在sMPI中的层索引;或者
    所述PMPI中每一sMPI包括的每一层的深度,及所述PMPI中每一像素的每一有效层在所述PMPI中的层索引。
  22. 根据权利要求19所述的编码方法,其中:
    所述PCS数据按照如权利要求9至18中任一项所述的数据处理方法、从所述PMPI的原始存储数据转换得到。
  23. 一种多平面图像的解码方法,包括:
    根据分块多平面图像PMPI的编码码流中的图像参数和阿特拉斯数据获取所述PMPI的图像参数,及纹理属性部分和透明度属性部分的数据;
    其中,所述PMPI包括多个子多平面图像sMPI以分别表征三维场景划分成的多个场景区域,每一sMPI包括在该sMPI所表征场景区域的不同深度采样得到的多个层。
  24. 根据权利要求23所述的解码方法,其中:
    所述PMPI包括的多个sMPI中,不同sMPI的起始深度不同或不完全相同。
  25. 根据权利要求23所述的解码方法,其中:
    所述图像参数包括图像的深度信息,所述深度信息包括:
    所述PMPI中每一像素所在sMPI的起始深度,及所述PMPI中每一像素的每一有效层在该像素所在sMPI中的层索引;或者
    所述PMPI中每一sMPI的起始深度,所述PMPI中每一像素所在sMPI的索引,及所述PMPI中每一像素的每一有效层在该像素所在sMPI中的层索引;或者
    所述PMPI中每一sMPI包括的每一层的深度,及所述PMPI中每一像素的每一有效层在所述PMPI中的层索引。
  26. 一种码流,其中,所述码流通过对分块多平面图像PMPI编码生成,所述码流中包括所述PMPI的图像参数和阿特拉斯数据;其中,所述PMPI包括多个子多平面图像sMPI以分别表征三维场景划分成的多个场景区域,每一sMPI包括在该sMPI所表征场景区域的不同深度采样得到的多个层。
  27. 根据权利要求26所述的解码方法,其中:
    所述图像参数包括图像的深度信息,所述深度信息包括:
    所述PMPI中每一像素所在sMPI的起始深度,及所述PMPI中每一像素的每一有效层在该像素所在sMPI中的层索引;或者
    所述PMPI中每一sMPI的起始深度,所述PMPI中每一像素所在sMPI的索引,及所述PMPI中每一像素的每一有效层在该像素所在sMPI中的层索引;或者
    所述PMPI中每一sMPI包括的每一层的深度,及所述PMPI中每一像素的每一有效层在所述PMPI中的层索引。
  28. 一种多平面图像的生成装置,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如权利要求1至8中任一所述的多平面图像的生成方法。
  29. 一种多平面图像的数据处理装置,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如权利要求9至18中任一所述的多平面图像的数据处理方法。
  30. 一种多平面图像的编码装置,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如权利要求19至22中任一所述的多平面图像的编码方法。
  31. 一种多平面图像的解码装置,包括处理器以及存储有可在所述处理器上运行的计算机程序的存储器,其中,所述处理器执行所述计算机程序时实现如权利要求23至25中任一所述的多平面图像的解码方法。
  32. 一种非瞬态计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序时被处理器执行时实现如权利要求1至25中任一所述的方法。
PCT/CN2021/103233 2021-06-29 2021-06-29 多平面图像的生成、数据处理、编码和解码方法、装置 WO2023272510A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2021/103233 WO2023272510A1 (zh) 2021-06-29 2021-06-29 多平面图像的生成、数据处理、编码和解码方法、装置
CN202180099763.4A CN117561715A (zh) 2021-06-29 2021-06-29 多平面图像的生成、数据处理、编码和解码方法、装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/103233 WO2023272510A1 (zh) 2021-06-29 2021-06-29 多平面图像的生成、数据处理、编码和解码方法、装置

Publications (1)

Publication Number Publication Date
WO2023272510A1 true WO2023272510A1 (zh) 2023-01-05

Family

ID=84689833

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/103233 WO2023272510A1 (zh) 2021-06-29 2021-06-29 多平面图像的生成、数据处理、编码和解码方法、装置

Country Status (2)

Country Link
CN (1) CN117561715A (zh)
WO (1) WO2023272510A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116095294A (zh) * 2023-04-10 2023-05-09 深圳臻像科技有限公司 根据深度值渲染分辨率的三维光场图像编码方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180033157A1 (en) * 2015-02-25 2018-02-01 Bae Systems Plc Image processing method and apparatus for determining depth within an image
US20200228774A1 (en) * 2019-01-14 2020-07-16 Fyusion, Inc. Free-viewpoint photorealistic view synthesis from casually captured video
CN112055213A (zh) * 2020-01-07 2020-12-08 谷歌有限责任公司 用于生成压缩图像的方法、系统和介质
CN112233165A (zh) * 2020-10-15 2021-01-15 大连理工大学 一种基于多平面图像学习视角合成的基线扩展实现方法
WO2021048276A1 (en) * 2019-09-13 2021-03-18 Interdigital Vc Holdings France, Sas Multiview multiscale methods and apparatus for view synthesis

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180033157A1 (en) * 2015-02-25 2018-02-01 Bae Systems Plc Image processing method and apparatus for determining depth within an image
US20200228774A1 (en) * 2019-01-14 2020-07-16 Fyusion, Inc. Free-viewpoint photorealistic view synthesis from casually captured video
WO2021048276A1 (en) * 2019-09-13 2021-03-18 Interdigital Vc Holdings France, Sas Multiview multiscale methods and apparatus for view synthesis
CN112055213A (zh) * 2020-01-07 2020-12-08 谷歌有限责任公司 用于生成压缩图像的方法、系统和介质
CN112233165A (zh) * 2020-10-15 2021-01-15 大连理工大学 一种基于多平面图像学习视角合成的基线扩展实现方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116095294A (zh) * 2023-04-10 2023-05-09 深圳臻像科技有限公司 根据深度值渲染分辨率的三维光场图像编码方法及系统

Also Published As

Publication number Publication date
CN117561715A (zh) 2024-02-13

Similar Documents

Publication Publication Date Title
US11010955B2 (en) Point cloud mapping
JP7303992B2 (ja) 点群表現を介したメッシュ圧縮
JP6939883B2 (ja) 自由視点映像ストリーミング用の復号器を中心とするuvコーデック
JP7371691B2 (ja) ホモグラフィ変換を使用した点群符号化
CN112017228A (zh) 一种对物体三维重建的方法及相关设备
WO2023272510A1 (zh) 多平面图像的生成、数据处理、编码和解码方法、装置
JP2022533754A (ja) ボリュメトリック映像の符号化および復号化のための方法、装置、およびコンピュータプログラム製品
US20200195967A1 (en) Point cloud auxiliary information coding
EP4162691A1 (en) A method, an apparatus and a computer program product for video encoding and video decoding
US11196977B2 (en) Unified coding of 3D objects and scenes
WO2023050396A1 (zh) 多平面图像的生成、数据处理、编码和解码方法、装置
WO2022141222A1 (zh) 虚拟视点生成、渲染、解码方法及装置、设备、存储介质
US20240177355A1 (en) Sub-mesh zippering
WO2024011381A1 (zh) 点云编解码方法、装置、设备及存储介质
WO2024026712A1 (zh) 点云编解码方法、装置、设备及存储介质
US20230306683A1 (en) Mesh patch sub-division
US20230306641A1 (en) Mesh geometry coding
WO2023024842A1 (zh) 点云编解码方法、装置、设备及存储介质
US20230306687A1 (en) Mesh zippering
EP4360053A1 (en) Learning-based point cloud compression via unfolding of 3d point clouds
Hsia et al. Real-Time 2D-to-3D Conversion for Television Broadcasting Based on Embedded System
WO2022232547A1 (en) Learning-based point cloud compression via tearing transform
AU2022409165A1 (en) Hybrid framework for point cloud compression
WO2023113917A1 (en) Hybrid framework for point cloud compression
WO2022219230A1 (en) A method, an apparatus and a computer program product for video encoding and video decoding

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21947467

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180099763.4

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE