WO2023240455A1 - 点云编码方法、编码装置、编码设备以及存储介质 - Google Patents

点云编码方法、编码装置、编码设备以及存储介质 Download PDF

Info

Publication number
WO2023240455A1
WO2023240455A1 PCT/CN2022/098709 CN2022098709W WO2023240455A1 WO 2023240455 A1 WO2023240455 A1 WO 2023240455A1 CN 2022098709 W CN2022098709 W CN 2022098709W WO 2023240455 A1 WO2023240455 A1 WO 2023240455A1
Authority
WO
WIPO (PCT)
Prior art keywords
prediction mode
attribute information
current point
encoded
encoding
Prior art date
Application number
PCT/CN2022/098709
Other languages
English (en)
French (fr)
Inventor
元辉
王晓辉
郭甜
王婷婷
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to PCT/CN2022/098709 priority Critical patent/WO2023240455A1/zh
Priority to TW112121142A priority patent/TW202404363A/zh
Publication of WO2023240455A1 publication Critical patent/WO2023240455A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria

Definitions

  • the embodiments of the present application relate to the field of coding and decoding technology, and more specifically, to point cloud coding methods, coding devices, coding equipment, and storage media.
  • Point cloud has begun to spread into various fields, such as virtual/augmented reality, robotics, geographic information systems, medical fields, etc.
  • various fields such as virtual/augmented reality, robotics, geographic information systems, medical fields, etc.
  • a large number of point clouds on the surface of objects can be accurately obtained, often corresponding to hundreds of thousands of points in one scene.
  • Such a large number of points also poses challenges for computer storage and transmission. Therefore, point cloud compression has become a hot issue.
  • the encoder needs to compress its geometric information and attribute information.
  • how to determine the appropriate predictive coding mode according to the characteristics of different point clouds to achieve a balance between coding efficiency and coding quality is a problem that needs to be solved. .
  • the embodiment of this application provides a point cloud coding method, including:
  • N is an integer greater than or equal to 1;
  • predict the attribute information of the current point to be encoded and obtain the predicted value of the attribute information of the current point to be encoded
  • Encoding is performed according to the preferred prediction mode and the residual to obtain coded bits, and the coded bits are written into a code stream.
  • this application provides a point cloud encoding device, including:
  • a determination unit configured to determine the encoding sequence of the attribute information of the current point cloud based on the geometric information of the current point cloud;
  • N is an integer greater than or equal to 1;
  • a prediction unit configured to predict the attribute information of the current point to be encoded according to the preferred prediction mode, and obtain the predicted value of the attribute information of the current point to be encoded
  • a determining unit further configured to determine the residual of the attribute information of the current point to be encoded based on the predicted value of the attribute information of the current point to be encoded;
  • a coding unit configured to perform coding according to the preferred prediction mode and the residual, obtain coded bits, and write the coded bits into a code stream.
  • this application provides an encoding device, including:
  • a processor adapted to execute a computer program
  • a computer-readable storage medium stores a computer program.
  • the computer program is adapted to be loaded by a processor and execute the encoding method in the above-mentioned first aspect or its respective implementations.
  • the processor is one or more.
  • the computer-readable storage medium may be integrated with the processor, or the computer-readable storage medium may be provided separately from the processor.
  • embodiments of the present application provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the computer program When the computer program is read and executed by a processor of a computer device, it causes the computer device to execute the above-mentioned first step.
  • An encoding method in an aspect or implementation thereof.
  • embodiments of the present application provide a code stream, which is a code stream generated by the method described in the first aspect or its respective implementations.
  • this application when this application performs predictive coding on the attribute information of the current point to be encoded in the point cloud, it considers the N neighbor points of the current point to be encoded in the coding order, and when the attributes of the N neighbor points are When the information difference is greater than the first threshold, the rate-distortion optimization RDO mechanism is used to determine the preferred prediction mode for the attribute information of the current point to be encoded.
  • the solution of this application is conducive to adaptively selecting appropriate prediction modes for the prediction of attribute information for points in the current point cloud, avoiding the problem of low coding efficiency or low reconstruction quality of certain points, and achieving coding efficiency and reconstruction quality. The balance between them can improve the coding performance of the point cloud encoder.
  • Figure 1 is an example of a point cloud image provided by the embodiment of this application.
  • Figure 2 is a partial enlarged view of the point cloud image shown in Figure 1;
  • Figure 3 is an example of a point cloud image with six viewing angles provided by an embodiment of the present application.
  • Figure 4 is a schematic block diagram of the coding framework provided by the embodiment of the present application.
  • Figure 5 is an example of a bounding box provided by an embodiment of the present application.
  • Figure 6 is an example of octree division of bounding boxes provided by the embodiment of the present application.
  • Figure 7 is a schematic flow chart of the encoding method provided by the embodiment of the present application.
  • Figures 8 to 10 show the arrangement sequence of Morton codes in two-dimensional space
  • Figure 11 shows the arrangement order of Morton codes in three-dimensional space
  • Figure 12 is a schematic block diagram of the LOD layer provided by the embodiment of the present application.
  • Figure 13 is a schematic block diagram of the decoding framework provided by the embodiment of the present application.
  • Figure 14 is a schematic flow chart of the decoding method provided by the embodiment of the present application.
  • Figure 15 is a schematic block diagram of a point cloud encoding device provided by an embodiment of the present application.
  • Figure 16 is a schematic block diagram of a coding and decoding device provided by an embodiment of the present application.
  • Point Cloud is a set of discrete points randomly distributed in space that expresses the spatial structure and surface properties of a three-dimensional object or scene.
  • Figures 1 and 2 show three-dimensional point cloud images and local enlargements respectively. It can be seen that the point cloud surface is composed of densely distributed points.
  • Two-dimensional images have information expressed in each pixel, so there is no need to record additional position information; however, the distribution of points in the point cloud in the three-dimensional space is random and irregular, so it is necessary to record the location of each point in the space. Only the position in can completely express a point cloud. Similar to two-dimensional images, each point in the point cloud has corresponding attribute information, which can be an RGB color value, and the color value reflects the color of the object; for point clouds, the attribute information corresponding to each point is in addition to color. , or it can be a reflectance value, which reflects the surface material of the object. Each point in the point cloud may include geometric information and attribute information. The geometric information of each point in the point cloud refers to the Cartesian three-dimensional coordinate data of the point.
  • the attribute information of each point in the point cloud may include but is not limited to At least one of the following: color information, material information, laser reflection intensity information, where the material information or laser reflection intensity information can be reflected by indicators such as reflectivity values.
  • Color information can be information in any color space.
  • the color information may be Red Green Blue (RGB) information.
  • the color information may also be brightness and chromaticity (YCbCr, YUV) information. Among them, Y represents brightness (Luma), Cb(U) represents the blue chromaticity component, and Cr(V) represents the red chromaticity component.
  • Each point in the point cloud has the same amount of attribute information.
  • each point in the point cloud has two attribute information: color information and laser reflection intensity.
  • each point in the point cloud has three attribute information: color information, material information and laser reflection intensity information.
  • a point cloud image can have multiple viewing angles.
  • the point cloud image as shown in Figure 3 can have six viewing angles.
  • the data storage format corresponding to the point cloud image consists of a file header information part and a data part.
  • the header information It includes data format, data representation type, total number of point cloud points, and content represented by the point cloud.
  • Point clouds can flexibly and conveniently express the spatial structure and surface properties of three-dimensional objects or scenes, and because point clouds are obtained by directly sampling real objects, they can provide a strong sense of reality while ensuring accuracy, so they are widely used and their scope Including virtual reality games, computer-aided design, geographic information systems, automatic navigation systems, digital cultural heritage, free-viewpoint broadcasting, three-dimensional immersive telepresence, three-dimensional reconstruction of biological tissues and organs, etc.
  • point clouds can be divided into two categories based on application scenarios, namely, machine-perceived point clouds and human-eye-perceived point clouds.
  • the application scenarios of machine-perceived point cloud include but are not limited to: autonomous navigation system, real-time inspection system, geographical information system, visual sorting robot, rescue and disaster relief robot and other point cloud application scenarios.
  • the application scenarios of point clouds perceived by the human eye include but are not limited to: digital cultural heritage, free viewpoint broadcasting, three-dimensional immersive communication, three-dimensional immersive interaction and other point cloud application scenarios.
  • the point cloud can be divided into dense point cloud and sparse point cloud based on the point cloud acquisition method; the point cloud can also be divided into static point cloud and dynamic point cloud based on the point cloud acquisition method.
  • point cloud can It is divided into three types of point clouds, namely the first type of static point cloud, the second type of dynamic point cloud and the third type of dynamically acquired point cloud.
  • first type of static point cloud the object is stationary, and the device for acquiring the point cloud is also stationary;
  • second type of dynamic point cloud the object is moving, but the device for acquiring the point cloud is stationary;
  • third category The point cloud is acquired dynamically, and the device that acquires the point cloud is in motion.
  • point cloud collection methods include but are not limited to: computer generation, 3D laser scanning, 3D photogrammetry, etc.
  • Computers can generate point clouds of virtual three-dimensional objects and scenes;
  • 3D laser scanning can obtain point clouds of static real-world three-dimensional objects or scenes, and can obtain millions of point clouds per second;
  • 3D photogrammetry can obtain dynamic real-world three-dimensional objects or scenes
  • Point clouds can obtain tens of millions of point clouds per second.
  • point clouds on the surface of objects can be collected through collection equipment such as photoelectric radar, lidar, laser scanners, and multi-view cameras.
  • the point cloud obtained according to the principle of laser measurement can include the three-dimensional coordinate information of the point and the laser reflection intensity (reflectance) of the point.
  • the point cloud obtained according to the principle of photogrammetry can include the three-dimensional coordinate information of the point and the color information of the point.
  • the point cloud is obtained by combining the principles of laser measurement and photogrammetry, which may include the three-dimensional coordinate information of the point, the laser reflection intensity (reflectance) of the point, and the color information of the point.
  • These technologies reduce the cost and time period of point cloud data acquisition and improve the accuracy of the data.
  • point clouds of biological tissues and organs can be obtained using magnetic resonance imaging (MRI), computed tomography (CT), and electromagnetic positioning information.
  • MRI magnetic resonance imaging
  • CT computed tomography
  • electromagnetic positioning information These technologies reduce the cost and time period of point cloud acquisition and improve the accuracy of data. Changes in the way of obtaining point cloud data have made it possible to obtain large amounts of point cloud data. With the growth of application requirements, the processing of massive 3D point cloud data has encountered bottlenecks limited by storage space and transmission bandwidth.
  • each point in the point cloud of each frame has coordinate information xyz (float) and color information RGB.
  • Point cloud compression generally uses point cloud geometric information and attribute information to be compressed separately.
  • the point cloud geometric information is first encoded in the geometry encoder, and then the reconstructed geometric information is input into the attribute encoder as additional information to assist Point cloud attribute information compression;
  • the point cloud geometric information is first decoded in the geometry decoder, and then the decoded geometric information is input into the attribute decoder as additional information to assist in decompressing the point cloud attribute information.
  • the entire codec consists of pre-processing/post-processing, geometry encoding/decoding, and attribute encoding/decoding.
  • the point cloud can be encoded and decoded through various types of encoding frameworks and decoding frameworks, respectively.
  • the codec framework may be the Geometry Point Cloud Compression (G-PCC) codec framework or the Video Point Cloud Compression (Video Point Cloud Compression) provided by the Moving Picture Experts Group (MPEG) , V-PCC) encoding and decoding framework, or it can be the AVS-PCC encoding and decoding framework or the Point Cloud Compression Reference Platform (PCRM) framework provided by the Audio Video Coding Standard (AVS) topic group.
  • G-PCC Geometry Point Cloud Compression
  • MPEG Moving Picture Experts Group
  • V-PCC Video Point Cloud Compression
  • PCM Point Cloud Compression Reference Platform
  • the G-PCC encoding and decoding framework can be used to compress the first type of static point cloud and the third type of dynamic point cloud, and the V-PCC encoding and decoding framework can be used to compress the second type of dynamic point cloud.
  • the G-PCC encoding and decoding framework is also called point cloud codec TMC13, and the V-PCC encoding and decoding framework is also called point cloud codec TMC2.
  • G-PCC and AVS-PCC both target static sparse point clouds, and their coding frameworks are roughly the same.
  • the following uses the G-PCC framework as an example to describe the encoding and decoding framework applicable to the embodiments of the present application.
  • Figure 4 is a schematic block diagram of a coding framework provided by an embodiment of the present application.
  • the coding framework 100 can obtain geometric information and attribute information of the point cloud from the collection device.
  • the encoding of point cloud includes geometric information encoding and attribute information encoding.
  • the process of encoding geometric information includes: performing preprocessing on the original point cloud such as coordinate transformation and quantization to remove duplicate points; constructing an octree and then encoding to form a geometric code stream.
  • the geometric information encoding process of the encoder can be implemented through the following units:
  • Coordinate transformation Transform coordinates
  • Quantize and remove points Quantize and remove points
  • Analyze octree 103
  • Surface fitting Analyze surface approximation
  • first arithmetic coding Arithmetic encode
  • the coordinate transformation unit 101 may be used to transform the world coordinates of points in the point cloud into relative coordinates. For example, the geometric coordinates of a point are subtracted from the minimum value of the xyz coordinate axis, which is equivalent to a DC removal operation to transform the coordinates of the points in the point cloud from world coordinates to relative coordinates.
  • the quantization and duplicate point removal unit 102 can reduce the number of coordinates through quantization; after quantization, originally different points may be assigned the same coordinates. Based on this, duplicate points can be deleted through a deduplication operation; for example, points with the same quantization position and different Multiple clouds of attribute information can be merged into a single cloud through attribute transformation.
  • the quantization and duplicate point removal unit 102 is an optional unit module.
  • the octree analysis unit 103 may use an octree encoding method to encode the position information of the quantization point.
  • the point cloud is regularized in the form of an octree, so that the position of the point can correspond to the position of the octree one by one.
  • the first arithmetic coding unit 105 can use entropy coding to arithmetic encode the geometric information output by the octree analysis unit 103, that is, use the arithmetic coding method to generate a geometric code stream for the geometric information output by the octree analysis unit 103; the geometric code stream is also It can be called a geometry bit stream.
  • a recursive octree structure is used to regularly express the points in the point cloud as the center of a cube.
  • the entire point cloud can be placed in a cube bounding box.
  • x min min(x 0 ,x 1 ,...,x K-1 );
  • y min min(y 0 ,y 1 ,...,y K-1 );
  • z min min(z 0 ,z 1 ,...,z K-1 );
  • x max max(x 0 ,x 1 ,...,x K-1 );
  • y max max(y 0 ,y 1 ,...,y K-1 );
  • z max max(z 0 ,z 1 ,...,z K-1 ).
  • origin of the bounding box (x origin , y origin , z origin ) can be calculated as follows:
  • floor() represents downward rounding calculation or downward rounding calculation.
  • int() represents rounding operation.
  • the encoder can calculate the dimensions of the bounding box in the x-axis, y-axis, and z-axis directions based on the calculation formula of the boundary value and the origin as follows:
  • the encoder After the encoder obtains the dimensions of the bounding box in the x-axis, y-axis, and z-axis directions, it first divides the bounding box into an octree, obtaining eight sub-blocks each time, and then divides the non- Empty blocks (blocks containing points) are divided into octrees again, and this recursively divides until a certain depth.
  • the non-empty sub-blocks of the final size are called voxels.
  • Each voxel contains one or more points. , the geometric positions of these points are normalized to the center point of the voxel, and the attribute value of the center point is the average of the attribute values of all points in the voxel.
  • each voxel can be encoded based on the determined encoding sequence ( voxel), which encodes the point (or "node") represented by each voxel.
  • the attribute encoding process includes: given the reconstruction information of the geometric information of the input point cloud and the true value of the attribute information, selecting the prediction transformation mode to perform prediction transformation on the point cloud attribute information, quantifying the results after the prediction transformation, and performing arithmetic coding
  • An attribute code stream is formed, that is, an attribute information bit stream.
  • the attribute encoding process of the encoder can be implemented through the following units:
  • Color conversion (Transform colors) unit 110 attribute transfer (Transfer attributes) unit 111, Region Adaptive Hierarchical Transform (RAHT) unit 112, generate level of detail (Generate LOD) unit 113 and lifting transform (lifting transform) ) unit 114, a quantize unit 115 and a second arithmetic coding unit 116.
  • Transform colors attribute transfer (Transfer attributes) unit 111
  • Region Adaptive Hierarchical Transform (RAHT) unit 112 generate level of detail (Generate LOD) unit 113 and lifting transform (lifting transform) unit 114, a quantize unit 115 and a second arithmetic coding unit 116.
  • RAHT Region Adaptive Hierarchical Transform
  • the color conversion unit 110 may be used to convert the RGB color space of points in the point cloud into YCbCr format or other formats.
  • the attribute transfer unit 111 can be used to transfer attribute information to the point cloud geometry based on the reconstructed geometry information; the attribute transfer unit 111 can also be used to transform the attribute information of points in the point cloud to minimize attribute distortion. For example, in the case of geometric lossy encoding, since the geometric information changes after the geometric encoding, the attribute transfer unit 111 needs to reassign the attribute value to each point after the geometric encoding, so that the reconstructed point cloud and the original point cloud can be compared. Attribute error is minimal.
  • the reconstructed geometry information output by the reconstruction geometry unit 106 may be input to the region adaptive hierarchical transformation RAHT unit 112 or the level of detail generation LOD unit 113, and used as auxiliary information for the predictive transformation process of the attribute information output by the attribute transfer unit 111.
  • the point cloud attribute information can be subjected to predictive coding processing through the region adaptive layered transformation RAHT unit 112.
  • the quantization unit 115 is used to quantize the residual of the attribute information output by the RAHT transformation unit based on a certain quantization step size, and then through the second arithmetic coding Unit 116 uses arithmetic coding on the quantized residual to finally obtain an attribute information bit stream.
  • the point cloud attribute information can also be subjected to prediction transformation processing through the level of detail generation LOD unit 113 and the lifting transformation unit 114, and then the quantization unit 115 quantizes the residual based on a certain quantization step size, and then the second arithmetic coding unit 116 quantizes the The residuals are arithmetic encoded to finally obtain a bit stream of attribute information.
  • the second arithmetic coding unit 116 may use zero run length coding (Zero run length coding) to entropy encode the residual of the attribute information of the point to obtain an attribute code stream, and the attribute code stream may be a bit flow information.
  • Figure 7 is a schematic flowchart of a method for encoding point cloud attribute information provided by an embodiment of the present application.
  • the method 200 may be performed by an encoder or a coding framework, such as the coding framework shown in FIG. 4 .
  • the encoding method 200 may include:
  • S220 Based on the N neighbor points of the current point to be encoded in the encoding sequence, determine the preferred prediction mode of the attribute information of the current point to be encoded, where N is an integer greater than or equal to 1; where,
  • S230 Predict the attribute information of the current point to be encoded according to the preferred prediction mode, and obtain the predicted value of the attribute information of the current point to be encoded;
  • S240 Determine the residual of the attribute information of the current point to be encoded based on the predicted value of the attribute information of the current point to be encoded;
  • S250 Encode according to the preferred prediction mode and residual, obtain coded bits, and write the coded bits into the code stream.
  • the S210 may include:
  • the LOD generation unit 113 can perform Morton reordering or Hilbert reordering on the current point cloud. Finally, the encoding sequence of the current point cloud attribute information is obtained. After the encoder obtains the encoding sequence of the current point cloud attribute information, it can divide the points in the point cloud into layers according to the determined encoding sequence to obtain the LOD of the current point cloud, and then determine the attributes of the points in the point cloud based on the LOD. information for prediction.
  • Figures 8 to 10 show the arrangement sequence of Morton codes in two-dimensional space.
  • the encoder can adopt the "z"-shaped Morton arrangement sequence in the two-dimensional space formed by 2*2 blocks.
  • the encoder can adopt the "z"-shaped Morton arrangement sequence in the two-dimensional space formed by four 2*2 blocks.
  • the "z"-shaped Morton arrangement we can finally get the Morton arrangement used by the encoder in the two-dimensional space formed by 4*4 blocks.
  • the encoder can adopt the "z"-shaped Morton arrangement sequence in the two-dimensional space formed by four 4*4 blocks.
  • the two-dimensional space formed by each four 2*2 blocks and each The "z"-shaped Morton arrangement sequence can also be used in the two-dimensional space formed by 2*2 blocks, and finally the Morton arrangement order adopted by the encoder in the two-dimensional space formed by 8*8 blocks can be obtained.
  • Figure 11 shows the arrangement order of Morton codes in three-dimensional space.
  • Morton's arrangement order is not only applicable to two-dimensional space, but can also be extended to three-dimensional space.
  • Figure 11 shows 16 points, inside each "z” and between each "z” and "z”
  • the Morton arrangement order is to encode along the x-axis first, then along the y-axis, and finally along the z-axis.
  • the LOD generation process includes: obtaining the Euclidean distance between points based on the geometric information of the points in the point cloud; dividing the points into different LOD layers based on the Euclidean distance.
  • different ranges of Euclidean distances can be divided into different LOD layers. For example, you can randomly pick a point as the first LOD layer. Then calculate the Euclidean distance between the remaining points and this point, and classify the points whose Euclidean distance meets the first distance threshold requirement into the second LOD layer.
  • the centroid of the midpoint of the second LOD layer calculate the Euclidean distance between points other than the first and second LOD layers and the centroid, and classify the points whose Euclidean distance meets the second distance threshold into the third LOD layer.
  • all points are classified into the LOD layer.
  • the threshold of the Euclidean distance By adjusting the threshold of the Euclidean distance, the number of LOD points in each layer can be increased.
  • the LOD layer division method can also adopt other methods, and this application does not limit this.
  • the point cloud can be directly divided into one or more LOD layers, or the point cloud can be divided into multiple point cloud slices first, and then each point cloud slice can be divided into one or more point cloud slices. LOD layer.
  • the point cloud can be divided into multiple point cloud slices, and the number of points in each point cloud slice can be between 550,000 and 1.1 million.
  • Each point cloud slice can be viewed as a separate point cloud.
  • Each point cloud slice can be divided into multiple LOD layers, and each LOD layer includes multiple points.
  • the LOD layer can be divided according to the Euclidean distance between points.
  • Figure 12 is a schematic block diagram of the LOD layer provided by an embodiment of the present application.
  • the point cloud includes multiple points arranged in original order, namely P0, P1, P2, P3, P4, P5, P6, P7, P8 and P9.
  • the assumption can be based on point and point
  • the Euclidean distance between them can divide the point cloud into 3 LOD layers, namely LOD0, LOD1 and LOD2.
  • LOD0 may include P0, P5, P4 and P2
  • LOD2 may include P1, P6 and P3
  • LOD3 may include P9, P8 and P7.
  • LOD0, LOD1 and LOD2 can be used to form the LOD-based order of the point cloud, namely P0, P5, P4, P2, P1, P6, P3, P9, P8 and P7.
  • the LOD-based order can be used as the encoding order of the current point cloud attribute information.
  • the S220 may include:
  • the encoder determines the preferred prediction mode for the attribute information of the current point to be encoded based on the N neighbor points of the current point to be encoded in the encoding sequence, where N is an integer greater than or equal to 1.
  • the encoder based on the N neighbor points in the LOD where the current point to be encoded is located in the encoding sequence, multiple predictor candidates are created, that is, the second prediction mode (predMode). For example, when encoding the attribute information of the current point to be encoded, the encoder first finds 3 neighbor points before the current point based on the neighbor point search results on the LOD where the current point is located. Based on the 3 neighbor points, 4 types of third-order points can be determined. Two prediction modes, the value of the corresponding mode index can be 0 to 3.
  • the reconstructed value of the attribute information of the third nearest neighbor point among the neighbor points except the nearest neighbor point and the second nearest neighbor point is used as the predicted value of the attribute information of the current point to be encoded.
  • the number N of neighbor points is not limited to 3.
  • the number of neighbor points can also be adaptively selected according to the characteristics of points in different LOD layers or points in different point clouds. .
  • the selection of neighbor points is not limited to the LOD layer where the current point to be encoded is located, nor is it limited to points located before the current point in the LOD layer where the current point is located, and is not limited here.
  • the second prediction mode with index 0 refers to the neighboring points based on the distances of the neighboring points P0, P5 and P4.
  • the weighted average of the reconstructed values of the attribute information of P0, P5 and P4 is determined as the predicted value of the attribute information of the current point P2 to be encoded;
  • the second prediction mode with index 1 means that the reconstructed value of the attribute information of the nearest neighbor point P4 is used as the current point P2
  • the predicted value of attribute information of The reconstructed value of the attribute information of point P0 is used as the predicted value of the attribute information of the current point P2.
  • the attribute information difference can be the maximum value of the difference of attribute information of N neighbor points, or it can be the average difference of attribute information of N neighbor points or part of neighbor points, or it can be the difference of N neighbor points or part of neighbor points.
  • Other calculation methods for the difference in attribute information are not limited here.
  • the first prediction mode is determined to be the preferred prediction mode for the attribute information of the current point to be encoded.
  • the preferred prediction mode for encoding the refractive index information of the current point Based on the distance between the three neighbor points of the current point to be encoded and the current point to be encoded, determine the value to be encoded by calculating the weighted average of the reconstructed values of the refractive index information of the three neighbor points The predicted value of the refractive index at the current point. For example, the encoder can obtain the predicted value of the refractive index of the current point through the following formula:
  • W 1 , W 2 and W 3 respectively represent the geometric distance between neighbor point 1, neighbor point 2 and neighbor point 3 and the current point.
  • Ref 1 , Ref 2 and Ref 3 respectively represent neighbor point 1, neighbor point 2 and neighbor point.
  • the encoder when the encoder encodes the color information of the current point to be encoded, it calculates the maximum difference in the R component, G component and B component of the three neighbor points, that is, max(R1,R2,R3)-min(R1,R2 ,R3), max(G1,G2,G3)-min(G1,G2,G3) and max(B1,B2,B3)-min(B1,B2,B3), and then select the R, G, B components
  • the first prediction mode is , that is, the second prediction mode with index number 0 is determined as the preferred prediction mode for the color information of the current point to be encoded: based on the distance between the three neighbor points of the current point to be encoded and the current point to be encoded, by calculating the three neighbors
  • the weighted average of the reconstructed values of the color information of the point determines the predicted value of the color information of the current point to be encoded.
  • the encoder can obtain the predicted value of any one of R, G, and B or any of Y, U, and V components in the color component of the current point through the following formula:
  • W 1 , W 2 and W 3 respectively represent the geometric distance between neighbor point 1, neighbor point 2 and neighbor point 3 and the current point.
  • Ref 1 , Ref 2 and Ref 3 respectively represent neighbor point 1, neighbor point 2 and neighbor point.
  • the first threshold may be a preset threshold, and when encoding refractive index information and color information, the settings of the first threshold may be the same or different.
  • the first threshold may include but is not limited to 64 and so on.
  • the attribute information difference can be the maximum difference value of the attribute information of N neighbor points, or it can be the average difference value of the attribute information of N neighbor points or part of neighbor points, or it can be the attribute of N neighbor points or part of neighbor points. Other calculation methods for information differences will not be explained here.
  • the preferred prediction mode for the attribute information of the current point to be encoded is determined according to the rate-distortion optimization RDO mechanism.
  • determining the preferred prediction mode for the current attribute information to be encoded according to the rate-distortion optimized RDO mechanism may include: determining K second prediction modes based on N neighbor points of the current point to be encoded. Determine the reconstruction distortion and coding rate of the attribute information of the current point to be encoded in one or more second prediction modes, and determine K second predictions based on the reconstruction distortion and coding rate in one or more second prediction modes.
  • the preferred prediction mode among modes.
  • K is an integer greater than or equal to 2.
  • the value of K can be adaptively adjusted according to the value of the neighbor point N. For example, when N is 3, K can be 2, 3, 4, etc.
  • predMode 4 second prediction modes.
  • the mode index numbers are 0 to 3.
  • the distance between three neighbor points determines the weighted average of the reconstructed values of the attribute information of the three neighbor points as the predicted value of the attribute information of the current point to be encoded;
  • the predicted value of the attribute information of the current point; the second prediction mode with index 3 refers to the attribute information of the third nearest neighbor point among the three neighbor points except the nearest neighbor point and the second nearest neighbor point.
  • the reconstructed value serves as the predicted value of the attribute information of the current point to be encoded.
  • the three second prediction modes with index numbers 1 to 3 can be used as the second prediction mode analyzed in the RDO mechanism, or the index can be The four second prediction modes with index numbers 0 to 3 are used as the second prediction modes analyzed in the RDO mechanism.
  • Second prediction modes with less than 4 index numbers from 0 to 3 or less than 3 from 1 to 3 can also be used as the second prediction modes analyzed in the RDO mechanism.
  • the second prediction mode analyzed in the RDO mechanism is of course not limited to the combination of the above-mentioned second prediction modes, and is not limited here.
  • determining the preferred prediction mode among the K second prediction modes based on the reconstruction distortion and coding rate in one or more second prediction modes may include: The reconstruction distortion, determine the distortion parameter D in the prediction mode; according to the coding rate in the prediction mode, determine the code rate parameter R in the prediction mode; according to the distortion parameter D and code stream parameter R in the prediction mode, Determine the cost value for this forecast model.
  • the cost value in a second prediction mode can be calculated through the following formula:
  • J indx_i D indx_i + ⁇ R indx_i ,
  • J indx_i represents the cost value of the second prediction mode with index i
  • D indx_i represents the distortion parameter when using the second prediction mode with index i for prediction
  • R indx_i represents using the second prediction mode with index i for prediction.
  • the code rate parameter when , ⁇ is the preset coefficient.
  • determining the reconstruction distortion of the attribute information of the current point to be encoded in a second prediction mode may include:
  • the encoder can determine the quantized residual using the following formula:
  • AttrResidualQuant (attrValue-attrPred)/Qstep
  • Qstep represents the quantization step size
  • Qstep is calculated from the quantization parameter (Quantization Parameter, Qp).
  • the quantized residual attrResidualQuant is dequantized and combined with the predicted value attrPred to determine the reconstructed value reconAttr of the current point to be encoded in the second prediction mode i.
  • the encoder can reconstruct the value with the following formula:
  • the reconstruction distortion D of the current point attribute information to be encoded in the second prediction mode i can be determined, that is:
  • the reconstruction distortion D indx_i or the absolute value of the reconstruction distortion D indx_i in a second prediction mode can be used as the distortion parameter D indx_i corresponding to the second prediction mode, that is:
  • the square of the reconstruction distortion D indx_i in a second prediction mode can be used as the distortion parameter D indx_i corresponding to the prediction mode, that is:
  • the square sum of the reconstruction distortion D indx_i of each color component in a second prediction mode can be used as the distortion parameter D indx_i corresponding to the prediction mode, that is:
  • D indx_i (reconAttr[R]-attrValue[R]) ⁇ (reconAttr[R]-attrValue[R])+(reconAttr[G]-attrValue[G]) ⁇ (reconAttr[G]-attrValue[G]) +(reconAttr[B]-attrValue[B]) ⁇ (reconAttr[B]-attrValue[B]).
  • the reconstruction distortion may not only be the distortion obtained after the actual encoding process, but also may be some kind of estimated value, such as an estimated value of the distortion based on a distortion model.
  • the reconstructed value of the attribute information of the current point can be used as a neighbor candidate of the subsequent point, and the reconstructed value of the current point is used to predict the attribute information of the subsequent point.
  • the embodiment of the present application determines the distortion parameter D based on the reconstruction distortion, taking into account the distortion introduced by the inverse quantization process during the inverse quantization of attrResidualQuant, so that the distortion parameter D considered when calculating the rate distortion cost is closer to each second prediction.
  • the real situation of the distortion generated in the mode makes the RDO decision-making results more accurate, thereby improving the quality effect of predictive coding.
  • the coding rate of the attribute information of the current point to be encoded in a second prediction mode may be the number of coding bits required for predictive encoding of the attribute information of the current point to be encoded based on the prediction mode.
  • the encoding code rate of the attribute information of the current point to be encoded in a second prediction mode may be a pair of parameters indicating the prediction mode and the attribute information of the current point to be encoded in the prediction mode. The number of encoding bits required to encode the residual.
  • determining the code rate parameter R in a second prediction mode according to the coding rate in the second prediction mode may include:
  • the second number of coded bits of the parameters of the prediction mode (such as index, etc.) is determined based on the first number of coded bits and the second number of coded bits, and the code rate parameter R corresponding to the prediction mode is determined.
  • the code rate parameter R may be the sum of the first number of encoding bits and the number of second encoding bits. Determining the code rate parameter R based on the first coded bits and the second coded bits can make the code rate parameter R more accurate and consistent with actual operations, and improve the peak signal-to-noise ratio performance of the encoder and the size of the attribute code stream.
  • determining the code rate parameter R in a second prediction mode according to the coding rate corresponding to the second prediction mode may include:
  • the encoder can implement hybrid encoding of prediction modes and quantized residuals in the following way:
  • AttrResidualEncode fun(attrResidualQuant,predMode),
  • AttrResidualEncode is the encoding data of the current point attribute information to be encoded, and the third encoding bit number can be determined based on attrResidualEncode; fun( ⁇ ) is the mixed encoding, attrResidualQuant is the quantized value of the residual of the current point attribute information to be encoded, and predMode is the third 2.
  • the index number of the prediction mode is a reversible function, that is:
  • the code rate parameter corresponding to the third number of encoding bits is determined to be the code rate parameter R corresponding to the second prediction mode.
  • the encoder can be implemented with the following probabilistic model:
  • probResGt0 represents the probability that attrResidualEncode is greater than 0, and the initial value is 0.5.
  • probResGt0 represents the probability that attrResidualEncode is greater than 1, and the initial value is 0.5.
  • AttrResidualEncode 2 ⁇ (
  • -1)+1. Since it is necessary to express the positive and negative of attrResidualEncode, finally let R R+1.
  • the cost value of the prediction mode can be determined based on the following formula:
  • J indx_i D indx_i + ⁇ R indx_i ,
  • J indx_i represents the cost value of the second prediction mode with index i
  • D indx_i represents the distortion parameter when using the second prediction mode with index i for prediction
  • R indx_i represents using the second prediction mode with index i for prediction.
  • the code rate parameter when , ⁇ is the preset coefficient.
  • the cost value can be used to measure the advantages and disadvantages of various second prediction modes.
  • the smaller the cost value the better the coding performance of the second prediction mode.
  • the distortion parameter D and the code rate parameter R are comprehensively considered to determine the cost value, which can comprehensively measure the reconstruction quality and coding efficiency.
  • the optimal prediction coding mode can be adaptively selected through the RDO mechanism for points in the point cloud, further improving the performance of the encoder. Encoding performance.
  • the value of ⁇ may form a certain functional relationship with the quantization parameter or the quantization step size.
  • is determined based on the following formula:
  • Qstep is the quantization step size for quantizing the residual of the attribute information of the current point to be encoded in a second prediction mode, which is calculated by the quantization parameter (Quantization Parameter, Qp), and ⁇ is a preset value.
  • Qp quantization Parameter
  • includes, but is not limited to, 0.55.
  • is determined based on the following formula:
  • Qstep is the quantization step size for quantizing the residual of the attribute information of the current point to be encoded in a second prediction mode, which is calculated by the quantization parameter (Quantization Parameter, Qp), and ⁇ is a preset value.
  • Qp quantization Parameter
  • includes but is not limited to 0.11, 0.26, or any value in [0.01,1].
  • can also be determined based on the following method:
  • Different quantization parameters Qp can correspond to one or more values of ⁇ .
  • the point cloud standard test environment C1 (a test condition for near-geometrically lossy properties) corresponds to 6 QPs (48, 40, 32, 24, 16, 8 ), at this time, each QP among the six QPs can correspond to a value of ⁇ .
  • the point cloud standard test environment CY (a nearly lossless test condition for geometrically lossy attributes) corresponding to 5 QPs (10, 16, 22, 28, 34). At this time, each QP in the 5 QPs can correspond to a value of ⁇ .
  • can be a parameter obtained through a large number of tests.
  • the test range of ⁇ is approximately 0.0 to 4.0, that is, the value range of ⁇ can be 0.0 to 4.0.
  • the test range of ⁇ can also be other numerical ranges, which is not specifically limited in this application.
  • the corresponding values of ⁇ can be different.
  • can be a parameter obtained through a large number of tests.
  • can be a parameter obtained by testing under the sequence type of the current point cloud; in other words, point clouds of different sequence types can be used to pass testing or training. method to obtain the corresponding value of ⁇ .
  • the corresponding values of ⁇ may be different.
  • can be a parameter obtained through a large number of tests.
  • can be a value of ⁇ corresponding to different attribute information categories obtained through testing under the color attribute and refractive index attribute of the current point cloud.
  • the corresponding values of ⁇ can be different.
  • may be a parameter obtained through a large number of tests.
  • may be a parameter obtained by testing under the current component of the current point.
  • different components of points in the point cloud can be used to obtain their corresponding ⁇ values through testing or training.
  • the current component may be the component to be encoded at the current point.
  • the values of ⁇ corresponding to the V component, U component and Y component may be the same or different.
  • the values of ⁇ corresponding to the R component, the G component and the B component may be the same or different.
  • the encoding scheme provided by this application may be applicable to only some components of the current point, or may be applicable to all components of the current point, which is not specifically limited in this application.
  • S250 may include:
  • the index corresponding to the preferred prediction mode of the current point to be encoded is 0, then the index of the prediction mode does not need to be encoded in the code stream. If the index of the preferred prediction mode selected through the RDO mechanism is 1, 2 or 3, then the index of the preferred prediction mode in the code stream is The index of the selected preferred prediction mode needs to be encoded, that is, the index of the selected preferred prediction mode needs to be encoded together with the residual into the attribute information bit stream.
  • Figure 13 is a schematic block diagram of the decoding framework 300 provided by the embodiment of the present application.
  • the decoding framework 300 can obtain the code stream of the point cloud from the encoding device, and obtain the geometric information and attribute information of the points in the point cloud by parsing the code stream.
  • the decoding of point clouds includes geometric information decoding and attribute information decoding.
  • the process of decoding geometric information includes: arithmetic decoding of the geometric information bit stream; constructing an octree and then merging, reconstructing the geometric information of the point to obtain the reconstructed information of the geometric information of the point; reconstructing the geometric information of the point Perform inverse coordinate transformation to obtain the geometric information of the point.
  • the geometric information of a point can also be called the position information of the point.
  • the attribute information decoding process includes: determining the residual value of the attribute information of the points in the point cloud by parsing the attribute information bit stream; dequantizing the residual value of the attribute information of the point to obtain the dequantized attribute information of the point. Residual value; based on the reconstruction information of the point's geometric information obtained during the geometric information decoding process, select one of multiple prediction modes to perform point cloud prediction, and obtain the reconstructed value of the point's attribute information; the reconstructed value of the point's attribute information Perform an inverse color conversion to obtain the decoded point cloud.
  • geometric information decoding can be achieved through the following units: first arithmetic decoding unit 301, octree synthesis (synthesize octree) unit 302, surface fitting (Analyze surface approximation) unit 303, reconstructed geometry (Reconstruct geometry) Unit 304 and inverse transform coordinates unit 305.
  • Attribute information decoding can be realized through the following units: second arithmetic decoding unit 310, inverse quantize unit 311, RAHT reverse transform unit 312, generate LOD (Generate LOD) unit 313, lifting inverse transform (lifting) reverse transform unit 314 and inverse transform colors unit 315.
  • each unit in the decoding framework 300 can be referred to the functions of the corresponding units in the encoding framework 100 .
  • the decoding framework 300 can divide the point cloud into multiple LODs according to the Euclidean distance between points in the point cloud; then, decode the attribute information of the points in the LOD in sequence; then, the decoding framework 300 can based on the decoded
  • the residual value is dequantized, and the reconstructed value of the point cloud is obtained based on the addition of the dequantized residual value and the predicted value of the current point until all points in the point cloud are decoded.
  • the current point will be used as the nearest neighbor point of the subsequent LOD midpoint, and the reconstruction value of the current point will be used to predict the attribute information of the subsequent points.
  • Figure 14 is a schematic flow chart of a decoding method provided by an embodiment of the present application.
  • the decoder obtains the prediction mode parameters and quantized residual value of the point by decoding the attribute code stream. After performing inverse quantization and inverse transformation on the quantized residual value attrResidualQuant, the residual value attrResidual of the first point can be obtained. The predicted value of the attribute information of the point can be determined based on the prediction mode parameters. The decoder can obtain the attribute reconstruction value reconAttr of the first point based on the residual value attrResidual and the attribute prediction value attrPred of the first point.
  • the attribute reconstruction of the first point The value reconAttr can be used as the nearest neighbor candidate of the subsequent point, and then the quantized residual value of the second point is parsed from the attribute code stream, inverse quantization and inverse transformation are performed on it, and the result of the inverse quantization and inverse transformation is compared with the second point.
  • the attribute prediction values of are added to obtain the attribute reconstruction value of the second point, and so on until the last point of the point cloud is decoded.
  • the decoder may inversely quantize the quantized residual value attrResidualQuant of the current point based on the following formula to obtain the residual value of the current point:
  • AttrResidual represents the residual value of the current point
  • attrResidualQuant represents the quantized residual value of the current point
  • Qstep represents the quantization step size.
  • Qstep is calculated from the quantization parameter (Quantization Parameter, Qp).
  • the decoder can obtain the attribute reconstruction value of the current point based on the following formula:
  • reconAttr represents the attribute reconstruction value of the current point determined based on the quantized residual value of the current point
  • attrResidual represents the residual value of the current point
  • attrPred represents the attribute prediction value of the current point.
  • the size of the sequence numbers of the above-mentioned processes does not mean the order of execution.
  • the execution order of each process should be determined by its functions and internal logic, and should not be used in this application.
  • the implementation of the examples does not constitute any limitations.
  • Figure 15 is a schematic block diagram of a point cloud encoding device 400 provided by an embodiment of the present application.
  • the point cloud encoding device 400 may include:
  • the determination unit 410 is configured to determine the encoding order of the attribute information of the current point cloud based on the geometric information of the current point cloud;
  • N is an integer greater than or equal to 1;
  • the prediction unit 420 is used to predict the attribute information of the current point to be encoded according to the preferred prediction mode, and obtain the predicted value of the attribute information of the current point to be encoded;
  • the determination unit 410 is also configured to determine the residual of the attribute information of the current point to be encoded based on the predicted value of the attribute information of the current point to be encoded;
  • the encoding unit 430 is used to perform encoding according to the preferred prediction mode and the residual, obtain the encoded bits, and write the encoded bits into the code stream.
  • each unit in the encoding device 400 involved in the embodiment of the present application can be separately or entirely combined into one or several additional units, or some of the units can be further divided into functional units. It is composed of multiple smaller units, which can achieve the same operation without affecting the realization of the technical effects of the embodiments of the present application.
  • the above units are divided based on logical functions. In practical applications, the function of one unit can also be realized by multiple units, or the functions of multiple units can be realized by one unit. In other embodiments of the present application, the encoding device 400 may also include other units.
  • a general-purpose computing device including a general-purpose computer including processing elements and storage elements such as a central processing unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), etc.
  • a computer program (including program code) capable of executing each step involved in the corresponding method is run on the computer to construct the encoding device 400 involved in the embodiment of the present application, and to implement the encoding method provided by the embodiment of the present application.
  • the computer program can be recorded on, for example, a computer-readable storage medium, loaded into any electronic device with data processing capabilities through the computer-readable storage medium, and run therein to implement the corresponding methods of the embodiments of the present application.
  • the units mentioned above can be implemented in the form of hardware, can also be implemented in the form of instructions in the form of software, or can be implemented in the form of a combination of software and hardware.
  • each step of the method embodiments in the embodiments of the present application can be completed by integrated logic circuits of hardware in the processor and/or instructions in the form of software.
  • the steps of the methods disclosed in conjunction with the embodiments of the present application can be directly embodied in hardware.
  • the execution of the decoding processor is completed, or the execution is completed using a combination of hardware and software in the decoding processor.
  • the software can be located in a mature storage medium in this field such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, register, etc.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps in the above method embodiment in combination with its hardware.
  • Figure 16 is a schematic structural diagram of the encoding and decoding device 500 provided by the embodiment of the present application.
  • the encoding and decoding device 500 includes at least a processor 510 and a computer-readable storage medium 520 .
  • the processor 510 and the computer-readable storage medium 520 may be connected through a bus or other means.
  • the computer-readable storage medium 520 is used to store a computer program 521.
  • the computer program 521 includes computer instructions.
  • the processor 510 is used to execute the computer instructions stored in the computer-readable storage medium 520.
  • the processor 510 is the computing core and the control core of the encoding and decoding device 500. It is suitable for implementing one or more computer instructions. Specifically, it is suitable for loading and executing one or more computer instructions to implement the corresponding method flow or corresponding functions.
  • the processor 510 may also be called a central processing unit (Central Processing Unit, CPU).
  • the processor 510 may include, but is not limited to: a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the computer-readable storage medium 520 can be a high-speed RAM memory, or a non-volatile memory (Non-Volatile Memory), such as at least one disk memory; optionally, it can also be at least one located far away from the aforementioned processor 510 Computer-readable storage media.
  • the computer-readable storage medium 520 includes, but is not limited to, volatile memory and/or non-volatile memory.
  • non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory.
  • Volatile memory may be Random Access Memory (RAM), which is used as an external cache.
  • RAM Random Access Memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • Enhanced SDRAM, ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous link dynamic random access memory
  • Direct Rambus RAM Direct Rambus RAM
  • the encoding and decoding device 500 may be the encoding framework shown in Figure 4 or the encoding device 400 shown in Figure 15; the computer-readable storage medium 520 stores first computer instructions; and is processed by the processor 510 The first computer instructions stored in the computer-readable storage medium 520 are loaded and executed to implement corresponding steps in the encoding method provided by the embodiments of the present application. To avoid repetition, they will not be described again here.
  • embodiments of the present application also provide a computer-readable storage medium (Memory).
  • the computer-readable storage medium is a memory device in the encoding and decoding device 500 and is used to store programs and data.
  • computer-readable storage medium 520 may include a built-in storage medium in the codec device 500 , and of course may also include an extended storage medium supported by the codec device 500 .
  • the computer-readable storage medium provides a storage space that stores the operating system of the encoding and decoding device 500 .
  • one or more computer instructions suitable for being loaded and executed by the processor 510 are also stored in the storage space.
  • These computer instructions may be one or more computer programs 521 (including program codes). These computer instruction instructions are used by the computer to execute the encoding methods provided in the above various optional ways.
  • a computer program product or computer program including computer instructions stored in a computer-readable storage medium.
  • computer program 521 the encoding and decoding device 500 can be a computer.
  • the processor 510 reads the computer instructions from the computer-readable storage medium 520.
  • the processor 510 executes the computer instructions, so that the computer executes the encoding method provided in the above various optional ways. .
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transmitted from a website, computer, server, or data center to Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) methods.
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请实施例提供一种点云编码方法,该方法包括:基于当前点云的几何信息,确定当前点云的属性信息的编码顺序,基于编码顺序中当前点的N个邻居点,确定当前点的属性信息的优选预测模式,N为大于等于1的整数,其中在N个邻居点的属性信息差异大于第一阈值的情况下,根据率失真优化RDO机制确定当前点的属性信息的优选预测模式。根据优选预测模式,对当前点的属性信息进行预测,得到当前点的属性信息的预测值,根据当前点的属性信息的预测值,确定当前点的属性信息的残差,根据优选预测模式和残差进行编码,获得编码比特,并将编码比特写入码流。本申请提供的编码方法能够提升编码器的编码性能。

Description

点云编码方法、编码装置、编码设备以及存储介质 技术领域
本申请实施例涉及编解码技术领域,并且更具体地,涉及点云编码方法、编码装置、编码设备以及存储介质。
背景技术
点云已经开始普及到各个领域,例如,虚拟/增强现实、机器人、地理信息系统、医学领域等。随着扫描设备的基准度和速率的不断提升,可以准确地获取物体表面的大量点云,往往一个场景下就可以对应几十万个点。数量如此庞大的点也给计算机的存储和传输带来了挑战。因此,对点云的压缩也就成为一个热点问题。
对于点云的压缩来说,编码器需要压缩其几何信息和属性信息,但如何根据不同点云的特性,确定其合适的预测编码模式以实现编码效率和编码质量的平衡,为需要解决的问题。
发明内容
本申请实施例提供了一种点云编码方法,包括:
基于当前点云的几何信息,确定所述当前点云的属性信息的编码顺序;
基于所述编码顺序中待编码当前点的N个邻居点,确定所述待编码当前点的属性信息的优选预测模式,N为大于等于1的整数;其中,
在所述N个邻居点的属性信息差异大于第一阈值的情况下,根据率失真优化RDO机制确定所述待编码当前点的属性信息的优选预测模式;
根据所述优选预测模式,对所述待编码当前点的属性信息进行预测,得到所述待编码当前点的属性信息的预测值;
根据所述待编码当前点的属性信息的预测值,确定所述待编码当前点的属性信息的残差;
根据所述优选预测模式和所述残差进行编码,获得编码比特,并将所述编码比特写入码流。
第二方面,本申请提供了一种点云编码装置,包括:
确定单元,用于基于当前点云的几何信息,确定所述当前点云的属性信息的编码顺序;
基于所述编码顺序中待编码当前点的N个邻居点,确定所述待编码当前点的属性信息的优选预测模式,N为大于等于1的整数;其中,
在所述N个邻居点的属性信息差异大于第一阈值的情况下,根据率失真优化RDO机制确定所述待编码当前点的属性信息的优选预测模式;
预测单元,用于根据所述优选预测模式,对所述待编码当前点的属性信息进行预测,得到所述待编码当前点的属性信息的预测值;
确定单元,还用于根据所述待编码当前点的属性信息的预测值,确定所述待编码当前点的属性信息的残差;
编码单元,用于根据所述优选预测模式和所述残差进行编码,获得编码比特,并将所述编码比特写入码流。
第三方面,本申请提供了一种编码设备,包括:
处理器,适于执行计算机程序;以及,
计算机可读存储介质,计算机可读存储介质存储有计算机程序,计算机程序适于由处理器加载并执行上述第一方面或其各实现方式中的编码方法。
在一种实现方式中,该处理器为一个或多个。
在一种实现方式中,该计算机可读存储介质可以与该处理器集成在一起,或者该计算 机可读存储介质与处理器分离设置。
第四方面,本申请实施例提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被计算机设备的处理器读取并执行时,使得计算机设备执行上述第一方面或其各实现方式中的编码方法。
第五方面,本申请实施例提供了一种码流,所述码流为如第一方面或其各实现方式中的所述的方法生成的码流。
基于以上技术方案,本申请在对点云中的待编码当前点的属性信息进行预测编码时,考虑了在编码顺序上待编码当前点的N个邻居点,并当在N个邻居点的属性信息差异大于第一阈值的情况下,采用率失真优化RDO机制确定待编码当前点的属性信息的优选预测模式。本申请方案有利于为当前点云中的点自适应地选择合适的预测模式用于属性信息的预测,避免某些点的编码效率过低或重建质量过低的问题,实现编码效率和重建质量之间的平衡,从而提升点云编码器的编码性能。
附图说明
图1是本申请实施例提供的点云图像的示例;
图2是图1所示的点云图像的局部放大图;
图3是本申请实施例提供的具有的六个观看角度的点云图像的示例;
图4是本申请实施例提供的编码框架的示意性框图;
图5是本申请实施例提供的包围盒的示例;
图6是本申请实施例提供的对包围盒进行八叉树划分的示例;
图7是本申请实施例提供的编码方法的示意性流程图;
图8至图10示出了莫顿码在二维空间中的排列顺序;
图11示出了莫顿码在三维空间中的排列顺序;
图12是本申请实施例提供的LOD层的示意性框图;
图13是本申请实施例提供的解码框架的示意性框图;
图14是本申请实施例提供的解码方法的示意性流程图;
图15是本申请实施例提供的点云编码装置的示意性框图;
图16是本申请实施例提供的编解码设备的示意性框图。
具体实施方式
下面将结合附图,对本申请实施例中的技术方案进行描述。
点云(Point Cloud)是空间中一组无规则分布的、表达三维物体或三维场景的空间结构及表面属性的离散点集。图1和图2分别示出了三维点云图像和局部放大图,可以看到点云表面是由分布稠密的点所组成的。
二维图像在每一个像素点均有信息表达,因此不需要额外记录其位置信息;然而点云中的点在三维空间中的分布具有随机性和不规则性,因此需要记录每一个点在空间中的位置,才能完整地表达一幅点云。与二维图像类似,点云中的每一个点均有对应的属性信息,可以为RGB颜色值,颜色值反映物体的色彩;对于点云来说,每一个点所对应的属性信息除了颜色以外,还可以是反射率(reflectance)值,反射率值反映物体的表面材质。点云中每个点可以包括几何信息和属性信息,其中,点云中每个点的几何信息是指该点的笛卡尔三维坐标数据,点云中每个点的属性信息可以包括但不限于以下至少一种:颜色信息、材质信息、激光反射强度信息,其中材质信息或激光反射强度信息可以通过反射率值等指标体现。颜色信息可以是任意一种色彩空间上的信息。例如,颜色信息可以是红绿蓝(Red Green Blue,RGB)信息。再如,颜色信息还可以是亮度色度(YCbCr,YUV)信息。其中,Y表示明亮度(Luma),Cb(U)表示蓝色色度分量,Cr(V)表示红色色度分量。点云中的每个点都具有相同数量的属性信息。例如,点云中的每个点都具有颜色信息和激光反射强 度两种属性信息。再如,点云中的每个点都具有颜色信息、材质信息和激光反射强度信息三种属性信息。
点云图像可具有的多个观看角度,例如,如图3所示的点云图像可具有的六个观看角度,点云图像对应的数据存储格式由文件头信息部分和数据部分组成,头信息包含了数据格式、数据表示类型、点云总点数、以及点云所表示的内容。
点云可以灵活方便地表达三维物体或场景的空间结构及表面属性,并且由于点云通过直接对真实物体采样获得,在保证精度的前提下能提供极强的真实感,因而应用广泛,其范围包括虚拟现实游戏、计算机辅助设计、地理信息系统、自动导航系统、数字文化遗产、自由视点广播、三维沉浸远程呈现、生物组织器官三维重建等。
示例性地,可以基于应用场景可以将点云划分为两大类别,即机器感知点云和人眼感知点云。机器感知点云的应用场景包括但不限于:自主导航系统、实时巡检系统、地理信息系统、视觉分拣机器人、抢险救灾机器人等点云应用场景。人眼感知点云的应用场景包括但不限于:数字文化遗产、自由视点广播、三维沉浸通信、三维沉浸交互等点云应用场景。相应的,可以基于点云的获取方式,将点云划分为密集型点云和稀疏型点云;也可基于点云的获取途径将点云划分为静态点云和动态点云,更具体可划分为三种类型的点云,即第一类静态点云、第二类动态点云以及第三类动态获取点云。针对第一类静态点云,物体是静止的,且获取点云的设备也是静止的;针对第二类动态点云,物体是运动的,但获取点云的设备是静止的;针对第三类动态获取点云,获取点云的设备是运动的。
示例性地,点云的采集途径包括但不限于:计算机生成、3D激光扫描、3D摄影测量等。计算机可以生成虚拟三维物体及场景的点云;3D激光扫描可以获得静态现实世界三维物体或场景的点云,每秒可以获取百万级点云;3D摄影测量可以获得动态现实世界三维物体或场景的点云,每秒可以获取千万级点云。具体而言,可通过光电雷达、激光雷达、激光扫描仪、多视角相机等采集设备,可以采集得到物体表面的点云。根据激光测量原理得到的点云,其可以包括点的三维坐标信息和点的激光反射强度(reflectance)。根据摄影测量原理得到的点云,其可以包括点的三维坐标信息和点的颜色信息。结合激光测量和摄影测量原理得到点云,其可以可包括点的三维坐标信息、点的激光反射强度(reflectance)和点的颜色信息。这些技术降低了点云数据获取成本和时间周期,提高了数据的精度。例如,在医学领域,由磁共振成像(magnetic resonance imaging,MRI)、计算机断层摄影(computed tomography,CT)、电磁定位信息,可以获得生物组织器官的点云。这些技术降低了点云的获取成本和时间周期,提高了数据的精度。点云数据获取方式的变革,使大量点云数据的获取成为可能,伴随着应用需求的增长,海量3D点云数据的处理遭遇存储空间和传输带宽限制的瓶颈。
以帧率为30fps(帧每秒)的点云视频为例,每帧点云的点数为70万,其中,每一帧点云中的每一个点具有坐标信息xyz(float)和颜色信息RGB(uchar),则10s长度的点云视频的数据量大约为0.7百万(million)×(4Byte×3+1Byte×3)×30fps×10s=3.15GB,而YUV采样格式为4:2:0,帧率为24fps的1280×720二维视频,其10s的数据量约为1280×720×12bit×24frames×10s≈0.33GB,10s的两视角3D视频的数据量约为0.33×2=0.66GB。由此可见,点云视频的数据量远超过相同时长的二维视频和三维视频的数据量。因此,为更好地实现数据管理,节省服务器存储空间,降低服务器与客户端之间的传输流量及传输时间,点云压缩成为促进点云产业发展的关键问题。
点云压缩一般采用点云几何信息和属性信息分别压缩的方式,在编码端,首先在几何编码器中编码点云几何信息,然后将重建几何信息作为附加信息输入到属性编码器中,以辅助点云的属性信息压缩;在解码端,首先在几何解码器中解码点云几何信息,然后将解码后的几何信息作为附加信息输入到属性解码器中,辅助点云的属性信息解压。整个编解码器由预处理/后处理、几何编码/解码、属性编码/解码几部分组成。
示例性地,点云可通过各种类型的编码框架和解码框架分别进行编码和解码。作为示例,编解码框架可以是运动图象专家组(Moving Picture Experts Group,MPEG)提供的几何点云压缩(Geometry Point Cloud Compression,G-PCC)编解码框架或视频点云压缩(Video Point Cloud Compression,V-PCC)编解码框架,也可以是音视频编码标准(Audio Video Standard,AVS)专题组提供的AVS-PCC编解码框架或点云压缩参考平台(PCRM)框架。G-PCC编解码框架可用于针对第一类静态点云和第三类动态获取点云进行压缩,V-PCC编解码框架可用于针对第二类动态点云进行压缩。G-PCC编解码框架也称为点云编解码器TMC13,V-PCC编解码框架也称为点云编解码器TMC2。G-PCC及AVS-PCC均针对静态的稀疏型点云,其编码框架大致相同。
下面以G-PCC框架为例对本申请实施例可适用的编解码框架进行说明。
图4是本申请实施例提供的编码框架的示意性框图。
如图4所示,编码框架100可以从采集设备获取点云的几何信息和属性信息。点云的编码包括几何信息编码和属性信息编码。在一个实施例中,几何信息编码的过程包括:对原始点云进行坐标变换、量化去除重复点等预处理;构建八叉树后进行编码形成几何码流。
如图4所示,编码器的几何信息编码过程可通过以下单元实现:
坐标变换(Transform coordinates)单元101、量化和去除重复点(Quantize and remove points)单元102、八叉树分析(Analyze octree)单元103、表面拟合(Analyze surface approximation)单元104、第一算术编码(Arithmetic encode)单元105和重建几何(Reconstruct geometry)单元106。
坐标变换单元101可用于将点云中点的世界坐标变换为相对坐标。例如,点的几何坐标分别减去xyz坐标轴的最小值,相当于去直流操作,以实现将点云中的点的坐标从世界坐标变换为相对坐标。量化和去除重复点单元102可通过量化减少坐标的数目;量化后原先不同的点可能被赋予相同的坐标,基于此,可通过去重操作将重复的点删除;例如,具有相同量化位置和不同属性信息的多个云可通过属性变换合并到一个云中。在本申请的一些实施例中,量化和去除重复点单元102为可选的单元模块。八叉树分析单元103可利用八叉树(octree)编码方式编码量化点的位置信息。例如,将点云按照八叉树的形式进行规则化处理,由此,点的位置可以和八叉树的位置一一对应,通过统计八叉树中有点的位置,并将其标识(flag)记为1,以进行几何编码。第一算术编码单元105可以采用熵编码方式对八叉树分析单元103输出的几何信息进行算术编码,即将八叉树分析单元103输出的几何信息利用算术编码方式生成几何码流;几何码流也可称为几何信息比特流(geometry bit stream)。
下面对点云的规则化处理方法进行说明。
由于点云在空间中无规则分布的特性,给编码过程带来挑战,因此采用递归八叉树的结构,将点云中的点规则化地表达成立方体的中心。例如如图5所示,可以将整幅点云放置在一个正方体包围盒内,此时点云中点的坐标可以表示为(x k,y k,z k),k=0,…,K-1,其中K是点云的总点数,则点云在x轴、y轴以及z轴方向上的边界值分别为:
x min=min(x 0,x 1,…,x K-1);
y min=min(y 0,y 1,…,y K-1);
z min=min(z 0,z 1,…,z K-1);
x max=max(x 0,x 1,…,x K-1);
y max=max(y 0,y 1,…,y K-1);
z max=max(z 0,z 1,…,z K-1)。
此外,包围盒的原点(x origin,y origin,z origin)可以计算如下:
x origin=int(floor(x min));
y origin=int(floor(y min));
z origin=int(floor(z min))。
其中,floor()表示向下取整计算或向下舍入计算。int()表示取整运算。
基于此,编码器可以基于边界值和原点的计算公式,计算包围盒在x轴、y轴以及z轴方向上的尺寸如下:
BoudingBoxSize_x=int(x max-x origin)+1;
BoudingBoxSize_y=int(y max-y origin)+1;
BoudingBoxSize_z=int(z max-z origin)+1。
如图6所示,编码器得到包围盒在x轴、y轴以及z轴方向上的尺寸后,首先对包围盒进行八叉树划分,每次得到八个子块,然后对子块中的非空块(包含点的块)进行再一次的八叉树划分,如此递归划分直到某个深度,将最终大小的非空子块称作体素(voxel),每一个voxel中包含一个或多个点,将这些点的几何位置归一化为voxel的中心点,该中心点的属性值取voxel中所有点的属性值的平均值。将点云规则化为空间中的块,有利于描述点云中点与点之前的位置关系,进而有利于设计特定的编码顺序,基于此编码器可基于确定的编码顺序编码每一个体素(voxel),即编码每一个体素所代表的点(或称“节点”)。
结合八叉树分析单元输出的几何信息和经过表面拟合单元输出的几何信息,对几何信息进行重建,利用重建的几何信息来对属性信息进行编码。属性编码过程包括:通过给定输入点云的几何信息的重建信息和属性信息的真实值,选择预测变换模式对点云属性信息进行预测变换,对预测变换后的结果进行量化,并进行算术编码形成属性码流,也即属性信息比特流。
如图4所示,编码器的属性编码过程可通过以下单元实现:
颜色转换(Transform colors)单元110、属性传递(Transfer attributes)单元111、区域自适应分层变换(Region Adaptive Hierarchical Transform,RAHT)单元112、生成细节等级(Generate LOD)单元113以及提升变换(lifting transform)单元114、量化(Quantize)单元115以及第二算术编码单元116。
颜色转换单元110可用于将点云中点的RGB色彩空间变换为YCbCr格式或其他格式。属性传递单元111可用于基于重建几何信息将属性信息传递到点云几何体;属性传递单元111还可用于变换点云中点的属性信息,以最小化属性失真。例如,在几何有损编码的情况下,由于几何信息在几何编码之后有所异动,因此需要属性传递单元111为几何编码后的每一个点重新分配属性值,使得重建点云和原始点云的属性误差最小。重建几何单元106输出的重建几何信息可输入区域自适应分层变换RAHT单元112或生成细节等级LOD单元113,作为辅助信息用于属性传递单元111输出的属性信息的预测变换处理。点云属性信息可以通过区域自适应分层变换RAHT单元112进行预测编码处理,量化单元115用于对RAHT变换单元输出的属性信息的残差基于一定量化步长进行量化,再通过第二算术编码单元116对量化后残差使用算术编码最终得到属性信息比特流。点云属性信息还可以通过生成细节等级LOD单元113和提升变换单元114进行预测变换处理,然后经过量化单元115对残差基于一定量化步长进行量化,再通过第二算术编码单元116对量化后残差使用算术编码最终得到属性信息比特流。在某些实施例中,第二算术编码单元116可使用零行程编码(Zero run length coding)对点的属性信息的残差进行熵编码,以得到属性码流,所述属性码流可以是比特流信息。
图7是本申请实施例提供的点云属性信息的编码方法的示意性流程图。该方法200可由编码器或编码框架执行,例如图4所示的编码框架。
如图7所示,该编码方法200可包括:
S210:基于当前点云的几何信息,确定当前当云的属性信息的编码顺序;
S220:基于编码顺序中待编码当前点的N个邻居点,确定待编码当前点的属性信息的优选预测模式,N为大于等于1的整数;其中,
在N个邻居点的属性信息差异大于第一阈值的情况下,根据率失真优化RDO机制确定待编码当前点的属性信息的优选预测模式;
S230:根据优选预测模式,对待编码当前点的属性信息进行预测,得到待编码当前点的属性信息的预测值;
S240:根据待编码当前点的属性信息的预测值,确定待编码当前点的属性信息的残差;
S250:根据优选预测模式和残差进行编码,获得编码比特,并将编码比特写入码流。
在一些实施例中,所述S210可包括:
以图4所示的编码框架为例,基于重建几何单元106输入的当前点云的几何重建信息,生成LOD单元113可对当前点云进行莫顿重排序或希尔伯特(Hilbert)重排序后得到当前点云属性信息的编码顺序。编码器得到当前点云属性信息的编码顺序后,可按照确定后的编码顺序对点云中的点进行层的划分,以得到当前点云的LOD,进而基于LOD对点云中的点的属性信息进行预测。
图8至图10示出了莫顿码在二维空间中的排列顺序。
如图8所示,编码器在2*2个块形成的二维空间中可以采用“z”字形莫顿排列顺序。如图9所示,编码器在4个2*2个块形成的二维空间中可以采用“z”字形莫顿排列顺序,其中,每个2*2个块形成的二维空间中也可以采用“z”字形莫顿排列顺序,最终可以得到编码器在4*4个块形成的二维空间中采用的莫顿排列顺序。如图10所示,编码器在4个4*4个块形成的二维空间中可以采用“z”字形莫顿排列顺序,其中,每4个2*2个块形成的二维空间以及每个2*2个块形成的二维空间中也可以采用“z”字形莫顿排列顺序,最终可以得到编码器在8*8个块形成的二维空间中采用的莫顿排列顺序。
图11示出了莫顿码在三维空间中的排列顺序。
莫顿排列顺序不仅适用于二维空间,也可以将其扩展到三维空间中,例如图11中展示了16个点,每个“z”字内部,每个“z”与“z”之间的莫顿排列顺序都是先沿x轴方向编码,再沿y轴,最后沿z轴。
LOD的生成过程包括:根据点云中点的几何信息,获取点与点之间的欧式距离;根据欧式距离,将点分为不同的LOD层。在一个实施例中,可以将欧式距离进行排序后,将不同范围的欧式距离划分为不同的LOD层。例如,可以随机挑选一个点,作为第一LOD层。然后计算剩余点与该点的欧式距离,并将欧式距离符合第一距离阈值要求的点,归为第二LOD层。获取第二LOD层中点的质心,计算除第一、第二LOD层以外的点与该质心的欧式距离,并将欧式距离符合第二距离阈值的点,归为第三LOD层。以此类推,将所有的点都归到LOD层中。通过调整欧式距离的阈值,可以使得每层LOD的点的数量是递增的。应理解,LOD层划分的方式还可以采用其它方式,本申请对此不进行限制。需要说明的是,可以直接将点云划分为一个或多个LOD层,也可以先将点云划分为多个点云切块(slice),再将每一个点云切块划分为一个或多个LOD层。例如,可将点云划分为多个点云切块,每个点云切块的点的个数可以在55万-110万之间。每个点云切块可看成单独的点云。每个点云切块又可以划分为多个LOD层,每个LOD层包括多个点。在一个实施例中,可根据点与点之间的欧式距离,进行LOD层的划分。
图12是本申请实施例提供的LOD层的示意性框图。
如图12所示,假设点云包括按照原始顺序(original order)排列的多个点,即P0,P1,P2,P3,P4,P5,P6,P7,P8以及P9,假设可基于点与点之间的欧式距离可将点云划分为3个LOD层,即LOD0、LOD1以及LOD2。其中,LOD0可包括P0,P5,P4以及P2,LOD2可包括P1,P6以及P3,LOD3可包括P9,P8以及P7。此时,LOD0、LOD1以及LOD2可用于形成该点云的基于LOD的顺序(LOD-based order),即P0,P5,P4,P2,P1,P6,P3,P9,P8以及P7。所述基于LOD的顺序可作为当前点云属性信息的编码顺序。
在一些实施例中,所述S220可包括:
编码器在预测点云中的待编码当前点时,基于编码顺序中待编码当前点的N个邻居点,确定待编码当前点的属性信息的优选预测模式,其中N为大于等于1的整数。
示例性地,基于编码顺序中待编码当前点所在的LOD中的N个邻居点,创建多个预测变量候选项,即第二预测模式(predMode)。例如,当对待编码当前点的属性信息进行编码时,编码器先基于当前点所在的LOD上的邻居点搜索结果找到位于当前点之前的3个邻居点,基于3个邻居点可以确定4种第二预测模式,相应的模式索引的取值可以为0~3。其中索引为0的第二预测模式(predMode=0)指基于3个邻居点与当前点之间的距离将3个邻居点的属性信息的重建值的加权平均值确定为待编码当前点的属性信息的预测值;索引为1的第二预测模式(predMode=1)指将3个邻居点中最近邻居点的属性信息的重建值作为待编码当前点的属性信息的预测值;索引为2的第二预测模式(predMode=2)指将次近邻居点的属性信息的重建值作为待编码当前点的属性信息的预测值;索引为3的第二预测模式(predMode=3)指将3个邻居点中除最近邻居点和次近邻居点之外的第三近邻居点的属性信息的重建值作为待编码当前点的属性信息的预测值。当然,邻居点的数量N并不局限于3,可以是1、2、4、5等等,也可以根据不同LOD层中的点或不同点云中的点的特性自适应选择邻居点的数量。此外,邻居点的选择不局限于待编码当前点所在的LOD层,也不限于当前点所在LOD层中位于当前点之前的点,在此不做限定。
表1
Figure PCTCN2022098709-appb-000001
示例性地,如表1所示,当使用候选的预测方式对待编码当前点P2的属性信息进行编码时,索引为0的第二预测模式指基于邻居点P0、P5以及P4的距离将邻居点P0、P5以及P4的属性信息重建值的加权平均值确定为待编码当前点P2的属性信息预测值;索引为1的第二预测模式指将最近邻居点P4的属性信息重建值作为当前点P2的属性信息预测值;索引为2的第二预测模式指将次近邻居点P5的属性信息重建值作为当前点P2的属性信息预测值;索引为3的第二预测模式指将第三近邻居点P0的属性信息重建值作为当前点P2的属性信息预测值。
进一步地,可以根据待编码当前点的N个邻居点的属性信息差异情况,自适应地选择不同的方式确定多个第二预测模式中的优选预测模式用以待编码当前点属性信息的预测编码。属性信息差异可以是N个邻居点的属性信息的差异的最大值,也可以是N个邻居点或者部分邻居点的属性信息的差异的平均值,也可以是对N个邻居点或者部分邻居点的属性信息的差异的其他计算方式,在此不做限定。
在N个邻居点的属性信息差异小于第一阈值,或者小于等于第一阈值的情况下,确定第一预测模式为待编码当前点的属性信息的优选预测模式。
示例性地,以使用3个邻居点的情况为例,当编码器编码待编码当前点的折射率信息时,先确定3个邻居点折射率信息的最大差异值maxDiff,即maxDiff=max(abs(Refl 1-Refl 2),abs(Refl 1-Refl 3),abs(Refl 2-Refl 3))。编码器将最大差异值maxDiff与第一阈值进行比较,若maxDiff小于第一阈值,或者小于等于第一阈值,则将第一预测模式,也即索引序号为0的第二预测模式,确定为待编码当前点的折射率信息的优选预测模式:基于待编码当前点 的3个邻居点与待编码当前点的距离,通过计算3个邻居点的折射率信息重建值的加权平均值,确定待编码当前点折射率的预测值。示例性地,编码器可以通过以下公式得到当前点折射率的预测值:
Figure PCTCN2022098709-appb-000002
其中,W 1、W 2、W 3分别表示邻居点1、邻居点2、邻居点3与当前点的几何距离,Ref 1、Ref 2、Ref 3分别表示邻居点1、邻居点2、邻居点3的折射率重建值。
或者,当编码器编码待编码当前点的颜色信息时,分别计算3个邻居点在R分量、G分量以及B分量上的最大差异,即max(R1,R2,R3)-min(R1,R2,R3)、max(G1,G2,G3)-min(G1,G2,G3)以及max(B1,B2,B3)-min(B1,B2,B3),然后选择R、G、B分量中的最大差异值作为maxDiff,即maxDiff=max(max(R1,R2,R3)-min(R1,R2,R3),max(G1,G2,G3)-min(G1,G2,G3),max(B1,B2,B3)-min(B1,B2,B3));编码器将最大差异值maxDiff与第一阈值进行比较,若maxDiff小于第一阈值,或小于等于第一阈值,则将第一预测模式,也即索引序号为0的第二预测模式,确定为待编码当前点的颜色信息的优选预测模式:基于待编码当前点的3个邻居点与待编码当前点的距离,通过计算3个邻居点的颜色信息的重建值的加权平均值,确定待编码当前点颜色信息的预测值。示例性地,编码器可以通过以下公式得到当前点颜色分量中R、G、B任一或Y、U、V任一分量的预测值:
Figure PCTCN2022098709-appb-000003
其中,W 1、W 2、W 3分别表示邻居点1、邻居点2、邻居点3与当前点的几何距离,Ref 1、Ref 2、Ref 3分别表示邻居点1、邻居点2、邻居点3的某一颜色分量的重建值。
可以理解,第一阈值可以是预设的阈值,在编码折射率信息和颜色信息时,第一阈值的设置可以相同,也可以不同。示例性地,第一阈值可以包括但不限于64等。属性信息差异可以是N个邻居点的属性信息的最大差异值,也可以是N个邻居点或者部分邻居点的属性信息的差异平均值,也可以是对N个邻居点或者部分邻居点的属性信息的差异的其他计算方式,在此不再举例说明。
在N个邻居点的属性信息差异大于第一阈值,或者大于等于第一阈值的情况下,根据率失真优化RDO机制确定待编码当前点的属性信息的优选预测模式。
在一种可能的实施例中,根据率失真优化RDO机制确定待编码当前的属性信息的优选预测模式可包括:根据待编码当前点的N个邻居点,确定K个第二预测模式。确定待编码当前点的属性信息在一个或多个第二预测模式下的重建失真和编码码率,根据一个或多个第二预测模式下的重建失真和编码码率,确定K个第二预测模式中的优选预测模式。其中,K为大于等于2的整数。K的取值可以根据邻居点N的取值自适应地调整,例如当N为3时,K可以取2,3,4等。
示例性地,仍以使用3个邻居点的情况为例。基于3个邻居点,可创建4个第二预测模式(predMode),模式索引序号为0~3,其中索引为0的第二预测模式(predMode=0)指基于3个邻居点与当前点之间的距离将3个邻居点的属性信息的重建值的加权平均值确定为待编码当前点的属性信息的预测值;索引为1的第二预测模式(predMode=1)指将3个邻居点中最近邻居点的属性信息的重建值作为待编码当前点的属性信息的预测值;索引为2的第二预测模式(predMode=2)指将次近邻居点的属性信息的重建值作为待编码当前点的属性信息的预测值;索引为3的第二预测模式(predMode=3)指将3个邻居点中除最近邻居点和次近邻居点之外的第三近邻居点的属性信息的重建值作为待编码当前点的属性信息的预测值。当根据率失真优化RDO机制确定待编码当前点的属性信息的优选预测模式时,可以将索引序号1~3的3种第二预测模式作为RDO机制中分析的第二预测模式,也可以将索引序号0~3的4种第二预测模式作为RDO机制中分析的第二预测模式,也可 以将索引序号0~3中少于4种或1~3中少于3种的第二预测模式作为RDO机制中分析的第二预测模式,当然也不限于上述第二预测模式的组合方式,在此不作限定。
在一种可能的实施例中,根据一个或多个第二预测模式下的重建失真和编码码率,确定K个第二预测模式中的优选预测模式,可包括:根据一个第二预测模式下的重建失真,确定该预测模式下的失真参数D;根据该预测模式下的编码码率,确定该预测模式下的码率参数R;根据该预测模式下的失真参数D和码流参数R,确定该预测模式下的代价值。
示例性地,可以通过如下公式计算一个第二预测模式下的代价值:
J indx_i=D indx_i+λ×R indx_i
其中,J indx_i表示索引为i的第二预测模式的代价值,D indx_i表示使用索引为i的第二预测模式进行预测时的失真参数,R indx_i表示使用索引为i的第二预测模式进行预测时的码率参数,λ为预设系数。
在一种可能的实施例中,确定待编码当前点的属性信息在一个第二预测模式下的重建失真,可包括:
确定待编码当前点的属性信息在第二预测模式i下的预测值attrPred,根据待编码当前点属性信息的预测值attrPred和原始值attrValue,确定待编码当前点属性信息在第二预测模式i下的残差attrResidual,即attrResidual=attrValue-attrPred。对残差attrResidual进行量化处理,得到量化后残差attrResidualQuant。例如,编码器可通过以下公式确定量化后残差:
attrResidualQuant=(attrValue-attrPred)/Qstep,
其中,Qstep表示量化步长,Qstep由量化参数(Quantization Parameter,Qp)计算得到。
对量化后残差attrResidualQuant进行反量化处理,结合预测值attrPred进而确定待编码当前点在第二预测模式i下的重建值reconAttr。例如,编码器可通过以下公式重建值:
reconAttr=attrResidualQuant×Qstep+attrPred,
根据重建值reconAttr和原始值attrValue,可以确定待编码当前点属性信息在第二预测模式i下的重建失真D,即:
D=reconAttr-attrValue。
遍历K种第二预测模式(i=0~k-1)或者遍历K种第二预测模式中的部分第二预测模式,按照上述方法可确定待编码当前点属性信息在K种第二预测模式或部分第二预测模式的重建失真D indx_i
在一种可能的实施例中,可以将一个第二预测模式下的重建失真D indx_i或重建失真D indx_i的绝对值作为该第二预测模式对应的失真参数D indx_i,即:
D indx_i=reconAttr-attrValue,
D indx_i=|reconAttr-attrValue|。
在一种可能的实施例中,可以将一个第二预测模式下的重建失真D indx_i的平方作为该预测模式对应的失真参数D indx_i,即:
D indx_i=(reconAttr-attrValue)×(reconAttr-attrValue)。
在一种可能的实施例中,当属性信息为颜色分量时,可以将一个第二预测模式下的每种颜色分量重建失真D indx_i的平方和作为该预测模式对应的失真参数D indx_i,即:
D indx_i=(reconAttr[R]-attrValue[R])×(reconAttr[R]-attrValue[R])+(reconAttr[G]-attrValue[G])×(reconAttr[G]-attrValue[G])+(reconAttr[B]-attrValue[B])×(reconAttr[B]-attrValue[B])。
在一种可能的实施例中,重建失真除了可以是经过真实编码过程后得到的失真,也可以是某种估计值,例如基于一个失真模型对失真的估计值。
示例性地,当前点的属性信息重建值可以作为后续点的近邻候选项,并利用当前点的重建值对后续点的属性信息进行预测。
本申请实施例根据重建失真确定失真参数D,考虑了对attrResidualQuant进行反量化的过程中,反量化过程引入的失真,使得在计算率失真代价时考虑的失真参数D更接近于每个第二预测模式下产生的失真的真实情况,从而使得RDO决策结果更加准确,进而提高预测编码的质量效果。
在一种可能的实施例中,待编码当前点的属性信息在一个第二预测模式下的编码码率可以为基于该预测模式对待编码当前点的属性信息进行预测编码所需编码比特数。
在一种可能的实施例中,待编码当前点的属性信息在一个第二预测模式下的编码码率可以为对指示该预测模式的参数和待编码当前点的属性信息在该预测模式下的残差进行编码所需编码比特数。
在一种可能的实施例中,根据一个第二预测模式下的编码码率,确定该预测模式下的码率参数R,可包括:
确定待编码当前点的属性信息在一个第二预测模式下的残差的量化值attrResidualQuant,确定在该预测模式下,编码残差的量化值attrResidualQuant所需的第一编码比特数,以及编码指示该预测模式的参数(如索引等)的第二编码比特数,根据第一编码比特数和第二编码比特数,确定该预测模式对应的码率参数R。其中,码率参数R可以是第一编码比特数和第二编码比特数的和。根据第一编码比特和第二编码比特确定码率参数R,可以使得码率参数R更能准确和符合实际操作,改善编码器的峰值信噪比性能和属性码流的大小。
在一种可能的实施例中,根据一个第二预测模式对应的编码码率,确定该预测模式下的码率参数R,可包括:
确定待编码当前点的属性信息在一个第二预测模式下的残差的量化值attrResidualQuant,确定该预测模式下,对残差的量化值attrResidualQuant和指示该预测模式的参数(如索引等)进行混合编码所需的第三编码比特数。例如,编码器可以通过以下方式实现预测模式和量化残差的混合编码:
attrResidualEncode=fun(attrResidualQuant,predMode),
其中,attrResidualEncode为待编码当前点属性信息的编码数据,基于attrResidualEncode可确定第三编码比特数;fun(·)为混合编码,attrResidualQuant为待编码当前点属性信息的残差的量化值,predMode为第二预测模式的索引序号。其中,fun(·)为可逆函数,即:
{attrResidualQuant,predMode}=fun′(attrResidualEncode)。
基于预设概率模型,确定第三编码比特数对应的码率参数为该第二预测模式对应的码率参数R。例如,编码器可以通过以下概率模型实现:
根据attrResidualEncode对应的概率模型probResGt0、probResGt1估计其信息熵作为其对应的码率参数R,具体公式如下:
Figure PCTCN2022098709-appb-000004
其中,probResGt0表示attrResidualEncode大于0的概率,初始值为0.5,probResGt0表示attrResidualEncode大于1的概率,初始值为0.5。
如果attrResidualEncode=0,则停止计算,否则继续计算:
Figure PCTCN2022098709-appb-000005
其中,g(attrResidualEncode)=2×(|attrResidualEncode|-1)+1,由于需要表示attrResidualEncode的正负,最后令R=R+1。
当待编码当前点的属性信息在一个第二预测模式下的失真参数D和码率参数R确定后,示例性地,可基于如下公式确定该预测模式的代价值:
J indx_i=D indx_i+λ×R indx_i
其中,J indx_i表示索引为i的第二预测模式的代价值,D indx_i表示使用索引为i的第二预测模式进行预测时的失真参数,R indx_i表示使用索引为i的第二预测模式进行预测时的码率参数,λ为预设系数。
其中,代价值可以用来衡量各种第二预测模式的优劣,代价值越小,第二预测模式编码性能越优。同时综合考虑失真参数D和码率参数R来确定代价值,能够综合衡量重建质量和编码效率,针对点云中的点通过RDO机制自适应地选择最优的预测编码模式,进一步提高编码器的编码性能。
在一种可能的实施例中,λ的取值可能与量化参数或量化步长形成一定的函数关系。
示例性地,λ基于如下公式确定:
λ=α×Qstep×Qstep,
其中,Qstep为对所述待编码当前点的属性信息在一个第二预测模式下的残差进行量化的量化步长,由量化参数(Quantization Parameter,Qp)计算得到,α为预设值。示例性地,α包括但不限于为0.55。
示例性地,λ基于如下公式确定:
λ=β×Qstep,
其中,Qstep为对所述待编码当前点的属性信息在一个第二预测模式下的残差进行量化的量化步长,由量化参数(Quantization Parameter,Qp)计算得到,β为预设值。示例性地,β包括但不限于为0.11、0.26,或取[0.01,1]中的任意值。
示例性地,λ还可以基于如下方式确定:
不同的量化参数Qp可以对应一个或多个λ的取值,例如点云标准测试环境C1(几何近有损属性无损的测试条件)对应6个QP(48,40,32,24,16,8),此时这6个QP中的每一个QP可对应一个取值的λ。再如点云标准测试环境CY(几何有损属性近无损的测试条件)对应5个QP(10,16,22,28,34),此时这5个QP中的每一个QP可对应一个取值的λ。衡量一个或多个量化参数Qp中待测量Qp对应的一个或多个λ取值的性能,确定待测量Qp的一个或多个λ取值中性能最优者,设置为待测量Qp的最优λ,根据确定出的一个或多个量化参数Qp对应的一个或多个最优λ,确定最终的λ。
在一种可能的实施例中,λ可以是经过大量的测试得到的参数,可选的,λ的测试范围大概为0.0~4.0,也即,λ的取值范围可以0.0~4.0。当然,λ的测试范围也可以是其他数值范围,本申请对此不作具体限定。
可选的,对于不同的序列类型,其对应的λ的取值可以不同。可选的,λ可以是经过大量的测试得到的参数,例如,λ可以是在当前点云的序列类型下测试得到的参数;换言之,可以利用不同的序列类型的点云,通过测试或训练的方式得到其对应的λ的取值。
可选的,对于不同的属性信息类型,其对应的λ的取值可以不同。可选的,λ可以是经过大量的测试得到的参数,例如,λ可以是分别在当前点云的颜色属性和折射率属性下测试得到不同属性信息类别对应的λ的取值。
可选的,对于当前点的不同颜色分量,其对应的λ的取值可以不同。可选的,λ可以是经过大量的测试得到的参数,例如,λ可以是在所述当前点的当前分量下测试得到的参数。换言之,可以利用点云中点的不同分量,通过测试或训练的方式得到其对应的λ的取值。可选的,所述当前分量可以是所述当前点待编码的分量。例如,V分量、U分量以及Y分量对应的λ的取值可以相同,也可以不同。再如,例如,R分量、G分量以及B分量对应的λ的取值可以相同,也可以不同。可选的,本申请提供的编码方案可以仅适用于当前点的部分分量,也可以适用于当前点的所有分量,本申请对此不再做具体限定。
在一种可能的实施例中,所述S250可包括:
若待编码当前点的优选预测模式对应的索引为0,则码流中不需要编码该预测模式的索引,若是通过RDO机制选择的优选预测模式的索引为1,2或3,则码流中需要对所选的优选预测模式的索引进行编码,即需要将所选的优选预测模式的索引和残差一起编码到属性信息比特流中。
图13是本申请实施例提供的解码框架300的示意性框图。
解码框架300可以从编码设备获取点云的码流,通过解析码流得到点云中的点的几何信息和属性信息。其中点云的解码包括几何信息解码和属性信息解码。几何信息解码的过程包括:对几何信息比特流进行算术解码;构建八叉树后进行合并,对点的几何信息进行重建,以得到点的几何信息的重建信息;对点的几何信息的重建信息进行逆坐标变换,得到点的几何信息。点的几何信息也可称为点的位置信息。属性信息解码过程包括:通过解析属性信息比特流,确定点云中点的属性信息的残差值;通过对点的属性信息的残差值进行反量化,得到反量化后的点的属性信息的残差值;基于几何信息解码过程中获取的点的几何信息的重建信息,选择多种预测模式的一种进行点云预测,得到点的属性信息的重建值;对点的属性信息的重建值进行逆颜色转换,以得到解码后点云。
如图13所示,几何信息解码可通过以下单元实现:第一算术解码单元301、八叉树合成(synthesize octree)单元302、表面拟合(Analyze surface approximation)单元303、重建几何(Reconstruct geometry)单元304以及逆坐标变换(inverse transform coordinates)单元305。属性信息解码可通过以下单元实现:第二算术解码单元310、反量化(inverse quantize)单元311、RAHT逆变换(RAHT reverse transform)单元312、生成LOD(Generate LOD)单元313、提升逆变换(lifting reverse transform)单元314以及逆颜色转换(inverse transform colors)单元315。
需要说明的是,解压缩是压缩的逆过程,类似的,解码框架300中的各个单元的功能可参见编码框架100中相应的单元的功能。例如,解码框架300可根据点云中点与点之间的欧式距离将点云划分为多个LOD;然后,依次对LOD中点的属性信息进行解码;接着,解码框架300可基于解码出的残差值进行反量化,并基于反量化后的残差值与当前点的预测值相加得到该点云的重建值,直到解码完点云中所有的点。当前点将会作为后续LOD中点的最近邻居点,并利用当前点的重建值对后续点的属性信息进行预测。
图14是本申请实施例提供的解码方法的示意性流程图。
如图14所示,解码器通过解码属性码流得到点的预测模式参数和量化残差值,对量化残差值attrResidualQuant进行反量化与反变换后可得到第一个点的残差值attrResidual,基于预测模式参数可确定点的属性信息的预测值,解码器可基于第一个点的残差值attrResidual与属性预测值attrPred得到第一个点的属性重建值reconAttr,第一个点的属性重建值reconAttr可以作为后续点的近邻候选项,然后再从属性码流中解析第二个点的量化残差值,对其进行反量化与反变换并将反量化反变换的结果与第二个点的属性预测值相加得到第二个点的属性重建值,以此类推,直到点云的最后一个点被解码。
示例性地,解码器可基于以下公式对当前点的量化残差值attrResidualQuant进行反量化,以得到当前点的残差值:
attrResidual=attrResidualQuant×Qstep,
其中,attrResidual表示当前点的残差值,attrResidualQuant表示当前点的量化残差值,Qstep表示量化步长。其中,Qstep由量化参数(Quantization Parameter,Qp)计算得到。
示例性地,解码器可基于以下公式得到当前点的属性重建值:
reconAttr=attrResidual+attrPred,
其中,reconAttr表示基于当前点的量化残差值确定的所述当前点的属性重建值,attrResidual表示当前点的残差值,attrPred表示当前点的属性预测值。
以上结合附图详细描述了本申请的优选实施方式,但是,本申请并不限于上述实施方式中的具体细节,在本申请的技术构思范围内,可以对本申请的技术方案进行多种简单变型,这些简单变型均属于本申请的保护范围。例如,在上述具体实施方式中所描述的各个具体技术特征,在不矛盾的情况下,可以通过任何合适的方式进行组合,为了避免不必要的重复,本申请对各种可能的组合方式不再另行说明。又例如,本申请的各种不同的实施方式之间也可以进行任意组合,只要其不违背本申请的思想,其同样应当视为本申请所公开的内容。还应理解,在本申请的各种方法实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本申请提供的方案在G-PCC参考软件TMC13V12.0上实现后,在CTC CY测试条件下,对运动图像专家组(Moving Picture Experts Group,MPEG)中含有反射率属性的序列进行测试,测试结果如表2所示,对于大部分测试序列,BD-AttrRate均为负值,说明本申请提供的方案能够提升编码性能。
表2本申请方案与TMC13V12.0性能对比结果
Figure PCTCN2022098709-appb-000006
图15是本申请实施例提供的点云编码装置400的示意性框图。
如图15所示,该点云编码装置400可包括:
确定单元410,用于基于当前点云的几何信息,确定当前点云的属性信息的编码顺序;
基于编码顺序中待编码当前点的N个邻居点,确定待编码当前点的属性信息的优选预测模式,N为大于等于1的整数;其中,
在N个邻居点的属性信息差异大于第一阈值的情况下,根据率失真优化RDO机制确定待编码当前点的属性信息的优选预测模式;
预测单元420,用于根据优选预测模式,对待编码当前点的属性信息进行预测,得到待编码当前点的属性信息的预测值;
确定单元410,还用于根据待编码当前点的属性信息的预测值,确定待编码当前点的属性信息的残差;
编码单元430,用于根据优选预测模式和残差进行编码,获得编码比特,并将编码比特写入码流。
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。还应当理解,本申请实施例涉及的编码装置400中的各个单 元可以分别或全部合并为一个或若干个另外的单元来构成,或者其中的某个(些)单元还可以再拆分为功能上更小的多个单元来构成,这可以实现同样的操作,而不影响本申请的实施例的技术效果的实现。上述单元是基于逻辑功能划分的,在实际应用中,一个单元的功能也可以由多个单元来实现,或者多个单元的功能由一个单元实现。在本申请的其它实施例中,该编码装置400也可以包括其它单元,在实际应用中,这些功能也可以由其它单元协助实现,并且可以由多个单元协作实现。根据本申请的另一个实施例,可以通过在包括例如中央处理单元(CPU)、随机存取存储介质(RAM)、只读存储介质(ROM)等处理元件和存储元件的通用计算机的通用计算设备上运行能够执行相应方法所涉及的各步骤的计算机程序(包括程序代码),来构造本申请实施例涉及的编码装置400,以及来实现本申请实施例提供的编码方法。计算机程序可以记载于例如计算机可读存储介质上,并通过计算机可读存储介质装载于任意具有数据处理能力的电子设备,并在其中运行,来实现本申请实施例的相应方法。
换言之,上文涉及的单元可以通过硬件形式实现,也可以通过软件形式的指令实现,还可以通过软硬件结合的形式实现。具体地,本申请实施例中的方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路和/或软件形式的指令完成,结合本申请实施例公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件组合执行完成。可选地,软件可以位于随机存储器,闪存、只读存储器、可编程只读存储器、电可擦写可编程存储器、寄存器等本领域的成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法实施例中的步骤。
图16是本申请实施例提供的编解码设备500的示意结构图。
如图16所示,该编解码设备500至少包括处理器510以及计算机可读存储介质520。其中,处理器510以及计算机可读存储介质520可通过总线或者其它方式连接。计算机可读存储介质520用于存储计算机程序521,计算机程序521包括计算机指令,处理器510用于执行计算机可读存储介质520存储的计算机指令。处理器510是编解码设备500的计算核心以及控制核心,其适于实现一条或多条计算机指令,具体适于加载并执行一条或多条计算机指令从而实现相应方法流程或相应功能。
作为示例,处理器510也可称为中央处理器(Central Processing Unit,CPU)。处理器510可以包括但不限于:通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等等。
作为示例,计算机可读存储介质520可以是高速RAM存储器,也可以是非不稳定的存储器(Non-VolatileMemory),例如至少一个磁盘存储器;可选的,还可以是至少一个位于远离前述处理器510的计算机可读存储介质。具体而言,计算机可读存储介质520包括但不限于:易失性存储器和/或非易失性存储器。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。
在一种实现方式中,该编解码设备500可以是图4所示的编码框架或图15所示的编码装置400;该计算机可读存储介质520中存储有第一计算机指令;由处理器510加载并执行计算机可读存储介质520中存放的第一计算机指令,以实现本申请实施例提供的编码方法中的相应步骤,为避免重复,此处不再赘述。
根据本申请的另一方面,本申请实施例还提供了一种计算机可读存储介质(Memory),计算机可读存储介质是编解码设备500中的记忆设备,用于存放程序和数据。例如,计算机可读存储介质520。可以理解的是,此处的计算机可读存储介质520既可以包括编解码设备500中的内置存储介质,当然也可以包括编解码设备500所支持的扩展存储介质。计算机可读存储介质提供存储空间,该存储空间存储了编解码设备500的操作系统。并且,在该存储空间中还存放了适于被处理器510加载并执行的一条或多条的计算机指令,这些计算机指令可以是一个或多个的计算机程序521(包括程序代码)。这些计算机指令指令用于计算机执行上述各种可选方式中提供的编码方法。
根据本申请的另一方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。例如,计算机程序521。此时,编解码设备500可以是计算机,处理器510从计算机可读存储介质520读取该计算机指令,处理器510执行该计算机指令,使得该计算机执行上述各种可选方式中提供的编码方法。
换言之,当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地运行本申请实施例的流程或实现本申请实施例的功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质进行传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元以及流程步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
最后需要说明的是,以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (25)

  1. 一种点云编码方法,其特征在于,包括:
    基于当前点云的几何信息,确定所述当前点云的属性信息的编码顺序;
    基于所述编码顺序中待编码当前点的N个邻居点,确定所述待编码当前点的属性信息的优选预测模式,N为大于等于1的整数;其中,
    在所述N个邻居点的属性信息差异大于第一阈值的情况下,根据率失真优化RDO机制确定所述待编码当前点的属性信息的优选预测模式;
    根据所述优选预测模式,对所述待编码当前点的属性信息进行预测,得到所述待编码当前点的属性信息的预测值;
    根据所述待编码当前点的属性信息的预测值,确定所述待编码当前点的属性信息的残差;
    根据所述优选预测模式和所述残差进行编码,获得编码比特,并将所述编码比特写入码流。
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述编码顺序中待编码当前点的N个邻居点,确定所述待编码当前点的属性信息的优选预测模式,还包括:
    在所述N个邻居点的属性信息差异小于第一阈值的情况下,确定第一预测模式为所述待编码当前点的属性信息的优选预测模式。
  3. 根据权利要求2所述的方法,其特征在于,根据所述第一预测模式对所述待编码当前点的属性信息进行预测,得到所述待编码当前点的属性信息的预测值,包括:
    根据所述N个邻居点的属性信息的加权平均,确定所述待编码当前点的属性信息的预测值。
  4. 根据权利要求1所述的方法,其特征在于,所述根据率失真优化RDO机制确定所述待编码当前点的属性信息的优选预测模式,包括:
    根据所述待编码当前点的N个邻居点,确定K个第二预测模式;
    确定所述待编码当前点的属性信息在一个或多个第二预测模式的重建失真和编码码率,K为大于等于2的整数;
    根据所述待编码当前点的属性信息在一个或多个第二预测模式的重建失真和编码码率,确定所述K个第二预测模式中的优选预测模式,用于对所述待编码当前点的属性信息进行预测。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述待编码当前点的属性信息在一个或多个第二预测模式的重建失真和编码码率,确定所述K个第二预测模式中的优选预测模式,包括:
    根据一个第二预测模式的重建失真,确定所述第二预测模式的失真参数D;
    根据所述第二预测模式的编码码率,确定所述第二预测模式的码率参数R;
    根据所述第二预测模式的失真参数D和码率参数R,确定所述第二预测模式的代价值;
    确定所述K个第二预测模式中代价值最小的第二预测模式为优选预测模式。
  6. 根据权利要求5所述的方法,其特征在于,所述根据一个第二预测模式的重建失真,确定所述第二预测模式的失真参数D,包括:
    确定所述待编码当前点的属性信息在一个第二预测模式的重建值;
    根据所述待编码当前点的属性信息在所述第二预测模式的重建值和原始值,确定所述待编码当前点的属性信息在所述第二预测模式的重建失真;
    根据所述待编码当前点的属性信息在所述第二预测模式的重建失真,确定所述第二预测模式的失真参数D。
  7. 根据权利要求6所述的方法,其特征在于,所述确定所述待编码当前点的属性信息在一个第二预测模式的重建值,包括:
    确定所述待编码当前点的属性信息在一个第二预测模式的预测值;
    根据所述待编码当前点的属性信息在所述第二预测模式的预测值和原始值,确定所述待编码当前点的属性信息在所述第二预测模式的残差;
    对所述残差进行量化与反量化;
    根据反量化后的残差和所述预测值,确定所述待编码当前点的属性信息在所述第二预测模式的重建值。
  8. 根据权利要求5至7任一所述的方法,其特征在于,所述根据一个第二预测模式的重建失真,确定所述第二预测模式的失真参数D,包括:
    确定所述待编码当前点的属性信息在一个第二预测模式的重建失真为所述第二预测模式的失真参数D;或,
    确定所述待编码当前点的属性信息在一个第二预测模式的重建失真的平方为所述第二预测模式的失真参数D。
  9. 根据权利要求4所述的方法,其特征在于,所述待编码当前点的属性信息在一个第二预测模式的重建失真为基于失真模型对利用所述第二预测模式对所述待编码当前点的属性信息进行预测编码的失真的估计值。
  10. 根据权利要求4所述的方法,其特征在于,所述待编码当前点的属性信息在一个第二预测模式的编码码率为基于所述第二预测模式对所述待编码当前点的属性信息进行预测编码所需编码比特数。
  11. 根据权利要求4所述的方法,其特征在于,所述待编码当前点的属性信息在一个第二预测模式的编码码率为对指示所述第二预测模式的参数和所述待编码当前点的属性信息在所述第二预测模式的残差进行编码所需编码比特数。
  12. 根据权利要求5所述的方法,其特征在于,所述根据所述第二预测模式的编码码率,确定所述第二预测模式的码率参数R,包括:
    确定所述待编码当前点的属性信息在所述第二预测模式的残差的量化值;
    确定所述第二预测模式下,编码所述残差的量化值所需的第一编码比特数,以及编码指示所述第二预测模式的参数所需的第二编码比特数;
    根据所述第一编码比特数和所述第二编码比特数,确定所述第二预测模式的码率参数R。
  13. 根据权利要求5所述的方法,其特征在于,所述根据所述第二预测模式的编码码率,确定所述第二预测模式的码率参数R,包括:
    确定所述待编码当前点的属性信息在所述第二预测模式的残差的量化值;
    确定所述第二预测模式下,对所述残差的量化值和指示所述第二预测模式的参数进行混合编码所需的第三编码比特数;
    基于预设概率模型,确定所述第三编码比特数所对应的码率参数为所述第二预测模式对应的码率参数R。
  14. 根据权利要求5所述的方法,其特征在于,所述根据所述第二预测模式的失真参数D和码率参数R,确定所述第二预测模式的代价值,包括:
    基于如下公式确定所述代价值:
    J=D+λ×R,
    其中,J为所述代价值,D为所述失真参数,R为所述码率参数,λ为预设系数。
  15. 根据权利要求14所述的方法,其特征在于,所述λ基于如下公式确定:
    λ=α×Qstep×Qstep,
    其中,Qstep为对所述待编码当前点的属性信息在所述第二预测模式下的残差进行量化的量化步长,α为预设值。
  16. 根据权利要求15所述的方法,其特征在于,所述α为0.55。
  17. 根据权利要求14所述的方法,其特征在于,所述λ基于如下公式确定:
    λ=β×Qstep,
    其中,Qstep为对所述待编码当前点的属性信息在所述第二预测模式下的残差进行量化的量化步长,β为预设值。
  18. 根据权利要求17所述的方法,其特征在于,β的取值范围为[0.01,1]。
  19. 根据权利要求17所述的方法,其特征在于,β为0.11或0.26。
  20. 根据权利要求14所述的方法,其特征在于,所述λ基于如下方式确定:
    测量一个或多个量化参数Qp中待测量Qp对应的一个或多个λ取值的性能;
    确定所述待测量Qp的一个或多个λ取值中性能最优者,设置为所述待测量Qp的最优λ;
    根据确定出的所述一个或多个量化参数Qp对应的一个或多个最优λ,确定最终的λ。
  21. 根据权利要求1至20任一所述的方法,其特征在于,所述属性信息为反射率。
  22. 一种点云编码装置,其特征在于,包括:
    确定单元,用于基于当前点云的几何信息,确定所述当前点云的属性信息的编码顺序;
    基于所述编码顺序中待编码当前点的N个邻居点,确定所述待编码当前点的属性信息的优选预测模式,N为大于等于1的整数;其中,
    在所述N个邻居点的属性信息差异大于第一阈值的情况下,根据率失真优化RDO机制确定所述待编码当前点的属性信息的优选预测模式;
    预测单元,用于根据所述优选预测模式,对所述待编码当前点的属性信息进行预测,得到所述待编码当前点的属性信息的预测值;
    所述确定单元,还用于根据所述待编码当前点的属性信息的预测值,确定所述待编码当前点的属性信息的残差;
    编码单元,用于根据所述优选预测模式和所述残差进行编码,获得编码比特,并将所述编码比特写入码流。
  23. 一种编码设备,其特征在于,包括:
    处理器,适于执行计算机程序;
    计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被所述处理器执行时,实现如权利要求1至21中任一项所述的编码方法。
  24. 一种计算机可读存储介质,其特征在于,用于存储计算机程序,所述计算机程序使得计算机执行如权利要求1至21中任一项所述的编码方法。
  25. 一种码流,其特征在于,所述码流为如权利要求1至21中任一项所述的方法生成的码流。
PCT/CN2022/098709 2022-06-14 2022-06-14 点云编码方法、编码装置、编码设备以及存储介质 WO2023240455A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2022/098709 WO2023240455A1 (zh) 2022-06-14 2022-06-14 点云编码方法、编码装置、编码设备以及存储介质
TW112121142A TW202404363A (zh) 2022-06-14 2023-06-06 點雲編碼方法、編碼裝置、編碼設備、儲存媒介以及碼流

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/098709 WO2023240455A1 (zh) 2022-06-14 2022-06-14 点云编码方法、编码装置、编码设备以及存储介质

Publications (1)

Publication Number Publication Date
WO2023240455A1 true WO2023240455A1 (zh) 2023-12-21

Family

ID=89192978

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/098709 WO2023240455A1 (zh) 2022-06-14 2022-06-14 点云编码方法、编码装置、编码设备以及存储介质

Country Status (2)

Country Link
TW (1) TW202404363A (zh)
WO (1) WO2023240455A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200111236A1 (en) * 2018-10-03 2020-04-09 Apple Inc. Point cloud compression using fixed-point numbers
CN111405281A (zh) * 2020-03-30 2020-07-10 北京大学深圳研究生院 一种点云属性信息的编码方法、解码方法、存储介质及终端设备
CN113454691A (zh) * 2019-03-26 2021-09-28 腾讯美国有限责任公司 自适应点云属性编解码的方法和装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200111236A1 (en) * 2018-10-03 2020-04-09 Apple Inc. Point cloud compression using fixed-point numbers
CN113454691A (zh) * 2019-03-26 2021-09-28 腾讯美国有限责任公司 自适应点云属性编解码的方法和装置
CN111405281A (zh) * 2020-03-30 2020-07-10 北京大学深圳研究生院 一种点云属性信息的编码方法、解码方法、存储介质及终端设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WG 7: "G-PCC codec description", INTERNATIONAL ORGANIZATION FOR STANDARDIZATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC 1/SC 29/WG 7 MPEG 3D GRAPHICS CODING. ISO/IEC JTC 1/SC 29/WG 7 N0151, no. N0151, 1 July 2021 (2021-07-01), XP093056421 *

Also Published As

Publication number Publication date
TW202404363A (zh) 2024-01-16

Similar Documents

Publication Publication Date Title
WO2023130333A1 (zh) 编解码方法、编码器、解码器以及存储介质
US20230237707A1 (en) Point cloud encoding and decoding method, encoder, decoder and codec system
WO2022062369A1 (zh) 点云编解码方法与系统、及点云编码器与点云解码器
WO2022133753A1 (zh) 点云编解码方法与系统、及点云编码器与点云解码器
US20230351639A1 (en) Point cloud encoding and decoding method, encoder and decoder
US20230237704A1 (en) Point cloud decoding and encoding method, and decoder, encoder and encoding and decoding system
WO2023240455A1 (zh) 点云编码方法、编码装置、编码设备以及存储介质
WO2023015530A1 (zh) 点云编解码方法、编码器、解码器及计算机可读存储介质
WO2023159428A1 (zh) 编码方法、编码器以及存储介质
WO2023197337A1 (zh) 索引确定方法、装置、解码器以及编码器
WO2023197338A1 (zh) 索引确定方法、装置、解码器以及编码器
WO2023023918A1 (zh) 解码方法、编码方法、解码器以及编码器
WO2022257155A1 (zh) 解码方法、编码方法、解码器、编码器以及编解码设备
WO2022133752A1 (zh) 点云的编码方法、解码方法、编码器以及解码器
WO2023123284A1 (zh) 一种解码方法、编码方法、解码器、编码器及存储介质
WO2023240660A1 (zh) 解码方法、编码方法、解码器以及编码器
WO2022217472A1 (zh) 点云编解码方法、编码器、解码器及计算机可读存储介质
WO2022188582A1 (zh) 点云中邻居点的选择方法、装置及编解码器
WO2024077548A1 (zh) 点云解码方法、点云编码方法、解码器和编码器
WO2024065272A1 (zh) 点云编解码方法、装置、设备及存储介质
WO2022140937A1 (zh) 点云编解码方法与系统、及点云编码器与点云解码器
WO2022133755A1 (zh) 点云的解码方法、编码方法、解码器以及编码器
WO2023240662A1 (zh) 编解码方法、编码器、解码器以及存储介质
WO2022257528A1 (zh) 点云属性的预测方法、装置及相关设备
CN118075464A (zh) 点云属性的预测方法、装置及编解码器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22946154

Country of ref document: EP

Kind code of ref document: A1