WO2023197338A1 - 索引确定方法、装置、解码器以及编码器 - Google Patents

索引确定方法、装置、解码器以及编码器 Download PDF

Info

Publication number
WO2023197338A1
WO2023197338A1 PCT/CN2022/087244 CN2022087244W WO2023197338A1 WO 2023197338 A1 WO2023197338 A1 WO 2023197338A1 CN 2022087244 W CN2022087244 W CN 2022087244W WO 2023197338 A1 WO2023197338 A1 WO 2023197338A1
Authority
WO
WIPO (PCT)
Prior art keywords
index
node
current node
value
current
Prior art date
Application number
PCT/CN2022/087244
Other languages
English (en)
French (fr)
Inventor
杨付正
李明
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to PCT/CN2022/087244 priority Critical patent/WO2023197338A1/zh
Publication of WO2023197338A1 publication Critical patent/WO2023197338A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]

Definitions

  • the embodiments of the present application relate to the field of coding and decoding technology, and more specifically, to an index determination method, device, decoder, and encoder.
  • Point cloud has begun to spread into various fields, such as virtual/augmented reality, robotics, geographic information systems, medical fields, etc.
  • various fields such as virtual/augmented reality, robotics, geographic information systems, medical fields, etc.
  • a large number of point clouds on the surface of objects can be accurately obtained, often corresponding to hundreds of thousands of points in one scene.
  • Such a large number of points also poses challenges for computer storage and transmission. Therefore, point compression has become a hot issue.
  • the encoder For point cloud compression, it is mainly necessary to compress its location information and attribute information. Specifically, the encoder first obtains the divided nodes by performing octree division on the position information of the point cloud, and then performs arithmetic coding on the current node to be encoded to obtain the geometric code stream; at the same time, the encoder divides the point cloud according to the octree After the position information of the current point is selected from the encoded points to predict the predicted value of the attribute information of the current point, its attribute information is predicted based on the selected point, and then compared with the original value of the attribute information. Different ways to encode attribute information to obtain attribute code streams of point clouds.
  • the encoder can use the spatial correlation between the current node to be encoded and surrounding nodes to perform intra prediction on the placeholder bits to obtain the index of the current node, and perform arithmetic coding based on the index of the current node. , to implement Context-based Adaptive Binary Arithmetic Coding (CABAC) based on the context model to obtain the geometric code stream.
  • CABAC Context-based Adaptive Binary Arithmetic Coding
  • Embodiments of the present application provide an index determination method, device, decoder, and encoder, which can improve the accuracy of the index for the current node, thereby improving decoding performance.
  • this application provides an index determination method, including:
  • the first index of the current node is determined based on the occupied child nodes of at least one decoded neighbor node of the current node on a plane perpendicular to the k-th axis.
  • this application provides an index determination method, including:
  • the first index of the current node is determined based on the occupied child nodes of the encoded at least one neighbor node of the current node on a plane perpendicular to the k-th axis.
  • this application provides an index determination device, including:
  • a determining unit configured to determine a first index of the current node based on the occupied child nodes of at least one decoded neighbor node of the current node on a plane perpendicular to the k-th axis.
  • this application provides an index determination device, including:
  • a determining unit configured to determine a first index of the current node based on the occupied child nodes of at least one encoded neighbor node of the current node on a plane perpendicular to the k-th axis.
  • this application provides a decoder, including:
  • a processor adapted to implement computer instructions
  • the computer-readable storage medium stores computer instructions, and the computer instructions are suitable for the processor to load and execute the decoding method in the above-mentioned first aspect or its respective implementations.
  • processors there are one or more processors and one or more memories.
  • the computer-readable storage medium may be integrated with the processor, or the computer-readable storage medium may be provided separately from the processor.
  • this application provides an encoder, including:
  • a processor adapted to implement computer instructions
  • the computer-readable storage medium stores computer instructions, and the computer instructions are suitable for the processor to load and execute the encoding method in the above-mentioned second aspect or its respective implementations.
  • processors there are one or more processors and one or more memories.
  • the computer-readable storage medium may be integrated with the processor, or the computer-readable storage medium may be provided separately from the processor.
  • the present application provides a computer-readable storage medium that stores computer instructions.
  • the computer instructions When the computer instructions are read and executed by a processor of a computer device, the computer device performs the above-mentioned first aspect.
  • the present application provides a code stream, which is the code stream involved in the first aspect or the code stream involved in the second aspect.
  • this application determines the first index of the current node based on the occupied child nodes of at least one encoded neighbor node of the current node on the plane perpendicular to the k-th axis, which can make better and more detailed use of the current node.
  • the spatial correlation between the node and the at least one neighbor node predicts the first index of the current node, which improves the accuracy of the first index, thereby improving decoding performance.
  • Figure 1 is an example of a point cloud image provided by an embodiment of this application.
  • Figure 2 is a partial enlarged view of the point cloud image shown in Figure 1.
  • Figure 3 is an example of a point cloud image with six viewing angles provided by an embodiment of the present application.
  • Figure 4 is a schematic block diagram of a coding framework provided by an embodiment of the present application.
  • Figure 5 is an example of a bounding box provided by an embodiment of the present application.
  • Figure 6 is an example of octree division of bounding boxes provided by the embodiment of the present application.
  • Figures 7 to 9 show the arrangement sequence of Morton codes in two-dimensional space.
  • Figure 10 shows the arrangement order of Morton codes in three-dimensional space.
  • FIG 11 is a schematic block diagram of the LOD layer provided by an embodiment of the present application.
  • Figure 12 is a schematic block diagram of a decoding framework provided by an embodiment of the present application.
  • Figure 13 is a schematic flow chart of an index determination method provided by an embodiment of the present application.
  • Figure 14 is an example of occupied child nodes of neighbor nodes on the S axis provided by the embodiment of the present application.
  • Figure 15 is another schematic flow chart of the index determination method provided by the embodiment of the present application.
  • Figure 16 is another schematic flow chart of the index determination method provided by the embodiment of the present application.
  • Figure 17 is a schematic block diagram of an index determination device provided by an embodiment of the present application.
  • Figure 18 is another schematic block diagram of an index determination device provided by an embodiment of the present application.
  • Figure 19 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
  • Point Cloud is a set of discrete points randomly distributed in space that expresses the spatial structure and surface properties of a three-dimensional object or scene.
  • Figures 1 and 2 show three-dimensional point cloud images and local enlargements respectively. It can be seen that the point cloud surface is composed of densely distributed points.
  • Two-dimensional images have information expressed in each pixel, so there is no need to record additional position information; however, the distribution of points in the point cloud in the three-dimensional space is random and irregular, so it is necessary to record the location of each point in the space. Only the position in can completely express a point cloud. Similar to two-dimensional images, each point in the point cloud has corresponding attribute information, usually an RGB color value, and the color value reflects the color of the object; for point clouds, the attribute information corresponding to each point is in addition to color. , or it can be a reflectance value, which reflects the surface material of the object. Each point in the point cloud may include geometric information and attribute information. The geometric information of each point in the point cloud refers to the Cartesian three-dimensional coordinate data of the point.
  • the attribute information of each point in the point cloud may include but is not limited to At least one of the following: color information, material information, laser reflection intensity information.
  • Color information can be information in any color space.
  • the color information may be Red Green Blue (RGB) information.
  • the color information may also be brightness and chromaticity (YCbCr, YUV) information.
  • Y represents brightness (Luma)
  • Cb(U) represents the blue chromaticity component
  • Cr(V) represents the red chromaticity component.
  • Each point in the point cloud has the same amount of attribute information.
  • each point in the point cloud has two attribute information: color information and laser reflection intensity.
  • each point in the point cloud has three attribute information: color information, material information and laser reflection intensity information.
  • a point cloud image can have multiple viewing angles.
  • the point cloud image as shown in Figure 3 can have six viewing angles.
  • the data storage format corresponding to the point cloud image consists of a file header information part and a data part.
  • the header information It includes data format, data representation type, total number of point cloud points, and content represented by the point cloud.
  • Point clouds can flexibly and conveniently express the spatial structure and surface properties of three-dimensional objects or scenes, and because point clouds are obtained by directly sampling real objects, they can provide a strong sense of reality while ensuring accuracy, so they are widely used and their scope Including virtual reality games, computer-aided design, geographic information systems, automatic navigation systems, digital cultural heritage, free-viewpoint broadcasting, three-dimensional immersive telepresence, three-dimensional reconstruction of biological tissues and organs, etc.
  • point clouds can be divided into two categories based on application scenarios, namely, machine-perceived point clouds and human-eye-perceived point clouds.
  • the application scenarios of machine-perceived point cloud include but are not limited to: autonomous navigation system, real-time inspection system, geographical information system, visual sorting robot, rescue and disaster relief robot and other point cloud application scenarios.
  • the application scenarios of point clouds perceived by the human eye include but are not limited to: digital cultural heritage, free viewpoint broadcasting, three-dimensional immersive communication, three-dimensional immersive interaction and other point cloud application scenarios.
  • the point cloud can be divided into dense point cloud and sparse point cloud based on the point cloud acquisition method; the point cloud can also be divided into static point cloud and dynamic point cloud based on the point cloud acquisition method.
  • point cloud More specifically, it can It is divided into three types of point clouds, namely the first static point cloud, the second type dynamic point cloud and the third type dynamically acquired point cloud.
  • first static point cloud the object is stationary, and the device for acquiring the point cloud is also stationary;
  • second type of dynamic point cloud the object is moving, but the device for acquiring the point cloud is stationary;
  • third type of dynamic point cloud To obtain point cloud, the device that obtains point cloud is moving.
  • point cloud collection methods include but are not limited to: computer generation, 3D laser scanning, 3D photogrammetry, etc.
  • Computers can generate point clouds of virtual three-dimensional objects and scenes;
  • 3D laser scanning can obtain point clouds of static real-world three-dimensional objects or scenes, and can obtain millions of point clouds per second;
  • 3D photogrammetry can obtain dynamic real-world three-dimensional objects or scenes
  • Point clouds can obtain tens of millions of point clouds per second.
  • point clouds on the surface of objects can be collected through collection equipment such as photoelectric radar, lidar, laser scanners, and multi-view cameras.
  • the point cloud obtained according to the principle of laser measurement can include the three-dimensional coordinate information of the point and the laser reflection intensity (reflectance) of the point.
  • the point cloud obtained according to the principle of photogrammetry may include the three-dimensional coordinate information of the point and the color information of the point.
  • the point cloud is obtained by combining the principles of laser measurement and photogrammetry, which may include the three-dimensional coordinate information of the point, the laser reflection intensity (reflectance) of the point, and the color information of the point.
  • These technologies reduce the cost and time period of point cloud data acquisition and improve the accuracy of the data.
  • point clouds of biological tissues and organs can be obtained using magnetic resonance imaging (MRI), computed tomography (CT), and electromagnetic positioning information.
  • MRI magnetic resonance imaging
  • CT computed tomography
  • electromagnetic positioning information These technologies reduce the cost and time period of point cloud acquisition and improve the accuracy of data. Changes in the way of obtaining point cloud data have made it possible to obtain large amounts of point cloud data. With the growth of application requirements, the processing of massive 3D point cloud data has encountered bottlenecks limited by storage space and transmission bandwidth.
  • each point in the point cloud of each frame has coordinate information xyz (float) and color information RGB.
  • Point cloud compression generally uses point cloud geometric information and attribute information to be compressed separately.
  • the point cloud geometric information is first encoded in the geometry encoder, and then the reconstructed geometric information is input into the attribute encoder as additional information to assist Point cloud attribute compression;
  • the point cloud geometric information is first decoded in the geometry decoder, and then the decoded geometric information is input into the attribute decoder as additional information to assist in point cloud attribute compression.
  • the entire codec consists of pre-processing/post-processing, geometry encoding/decoding, and attribute encoding/decoding.
  • the point cloud can be encoded and decoded through various types of encoding frameworks and decoding frameworks, respectively.
  • the codec framework may be the Geometry Point Cloud Compression (G-PCC) codec framework or the Video Point Cloud Compression (Video Point Cloud Compression) provided by the Moving Picture Experts Group (MPEG) , V-PCC) encoding and decoding framework, or it can be the AVS-PCC encoding and decoding framework or the Point Cloud Compression Reference Platform (PCRM) framework provided by the Audio Video Coding Standard (AVS) topic group.
  • G-PCC Geometry Point Cloud Compression
  • MPEG Moving Picture Experts Group
  • V-PCC Video Point Cloud Compression
  • PCM Point Cloud Compression Reference Platform
  • the G-PCC encoding and decoding framework can be used to compress the first static point cloud and the third type of dynamically acquired point cloud, and the V-PCC encoding and decoding framework can be used to compress the second type of dynamic point cloud.
  • the G-PCC encoding and decoding framework is also called point cloud codec TMC13, and the V-PCC encoding and decoding framework is also called point cloud codec TMC2.
  • G-PCC and AVS-PCC both target static sparse point clouds, and their coding frameworks are roughly the same.
  • the following uses the G-PCC framework as an example to describe the encoding and decoding framework applicable to the embodiments of the present application.
  • the input point cloud is first divided into slices, and then the divided slices are independently encoded.
  • the geometric information of the point cloud and the attribute information corresponding to the points in the point cloud are encoded separately.
  • the G-PCC coding framework first encodes geometric information; specifically, coordinate transformation is performed on the geometric information so that all point clouds are contained in a bounding box; then quantization is performed. This quantization step mainly serves the purpose of scaling. Due to the quantization and rounding, the geometric information of a part of the points is the same, and whether to remove duplicate points is decided based on the parameters. The process of quantization and removal of duplicate points is also called the voxelization process.
  • the bounding box is divided based on the octree. According to the different depths of octree division levels, the coding of geometric information is divided into a geometric information coding framework based on octree and a geometric information coding framework based on triangle patch set (triangle soup, trisoup).
  • the bounding box is first divided into eight equal parts into eight sub-cubes, and the placeholder bits of the sub-cubes are recorded (1 is non-empty, 0 is empty), and the non-empty sub-cubes are continued. Divide into eight equal parts, and usually stop dividing when the leaf nodes obtained by the division are 1x1x1 unit cubes.
  • the spatial correlation between the node and the surrounding nodes is used to perform intra prediction on the placeholder bits, and the corresponding binary arithmetic encoder is selected for arithmetic coding based on the prediction results to achieve automatic prediction based on the context model.
  • Adapt to Binary Arithmetic Coding Context-based Adaptive Binary Arithmetic Coding, CABAC
  • CABAC Context-based Adaptive Binary Arithmetic Coding
  • octree division is also required first, but unlike the geometric information encoding framework based on octrees, the geometric information encoding framework based on triangular patch sets does not require points to be
  • the cloud is divided step by step into unit cubes with side lengths of 1x1x1, and the division stops when the side length of the block is W.
  • the tenth relationship between the surface and the block is obtained. There are at most twelve intersection points (vertex) generated by the two edges, and then the coordinates of the intersection points of each block are sequentially encoded and a binary code stream is generated.
  • the G-PCC coding framework reconstructs the geometric information after completing the geometric information encoding, and uses the reconstructed geometric information to encode the attribute information of the point cloud.
  • the attribute encoding of point cloud is mainly to encode the color information of points in the point cloud.
  • the G-PCC encoding framework can perform color space conversion on the color information of the points. For example, when the color information of the points in the input point cloud is represented by the RGB color space, the G-PCC encoding framework can convert the color information from the RGB color space. to YUV color space. Then, the G-PCC encoding framework uses the reconstructed geometric information to recolor the point cloud so that the unencoded attribute information corresponds to the reconstructed geometric information.
  • RAHT Region Adaptive Hierarchal Transform
  • Figure 4 is a schematic block diagram of a coding framework provided by an embodiment of the present application.
  • the encoding framework 100 can obtain the location information and attribute information of the point cloud from the collection device.
  • the coding of point cloud includes position coding and attribute coding.
  • the process of position encoding includes: preprocessing the original point cloud by coordinate transformation, quantization and removing duplicate points; constructing an octree and then encoding to form a geometric code stream.
  • the position encoding process of the encoder can be realized through the following units:
  • Coordinate transformation transformation (Tanmsform coordinates) unit 101, quantize and remove points (Quantize and remove points) unit 102, octree analysis (Analyze octree) unit 103, geometric reconstruction (Reconstruct geometry) unit 104 and first arithmetic coding (Arithmetic) encode) unit 105.
  • the coordinate transformation unit 101 may be used to transform the world coordinates of points in the point cloud into relative coordinates. For example, the geometric coordinates of a point are subtracted from the minimum value of the xyz coordinate axis, which is equivalent to the DC operation to transform the coordinates of the points in the point cloud from world coordinates to relative coordinates, and make the point cloud all contained in a bounding box. (bounding box).
  • the quantization and duplicate point removal unit 102 can reduce the number of coordinates through quantization; after quantization, originally different points may be assigned the same coordinates. Based on this, duplicate points can be deleted through a deduplication operation; for example, points with the same quantized position and Multiple clouds of different attribute information can be merged into one cloud through attribute transformation.
  • the quantization and repetitive point removal unit 102 is an optional unit module.
  • the octree analysis unit 103 may encode the quantized point position information using an octree encoding method.
  • the point cloud is regularized in the form of an octree, so that the position of the point can correspond to the position of the octree one by one. By counting the positions of the points in the octree, and flagging them Record as 1 for geometric encoding.
  • the first arithmetic coding unit 105 can use entropy coding to arithmetic encode the position information output by the octree analysis unit 103, that is, use the arithmetic coding method to generate a geometric code stream for the position information output by the octree analysis unit 103; the geometric code stream is also It can be called a geometry bit stream.
  • a recursive octree structure is used to regularly express the points in the point cloud as the center of a cube.
  • the entire point cloud can be placed in a cube bounding box.
  • x min min(x 0 ,x 1 ,...,x K-1 );
  • y min min(y 0 ,y 1 ,...,y K-1 );
  • z min min(z 0 ,z 1 ,...,z K-1 );
  • x max max(x 0 ,x 1 ,...,x K-1 );
  • y max max(y 0 ,y 1 ,...,y K-1 );
  • z max max(z 0 ,z 1 ,...,z K-1 ).
  • origin of the bounding box (x origin , y origin , z origin ) can be calculated as follows:
  • floor() represents downward rounding calculation or downward rounding calculation.
  • int() represents rounding operation.
  • the encoder can calculate the dimensions of the bounding box in the x-axis, y-axis, and z-axis directions based on the calculation formula of the boundary value and the origin as follows:
  • the encoder After the encoder obtains the dimensions of the bounding box in the x-axis, y-axis, and z-axis directions, it first divides the bounding box into an octree, obtaining eight sub-blocks each time, and then divides the non- Empty blocks (blocks containing points) are divided into octrees again, and this recursively divides until a certain depth.
  • the non-empty sub-blocks of the final size are called voxels.
  • Each voxel contains one or more points. , the geometric positions of these points are normalized to the center point of the voxel, and the attribute value of the center point is the average of the attribute values of all points in the voxel.
  • each voxel can be encoded based on the determined encoding sequence ( voxel), which encodes the point (or "node") represented by each voxel.
  • the encoder reconstructs the geometric information and uses the reconstructed geometric information to encode the attribute information.
  • the attribute encoding process includes: given the reconstructed information of the position information of the input point cloud and the true value of the attribute information, select one of the three prediction modes for point cloud prediction, quantify the predicted results, and perform arithmetic coding to form Attribute code stream.
  • the attribute encoding process of the encoder can be implemented through the following units:
  • Color space transform (Transform colors) unit 110 attribute transform (Transfer attributes) unit 111, Region Adaptive Hierarchical Transform (RAHT) unit 112, predicting transform (predicting transform) unit 113 and lifting transform (lifting transform) ) unit 114, a quantize unit 115 and a second arithmetic coding unit 116.
  • RAHT Region Adaptive Hierarchical Transform
  • the color space transformation unit 110 may be used to transform the RGB color space of points in the point cloud into YCbCr format or other formats.
  • the attribute transformation unit 111 may be used to transform attribute information of points in the point cloud to minimize attribute distortion. For example, in the case of geometric lossy coding, since the geometric information changes after the geometric coding, the attribute transformation unit 111 needs to reassign the attribute value to each point after the geometric coding, so that the reconstructed point cloud and the original point cloud can be compared. Attribute error is minimal.
  • the attribute information may be color information of a point.
  • the attribute transformation unit 111 can be used to obtain the original attribute value of the point.
  • any prediction unit can be selected to predict the points in the point cloud.
  • the unit for predicting points in the point cloud may include at least one of the RAHT 112, the predicting transform unit 113, and the lifting transform unit 114.
  • any one of the RAHT 112, the predicting transform unit 113, and the lifting transform unit 114 can be used to predict the attribute information of the point in the point cloud to obtain the attribute prediction value of the point, and then can Based on the attribute prediction value of the point, the residual value of the attribute information of the point is obtained.
  • the residual value of the attribute information of a point may be the original attribute value of the point minus the predicted attribute value of the point.
  • the quantization unit 115 may be used to quantize the residual value of the attribute information of the point. For example, if the quantization unit 115 is connected to the prediction transformation unit 113, the quantization unit 115 may be used to quantize the residual value of the attribute information of the point output by the prediction transformation unit 113. For example, the residual value of the point attribute information output by the prediction transformation unit 113 is quantized using a quantization step size to improve system performance.
  • the second arithmetic coding unit 116 may use zero run length coding to perform entropy coding on the residual value of the attribute information of the point to obtain the attribute code stream.
  • the attribute code stream may be bit stream information.
  • the prediction transformation unit 113 can be used to obtain the original order of the point cloud and divide the point cloud into a level of detail (LOD) based on the original order of the point cloud.
  • LOD level of detail
  • the prediction transformation unit 113 can The attribute information of the points in the LOD is predicted in sequence, and then the residual value of the attribute information of the point is calculated, so that subsequent units can perform subsequent quantization coding processing based on the residual value of the attribute information of the point.
  • For each point in the LOD based on the neighbor point search results on the LOD where the current point is located, find the three neighbor points before the current point, and then use the attribute reconstruction value of at least one of the three neighbor points to reconstruct the current point. Make a prediction and obtain the attribute prediction value of the current point; based on this, the residual value of the attribute information of the current point can be obtained based on the attribute prediction value of the current point and the original attribute value of the current point.
  • the original order of the point clouds obtained by the prediction transformation unit 113 may be the arrangement order obtained by the prediction transformation unit 113 performing Morton reordering on the current point cloud.
  • the encoder can obtain the original order of the current point cloud by reordering the current point cloud. After the encoder obtains the original order of the current point cloud, it can divide the points in the point cloud into layers according to the original order of the current point cloud. Obtain the LOD of the current point cloud, and then predict the attribute information of the points in the point cloud based on the LOD.
  • Figures 7 to 9 show the arrangement sequence of Morton codes in two-dimensional space.
  • the encoder can adopt the "z"-shaped Morton arrangement sequence in the two-dimensional space formed by 2*2 blocks.
  • the encoder can adopt the "z"-shaped Morton arrangement sequence in the two-dimensional space formed by four 2*2 blocks.
  • the "z"-shaped Morton arrangement we can finally get the Morton arrangement used by the encoder in the two-dimensional space formed by 4*4 blocks.
  • the encoder can adopt the "z"-shaped Morton arrangement sequence in the two-dimensional space formed by four 4*4 blocks, where the two-dimensional space formed by each four 2*2 blocks and each
  • the "z"-shaped Morton arrangement sequence can also be used in the two-dimensional space formed by 2*2 blocks, and finally the Morton arrangement order adopted by the encoder in the two-dimensional space formed by 8*8 blocks can be obtained.
  • Figure 10 shows the arrangement order of Morton codes in three-dimensional space.
  • Morton's arrangement order is not only applicable to two-dimensional space, but can also be extended to three-dimensional space.
  • Figure 10 shows 16 points, inside each "z", each "z”
  • the Morton arrangement sequence between "z” and "z” is encoded first along the x-axis, then along the y-axis, and finally along the z-axis.
  • the LOD generation process includes: obtaining the Euclidean distance between points based on the position information of the points in the point cloud; dividing the points into different LOD layers based on the Euclidean distance.
  • different ranges of Euclidean distances can be divided into different LOD layers. For example, you can randomly pick a point as the first LOD layer. Then calculate the Euclidean distance between the remaining points and this point, and classify the points whose Euclidean distance meets the first threshold requirement into the second LOD layer.
  • the centroid of the midpoint of the second LOD layer calculate the Euclidean distance between points other than the first and second LOD layers and the centroid, and classify the points whose Euclidean distance meets the second threshold as the third LOD layer.
  • all points are classified into the LOD layer.
  • the threshold of the Euclidean distance By adjusting the threshold of the Euclidean distance, the number of LOD points in each layer can be increased.
  • the LOD layer division method can also adopt other methods, and this application does not limit this.
  • the point cloud can be directly divided into one or more LOD layers, or the point cloud can be divided into multiple point cloud slices first, and then each point cloud slice can be divided into one or more point cloud slices. LOD layer.
  • the point cloud can be divided into multiple point cloud slices, and the number of points in each point cloud slice can be between 550,000 and 1.1 million.
  • Each point cloud slice can be viewed as a separate point cloud.
  • Each point cloud slice can be divided into multiple LOD layers, and each LOD layer includes multiple points.
  • the LOD layer can be divided according to the Euclidean distance between points.
  • FIG 11 is a schematic block diagram of the LOD layer provided by an embodiment of the present application.
  • the point cloud includes multiple points arranged in original order, namely P0, P1, P2, P3, P4, P5, P6, P7, P8 and P9.
  • the assumption can be based on point and point
  • the Euclidean distance between them can divide the point cloud into 3 LOD layers, namely LOD0, LOD1 and LOD2.
  • LOD0 may include P0, P5, P4 and P2
  • LOD2 may include P1, P6 and P3
  • LOD3 may include P9, P8 and P7.
  • LOD0, LOD1 and LOD2 can be used to form the LOD-based order of the point cloud, namely P0, P5, P4, P2, P1, P6, P3, P9, P8 and P7.
  • the LOD-based order can be used as the encoding order of the point cloud.
  • the encoder when the encoder predicts the current point in the point cloud, it creates multiple predictor variable candidates based on the search results of neighbor points on the LOD where the current point is located, that is, the value of the index of the prediction mode (predMode) can be 0 ⁇ 3.
  • the encoder when using the prediction method to encode the attribute information of the current point, the encoder first finds the three neighbor points located before the current point based on the neighbor point search results on the LOD where the current point is located.
  • the prediction mode with index 0 refers to Based on the distance between the three neighbor points and the current point, the weighted average of the reconstructed attribute values of the three neighbor points is determined as the attribute prediction value of the current point; the prediction mode with index 1 refers to the nearest neighbor point among the three neighbor points.
  • the attribute reconstruction value of the current point is used as the attribute prediction value of the current point;
  • the prediction mode with an index of 2 means that the attribute reconstruction value of the next nearest neighbor point is used as the attribute prediction value of the current point;
  • the prediction mode with an index of 3 means that the three neighbor points are divided
  • the attribute reconstruction value of the neighbor point other than the nearest neighbor point and the next nearest neighbor point is used as the attribute prediction value of the current point; after obtaining the candidate attribute prediction value of the current point based on the various prediction modes mentioned above, the encoder can use rate distortion
  • the rate distortion optimization (RDO) technique selects the best attribute prediction value and then performs arithmetic coding on the selected attribute prediction value.
  • RDO rate distortion optimization
  • the index of the prediction mode at the current point is 0, no coding is required in the code stream to encode the index of the prediction mode. If the index of the prediction mode selected through RDO is 1, 2 or 3, then no coding is required in the code stream. Encoding the index of the selected prediction mode means encoding the index of the selected prediction mode into the attribute code stream.
  • the prediction mode with index 0 refers to the reconstructed attribute values of the neighboring points P0, P5 and P4 based on the distances of the neighboring points P0, P5 and P4.
  • the weighted average of is determined as the attribute prediction value of the current point P2;
  • the prediction mode with an index of 1 means that the attribute reconstruction value of the nearest neighbor point P4 is used as the attribute prediction value of the current point P2;
  • the prediction mode with an index of 2 means that the next neighbor
  • the attribute reconstruction value of point P5 is used as the attribute prediction value of the current point P2;
  • the prediction mode with index 3 refers to using the attribute reconstruction value of the next neighbor point P0 as the attribute prediction value of the current point P2.
  • the encoder first calculates the maximum difference maxDiff of its attributes for at least one neighbor point of the current point, and compares maxDiff with the set threshold. If it is less than the set threshold, the prediction mode of the weighted average of neighbor point attribute values is used; otherwise, the Use RDO technology to select the optimal prediction mode. Specifically, the encoder calculates the maximum attribute difference maxDiff of at least one neighbor point of the current point.
  • the rate distortion cost of the prediction mode with index 1, 2 or 3 can be calculated by the following formula:
  • J indx_i D indx_i + ⁇ R indx_i ;
  • J indx_i represents the rate distortion cost when the current point adopts the prediction mode with index i
  • is determined based on the quantization parameter of the current point
  • R indx_i represents the number of bits required in the code stream for the quantized residual value obtained when the current point adopts the prediction mode with index i.
  • the encoder determines the prediction mode used by the current point, it can determine the attribute prediction value attrPred of the current point based on the determined prediction mode, and then subtract the attribute original value attrValue of the current point from the attribute prediction value attrPred of the current point. And quantize the result to obtain the quantized residual value attrResidualQuant of the current point. For example, the encoder can determine the quantized residual value of the current point through the following formula:
  • AttrResidualQuant (attrValue-attrPred)/Qstep
  • AttrResidualQuant represents the quantized residual value of the current point
  • attrPred represents the attribute prediction value of the current point
  • attrValue represents the original attribute value of the current point
  • Qstep represents the quantization step size.
  • Qstep is calculated from the quantization parameter (Quantization Parameter, Qp).
  • the attribute reconstruction value of the current point can be used as a neighbor candidate of the subsequent point, and the reconstruction value of the current point is used to predict the attribute information of the subsequent point.
  • the encoder may reconstruct the attribute value of the current point determined based on the first quantized residual value through the following formula:
  • Recon represents the attribute reconstruction value of the current point determined based on the quantized residual value of the current point
  • attrResidualQuant represents the quantized residual value of the current point
  • Qstep represents the quantization step size
  • attrPred represents the attribute prediction value of the current point.
  • Qstep is calculated from the quantization parameter (Quantization Parameter, Qp).
  • the attribute predicted value (predictedvalue) of the current point may also be called the predicted value of the attribute information or the predicted color value (predictedColor).
  • the original attribute value of the current point can also be called the real value or the original color value of the attribute information of the current point.
  • the residual value of the current point can also be called the difference between the original attribute value of the current point and the predicted attribute value of the current point, or it can also be called the color residual value (residualColor) of the current point.
  • the reconstructed value of the attribute of the current point (reconstructedvalue) can also be called the reconstructed value of the attribute of the current point or the reconstructed color value (reconstructedColor).
  • Figure 12 is a schematic block diagram of the decoding framework 200 provided by the embodiment of the present application.
  • the decoding framework 200 can obtain the code stream of the point cloud from the encoding device, and obtain the position information and attribute information of the points in the point cloud by parsing the code.
  • the decoding of point clouds includes position decoding and attribute decoding.
  • the process of position decoding includes: arithmetic decoding of the geometric code stream; merging after constructing the octree, reconstructing the position information of the point to obtain the reconstructed information of the position information of the point; performing coordinates on the reconstructed information of the position information of the point Transform to obtain the position information of the point.
  • the position information of a point can also be called the geometric information of the point.
  • the attribute decoding process includes: by parsing the attribute code stream, obtaining the residual value of the attribute information of the point cloud; by dequantizing the residual value of the attribute information of the point, obtaining the residual value of the dequantized attribute information of the point value; based on the reconstruction information of the position information of the point obtained during the position decoding process, select one of the three prediction modes for point cloud prediction to obtain the attribute reconstruction value of the point; perform inverse color space transformation on the attribute reconstruction value of the point to Get the decoded point cloud.
  • position decoding can be achieved through the following units: the first arithmetic decoding unit 201, the octree analysis (synthesize octree) unit 202, the geometric reconstruction (Reconstruct geometry) unit 203, and the inverse transform coordinates unit 204.
  • Attribute encoding can be implemented through the following units: second arithmetic decoding unit 210, inverse quantize unit 211, RAHT unit 212, predicting transform unit 213, lifting transform unit 214 and color space inverse transform (inverse transform colors)Unit 215.
  • each unit in the decoding framework 200 can be referred to the functions of the corresponding units in the encoding framework 100 .
  • the decoding framework 200 can divide the point cloud into multiple LODs according to the Euclidean distance between points in the point cloud; then, decode the attribute information of the points in the LOD in sequence; for example, calculate the zero-run coding technology quantity (zero_cnt), decoding the residual with a zero-based quantity; then, the decoding framework 200 can perform inverse quantization based on the decoded residual value, and add the predicted value of the current point based on the inverse quantized residual value Get the reconstructed value of the point cloud until all point clouds are decoded. The current point will be used as the nearest neighbor of the subsequent LOD midpoint, and the reconstructed value of the current point will be used to predict the attribute information of subsequent points.
  • the encoder can use the spatial correlation between the current node to be encoded and surrounding nodes to perform intra prediction on the placeholder bits, and select the corresponding binary arithmetic encoder for arithmetic encoding based on the prediction results. , to implement Context-based Adaptive Binary Arithmetic Coding (CABAC) based on the context model to obtain the geometric code stream.
  • CABAC Context-based Adaptive Binary Arithmetic Coding
  • the encoder can use the encoded and decoded nodes before the current node to be stored, and use the plane information of the previous encoded and decoded node located on a certain plane and the distance between it and the current node to perform the first index of the current node. Determine, further determine the context index based on the determined first index, and then encode the current node based on the obtained context index.
  • the decoder determines that the first index of the current node is 0; if the plane information of the previous encoded and decoded node is 1, then the decoder The decoder determines that the first index of the current node is 1, otherwise, the decoder determines that the first index of the current node is -1.
  • the accuracy of determining the first index of the current node using the plane information of the encoded and decoded nodes is low, thereby reducing the encoding and decoding performance.
  • the previous encoded and decoded node is not necessarily the neighbor node of the current node.
  • the plane information of the encoded and decoded node is used to determine the first index of the current node , its accuracy is lower.
  • embodiments of the present application provide an index determination method, device, decoder, and encoder, which can improve the accuracy of the first index, thereby improving decoding performance.
  • Figure 13 is a schematic flow chart of the index determination method 300 provided by the embodiment of the present application. It should be understood that the index determination method 300 can be performed by a decoder. For example, it is applied to the decoding framework 200 shown in FIG. 12 . For the convenience of description, the following takes the decoder as an example.
  • the index determination method 300 may include:
  • the decoder determines the first index of the current node based on the occupied child nodes of at least one decoded neighbor node of the current node on the plane perpendicular to the k-th axis.
  • the first index of the current node is determined based on the occupied child nodes of at least one encoded neighbor node of the current node on the plane perpendicular to the k-th axis, avoiding the need to directly encode and decode the plane information of the node.
  • the spatial correlation between the current node and the at least one neighbor node can be used to determine the first index of the current node in a better and more detailed manner, thereby improving the accuracy of the first index. This improves decoding performance.
  • This embodiment determines the first index of the current node based on the occupied child nodes of at least one decoded neighbor node of the current node on a plane perpendicular to the k-th axis, which can bring about gains in decoding performance.
  • Table 2 shows the representative rate distortion (Bit distortion, BD-rate) under the condition of lossy compression of geometric information.
  • the BD-Rate expression under the condition of lossy compression of geometric information In the case of obtaining the same encoding quality, using this The ratio of the code rate when applying for the technical solution provided by this application to the percentage of code rate savings (BD-Rate is a negative value) or increase (BD-Rate is a positive value) when the technical solution provided by this application is not adopted.
  • Table 3 shows the Bpip ratio (Bpip Ratio) under the condition of lossless compression of geometric information.
  • the Bpip Ratio under the condition of lossless compression of geometric information indicates: without loss of point cloud quality, the code when using the technical solution provided by this application
  • the ratio is a percentage of the code rate when the technical solution provided by this application is not used. The lower the value, the greater the code rate savings when using the solution provided by this application for encoding and decoding.
  • Cat1-A represents a point cloud that only includes the reflectivity information of the point
  • Cat1-A average represents the average BD-rate of each component of Cat1A under lossy compression of geometric information
  • Cat1-B represents only Point cloud of points including the color information of the points.
  • Cat1-B average represents the average BD-rate of each component of Cat1-B under lossy compression of geometric information
  • Cat3-fused and Cat3-frame both represent the color information of the points and Point cloud of points with other attribute information.
  • Cat3-fused average represents the average BD-rate of each component of Cat3-fused under geometric information lossy compression
  • Cat3-frame average represents the average BD-rate of each component of Cat3-frame under geometric information lossy compression
  • overall average The value (Overall average) represents the average BD-rate of Cat1-A to Cat3-frame under geometric information lossy compression.
  • D1 represents the BD-Rate based on the same point-to-point error
  • D2 represents the BD-Rate based on the same point-to-surface error.
  • the index determination method provided by this application has obvious performance improvements for Cat1-A, Cat3-frame and Cat1-B.
  • the index determination method provided by this application can improve the performance of Cat1-A, Cat3-frame and Cat1-B.
  • the occupied child node of at least one decoded neighbor node of the current node of the decoder on a plane perpendicular to the k-th axis determines the first index of the current node, which may also be referred to as
  • the plane mode flag bit occ_plane_pos[k] of the current node on the k-th axis can also be called Planar contextualization of occ_plane_pos[k], or it can also be called the plane mode flag of the current node on the plane perpendicular to the k-th axis.
  • the occupied child node of the at least one neighbor node can also be equivalently replaced by a child node whose value of the occupied bit in the at least one neighbor node indicates a non-empty value or a term with a similar meaning, which is not specifically limited in this application.
  • the decoder may determine the occupied child node of the at least one neighbor node based on the occupied bits of each child node of the decoded at least one neighbor node of the current node on a plane perpendicular to the k-th axis. In other words, the decoder may predict the first index of the current node based on placeholder bits (or information) of decoded child nodes of at least one neighbor node of the current node on a plane perpendicular to the k-th axis.
  • occtree_planar_enabled indicates whether the current point cloud allows the use of planar mode. If occtree_planar_enabled is true, the decoder traverses the k-th axis to obtain PlanarEligible[k]. PlanarEligible[k] indicates whether the current point cloud is allowed to use planar mode on the k-th axis. Optional, when the value of k is 0, 1, or 2, it represents the S, T, and V axes. If PlanarEligible[k] is true, the decoder obtains occ_single_plane[k], which indicates whether the current node is allowed to use planar mode on the k-th axis.
  • the decoder may determine the plane mode flag bit occ_plane_pos[k] based on at least one decoded neighbor node of the current node on a plane perpendicular to the k-th axis.
  • Table 5 shows the corresponding relationship between k and Planar axis:
  • the S310 may include:
  • the decoder determines the first index based on the occupied child nodes of the at least one neighbor node.
  • the decoder determines the first index based on the occupied child node of the at least one neighbor node.
  • the decoder is based on the occupied child nodes of the at least one neighbor node. , determine the first index.
  • the decoder determines the first index based on occupied child nodes of the at least one neighbor node.
  • the S310 may include:
  • the decoder determines that the first index is a first value; if the at least one neighbor node If the occupied child nodes are all distributed on the second plane perpendicular to the k-th axis, then the decoder determines that the first index is a second value; otherwise, the decoder determines that the first index is a third value.
  • the first plane may be a high plane
  • the second plane may be a low plane
  • the decoder may determine the first index based on a plane in which an occupied child node of the at least one neighbor node is located. If the occupied child nodes of the at least one neighbor node are distributed in the same plane, the decoder determines the first index based on the same plane; for example, if the same plane is the first plane, then determines the first index The first index is a first numerical value; if the same plane is the second plane, it is determined that the first index is a second numerical value. If the occupied child nodes of the at least one neighbor node are not distributed in the same plane, the first index is determined to be a third value.
  • the decoder first determines whether the occupied child nodes of the at least one neighbor node are all distributed on the first plane. If the occupied child nodes of the at least one neighbor node are all distributed on the first plane, Then the decoder determines that the first index is a first value; if the occupied child nodes of the at least one neighbor node are not all distributed on the first plane, the decoder determines the occupied child nodes of the at least one neighbor node.
  • the decoder determines that the first index is a second value; if the at least one neighbor node If the occupied child nodes of a neighbor node are not all distributed on the second plane, the decoder determines that the first index is a third value.
  • the decoder first determines whether the occupied child nodes of the at least one neighbor node are all distributed on the second plane. If the occupied child nodes of the at least one neighbor node are all distributed on the second plane, The decoder determines that the first index is a second value; if the occupied child nodes of the at least one neighbor node are unevenly distributed on the second plane, the decoder determines the occupied child nodes of the at least one neighbor node.
  • the decoder determines that the first index is a first value; if the at least one neighbor node If the occupied child nodes of a neighbor node are not all distributed on the first plane, the decoder determines that the first index is a third value.
  • the method 300 may further include:
  • the decoder determines a second index of the current node based on a distance between the at least one neighbor node and the current node.
  • the decoder determines the second index to be a fourth value.
  • the decoder determines the first index based on the occupied child node of the at least one neighbor node, the decoder directly determines the second index to be a fourth value.
  • the fourth value is a predefined value.
  • the pre-definition can be achieved by pre-saving corresponding codes, tables or other methods that can be used to indicate relevant information in the device (for example, including a decoder and an encoder).
  • This application describes its specific implementation. No restrictions.
  • the predefined value may refer to the value defined in the protocol.
  • the "protocol" may refer to a standard protocol in the field of coding and decoding technology, which may include, for example, VCC or ECM protocols and other related protocols.
  • the fourth value is 0 and is used to indicate that the distance between the at least one neighbor node and the current node is less than or equal to a preset threshold.
  • the fourth value is 0 and is used to indicate that the at least one neighbor node is within a preset area of the current node.
  • the fourth numerical value can also take other numerical values, and this application does not limit its specific value.
  • the distance between the at least one neighbor node and the current node is a Manhattan distance.
  • the distance between the at least one neighbor node and the current node can also be other types of distance, such as Euclidean distance or Morton distance, and this application does not specify its specific value. limited.
  • the method 300 may further include:
  • the decoder determines the third index based on plane information of a previous decoded node of the current node on a plane perpendicular to the k-th axis.
  • the plane information of the previous decoded node may also refer to the plane perpendicular to the k-th axis where the occupied child nodes of the previous decoded node are located.
  • the S310 may include:
  • the decoder determines that the third index is a first value; if the plane information is not 1, the decoder determines that the third index is a second value.
  • the S310 may include:
  • the decoder determines that the third index is a second value; if the plane information is not 0, the decoder determines that the third index is a first value.
  • the method 300 may further include:
  • the decoder determines a fourth index based on the distance between the previous decoded node and the current node.
  • the decoder may determine the third index based on the plane information of the previous decoded node of the current node on the plane perpendicular to the k-th axis.
  • the decoder may determine the third index based on the previous decoded node. The distance between the decoded node and the current node is determined to determine the fourth index of the current node.
  • the decoder obtains an octree depth layer number n for which planar mode is enabled; based on the n, the fourth index is determined.
  • the current point cloud has octree depth layer continuity enabled in planar mode.
  • the decoder determines that the fourth index is a fifth value; if the previous decoded node If the distance between the node and the current node is greater than 2 n , the decoder determines that the second index of the current node is the sixth value.
  • the fifth value is 0 and is used to indicate that: the distance between the previous decoded node and the current node is less than or equal to a preset threshold, or the previous decoded node is within Within the preset area of the current node; and/or, the sixth value is 1 and is used to indicate that: the distance between the previous decoded node and the current node is greater than the preset threshold, or the The previous decoded node is not within the preset area of the current node.
  • the fifth numerical value or the sixth numerical value can also take other numerical values.
  • the solution of this application only needs to ensure that the fifth numerical value or the sixth numerical value is different. There is no limit to its specific value.
  • the fourth index may be determined by directly comparing the distance between the previous decoded node and the current node with n.
  • the fourth index may be determined by directly comparing the distance between the previous decoded node and the current node with other function values related to n.
  • the distance between the previous decoded node and the current node is a Manhattan distance.
  • the first value is 1, the second value is 0, and the third value is -1.
  • the first value, the second value or the third value can also take other values.
  • the solution of this application only needs to ensure that the first value, the second value
  • the numerical value and the third numerical value only need to be different from each other, and there is no limit to the specific value thereof.
  • the first index is the third value, it indicates that the current node satisfies the plane mode of the second plane. The current node does not satisfy flat mode.
  • the value of k is 0, 1, 2.
  • k when the value of k is 0, 1, or 2, it represents the S, T, and V axes.
  • the decoder may determine the index of the current node on the S-axis based on the occupied child node of at least one decoded neighbor node of the current node on the plane perpendicular to the S-axis, or may also determine the index of the current node on the vertical axis. Determine the index of the current node on the T axis based on the occupied child node of at least one decoded neighbor node on the plane of the T axis. It may also be based on at least one decoded child node of the current node on the plane perpendicular to the V axis. The occupied child nodes of neighbor nodes determine the index of the current node on the V axis.
  • the first index determined by the decoder may include one or more of the index of the current node on the S axis, the index of the current node on the T axis, and the index of the current node on the V axis.
  • the at least one neighbor node includes a decoded node adjacent to the current node.
  • the decoder first determines whether the occupied child nodes of the at least one neighbor node are evenly distributed on the first plane, and then determines whether the occupied child nodes of the at least one neighbor node are evenly distributed on the second plane.
  • the above is an example to illustrate the method of determining the index of the current node.
  • Figure 14 is an example of occupied child nodes of neighbor nodes in the x direction provided by the embodiment of the present application.
  • the decoder determines the first index of the current node based on the occupied child nodes of at least one decoded neighbor node of the current node on a plane perpendicular to the x direction, where the current node is on a plane perpendicular to the x direction.
  • the decoded at least one neighbor node on the direction plane includes neighbor node 1 and neighbor node 2, the occupied child node of neighbor node 1 includes occupied child node 1, and the occupied child node of neighbor node 2 includes occupied child node 2 and occupied child node. 3.
  • the decoder can predict that the first index of the current node is the second value. For example, the decoder can determine the current The first index of a node is 0.
  • Figure 15 is a schematic flow chart of the index determination method 400 provided by the embodiment of the present application. It should be understood that the index determination method 400 can be performed by a decoder. For example, it is applied to the decoding framework 200 shown in FIG. 12 . For the convenience of description, the following takes the decoder as an example.
  • the decoder determines that the occupied child nodes of the two neighbor nodes are distributed on the first plane perpendicular to the k-th axis. on flat surface?
  • the decoder obtains the number n of octree levels with planar mode enabled.
  • the decoder determines that the fourth index of the current node is 1.
  • Figure 15 is only an example of the present application and should not be understood as a limitation of the present application.
  • the decoder may also first determine whether the occupied child nodes of the two neighbor nodes are all distributed on the second plane. If the occupied child nodes of the two neighbor nodes are not all distributed on the second plane, On the second plane, it is then determined whether the occupied child nodes of the two neighbor nodes are all distributed on the first plane; or, the decoder can also determine whether the occupied child nodes of the two neighbor nodes are all distributed on the second plane. plane or the first plane, this application does not specifically limit this.
  • the method 300 may further include:
  • the decoder decodes the current node based on the first index.
  • the decoder may determine the context index of the current node based on the first index, and perform decoding based on the context index of the current node.
  • the decoder may determine the context index of the current node based on the first index and the second index referred to above.
  • the decoder may also determine the context index of the current node based on the third index referred to above.
  • the decoder may determine the context index of the current node based on the third index and the fourth index referred to above.
  • the decoder determines to obtain one or more of the index of the current node on the S axis, the index of the current node on the T axis, and the index of the current node on the V axis, it can Based on one or more of the index of the current node on the S axis, the index of the current node on the T axis, and the index of the current node on the V axis, determine the context index of the current node, and based on The context index of the current node decodes the current node.
  • the arithmetic decoder for arithmetic decoding of the current node can be determined based on the context index of the current node; and the arithmetic decoder for the current node can be determined based on the determined arithmetic decoder. Perform arithmetic decoding to obtain the geometric information of the current node.
  • the first index, the second index, the third index and the fourth index may all be intermediate indexes used to determine the context index of the current node or Intermediate variables.
  • the first index and the third index may be called one type of index, or the first index and the third index may be merged into 1 index, for example This can be called a flat index.
  • the plane index is determined based on the occupied child nodes of the at least one neighbor node; if the at least one neighbor node is empty, the plane index is determined based on the current node in The plane information of the previous decoded node on the plane perpendicular to the k-th axis determines this plane index.
  • the second index and the fourth index may be called one type of index, or the second index and the fourth index may also be merged into one index. , which can be called a distance index, for example.
  • a distance index for example. For example, if the at least one neighbor node is non-empty or the decoder determines the first index mentioned above based on the occupied child node of the at least one neighbor node, the decoder determines the distance index as a preset Value (for example, 0); if the at least one neighbor node is empty or the decoder determines the above-mentioned k-th node based on the plane information of the previous decoded node of the current node on the plane perpendicular to the k-th axis. three indexes, the decoder determines this distance index based on the distance between the previous decoded node and the current node.
  • a preset Value for example, 0
  • first index, second index, third index and fourth index are only used to distinguish indexes from each other, and there is no limit to the number, type, etc. of indexes. That is, it does not limit the scope of the embodiments of the present application.
  • Determining the context index of the occ_plane_pos[k] flag bit uses the information of the occupied child nodes of the previous decoded node qualified for the plane coding mode or the neighbor node in the plane perpendicular to the k-th axis, including:
  • the plane perpendicular to the k-th axis of the encoding node is identified by its position along the axis modulo 2 14 .
  • PlanarNodeAxisLoc[k] represents the plane perpendicular to the k-th axis of the current node, which is obtained based on the position coordinates of the current node under the octree at the current level.
  • ManhattanDist[k] represents the Manhattan distance of the current node from the coordinate origin on the plane perpendicular to the k-th axis, which is obtained by adding the coordinate values on the plane perpendicular to the k-th axis:
  • k and axisLoc can determine the position of the plane perpendicular to the k-th axis:
  • PrevManhattanDist[k][axisLoc] represents the Manhattan distance of the previous encoded and decoded node qualified for plane encoding mode from the coordinate origin on the plane perpendicular to the k-th axis;
  • PrevOccSinglePlane[k][axisLoc] indicates whether the previous encoded and decoded node qualified for the plane encoding mode satisfies the plane encoding mode;
  • PrevOccPlanePos represents the plane position of the previous encoded and decoded node that is qualified for plane encoding mode.
  • the state shall be updated for each planar-eligible axis:
  • Contextualization of occ_plane_pos[k]for nodes not eligible for angular contextualization(AngularEligible is 0) is specified by the expression CtxIdxPlanePos.
  • the context index of the plane coding mode flag bit occ_plane_pos[k] is determined as follows:
  • the context index of occ_plane_pos[k] is determined by the first index neighPlanePosCtxInc and the second index neighDistCtxInc; Otherwise, the context index of occ_plane_pos[k] is determined by the third index prevPlanePosCtxInc and the fourth index prevDistCtxInc.
  • the value of neighDistCtxInc (second index) is 0;
  • prevDistCtxInc (fourth index) is determined by the Manhattan distance between the previous encoded and decoded node and the current node.
  • prevDistCtxInc Abs(ab)>2 numEiigiblePlanarLeveis
  • numEligiblePlanarLevels is differentiated by the number of layers with planar mode enabled.
  • neighPlanePosCtxInc (first index) is determined by the occupied child nodes of at least one neighbor node.
  • prevPlanePosCtxInc (third index) is the occupied plane position (first plane, second plane) through the previous encoded and decoded node qualified for plane encoding mode.
  • the index determination method according to the embodiment of the present application is described in detail from the perspective of the decoder above.
  • the index determination method according to the embodiment of the present application will be described from the perspective of the encoder with reference to FIG. 16 below.
  • Figure 16 is a schematic flow chart of the index determination method 500 provided by the embodiment of the present application. It should be understood that the index determination method 500 may be performed by an encoder. For example, it is applied to the coding framework 100 shown in FIG. 4 . For ease of description, the following uses an encoder as an example.
  • the index determination method 500 may include:
  • S510 Determine the first index of the current node based on the occupied child nodes of at least one encoded neighbor node of the current node on a plane perpendicular to the k-th axis.
  • the S510 may include:
  • the first index is determined based on the occupied child nodes of the at least one neighbor node.
  • the first index is determined to be a first value; if the at least one If the occupied child nodes of the neighbor nodes are all distributed on the second plane perpendicular to the k-th axis, then the first index is determined to be the second value; otherwise, the first index is determined to be the third value.
  • the method 500 may further include:
  • a second index of the current node is determined based on a distance between the at least one neighbor node and the current node.
  • the second index is determined to be a fourth value.
  • the fourth value when the fourth value is 0, it indicates that: the distance between the at least one neighbor node and the current node is less than or equal to a preset threshold, or the at least one neighbor node is within the range of the current node. within the preset area.
  • the distance between the at least one neighbor node and the current node is a Manhattan distance.
  • the method 500 may further include:
  • the third index is determined based on the plane information of the previous encoded node of the current node on the plane perpendicular to the k-th axis.
  • the third index is determined to be a first value; if the plane information is not 1, the third index is determined to be a second value.
  • the method 500 may further include:
  • a fourth index of the current node is determined based on the distance between the previous encoded node and the current node.
  • a number n of octree depth levels with planar mode enabled is obtained; based on the n, the fourth index is determined.
  • the fourth index is determined to be a fifth value; if the distance between the previous encoded node and the current node is If the distance between the current nodes is greater than 2 n , the fourth index is determined to be the sixth value.
  • the distance between the previous encoded node and the current node is a Manhattan distance.
  • the first value is 1, the second value is 0, and the third value is -1.
  • the value of k is 0, 1, 2.
  • the method 300 may further include:
  • the current node is encoded based on its first index.
  • the size of the sequence numbers of the above-mentioned processes does not mean the order of execution.
  • the execution order of each process should be determined by its functions and internal logic, and should not be used in this application.
  • the implementation of the examples does not constitute any limitations.
  • Figure 17 is a schematic block diagram of the index determination device 600 according to the embodiment of the present application.
  • the index determination device 600 may include:
  • the determining unit 610 is configured to determine the first index of the current node based on the occupied child nodes of at least one decoded neighbor node of the current node on a plane perpendicular to the k-th axis.
  • the determining unit 610 is specifically used to:
  • the first index is determined based on the occupied child nodes of the at least one neighbor node.
  • the determining unit 610 is specifically used to:
  • the occupied child nodes of the at least one neighbor node are all distributed on the first plane perpendicular to the k-th axis, then determine the first index to be a first value;
  • the occupied child nodes of the at least one neighbor node are all distributed on the second plane perpendicular to the k-th axis, then determine the first index to be the second value;
  • the first index is determined to be a third value.
  • the determining unit 610 is also used to:
  • a second index of the current node is determined based on a distance between the at least one neighbor node and the current node.
  • the determining unit 610 is specifically used to:
  • the second index is determined to be a fourth value.
  • the fourth value when the fourth value is 0, it indicates that: the distance between the at least one neighbor node and the current node is less than or equal to a preset threshold, or the at least one neighbor node is within the range of the current node. within the preset area.
  • the distance between the at least one neighbor node and the current node is a Manhattan distance.
  • the determining unit 610 is also used to:
  • the third index is determined based on the plane information of the previous decoded node of the current node on the plane perpendicular to the k-th axis.
  • the determining unit 610 is specifically used to:
  • the third index is determined to be the second value.
  • the determining unit 610 is also used to:
  • a fourth index of the current node is determined based on the distance between the previous decoded node and the current node.
  • the determining unit 610 is specifically used to:
  • the fourth index is determined.
  • the determining unit 610 is specifically used to:
  • the fourth index is determined to be a sixth value.
  • the distance between the previous decoded node and the current node is a Manhattan distance.
  • the first value is 1, the second value is 0, and the third value is -1.
  • the value of k is 0, 1, 2.
  • the determining unit 610 is also used to:
  • the current node is decoded based on the first index of the current node.
  • Figure 18 is a schematic block diagram of the index determination device 700 according to the embodiment of the present application.
  • the index determination device 700 may include:
  • the determining unit 710 is configured to determine the first index of the current node based on the occupied child nodes of at least one encoded neighbor node of the current node on a plane perpendicular to the k-th axis.
  • the determining unit 710 is specifically used to:
  • the first index is determined based on the occupied child nodes of the at least one neighbor node.
  • the determining unit 710 is specifically used to:
  • the occupied child nodes of the at least one neighbor node are all distributed on the first plane perpendicular to the k-th axis, then determine the first index to be a first value;
  • the occupied child nodes of the at least one neighbor node are all distributed on the second plane perpendicular to the k-th axis, then determine the first index to be the second value;
  • the first index is determined to be a third value.
  • the determining unit 710 is also used to:
  • a second index of the current node is determined based on a distance between the at least one neighbor node and the current node.
  • the determining unit 710 is specifically used to:
  • the second index is determined to be a fourth value.
  • the fourth value when the fourth value is 0, it indicates that: the distance between the at least one neighbor node and the current node is less than or equal to a preset threshold, or the at least one neighbor node is within the range of the current node. within the preset area.
  • the distance between the at least one neighbor node and the current node is a Manhattan distance.
  • the determining unit 710 is also used to:
  • the third index is determined based on the plane information of the previous encoded node of the current node on the plane perpendicular to the k-th axis.
  • the determining unit 710 is specifically used to:
  • the third index is determined to be the second value.
  • the determining unit 710 is also used to:
  • a fourth index of the current node is determined based on the distance between the previous encoded node and the current node.
  • the determining unit 710 is specifically used to:
  • the fourth index is determined.
  • the determining unit 710 is specifically used to:
  • the fourth index is determined to be a sixth value.
  • the distance between the previous encoded node and the current node is a Manhattan distance.
  • the first value is 1, the second value is 0, and the third value is -1.
  • the value of k is 0, 1, 2.
  • the determining unit 710 is also used to:
  • the current node is encoded based on its first index.
  • the device embodiments and the method embodiments may correspond to each other, and similar descriptions may refer to the method embodiments. To avoid repetition, they will not be repeated here.
  • the index determination device 600 shown in FIG. 17 may correspond to the corresponding subject in executing the method 300 of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the index determination device 600 are respectively to implement the method. 300 and other corresponding processes in each method.
  • the index determination device 700 shown in Figure 18 may correspond to the corresponding subject in performing the method 500 of the embodiment of the present application, that is, the aforementioned and other operations and/or functions of each unit in the index determination device 700 are respectively to implement the method 500 and other aspects. The corresponding process in the method.
  • each unit in the index determination device 600 or the index determination device 700 involved in the embodiment of the present application can be separately or entirely combined into one or several other units to form, or one (some) of the units can also be It is then divided into multiple functionally smaller units to form a structure, which can achieve the same operation without affecting the realization of the technical effects of the embodiments of the present application.
  • the above units are divided based on logical functions. In practical applications, the function of one unit can also be realized by multiple units, or the functions of multiple units can be realized by one unit. In other embodiments of the present application, the index determination device 600 or the index determination device 700 may also include other units. In practical applications, these functions may also be implemented with the assistance of other units, and may be implemented by multiple units in cooperation.
  • a general-purpose computing device including a general-purpose computer including processing elements and storage elements such as a central processing unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), etc.
  • Run a computer program capable of executing each step involved in the corresponding method to construct the index determination device 600 or the index determination device 700 involved in the embodiment of the present application, and to implement the encoding method or decoding of the embodiment of the present application.
  • the computer program can be recorded on, for example, a computer-readable storage medium, loaded into an electronic device through the computer-readable storage medium, and run therein to implement the corresponding methods of the embodiments of the present application.
  • the units mentioned above can be implemented in the form of hardware, can also be implemented in the form of instructions in the form of software, or can be implemented in the form of a combination of software and hardware.
  • each step of the method embodiments in the embodiments of the present application can be completed by integrated logic circuits of hardware in the processor and/or instructions in the form of software.
  • the steps of the methods disclosed in conjunction with the embodiments of the present application can be directly embodied in hardware.
  • the execution of the decoding processor is completed, or the execution is completed using a combination of hardware and software in the decoding processor.
  • the software can be located in a mature storage medium in this field such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, register, etc.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps in the above method embodiment in combination with its hardware.
  • FIG. 19 is a schematic structural diagram of an electronic device 800 provided by an embodiment of the present application.
  • the electronic device 800 at least includes a processor 810 and a computer-readable storage medium 820 .
  • the processor 810 and the computer-readable storage medium 820 may be connected through a bus or other means.
  • the computer-readable storage medium 820 is used to store a computer program 821.
  • the computer program 821 includes computer instructions.
  • the processor 810 is used to execute the computer instructions stored in the computer-readable storage medium 820.
  • the processor 810 is the computing core and the control core of the electronic device 800. It is suitable for implementing one or more computer instructions. Specifically, it is suitable for loading and executing one or more computer instructions to implement the corresponding method flow or corresponding functions.
  • the processor 810 may also be called a central processing unit (Central Processing Unit, CPU).
  • the processor 810 may include, but is not limited to: a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the computer-readable storage medium 820 can be a high-speed RAM memory, or a non-volatile memory (Non-Volatile Memory), such as at least one disk memory; optionally, it can also be at least one located far away from the aforementioned processor 810 Computer-readable storage media.
  • computer-readable storage medium 820 includes, but is not limited to: volatile memory and/or non-volatile memory.
  • non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory.
  • Volatile memory may be Random Access Memory (RAM), which is used as an external cache.
  • RAM Random Access Memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • Enhanced SDRAM, ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous link dynamic random access memory
  • Direct Rambus RAM Direct Rambus RAM
  • the electronic device 800 may be an encoder or a coding framework related to the embodiment of the present application; the computer-readable storage medium 820 stores first computer instructions; the computer-readable instructions are loaded and executed by the processor 810 The first computer instructions stored in the storage medium 820 are used to implement the corresponding steps in the encoding method provided by the embodiment of the present application; in other words, the first computer instructions in the computer-readable storage medium 820 are loaded by the processor 810 and execute the corresponding steps, To avoid repetition, they will not be repeated here.
  • the electronic device 800 may be the decoder or decoding framework involved in the embodiment of the present application; the computer-readable storage medium 820 stores second computer instructions; the computer-readable instructions are loaded and executed by the processor 810 The second computer instructions stored in the storage medium 820 are used to implement the corresponding steps in the decoding method provided by the embodiment of the present application; in other words, the second computer instructions in the computer-readable storage medium 820 are loaded by the processor 810 and execute the corresponding steps, To avoid repetition, they will not be repeated here.
  • embodiments of the present application also provide a coding and decoding system, including the above-mentioned encoder and decoder.
  • embodiments of the present application also provide a computer-readable storage medium (Memory).
  • the computer-readable storage medium is a memory device in the electronic device 800 and is used to store programs and data.
  • computer-readable storage medium 820 may include a built-in storage medium in the electronic device 800 , and of course may also include an extended storage medium supported by the electronic device 800 .
  • the computer-readable storage medium provides storage space that stores the operating system of the electronic device 800 .
  • one or more computer instructions suitable for being loaded and executed by the processor 810 are also stored in the storage space. These computer instructions may be one or more computer programs 821 (including program codes).
  • a computer program product or computer program is provided, the computer program product or computer program including computer instructions stored in a computer-readable storage medium.
  • computer program 821 the data processing device 800 can be a computer.
  • the processor 810 reads the computer instructions from the computer-readable storage medium 820.
  • the processor 810 executes the computer instructions, so that the computer executes the encoding method provided in the above various optional ways. or decoding method.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transmitted from a website, computer, server, or data center to Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) methods.
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请实施例提供了一种索引确定方法、装置、解码器以及编码器,涉及编解码技术领域,本申请基于当前节点在垂直于第k轴的平面上的已编码的至少一个邻居节点的占据子节点,确定所述当前节点的第一索引,能够更好更细致的利用当前节点与所述至少一个邻居节点之间的空间相关性确定当前节点的第一索引,提升了针对第一索引的准确度,进而提升解码性能。

Description

索引确定方法、装置、解码器以及编码器 技术领域
本申请实施例涉及编解码技术领域,并且更具体地,涉及索引确定方法、装置、解码器以及编码器。
背景技术
点云已经开始普及到各个领域,例如,虚拟/增强现实、机器人、地理信息系统、医学领域等。随着扫描设备的基准度和速率的不断提升,可以准确地获取物体表面的大量点云,往往一个场景下就可以对应几十万个点。数量如此庞大的点也给计算机的存储和传输带来了挑战。因此,对点的压缩也就成为一个热点问题。
对于点云的压缩来说,主要需要压缩其位置信息和属性信息。具体而言,编码器先通过对点云的位置信息进行八叉树划分得到划分后的节点,然后对待编码的当前节点进行算数编码以得到几何码流;同时,编码器根据八叉树划分后的当前点的位置信息在已编码的点中选择出用于预测当前点属性信息的预测值的点后,基于选择出的点对其属性信息进行预测,再通过与属性信息的原始值进行做差的方式来编码属性信息以得到点云的属性码流。
在算数编码过程中,编码器可利用待编码的当前节点与周围节点的空间相关性,对占位比特进行帧内预测(intra prediction)得到当前节点的索引,并基于当前节点的索引进行算数编码,以实现基于上下文模型的自适应二进制算术编码(Context-based Adaptive Binary Arithmetic Coding,CABAC)进而得到几何码流。
但是,相关技术中确定当前节点的索引时,其准确度较低,进而降低了编解码性能。
发明内容
本申请实施例提供了一种索引确定方法、装置、解码器以及编码器,能够提升针对当前节点的索引的准确度,进而提升解码性能。
第一方面,本申请提供了一种索引确定方法,包括:
基于当前节点在垂直于第k轴的平面上的已解码的至少一个邻居节点的占据子节点,确定所述当前节点的第一索引。
第二方面,本申请提供了一种索引确定方法,包括:
基于当前节点在垂直于第k轴的平面上的已编码的至少一个邻居节点的占据子节点,确定所述当前节点的第一索引。
第三方面,本申请提供了一种索引确定装置,包括:
确定单元,用于基于当前节点在垂直于第k轴的平面上的已解码的至少一个邻居节点的占据子节点,确定所述当前节点的第一索引。
第四方面,本申请提供了一种索引确定装置,包括:
确定单元,用于基于当前节点在垂直于第k轴的平面上的已编码的至少一个邻居节点的占据子节点,确定所述当前节点的第一索引。
第五方面,本申请提供了一种解码器,包括:
处理器,适于实现计算机指令;以及,
计算机可读存储介质,计算机可读存储介质存储有计算机指令,计算机指令适于由处理器加载并执行上述第一方面或其各实现方式中的解码方法。
在一种实现方式中,该处理器为一个或多个,该存储器为一个或多个。
在一种实现方式中,该计算机可读存储介质可以与该处理器集成在一起,或者该计算机可读存储介质与处理器分离设置。
第六方面,本申请提供了一种编码器,包括:
处理器,适于实现计算机指令;以及,
计算机可读存储介质,计算机可读存储介质存储有计算机指令,计算机指令适于由处理器加载并执行上述第二方面或其各实现方式中的编码方法。
在一种实现方式中,该处理器为一个或多个,该存储器为一个或多个。
在一种实现方式中,该计算机可读存储介质可以与该处理器集成在一起,或者该计算机可读存储介质与处理器分离设置。
第七方面,本申请提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机指令,该计算机指令被计算机设备的处理器读取并执行时,使得计算机设备执行上述第一方面涉及的解码方法或 上述第二方面涉及的编码方法。
第八方面,本申请提供了一种码流,该码流上述第一方面中涉及的码流或上述第二方面中涉及的码流。
基于以上技术方案,本申请基于当前节点在垂直于第k轴的平面上的已编码的至少一个邻居节点的占据子节点,确定所述当前节点的第一索引,能够更好更细致的利用当前节点与所述至少一个邻居节点之间的空间相关性预测当前节点的第一索引,提升了针对第一索引的准确度,进而提升解码性能。
附图说明
图1是本申请实施例提供的点云图像的示例。
图2是图1所示的点云图像的局部放大图。
图3是本申请实施例提供的具有的六个观看角度的点云图像的示例。
图4是本申请实施例提供的编码框架的示意性框图。
图5是本申请实施例提供的包围盒的示例。
图6是本申请实施例提供的对包围盒进行八叉树划分的示例。
图7至图9示出了莫顿码在二维空间中的排列顺序。
图10示出了莫顿码在三维空间中的排列顺序。
图11是本申请实施例提供的LOD层的示意性框图。
图12是本申请实施例提供的解码框架的示意性框图。
图13是本申请实施例提供的索引确定方法的示意性流程图。
图14是本申请实施例提供的在S轴上的邻居节点的占据子节点的示例。
图15是本申请实施例提供的索引确定方法的另一示意性流程图。
图16是本申请实施例提供的索引确定方法的再一示意性流程图。
图17是本申请实施例提供的索引确定装置的示意性框图。
图18是本申请实施例提供的索引确定装置的另一示意性框图。
图19是本申请实施例提供的电子设备的示意性框图。
具体实施方式
下面将结合附图,对本申请实施例中的技术方案进行描述。
点云(Point Cloud)是空间中一组无规则分布的、表达三维物体或三维场景的空间结构及表面属性的离散点集。图1和图2分别示出了三维点云图像和局部放大图,可以看到点云表面是由分布稠密的点所组成的。
二维图像在每一个像素点均有信息表达,因此不需要额外记录其位置信息;然而点云中的点在三维空间中的分布具有随机性和不规则性,因此需要记录每一个点在空间中的位置,才能完整地表达一幅点云。与二维图像类似,点云中的每一个点均有对应的属性信息,通常为RGB颜色值,颜色值反映物体的色彩;对于点云来说,每一个点所对应的属性信息除了颜色以外,还可以是反射率(reflectance)值,反射率值反映物体的表面材质。点云中每个点可以包括几何信息和属性信息,其中,点云中每个点的几何信息是指该点的笛卡尔三维坐标数据,点云中每个点的属性信息可以包括但不限于以下至少一种:颜色信息、材质信息、激光反射强度信息。颜色信息可以是任意一种色彩空间上的信息。例如,颜色信息可以是红绿蓝(Red Green Blue,RGB)信息。再如,颜色信息还可以是亮度色度(YCbCr,YUV)信息。其中,Y表示明亮度(Luma),Cb(U)表示蓝色色度分量,Cr(V)表示红色色度分量。点云中的每个点都具有相同数量的属性信息。例如,点云中的每个点都具有颜色信息和激光反射强度两种属性信息。再如,点云中的每个点都具有颜色信息、材质信息和激光反射强度信息三种属性信息。
点云图像可具有的多个观看角度,例如,如图3所示的点云图像可具有的六个观看角度,点云图像对应的数据存储格式由文件头信息部分和数据部分组成,头信息包含了数据格式、数据表示类型、点云总点数、以及点云所表示的内容。
点云可以灵活方便地表达三维物体或场景的空间结构及表面属性,并且由于点云通过直接对真实物体采样获得,在保证精度的前提下能提供极强的真实感,因而应用广泛,其范围包括虚拟现实游戏、计算机辅助设计、地理信息系统、自动导航系统、数字文化遗产、自由视点广播、三维沉浸远程呈现、生物组织器官三维重建等。
示例性地,可以基于应用场景可以将点云划分为两大类别,即机器感知点云和人眼感知点云。机器感知点云的应用场景包括但不限于:自主导航系统、实时巡检系统、地理信息系统、视觉分拣机器人、抢险救灾机器人等点云应用场景。人眼感知点云的应用场景包括但不限于:数字文化遗产、自由视点广 播、三维沉浸通信、三维沉浸交互等点云应用场景。相应的,可以基于点云的获取方式,将点云划分为密集型点云和稀疏型点云;也可基于点云的获取途径将点云划分为静态点云和动态点云,更具体可划分为三种类型的点云,即第一静态点云、第二类动态点云以及第三类动态获取点云。针对第一静态点云,物体是静止的,且获取点云的设备也是静止的;针对第二类动态点云,物体是运动的,但获取点云的设备是静止的;针对第三类动态获取点云,获取点云的设备是运动的。
示例性地,点云的采集途径包括但不限于:计算机生成、3D激光扫描、3D摄影测量等。计算机可以生成虚拟三维物体及场景的点云;3D激光扫描可以获得静态现实世界三维物体或场景的点云,每秒可以获取百万级点云;3D摄影测量可以获得动态现实世界三维物体或场景的点云,每秒可以获取千万级点云。具体而言,可通过光电雷达、激光雷达、激光扫描仪、多视角相机等采集设备,可以采集得到物体表面的点云。根据激光测量原理得到的点云,其可以包括点的三维坐标信息和点的激光反射强度(reflectance)。根据摄影测量原理得到的点云,其可以可包括点的三维坐标信息和点的颜色信息。结合激光测量和摄影测量原理得到点云,其可以可包括点的三维坐标信息、点的激光反射强度(reflectance)和点的颜色信息。这些技术降低了点云数据获取成本和时间周期,提高了数据的精度。例如,在医学领域,由磁共振成像(magnetic resonance imaging,MRI)、计算机断层摄影(computed tomography,CT)、电磁定位信息,可以获得生物组织器官的点云。这些技术降低了点云的获取成本和时间周期,提高了数据的精度。点云数据获取方式的变革,使大量点云数据的获取成为可能,伴随着应用需求的增长,海量3D点云数据的处理遭遇存储空间和传输带宽限制的瓶颈。
以帧率为30fps(帧每秒)的点云视频为例,每帧点云的点数为70万,其中,每一帧点云中的每一个点具有坐标信息xyz(float)和颜色信息RGB(uchar),则10s长度的点云视频的数据量大约为0.7百万(million)×(4Byte×3+1Byte×3)×30fps×10s=3.15GB,而YUV采样格式为4:2:0,帧率为24fps的1280×720二维视频,其10s的数据量约为1280×720×12bit×24frames×10s≈0.33GB,10s的两视角3D视频的数据量约为0.33×2=0.66GB。由此可见,点云视频的数据量远超过相同时长的二维视频和三维视频的数据量。因此,为更好地实现数据管理,节省服务器存储空间,降低服务器与客户端之间的传输流量及传输时间,点云压缩成为促进点云产业发展的关键问题。
点云压缩一般采用点云几何信息和属性信息分别压缩的方式,在编码端,首先在几何编码器中编码点云几何信息,然后将重建几何信息作为附加信息输入到属性编码器中,以辅助点云的属性压缩;在解码端,首先在几何解码器中解码点云几何信息,然后将解码后的几何信息作为附加信息输入到属性解码器中,辅助点云的属性压缩。整个编解码器由预处理/后处理、几何编码/解码、属性编码/解码几部分组成。
示例性地,点云可通过各种类型的编码框架和解码框架分别进行编码和解码。作为示例,编解码框架可以是运动图象专家组(Moving Picture Experts Group,MPEG)提供的几何点云压缩(Geometry Point Cloud Compression,G-PCC)编解码框架或视频点云压缩(Video Point Cloud Compression,V-PCC)编解码框架,也可以是音视频编码标准(Audio Video Standard,AVS)专题组提供的AVS-PCC编解码框架或点云压缩参考平台(PCRM)框架。G-PCC编解码框架可用于针对第一静态点云和第三类动态获取点云进行压缩,V-PCC编解码框架可用于针对第二类动态点云进行压缩。G-PCC编解码框架也称为点云编解码器TMC13,V-PCC编解码框架也称为点云编解码器TMC2。G-PCC及AVS-PCC均针对静态的稀疏型点云,其编码框架大致相同。
下面以G-PCC框架为例对本申请实施例可适用的编解码框架进行说明。
在G-PCC编码框架中,先将输入点云进行切片(slice)划分后,然后对划分得到的切片进行独立编码。在切片中,点云的几何信息和点云中的点所对应的属性信息是分开进行编码的。G-PCC编码框架首先对几何信息进行编码;具体地,先对几何信息进行坐标转换,使点云全都包含在一个包围盒(bounding box)中;然后再进行量化,这一步量化主要起到缩放的作用,由于量化取整,使得一部分点的几何信息相同,根据参数来决定是否移除重复点,量化和移除重复点这一过程又被称为体素化过程。接下来,对包围盒进行基于八叉树(octree)的划分。其中根据八叉树划分层级深度的不同,几何信息的编码又分为基于八叉树的几何信息编码框架和基于三角面片集(triangle soup,trisoup)的几何信息编码框架。
在基于八叉树的几何信息编码框架中,先将包围盒八等分为8个子立方体,并记录子立方体的占位比特(1为非空,0为空),对非空的子立方体继续进行八等分,通常划分得到的叶子节点为1x1x1的单位立方体时停止划分。在这个过程中,利用节点与周围节点的空间相关性,对占位比特进行帧内预测(intra prediction),并基于预测结果选择相应的二进制算数编码器进行算数编码,以实现基于上下文模型的自适应二进制算术编码(Context-based Adaptive Binary Arithmetic Coding,CABAC)并生成二进制码流。
在基于三角面片集的几何信息编码框架中,同样也要先进行八叉树划分,但区别于基于八叉树的几何信息编码框架,基于三角面片集的几何信息编码框架不需要将点云逐级划分到边长为1x1x1的单位立方体,而是划分到块(block)边长为W时停止划分,基于每个块中点云的分布所形成的表面,得到该表面与块的十二条边所产生的至多十二个交点(vertex),然后依次编码每个块的交点的坐标并生成二进制码流。
G-PCC编码框架在完成几何信息编码后对几何信息进行重建,并使用重建的几何信息对点云的属性信息进行编码。点云的属性编码主要是对点云中点的颜色信息进行编码。首先,G-PCC编码框架可以对点的颜色信息进行颜色空间转换,例如,当输入点云中点的颜色信息使用RGB颜色空间表示时,G-PCC编码框架可以将颜色信息从RGB颜色空间转换到YUV颜色空间。然后,G-PCC编码框架利用重建的几何信息对点云重新着色,使得未编码的属性信息与重建的几何信息对应起来。在颜色信息编码中,主要有两种变换方法,一种方法是依赖于细节层(Level of Detail,LOD)划分的基于距离的提升变换,另一种方法是直接进行区域自适应分层变换(Region Adaptive Hierarchal Transform,RAHT),这两种方法都会将颜色信息从空间域变换到频域,得到高频系数和低频系数,最后对系数进行量化和编码,并生成二进制码流。
图4是本申请实施例提供的编码框架的示意性框图。
如图4所示,编码框架100可以从采集设备获取点云的位置信息和属性信息。点云的编码包括位置编码和属性编码。在一个实施例中,位置编码的过程包括:对原始点云进行坐标变换、量化去除重复点等预处理;构建八叉树后进行编码形成几何码流。
如图4所示,编码器的位置编码过程可通过以下单元实现:
坐标变换(Tanmsform coordinates)单元101、量化和移除重复点(Quantize and remove points)单元102、八叉树分析(Analyze octree)单元103、几何重建(Reconstruct geometry)单元104以及第一算术编码(Arithmetic encode)单元105。
坐标变换单元101可用于将点云中点的世界坐标变换为相对坐标。例如,点的几何坐标分别减去xyz坐标轴的最小值,相当于去直流操作,以实现将点云中的点的坐标从世界坐标变换为相对坐标,并使点云全都包含在一个包围盒(bounding box)中。量化和移除重复点单元102可通过量化减少坐标的数目;量化后原先不同的点可能被赋予相同的坐标,基于此,可通过去重操作将重复的点删除;例如,具有相同量化位置和不同属性信息的多个云可通过属性变换合并到一个云中。在本申请的一些实施例中,量化和移除重复点单元102为可选的单元模块。八叉树分析单元103可利用八叉树(octree)编码方式编码量化的点的位置信息。例如,将点云按照八叉树的形式进行规则化处理,由此,点的位置可以和八叉树的位置一一对应,通过统计八叉树中有点的位置,并将其标识(flag)记为1,以进行几何编码。第一算术编码单元105可以采用熵编码方式对八叉树分析单元103输出的位置信息进行算术编码,即将八叉树分析单元103输出的位置信息利用算术编码方式生成几何码流;几何码流也可称为几何比特流(geometry bit stream)。
下面对点云的规则化处理方法进行说明。
由于点云在空间中无规则分布的特性,给编码过程带来挑战,因此采用递归八叉树的结构,将点云中的点规则化地表达成立方体的中心。例如如图5所示,可以将整幅点云放置在一个正方体包围盒内,此时点云中点的坐标可以表示为(x k,y k,z k),k=0,…,K-1,其中K是点云的总点数,则点云在x轴、y轴以及z轴方向上的边界值分别为:
x min=min(x 0,x 1,…,x K-1);
y min=min(y 0,y 1,…,y K-1);
z min=min(z 0,z 1,…,z K-1);
x max=max(x 0,x 1,…,x K-1);
y max=max(y 0,y 1,…,y K-1);
z max=max(z 0,z 1,…,z K-1)。
此外,包围盒的原点(x origin,y origin,z origin)可以计算如下:
x origin=int(floor(x min));
y origin=int(floor(y min));
z origin=int(floor(z min))。
其中,floor()表示向下取整计算或向下舍入计算。int()表示取整运算。
基于此,编码器可以基于边界值和原点的计算公式,计算包围盒在x轴、y轴以及z轴方向上的尺寸如下:
BoudingBoxSize_x=int(x max-x origin)+1;
BoudingBoxSize_y=int(y max-y origin)+1;
BoudingBoxSize_z=int(z max-z origin)+1。
如图6所示,编码器得到包围盒在x轴、y轴以及z轴方向上的尺寸后,首先对包围盒进行八叉树划分,每次得到八个子块,然后对子块中的非空块(包含点的块)进行再一次的八叉树划分,如此递归划分直到某个深度,将最终大小的非空子块称作体素(voxel),每一个voxel中包含一个或多个点,将这些点的几何位置归一化为voxel的中心点,该中心点的属性值取voxel中所有点的属性值的平均值。将点云规则化为空间中的块,有利于描述点云中点与点之前的位置关系,进而有利于设计特定的编码顺序,基于此编码器可基于确定的编码顺序编码每一个体素(voxel),即编码每一个体素所代表的点(或称“节点”)。
编码器几何编码完成后对几何信息进行重建,利用重建的几何信息来对属性信息进行编码。属性编码过程包括:通过给定输入点云的位置信息的重建信息和属性信息的真实值,选择三种预测模式的一种进行点云预测,对预测后的结果进行量化,并进行算术编码形成属性码流。
如图4所示,编码器的属性编码过程可通过以下单元实现:
颜色空间变换(Transform colors)单元110、属性变换(Transfer attributes)单元111、区域自适应分层变换(Region Adaptive Hierarchical Transform,RAHT)单元112、预测变化(predicting transform)单元113以及提升变化(lifting transform)单元114、量化(Quantize)单元115以及第二算术编码单元116。
颜色空间变换单元110可用于将点云中点的RGB色彩空间变换为YCbCr格式或其他格式。属性变换单元111可用于变换点云中点的属性信息,以最小化属性失真。例如,在几何有损编码的情况下,由于几何信息在几何编码之后有所异动,因此需要属性变换单元111为几何编码后的每一个点重新分配属性值,使得重建点云和原始点云的属性误差最小。例如,所述属性信息可以是点的颜色信息。属性变换单元111可用于得到点的属性原始值,经过属性变换单元111变换得到点的属性原始值后,可选择任一种预测单元,对点云中的点进行预测。用于对点云中的点进行预测的单元可包括:RAHT 112、预测变化(predicting transform)单元113以及提升变化(lifting transform)单元114中的至少一项。换言之,RAHT 112、预测变化(predicting transform)单元113以及提升变化(lifting transform)单元114中的任一项可用于对点云中点的属性信息进行预测,以得到点的属性预测值,进而可基于点的属性预测值得到点的属性信息的残差值。例如,点的属性信息的残差值可以是点的属性原始值减去点的属性预测值。量化单元115可用于量化点的属性信息的残差值。例如,若所述量化单元115和所述预测变换单元113相连,则所述量化单元115可用于量化所述预测变换单元113输出的点的属性信息的残差值。例如,对预测变换单元113输出的点的属性信息的残差值使用量化步长进行量化,以实现提升系统性能。第二算术编码单元116可使用零行程编码(Zero run length coding)对点的属性信息的残差值进行熵编码,以得到属性码流。所述属性码流可以是比特流信息。
预测变换单元113可用于获取点云的原始顺序(original order)以及基于点云的原始顺序将点云划分为细节层(level of detail,LOD),预测变换单元113获取点云的LOD后,可对LOD中点的属性信息依次进行预测,进而计算得到点的属性信息的残差值,以便后续单元基于点的属性信息的残差值进行后续的量化编码处理。对LOD中的每一个点,基于当前点所在的LOD上的邻居点搜索结果找到位于当前点之前的3个邻居点,然后利用3个邻居点中的至少一个邻居点的属性重建值对当前点进行预测,得到当前点的属性预测值;基于此,可基于当前点的属性预测值和当前点的属性原始值得到当前点的属性信息的残差值。
预测变换单元113获取的点云的原始顺序可以是预测变换单元113对当前点云进行莫顿重排序的得到的排列顺序。编码器通过对当前点云进行重排序可得到当前点云的原始顺序,编码器得到当前点云的原始顺序后,可按照当前点云的原始顺序对点云中的点进行层的划分,以得到当前点云的LOD,进而基于LOD对点云中的点的属性信息进行预测。
图7至图9示出了莫顿码在二维空间中的排列顺序。
如图7所示,编码器在2*2个块形成的二维空间中可以采用“z”字形莫顿排列顺序。如图8所示,编码器在4个2*2个块形成的二维空间中可以采用“z”字形莫顿排列顺序,其中,每个2*2个块形成的二维空间中也可以采用“z”字形莫顿排列顺序,最终可以得到编码器在4*4个块形成的二维空间中采用的莫顿排列顺序。如图9所示,编码器在4个4*4个块形成的二维空间中可以采用“z”字形莫顿排列顺序,其中,每4个2*2个块形成的二维空间以及每个2*2个块形成的二维空间中也可以采用“z”字形莫顿排列顺序,最终可以得到编码器在8*8个块形成的二维空间中采用的莫顿排列顺序。
图10示出了莫顿码在三维空间中的排列顺序。
如图10所示,莫顿排列顺序不仅适用于二维空间,也可以将其扩展到三维空间中,例如图10中展 示了16个点,每个“z”字内部,每个“z”与“z”之间的莫顿排列顺序都是先沿x轴方向编码,再沿y轴,最后沿z轴。
LOD的生成过程包括:根据点云中点的位置信息,获取点与点之间的欧式距离;根据欧式距离,将点分为不同的LOD层。在一个实施例中,可以将欧式距离进行排序后,将不同范围的欧式距离划分为不同的LOD层。例如,可以随机挑选一个点,作为第一LOD层。然后计算剩余点与该点的欧式距离,并将欧式距离符合第一阈值要求的点,归为第二LOD层。获取第二LOD层中点的质心,计算除第一、第二LOD层以外的点与该质心的欧式距离,并将欧式距离符合第二阈值的点,归为第三LOD层。以此类推,将所有的点都归到LOD层中。通过调整欧式距离的阈值,可以使得每层LOD的点的数量是递增的。应理解,LOD层划分的方式还可以采用其它方式,本申请对此不进行限制。需要说明的是,可以直接将点云划分为一个或多个LOD层,也可以先将点云划分为多个点云切块(slice),再将每一个点云切块划分为一个或多个LOD层。例如,可将点云划分为多个点云切块,每个点云切块的点的个数可以在55万-110万之间。每个点云切块可看成单独的点云。每个点云切块又可以划分为多个LOD层,每个LOD层包括多个点。在一个实施例中,可根据点与点之间的欧式距离,进行LOD层的划分。
图11是本申请实施例提供的LOD层的示意性框图。
如图11所示,假设点云包括按照原始顺序(original order)排列的多个点,即P0,P1,P2,P3,P4,P5,P6,P7,P8以及P9,假设可基于点与点之间的欧式距离可将点云划分为3个LOD层,即LOD0、LOD1以及LOD2。其中,LOD0可包括P0,P5,P4以及P2,LOD2可包括P1,P6以及P3,LOD3可包括P9,P8以及P7。此时,LOD0、LOD1以及LOD2可用于形成该点云的基于LOD的顺序(LOD-based order),即P0,P5,P4,P2,P1,P6,P3,P9,P8以及P7。所述基于LOD的顺序可作为该点云的编码顺序。
示例性地,编码器在预测点云中的当前点时,基于当前点所在的LOD上的邻居点搜索结果,创建多个预测变量候选项,即预测模式(predMode)的索引的取值可以为0~3。例如,当使用预测方式对当前点的属性信息进行编码时,编码器先基于当前点所在的LOD上的邻居点搜索结果找到位于当前点之前的3个邻居点,其中索引为0的预测模式指基于3个邻居点与当前点之间的距离将3个邻居点的重建属性值的加权平均值确定为当前点的属性预测值;索引为1的预测模式指将3个邻居点中最近邻居点的属性重建值作为当前点的属性预测值;索引为2的预测模式指将次近邻居点的属性重建值作为当前点的属性预测值;索引为3的预测模式指将3个邻居点中除最近邻居点和次近邻居点之外的邻居点的属性重建值作为当前点的属性预测值;在基于上述各种预测模式得到当前点的属性预测值的候选项后,编码器可以利用率失真优化(Rate distortion optimization,RDO)技术选择最佳的属性预测值,然后对所选的属性预测值进行算术编码。
进一步的,若当前点的预测模式的索引为0,则码流中不需要编码对预测模式的索引进行编码,若是通过RDO选择的预测模式的索引为1,2或3,则码流中需要对所选的预测模式的索引进行编码,即需要将所选的预测模式的索引编码到属性码流。
表1
Figure PCTCN2022087244-appb-000001
如表1所示,当使用预测方式对当前点P2的属性信息进行编码时,索引为0的预测模式指基于邻居点P0、P5以及P4的距离将邻居点P0、P5以及P4的重建属性值的加权平均值确定为当前点P2的属性预测值;索引为1的预测模式指将最近邻居点P4的属性重建值作为当前点P2的属性预测值;索引为2的预测模式指将下一个邻居点P5的属性重建值作为当前点P2的属性预测值;索引为3的预测模式指将下一个邻居点P0的属性重建值作为当前点P2的属性预测值。
下面对RDO技术进行示例性说明。
编码器先对当前点的至少一个邻居点计算其属性的最大差异maxDiff,将maxDiff与设定的阈值进行比较,如果小于设定的阈值则使用邻居点属性值加权平均的预测模式;否则对该点使用RDO技术选择最优预测模式。具体地,编码器计算当前点的至少一个邻居点的属性最大差异maxDiff,例如首先计算当前点的至少一个邻居点在R分量上的最大差异,即max(R1,R2,R3)-min(R1,R2,R3);类似的,编码器计算当前点的至少一个邻居点在G以及B分量上的最大差异,即max(G1,G2,G3)-min(G1,G2,G3)以 及max(B1,B2,B3)-min(B1,B2,B3),然后选择R、G、B分量中的最大差异值作为maxDiff,即maxDiff=max(max(R1,R2,R3)-min(R1,R2,R3),max(G1,G2,G3)-min(G1,G2,G3),max(B1,B2,B3)-min(B1,B2,B3));编码器将得到的maxDiff与设定的阈值比较,若小于设定的阈值则当前点的预测模式设为0,即predMode=0;若大于或等于设定的阈值,则编码器对当前点可以使用RDO技术确定当前点使用的预测模式。对于RDO技术,编码器可以对当前点的每种预测模式计算得到对应的率失真代价,然后选取率失真代价最小的预测模式,即最优预测模式作为当前点的属性预测模式。
示例性地,可通过以下公式计算索引为1、2或3的预测模式的率失真代价:
J indx_i=D indx_i+λ×R indx_i
其中,其中,J indx_i表示当前点采用索引为i的预测模式时的率失真代价,D为attrResidualQuant三个分量的和,即D=attrResidualQuant[0]+attrResidualQuant[1]+attrResidualQuant[2]。λ根据所述当前点的量化参数确定,R indx_i表示当前点采用索引为i的预测模式时得到的量化残差值在码流中所需的比特数。
示例性地,编码器确定出当前点使用的预测模式后,可基于确定的预测模式确定当前点的属性预测值attrPred,再利用当前点的属性原始值attrValue与当前点的属性预测值attrPred相减并对其结果进行量化,以得到当前点的量化残差值attrResidualQuant。例如编码器可通过以下公式确定当前点的量化残差值:
attrResidualQuant=(attrValue-attrPred)/Qstep;
其中,attrResidualQuant表示当前点的量化残差值,attrPred表示当前点的属性预测值,attrValue表示当前点的属性原始值,Qstep表示量化步长。其中,Qstep由量化参数(Quantization Parameter,Qp)计算得到。
示例性地,当前点的属性重建值可以作为后续点的近邻候选项,并利用当前点的重建值对后续点的属性信息进行预测。编码器可通过以下公式基于所述第一量化残差值确定的所述当前点的属性重建值:
Recon=attrResidualQuant×Qstep+attrPred;
其中,Recon表示基于当前点的量化残差值确定的所述当前点的属性重建值,attrResidualQuant表示当前点的量化残差值,Qstep表示量化步长,attrPred表示当前点的属性预测值。其中,Qstep由量化参数(Quantization Parameter,Qp)计算得到。
需要说明的是,本申请中,当前点的属性预测值(predictedvalue)也可称为属性信息的预测值或颜色预测值(predictedColor)。当前点的属性原始值也可称为当前点的属性信息的真实值或颜色原始值。当前点的残差值也可称为当前点的属性原始值与当前点的属性预测值的差值或也可称为当前点的颜色残差值(residualColor)。当前点的属性重建值(reconstructedvalue)也可称为当前点的属性的重建值或颜色重建值(reconstructedColor)。
图12是本申请实施例提供的解码框架200的示意性框图。
解码框架200可以从编码设备获取点云的码流,通过解析码得到点云中的点的位置信息和属性信息。其中点云的解码包括位置解码和属性解码。位置解码的过程包括:对几何码流进行算术解码;构建八叉树后进行合并,对点的位置信息进行重建,以得到点的位置信息的重建信息;对点的位置信息的重建信息进行坐标变换,得到点的位置信息。点的位置信息也可称为点的几何信息。属性解码过程包括:通过解析属性码流,获取点云中点的属性信息的残差值;通过对点的属性信息的残差值进行反量化,得到反量化后的点的属性信息的残差值;基于位置解码过程中获取的点的位置信息的重建信息,选择三种预测模式的一种进行点云预测,得到点的属性重建值;对点的属性重建值进行颜色空间反变换,以得到解码点云。
如图12所示,位置解码可通过以下单元实现:第一算数解码单元201、八叉树分析(synthesize octree)单元202、几何重建(Reconstruct geometry)单元203以及坐标反变化(inverse transform coordinates)单元204。属性编码可通过以下单元实现:第二算数解码单元210、反量化(inverse quantize)单元211、RAHT单元212、预测变化(predicting transform)单元213、提升变化(lifting transform)单元214以及颜色空间反变换(inverse transform colors)单元215。
需要说明的是,解压缩是压缩的逆过程,类似的,解码框架200中的各个单元的功能可参见编码框架100中相应的单元的功能。例如,解码框架200可根据点云中点与点之间的欧式距离将点云划分为多个LOD;然后,依次对LOD中点的属性信息进行解码;例如,计算零行程编码技术中零的数量(zero_cnt),以基于零的数量对残差进行解码;接着,解码框架200可基于解码出的残差值进行反量化,并基于反量化后的残差值与当前点的预测值相加得到该点云的重建值,直到解码完所有的点云。当前点将会作为后续LOD中点的最近邻居,并利用当前点的重建值对后续点的属性信息进行预测。
在算数编码过程中,编码器可利用待编码的当前节点与周围节点的空间相关性,对占位比特进行帧 内预测(intra prediction),并基于预测结果选择相应的二进制算数编码器进行算数编码,以实现基于上下文模型的自适应二进制算术编码(Context-based Adaptive Binary Arithmetic Coding,CABAC)进而得到几何码流。
例如,编码器可利用通过存储当前节点之前的已编解码节点,并利用位于某平面上的上一个已编解码节点的平面信息以及其与当前节点之间的距离对当前节点的第一索引进行确定,进而可基于确定得到的第一索引确定上下文索引,然后基于得到的上下文索引对所述当前节点进行编码。具体地,若上一个已编解码过的节点的平面信息为0时,则解码器确定当前节点的第一索引为0,若上一个已编解码过的节点的平面信息为1时,则解码器确定当前节点的第一索引为1,否则,解码器确定当前节点的第一索引为-1。
其中,确定当前节点的第一索引为-1时,表示确定当前节点不满足平面模式;确定当前节点的第一索引为1时,表示确定当前节点满足k=1的平面模式,确定当前节点的第一索引为0时,表示确定当前节点满足k=0的平面模式。确定当前节点满足k=0的平面模式指确定当前节点中在k=0的平面上存在占据子节点,确定当前节点满足k=1的平面模式指确定当前节点中在k=1的平面上存在占据子节点。
但是,利用已编解码节点的平面信息对当前节点的第一索引进行确定,其准确度较低,进而降低了编解码性能。例如,在莫顿顺序为基础的编码顺序下,上一个已编解码节点并不一定是当前节点的邻居节点,此时,利用已编解码节点的平面信息对当前节点的第一索引进行确定时,其准确度较低。有鉴于此,本申请实施例提供了一种索引确定方法、装置、解码器以及编码器,能够提升针对第一索引的准确度,进而提升解码性能。
图13是本申请实施例提供的索引确定方法300的示意性流程图。应理解,该索引确定方法300可由解码器执行。例如应用于图12所示的解码框架200。为便于描述,下面以解码器为例进行说明。
如图13所示,所述索引确定方法300可包括:
S310,解码器基于当前节点在垂直于第k轴的平面上的已解码的至少一个邻居节点的占据子节点,确定所述当前节点的第一索引。
本实施例中,基于当前节点在垂直于第k轴的平面上的已编码的至少一个邻居节点的占据子节点,确定所述当前节点的第一索引,避免了直接已编解码节点的平面信息对当前节点的第一索引进行预测,能够更好更细致的利用当前节点与所述至少一个邻居节点之间的空间相关性确定当前节点的第一索引,提升了针对第一索引的准确度,进而提升解码性能。
本实施例基于当前节点在垂直于第k轴的平面上的已解码的至少一个邻居节点的占据子节点,确定所述当前节点的第一索引,能够带来解码性能的增益。下面结合表2和表3对本申请提供的方案在测试平台上进行测试得到的结果进行说明。其中,表2示出了几何信息有损压缩下的代表率失真(Bit distortion,BD-rate),几何信息有损压缩条件下的BD-Rate表示:在获得相同编码质量的情况下,采用本申请提供的技术方案时的码率比与不采用本申请提供的技术方案时的码率节省(BD-Rate为负值)或增加(BD-Rate为正值)的百分比。表3示出了几何信息无损压缩条件下的Bpip比率(Bpip Ratio),几何信息无损压缩条件下的Bpip Ratio表示:在点云质量无损失的情况下,采用本申请提供的技术方案时的码率占不采用本申请提供的技术方案时的码率的百分比,其数值越低,说明采用本申请提供的方案进行编解码时节省的码率越大。
表2
Figure PCTCN2022087244-appb-000002
如表2所示,Cat1-A表示仅包括点的反射率信息的点的点云,Cat1-A average表示在几何信息有损压缩下Cat1A的各个分量的平均BD-rate;Cat1-B表示仅包括点的颜色信息的点的点云,Cat1-B average表示在几何信息有损压缩下Cat1-B的各个分量的平均BD-rate;Cat3-fused和Cat3-frame均表示包括点的颜色信息和其他属性信息的点的点云。Cat3-fused average表示在几何信息有损压缩下Cat3-fused的各个分量的平均BD-rate;Cat3-frame average表示在几何信息有损压缩下Cat3-frame的各个分量的平均BD-rate;总平均值(Overall average)表示Cat1-A至Cat3-frame在几何信息有损压缩下的平均BD-rate。 D1表示基于相同点到点误差下的BD-Rate,D2表示基于相同点到面误差下的BD-Rate。由表2可知,本申请提供的索引确定方法,对Cat1-A、Cat3-frame和Cat1-B具有明显的性能提升。
表3
Figure PCTCN2022087244-appb-000003
由表3可知,本申请提供的索引确定方法,对Cat1-A、Cat3-frame和Cat1-B均具有性能提升。
应当理解,本申请中涉及的第一索引的命名不做具体限定。
例如,在其他可替代实施例中,解码器当前节点在垂直于第k轴的平面上的已解码的至少一个邻居节点的占据子节点,确定所述当前节点的第一索引,也可称为当前节点在第k轴上的平面模式标志位occ_plane_pos[k],还可以称为平面上下文的(Planar contextualization of occ_plane_pos[k]),还可以称根据当前节点在垂直于第k轴的平面上的已解码的至少一个邻居节点的占据子节点确定的表达(expression)或变量。此外,所述至少一个邻居节点的占据子节点也可等同替换为至少一个邻居节点中占位比特的取值表示非空的子节点或具有类似含义的术语,本申请对此不做具体限定。
解码器可基于当前节点在垂直于第k轴的平面上的已解码的至少一个邻居节点中每一个子节点的占位比特,确定所述至少一个邻居节点的占据子节点。换言之,解码器可基于当前节点在垂直于第k轴的平面上的已解码的至少一个邻居节点的子节点的占位比特(或信息),预测当前节点的第一索引。
示例性地,下面结合表4对本申请的涉及的索引所在的位置进行说明。
表4
Figure PCTCN2022087244-appb-000004
如表4所示,occtree_planar_enabled表示当前点云是否允许使用平面模式。若occtree_planar_enabled为真,则解码器遍历第k轴获取PlanarEligible[k],PlanarEligible[k]表示当前点云在第k轴上是否允许使用平面模式。可选的,k取值为0、1、2时表示S、T、V轴。若PlanarEligible[k]为真,则解码器获取occ_single_plane[k],occ_single_plane[k]表示当前节点在第k轴上是否允许使用平面模式。若occ_single_plane[k]为真,则解码器可基于当前节点在垂直于第k轴的平面上的已解码的至少一个邻居节点,确定平面模式标志位occ_plane_pos[k]。
示例性,表5给出了k和平面轴(Planar axis)的对应关系:
表5
Figure PCTCN2022087244-appb-000005
在一些实施例中,所述S310可包括:
若所述至少一个邻居节点为非空,则解码器基于所述至少一个邻居节点的占据子节点,确定所述第一索引。
示例性地,若所述至少一个邻居节点中的任意一个邻居节点为非空,则解码器基于所述至少一个邻居节点的占据子节点,确定所述第一索引。
示例性地,若所述至少一个邻居节点仅包括一个非空的邻居节点、且所述一个非空的邻居节点包括多个占据子节点,则解码器基于所述至少一个邻居节点的占据子节点,确定所述第一索引。
示例性地,若所述至少一个邻居节点包括一个或多个非空的邻居节点,则解码器基于所述至少一个邻居节点的占据子节点,确定所述第一索引。
在一些实施例中,所述S310可包括:
若所述至少一个邻居节点的占据子节点均分布在垂直于所述第k轴的第一平面上,则解码器确定所述第一索引为第一数值;若若所述至少一个邻居节点的占据子节点均分布在垂直于所述第k轴的第二平面上,则解码器确定所述第一索引为第二数值;否则,解码器确定所述第一索引为第三数值。
示例性地,所述第一平面可以是高平面,所述第二平面可以是低平面。
示例性地,所述第一平面可以是k=1的平面,所述第二平面为k=0的平面。
示例性地,解码器可基于所述至少一个邻居节点的占据子节点所在的平面,确定所述第一索引。若所述至少一个邻居节点的占据子节点分布在同一平面内,则解码器基于所述同一平面确定所述第一索引;例如,若所述同一平面为所述第一平面,则确定所述第一索引为第一数值;若所述同一平面为所述第二平面,则确定所述第一索引为第二数值。若所述至少一个邻居节点的占据子节点未分布在同一平面内,则确定所述第一索引为第三数值。
示例性地,解码器先确定所述至少一个邻居节点的占据子节点是否均分布在所述第一平面上,若所述至少一个邻居节点的占据子节点均分布在所述第一平面上,则解码器确定所述第一索引为第一数值;若所述至少一个邻居节点的占据子节点不都分布在所述第一平面上,则解码器确定所述至少一个邻居节点的占据子节点是否均分布在所述第二平面上,若所述至少一个邻居节点的占据子节点都分布在所述第二平面上,则解码器确定所述第一索引为第二数值;若所述至少一个邻居节点的占据子节点不都分布在所述第二平面上,则解码器确定所述第一索引为第三数值。
示例性地,解码器先确定所述至少一个邻居节点的占据子节点是否均分布在所述第二平面上,若所述至少一个邻居节点的占据子节点均分布在所述第二平面上,则解码器确定所述第一索引为第二数值;若所述至少一个邻居节点的占据子节点不均分布在所述第二平面上,则解码器确定所述至少一个邻居节点的占据子节点是否均分布在所述第一平面上,若所述至少一个邻居节点的占据子节点都分布在所述第一平面上,则解码器确定所述第一索引为第一数值;若所述至少一个邻居节点的占据子节点不都分布在所述第一平面上,则解码器确定所述第一索引为第三数值。
在一些实施例中,所述方法300还可包括:
解码器基于所述至少一个邻居节点和所述当前节点之间的距离,确定所述当前节点的第二索引。
在一些实施例中,解码器确定所述第二索引为第四数值。
示例性地,若解码器基于所述至少一个邻居节点的占据子节点确定所述第一索引,则解码器直接确定所述第二索引为第四数值。
示例性地,所述第四数值为预定义的数值。
示例性地,所述预定义可以通过在设备(例如,包括解码器和编码器)中预先保存相应的代码、表格或其他可用于指示相关信息的方式来实现,本申请对于其具体的实现方式不做限定。比如,预定义的数值可以是指协议中定义的数值。可选地,所述"协议"可以指编解码技术领域的标准协议,例如可以包括VCC或ECM协议等相关协议。
在一些实施例中,所述第四数值为0且用于表示:所述至少一个邻居节点和所述当前节点之间的距离小于或等于预设阈值。
在一些实施例中,所述第四数值为0且用于表示:所述至少一个邻居节点在所述当前节点的预设区域内。
当然,在其他可替代实施例中,所述第四数值也可以取其他数值,本申请对其具体取值不做限定。
在一些实施例中,所述至少一个邻居节点和所述当前节点之间的距离为曼哈顿距离。
当然,在其他可替代实施例中,所述至少一个邻居节点和所述当前节点之间的距离也可以为其他类型的距离,例如欧式距离或莫顿距离,本申请对其具体取值不做限定。
在一些实施例中,所述方法300还可包括:
若所述至少一个邻居节点为空,则解码器基于所述当前节点在垂直于所述第k轴的平面上的前一个已解码节点的平面信息,确定所述第三索引。
示例性地,所述前一个已解码节点的平面信息也可以指所述前一个已解码节点的占据子节点所在的垂直于所述第k轴的平面。例如,所述前一个已解码节点的占据子节点所在的垂直于所述第k轴的平面可以是第一平面(例如高平面或k=1的平面)。例如,所述前一个已解码节点的占据子节点所在的垂直于所述第k轴的平面可以是第二平面(例如低平面或k=0的平面)。
在一些实施例中,所述S310可包括:
若所述平面信息为1,则解码器确定所述第三索引为第一数值;若所述平面信息不为1,则解码器确定所述第三索引为第二数值。
示例性地,若所述前一个已解码节点的占据子节点所在的垂直于所述第k轴的平面是第一平面(例 如高平面或k=1的平面),则解码器确定所述第三索引为第一数值;若所述前一个已解码节点的占据子节点所在的垂直于所述第k轴的平面不是所述第一平面(例如高平面或k=1的平面),则解码器确定所述第三索引为第二数值。
在一些实施例中,所述S310可包括:
若所述平面信息为0,则解码器确定所述第三索引为第二数值;若所述平面信息不为0,则解码器确定所述第三索引为第一数值。
在一些实施例中,所述方法300还可包括:
解码器基于所述前一个已解码节点和所述当前节点之间的距离,确定的第四索引。
示例性地,若解码器基于所述当前节点在垂直于所述第k轴的平面上的前一个已解码节点的平面信息,确定所述第三索引,则解码器可基于所述前一个已解码节点和所述当前节点之间的距离,确定所述当前节点的第四索引。
在一些实施例中,解码器获取已启用平面模式的八叉树深度层数n;基于所述n,确定所述第四索引。
示例性地,当前点云已启用平面模式的八叉树深度层连续。
示例性地,当前点云的八叉树划分层数为m时,若在第t层启用平面模式且当前点云已启用平面模式的八叉树深度层连续,则对于第t层,n=1;对于t+1层,n=2;依次类推,对于第m层,n=m-t+1。
示例性地,当前点云的八叉树划分层数为10时,若在第5层启用平面模式且当前点云已启用平面模式的八叉树深度层连续,则对于第5层,n=1;对于6层,n=2;对于7层,n=3;对于8层,n=4;对于9层,n=5;对于10层,n=6。
在一些实施例中,若所述前一个已解码节点与所述当前节点之间的距离小于或等于2 n,则解码器确定所述第四索引为第五数值;若所述前一个已解码节点与所述当前节点之间的距离大于2 n,则解码器确定所述当前节点的第二索引为第六数值。
在一些实施例中,所述第五数值为0且用于表示:所述前一个已解码节点和所述当前节点之间的距离小于或等于预设阈值,或所述前一个已解码节点在所述当前节点的预设区域内;和/或,所述第六数值为1且用于表示:所述前一个已解码节点和所述当前节点之间的距离大于预设阈值,或所述前一个已解码节点不在所述当前节点的预设区域内。
当然,在其他可替代实施例中,所述第五数值或所述第六数值也可以取其他数值,本申请的方案只需要保证所述第五数值或所述第六数值不相同即可,对其具体取值不做限定。
当然,在其他实施例中,也可以采用其他方式基于所述n,确定所述第四索引,本申请对此不做具体限定。例如,可以直接将所述前一个已解码节点与所述当前节点之间的距离和所述n进行比较的方式,确定所述第四索引。再如,可以直接将所述前一个已解码节点与所述当前节点之间的距离和与所述n有关的其他函数值进行比较的方式,确定所述第四索引。
在一些实施例中,所述前一个已解码节点与所述当前节点之间的距离为曼哈顿距离。
在一些实施例中,所述第一数值为1,所述第二数值为0,所述第三数值为-1。
当然,在其他可替代实施例中,所述第一数值,所述第二数值或所述第三数值也可以取其他数值,本申请的方案只需要保证所述第一数值,所述第二数值和所述第三数值互不相同即可,对其具体取值不做限定。
在一些实施例中,所述第一索引为所述第一数值时表征确定所述当前节点满足所述第一平面(例如高平面或k=1的平面)的平面模式,所述第一索引为所述第二数值时表征确定所述当前节点满足所述第二平面(例如低平面或k=0的平面)的平面模式,所述第一索引为所述第三数值时表征确定所述当前节点不满足平面模式。
示例性地,解码器确定所述第一索引为第一数值时,表示解码器可以预测当前节点满足所述第一平面(例如高平面或k=1的平面)的平面模式;解码器确定所述第一索引为第二数值时,表示解码器可以预测当前节点满足所述第二平面模式;解码器确定所述第一索引为第三数值时,表示解码器可以预测当前节点不满足第二平面(例如低平面或k=0的平面)的平面模式。解码器预测当前节点满足所述第二平面(例如低平面或k=0的平面)的平面模式指:解码器可以预测当前节点中在所述第二平面(例如低平面或k=0的平面)上存在占据子节点;解码器预测当前节点满足所述第一平面(例如高平面或k=1的平面)的平面模式指:解码器可以预测当前节点中在所述第一平面(例如高平面或k=1的平面)上存在占据子节点;解码器预测当前节点不满足平面模式指解码器可以预测当前节点不存在占据子节点或存在的占据子节点不都分布在一个平面上。
在一些实施例中,k的取值为0,1,2。
示例性地,k的取值为0,1,2时表示S、T、V轴。
示例性地,解码器可基于当前节点在垂直于S轴的平面上的已解码的至少一个邻居节点的占据子节点,确定所述当前节点在S轴上的索引,也可基于当前节点在垂直于T轴的平面上的已解码的至少一个邻居节点的占据子节点,确定所述当前节点在T轴上的索引,还可基于当前节点在垂直于V轴的平面上的已解码的至少一个邻居节点的占据子节点,确定所述当前节点在V轴上的索引。换言之,解码器确定的第一索引可包括所述当前节点在S轴上的索引,所述当前节点在T轴上的索引,所述当前节点在V轴上的索引中的一项或多项。
在一些实施例中,所述至少一个邻居节点包括已解码的且与所述当前节点相邻的节点。
下面结合图14以解码器先确定所述至少一个邻居节点的占据子节点是否均分布在所述第一平面,再确定所述至少一个邻居节点的占据子节点是否均分布在所述第二平面上为例,对当前节点的索引的确定方法进行示例性说明。
图14是本申请实施例提供的在x方向上的邻居节点的占据子节点的示例。
如图14所示,解码器基于当前节点在垂直于x方向的平面上的已解码的至少一个邻居节点的占据子节点,确定所述当前节点的第一索引,其中,当前节点在垂直于x方向的平面上的已解码的至少一个邻居节点包括邻居节点1和邻居节点2,邻居节点1的占据子节点包括占据子节点1,邻居节点2的占据子节点包括占据子节点2和占据子节点3,由于占据子节点1、占据子节点2以及占据子节点3均分布在x=0的平面上,因此,解码器可预测当前节点的第一索引为第二数值,例如解码器可确定当前节点的第一索引为0。
图15是本申请实施例提供的索引确定方法400的示意性流程图。应理解,该索引确定方法400可由解码器执行。例如应用于图12所示的解码框架200。为便于描述,下面以解码器为例进行说明。
S411,开始。
S412,解码器确定当前节点在垂直于第k轴平面上的已解码的两个邻居节点任意一个为非空?。
S413,若当前节点在垂直于第k轴平面上的已解码的两个邻居节点任意一个为非空,则解码器确定两个邻居节点的占据子节点都分布于垂直于第k轴的第一平面上?
S414,若两个邻居节点的占据子节点不都分布于所述第一平面上,则解码器确定两个邻居节点的占据子节点都分布于垂直于第k轴的第二平面上?
S415,若两个邻居节点的占据子节点不都分布于所述第二平面上,则解码器确定当前节点的第一索引为-1。
S416,若两个邻居节点的占据子节点都分布于所述第二平面上,则解码器确定当前节点的第一索引为0。
S417,若两个邻居节点的占据子节点都分布于所述第一平面上,则解码器确定当前节点的第一索引为1。
S418,解码器确定当前节点的第二索引为0。
S419,若当前节点在垂直于第k轴平面上的已解码的两个邻居节点均为空,则解码器确定在垂直于第k轴平面上的上一个已解码节点的平面信息为1?
S420,若在垂直于第k轴的平面上的上一个已解码节点的平面信息不为1,则解码器确定当前节点的第三索引为0。
S421,若在垂直于第k轴的平面上的上一个已解码节点的平面信息为1,则解码器确定当前节点的第三索引为1。
S422,解码器获取已启用平面模式的八叉树层级数n。
S423,解码器确定上一个已解码节点与当前节点的曼哈顿距离大于2的n次方?
S424,若上一个已解码节点与当前节点的曼哈顿距离小于或等于2的n次方,则解码器确定当前节点的第四索引为0。
S425,若上一个已解码节点与当前节点的曼哈顿距离大于2的n次方,则解码器确定当前节点的第四索引为1。
S426,结束。
应理解,图15仅为本申请的示例,不应理解为对本申请的限制。
例如,在其他可替代实施例中,解码器也可以先确定2个邻居节点的占据子节点是否都分布在所述第二平面上,若2个邻居节点的占据子节点不都分布在所述第二平面上,再确定2个邻居节点的占据子节点是否都分布在所述第一平面上;或者,解码器也可同时确定2个邻居节点的占据子节点是否都分布在所述第二平面或所述第一平面上,本申请对此不做具体限定。
在一些实施例中,所述方法300还可包括:
解码器基于所述第一索引,对所述当前节点进行解码。
示例性地,解码器可基于所述第一索引确定当前节点的上下文索引,并基于当前节点的上下文索引进行解码。
示例性地,解码器可基于所述第一索引和上文涉及的第二索引确定当前节点的上下文索引。
示例性地,解码器还可基于上文涉及的第三索引确定当前节点的上下文索引。
示例性地,解码器可基于所述第三索引和上文涉及的第四索引确定当前节点的上下文索引。
示例性地,解码器确定得到所述当前节点在S轴上的索引、所述当前节点在T轴上的索引、所述当前节点在V轴上的索引中的一项或多项后,可基于所述当前节点在S轴上的索引、所述当前节点在T轴上的索引、所述当前节点在V轴上的索引中的一项或多项,确定当前节点的上下文索引,并基于当前节点的上下文索引对所述当前节点进行解码。
示例性地,解码器确定出当前节点的上下文索引后,可基于当前节点的上下文索引确定用于对所述当前节点进行算数解码的算数解码器;并基于确定的算数解码器对所述当前节点进行算数解码,得到当前节点的几何信息。
值得注意的是,在本申请实施例中,所述第一索引、所述第二索引、所述第三索引和所述第四索引均可以是用于确定当前节点的上下文索引的中间索引或中间变量。
例如,在其他可替代实施例中,所述第一索引和所述第三索引可以称为一种类型的索引,或所述第一索引和所述第三索引可以合并为1个索引,例如可将其称为平面索引。示例性地,若所述至少一个邻居节点为非空,则基于所述至少一个邻居节点的占据子节点,确定这个平面索引;若所述至少一个邻居节点为空,则基于所述当前节点在垂直于所述第k轴的平面上的前一个已解码节点的平面信息,确定这个平面索引。
再如,在其他可替代实施例中,所述第二索引和所述第四索引可以称为一种类型的索引,或所述第二索引和所述第四索引也可合并为1个索引,例如可将其称为距离索引。示例性地,若所述至少一个邻居节点为非空或解码器基于所述至少一个邻居节点的占据子节点,则确定上文涉及的第一索引,则解码器将距离索引确定为一个预设数值(例如0);若所述至少一个邻居节点为空或解码器基于所述当前节点在垂直于所述第k轴的平面上的前一个已解码节点的平面信息,确定上文涉及的第三索引,则解码器基于所述前一个已解码节点和所述当前节点之间的距离,确定这个距离索引。
也即是说,术语所述第一索引、所述第二索引、所述第三索引和所述第四索引仅用来将索引彼此区分开,而对索引的数量、类型等并没有限定,即不对本申请实施例的范围构成限制。
下面结合本申请提供的方案对标准文本相关的数据处理流程以及spec中使用到的变量进行示例性说明:
确定occ_plane_pos[k]标志位的上下文索引要用到垂直于第k轴的平面里的前一个有平面编码模式资格的已解码节点或者邻居节点的占据子节点的信息,包括:
当前节点与所述节点的的曼哈顿距离;
occ_single_plane和occ_plane pos的值。
垂直于编码节点的第k轴的平面由其沿轴模2 14的位置标识。
PlanarNodeAxisLoc[k]表示当前节点垂直于第k轴的平面,是基于当前节点在当前层级的八叉树下的位置坐标获取的。
ManhattanDist[k]表示当前节点在垂直于第k轴的平面上距离坐标原点的曼哈顿距离,是通过垂直于第k轴的平面上的坐标值相加获取的:
ManhattanDist[k]:=
k==0?Nt+Nv:
k==1?Ns+Nv:
k==2?Ns+Nt:na
前一个具有平面编码模式资格的已编解码的节点的信息是由下面的变量存储的,k和axisLoc可以确定垂直于第k轴的平面的位置:
数组PrevManhattanDist;PrevManhattanDist[k][axisLoc]表示前一个具有平面编码模式资格的已编解码的节点在垂直于第k轴的平面上距离坐标原点的曼哈顿距离;
数组PrevOccSinglePlane;PrevOccSinglePlane[k][axisLoc]表示前一个具有平面编码模式资格的已编解码的节点是否满足平面编码模式;
数组PrevOccPlanePos;PrevOccPlanePos[k][axisLoc]表示前一个具有平面编码模式资格的已编解码的节点的平面位置。
After each occupancy_tree_node syntax structure,the state shall be updated for each planar-eligible axis:
for(k=0;k<3;k++)
if(PlanarEligible[k]){
PrevManhattanDist[k][PlanarNodeAxisLoc[k]]=ManhattanDist[k]
PrevOccSinglePlane[k][PlanarNodeAxisLoc[k]]=occ_single_plane[k]
if(occ_single_plane[k])
PrevOccPlanePos[k][PlanarNodeAxisLoc[k]]=occ_plane_pos[k]
}
即,当前节点进行平面编码模式后,对于每一个k轴,上面三个变量都要基于当前节点的信息分别更新。
对于不符合角度上下文化条件(AngularEligible为0)的节点的occ_plane_pos[k]的上下文化由表达式CtxIdxPlanePos指定:
Contextualization of occ_plane_pos[k]for nodes not eligible for angular contextualization(AngularEligible is 0)is specified by the expression CtxIdxPlanePos.
CtxIdxPlanePos:=isNeighOccupied&&occtree_adjacent_child_enabled
?(neighPlanePosCtxInc<0?adjPlaneCtxInc:12×k+4×adjPlaneCtxInc+2×neighDistCtxInc+neighPlanePosCtxInc+3)
:(occtree_planar_buffer_disabled||
Figure PCTCN2022087244-appb-000006
PrevOccSinglePlane[k][PlanarNodeAxisLoc[k]]
?adjPlaneCtxInc
:12×k+4×adjPlaneCtxInc+2×prevDistCtxInc+prevPlanePosCtxInc+3)
平面编码模式的标志位occ_plane_pos[k]的上下文索引的确定方式如下:
当至少一个邻居节点为非空(isNeighOccupied为真)且邻居节点子节点信息可访问(occtree_adjacent_child_enabled为真)的情况下,由第一索引neighPlanePosCtxInc和第二索引neighDistCtxInc来确定occ_plane_pos[k]的上下文索引;否则,由第三索引prevPlanePosCtxInc和第四索引prevDistCtxInc来确定occ_plane_pos[k]的上下文索引。
neighDistCtxInc(第二索引)的值就是0;
prevDistCtxInc(第四索引)通过前一个已编解码的节点与当前节点之间的曼哈顿距离来确定。
prevDistCtxInc:=Abs(a-b)>2 numEiigiblePlanarLeveis
where
a=PrevManhattanDist[k][PlanarNodeAxisLoc[k]]
b=ManhattanDist[k]
numEligiblePlanarLevels通过启用平面模式的层数来区分。
当他们之间的曼哈顿距离大于2的已启用了平面编码模式的八叉树深度层数次方,则赋值为1,否则为0。
neighPlanePosCtxInc(第一索引)是通过至少一个邻居节点的占据子节点确定的。
prevPlanePosCtxInc(第三索引)是通过前一个具有平面编码模式资格的已编解码的节点的占据平面位置(第一平面、第二平面)。
上文中从解码器的角度详细描述了根据本申请实施例的索引确定方法,下面将结合图16从编码器的角度描述根据本申请实施例的索引确定方法。
图16是本申请实施例提供的索引确定方法500的示意性流程图。应理解,该索引确定方法500可由编码器执行。例如应用于图4所示的编码框架100。为便于描述,下面以编码器为例进行说明。
如图16所示,所述索引确定方法500可包括:
S510,基于当前节点在垂直于第k轴的平面上的已编码的至少一个邻居节点的占据子节点,确定所述当前节点的第一索引。
在一些实施例中,所述S510可包括:
若所述至少一个邻居节点为非空,则基于所述至少一个邻居节点的占据子节点,确定所述第一索引。
在一些实施例中,若所述至少一个邻居节点的占据子节点均分布在垂直于所述第k轴的第一平面上,则确定所述第一索引为第一数值;若所述至少一个邻居节点的占据子节点均分布在垂直于第k轴的第二平面上,则确定所述第一索引为第二数值;否则,确定所述第一索引为第三数值。
在一些实施例中,所述方法500还可包括:
基于所述至少一个邻居节点和所述当前节点之间的距离,确定所述当前节点的第二索引。
在一些实施例中,确定所述第二索引为第四数值。
在一些实施例中,所述第四数值为0时表征:所述至少一个邻居节点和所述当前节点之间的距离小于或等于预设阈值,或所述至少一个邻居节点在所述当前节点的预设区域内。
在一些实施例中,所述至少一个邻居节点和所述当前节点之间的距离为曼哈顿距离。
在一些实施例中,所述方法500还可包括:
若所述至少一个邻居节点为空,则基于所述当前节点在垂直于所述第k轴的平面上的前一个已编码节点的平面信息,确定第三索引。
在一些实施例中,若所述平面信息为1,则确定所述第三索引为第一数值;若所述平面信息不为1,则确定所述第三索引为第二数值。
在一些实施例中,所述方法500还可包括:
基于所述前一个已编码节点和所述当前节点之间的距离,确定所述当前节点的第四索引。
在一些实施例中,获取已启用平面模式的八叉树深度层数n;基于所述n,确定所述第四索引。
在一些实施例中,若所述前一个已编码节点与所述当前节点之间的距离小于或等于2 n,则确定所述第四索引为第五数值;若所述前一个已编码节点与所述当前节点之间的距离大于2 n,则确定所述第四索引为第六数值。
在一些实施例中,所述前一个已编码节点与所述当前节点之间的距离为曼哈顿距离。
在一些实施例中,所述第一数值为1,所述第二数值为0,所述第三数值为-1。
在一些实施例中,k的取值为0,1,2。
在一些实施例中,所述方法300还可包括:
基于所述当前节点的第一索引,对所述当前节点进行编码。
应当理解,本申请提供的技术方案可同时应用于编解码端,即能够保持两端的同步和一致性;也即是说,索引确定方法500的详细方案可参见索引确定方法300的相关内容,为避免重复,此处不再赘述。
以上结合附图详细描述了本申请的优选实施方式,但是,本申请并不限于上述实施方式中的具体细节,在本申请的技术构思范围内,可以对本申请的技术方案进行多种简单变型,这些简单变型均属于本申请的保护范围。例如,在上述具体实施方式中所描述的各个具体技术特征,在不矛盾的情况下,可以通过任何合适的方式进行组合,为了避免不必要的重复,本申请对各种可能的组合方式不再另行说明。又例如,本申请的各种不同的实施方式之间也可以进行任意组合,只要其不违背本申请的思想,其同样应当视为本申请所公开的内容。还应理解,在本申请的各种方法实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
上文详细描述了本申请的方法实施例,下文结合图17至图18详细描述本申请的装置实施例。
图17是本申请实施例的索引确定装置600的示意性框图。
如图17所示,所述索引确定装置600可包括:
确定单元610,用于基于当前节点在垂直于第k轴的平面上的已解码的至少一个邻居节点的占据子节点,确定所述当前节点的第一索引。
在一些实施例中,所述确定单元610具体用于:
若所述至少一个邻居节点为非空,则基于所述至少一个邻居节点的占据子节点,确定所述第一索引。
在一些实施例中,所述确定单元610具体用于:
若所述至少一个邻居节点的占据子节点均分布在垂直于所述第k轴的第一平面上,则确定所述第一索引为第一数值;
若所述至少一个邻居节点的占据子节点均分布在垂直于第k轴的第二平面上,则确定所述第一索引为第二数值;
否则,确定所述第一索引为第三数值。
在一些实施例中,所述确定单元610还用于:
基于所述至少一个邻居节点和所述当前节点之间的距离,确定所述当前节点的第二索引。
在一些实施例中,所述确定单元610具体用于:
确定所述第二索引为第四数值。
在一些实施例中,所述第四数值为0时表征:所述至少一个邻居节点和所述当前节点之间的距离小于或等于预设阈值,或所述至少一个邻居节点在所述当前节点的预设区域内。
在一些实施例中,所述至少一个邻居节点和所述当前节点之间的距离为曼哈顿距离。
在一些实施例中,所述确定单元610还用于:
若所述至少一个邻居节点为空,则基于所述当前节点在垂直于所述第k轴的平面上的前一个已解码节点的平面信息,确定第三索引。
在一些实施例中,所述确定单元610具体用于:
若所述平面信息为1,则确定所述第三索引为第一数值;
若所述平面信息不为1,则确定所述第三索引为第二数值。
在一些实施例中,所述确定单元610还用于:
基于所述前一个已解码节点和所述当前节点之间的距离,确定所述当前节点的第四索引。
在一些实施例中,所述确定单元610具体用于:
获取已启用平面模式的八叉树深度层数n;
基于所述n,确定所述第四索引。
在一些实施例中,所述确定单元610具体用于:
若所述前一个已解码节点与所述当前节点之间的距离小于或等于2 n,则确定所述第四索引为第五数值;
若所述前一个已解码节点与所述当前节点之间的距离大于2 n,则确定所述第四索引为第六数值。
在一些实施例中,所述前一个已解码节点与所述当前节点之间的距离为曼哈顿距离。
在一些实施例中,所述第一数值为1,所述第二数值为0,所述第三数值为-1。
在一些实施例中,k的取值为0,1,2。
在一些实施例中,所述确定单元610还用于:
基于所述当前节点的第一索引,对所述当前节点进行解码。
图18是本申请实施例的索引确定装置700的示意性框图。
如图18所示,所述索引确定装置700可包括:
确定单元710,用于基于当前节点在垂直于第k轴的平面上的已编码的至少一个邻居节点的占据子节点,确定所述当前节点的第一索引。
在一些实施例中,所述确定单元710具体用于:
若所述至少一个邻居节点为非空,则基于所述至少一个邻居节点的占据子节点,确定所述第一索引。
在一些实施例中,所述确定单元710具体用于:
若所述至少一个邻居节点的占据子节点均分布在垂直于所述第k轴的第一平面上,则确定所述第一索引为第一数值;
若所述至少一个邻居节点的占据子节点均分布在垂直于第k轴的第二平面上,则确定所述第一索引为第二数值;
否则,确定所述第一索引为第三数值。
在一些实施例中,所述确定单元710还用于:
基于所述至少一个邻居节点和所述当前节点之间的距离,确定所述当前节点的第二索引。
在一些实施例中,所述确定单元710具体用于:
确定所述第二索引为第四数值。
在一些实施例中,所述第四数值为0时表征:所述至少一个邻居节点和所述当前节点之间的距离小于或等于预设阈值,或所述至少一个邻居节点在所述当前节点的预设区域内。
在一些实施例中,所述至少一个邻居节点和所述当前节点之间的距离为曼哈顿距离。
在一些实施例中,所述确定单元710还用于:
若所述至少一个邻居节点为空,则基于所述当前节点在垂直于所述第k轴的平面上的前一个已编码节点的平面信息,确定第三索引。
在一些实施例中,所述确定单元710具体用于:
若所述平面信息为1,则确定所述第三索引为第一数值;
若所述平面信息不为1,则确定所述第三索引为第二数值。
在一些实施例中,所述确定单元710还用于:
基于所述前一个已编码节点和所述当前节点之间的距离,确定所述当前节点的第四索引。
在一些实施例中,所述确定单元710具体用于:
获取已启用平面模式的八叉树深度层数n;
基于所述n,确定所述第四索引。
在一些实施例中,所述确定单元710具体用于:
若所述前一个已编码节点与所述当前节点之间的距离小于或等于2 n,则确定所述第四索引为第五数值;
若所述前一个已编码节点与所述当前节点之间的距离大于2 n,则确定所述第四索引为第六数值。
在一些实施例中,所述前一个已编码节点与所述当前节点之间的距离为曼哈顿距离。
在一些实施例中,所述第一数值为1,所述第二数值为0,所述第三数值为-1。
在一些实施例中,k的取值为0,1,2。
在一些实施例中,所述确定单元710还用于:
基于所述当前节点的第一索引,对所述当前节点进行编码。
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。具体地,图17所示的索引确定装置600可以对应于执行本申请实施例的方法300中的相应主体,并且索引确定装置600中的各个单元的前述和其它操作和/或功能分别为了实现方法300等各个方法中的相应流程。图18所示的索引确定装置700可以对应于执行本申请实施例的方法500中的相应主体,即索引确定装置700中的各个单元的前述和其它操作和/或功能分别为了实现方法500等各个方法中的相应流程。
还应当理解,本申请实施例涉及的索引确定装置600或索引确定装置700中的各个单元可以分别或全部合并为一个或若干个另外的单元来构成,或者其中的某个(些)单元还可以再拆分为功能上更小的多个单元来构成,这可以实现同样的操作,而不影响本申请的实施例的技术效果的实现。上述单元是基于逻辑功能划分的,在实际应用中,一个单元的功能也可以由多个单元来实现,或者多个单元的功能由一个单元实现。在本申请的其它实施例中,该索引确定装置600或索引确定装置700也可以包括其它单元,在实际应用中,这些功能也可以由其它单元协助实现,并且可以由多个单元协作实现。根据本申请的另一个实施例,可以通过在包括例如中央处理单元(CPU)、随机存取存储介质(RAM)、只读存储介质(ROM)等处理元件和存储元件的通用计算机的通用计算设备上运行能够执行相应方法所涉及的各步骤的计算机程序(包括程序代码),来构造本申请实施例涉及的索引确定装置600或索引确定装置700,以及来实现本申请实施例的编码方法或解码方法。计算机程序可以记载于例如计算机可读存储介质上,并通过计算机可读存储介质装载于电子设备中,并在其中运行,来实现本申请实施例的相应方法。
换言之,上文涉及的单元可以通过硬件形式实现,也可以通过软件形式的指令实现,还可以通过软硬件结合的形式实现。具体地,本申请实施例中的方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路和/或软件形式的指令完成,结合本申请实施例公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件组合执行完成。可选地,软件可以位于随机存储器,闪存、只读存储器、可编程只读存储器、电可擦写可编程存储器、寄存器等本领域的成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法实施例中的步骤。
图19是本申请实施例提供的电子设备800的示意结构图。
如图19所示,该电子设备800至少包括处理器810以及计算机可读存储介质820。其中,处理器810以及计算机可读存储介质820可通过总线或者其它方式连接。计算机可读存储介质820用于存储计算机程序821,计算机程序821包括计算机指令,处理器810用于执行计算机可读存储介质820存储的计算机指令。处理器810是电子设备800的计算核心以及控制核心,其适于实现一条或多条计算机指令,具体适于加载并执行一条或多条计算机指令从而实现相应方法流程或相应功能。
作为示例,处理器810也可称为中央处理器(Central Processing Unit,CPU)。处理器810可以包括但不限于:通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等等。
作为示例,计算机可读存储介质820可以是高速RAM存储器,也可以是非不稳定的存储器(Non-VolatileMemory),例如至少一个磁盘存储器;可选的,还可以是至少一个位于远离前述处理器810的计算机可读存储介质。具体而言,计算机可读存储介质820包括但不限于:易失性存储器和/或非易失性存储器。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。
在一种实现方式中,该电子设备800可以是本申请实施例涉及的编码器或编码框架;该计算机可读存储介质820中存储有第一计算机指令;由处理器810加载并执行计算机可读存储介质820中存放的第 一计算机指令,以实现本申请实施例提供的编码方法中的相应步骤;换言之,计算机可读存储介质820中的第一计算机指令由处理器810加载并执行相应步骤,为避免重复,此处不再赘述。
在一种实现方式中,该电子设备800可以是本申请实施例涉及的解码器或解码框架;该计算机可读存储介质820中存储有第二计算机指令;由处理器810加载并执行计算机可读存储介质820中存放的第二计算机指令,以实现本申请实施例提供的解码方法中的相应步骤;换言之,计算机可读存储介质820中的第二计算机指令由处理器810加载并执行相应步骤,为避免重复,此处不再赘述。
根据本申请的另一方面,本申请实施例还提供了一种编解码系统,包括上文涉及的编码器和解码器。
根据本申请的另一方面,本申请实施例还提供了一种计算机可读存储介质(Memory),计算机可读存储介质是电子设备800中的记忆设备,用于存放程序和数据。例如,计算机可读存储介质820。可以理解的是,此处的计算机可读存储介质820既可以包括电子设备800中的内置存储介质,当然也可以包括电子设备800所支持的扩展存储介质。计算机可读存储介质提供存储空间,该存储空间存储了电子设备800的操作系统。并且,在该存储空间中还存放了适于被处理器810加载并执行的一条或多条的计算机指令,这些计算机指令可以是一个或多个的计算机程序821(包括程序代码)。
根据本申请的另一方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。例如,计算机程序821。此时,数据处理设备800可以是计算机,处理器810从计算机可读存储介质820读取该计算机指令,处理器810执行该计算机指令,使得该计算机执行上述各种可选方式中提供的编码方法或解码方法。
换言之,当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地运行本申请实施例的流程或实现本申请实施例的功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质进行传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元以及流程步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
最后需要说明的是,以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (39)

  1. 一种索引确定方法,其特征在于,所述方法适用于解码器,所述方法包括:
    基于当前节点在垂直于第k轴的平面上的已解码的至少一个邻居节点的占据子节点,确定所述当前节点的第一索引。
  2. 根据权利要求1所述的方法,其特征在于,所述基于当前节点在垂直于第k轴的平面上的已解码的至少一个邻居节点的占据子节点,确定所述当前节点的第一索引,包括:
    若所述至少一个邻居节点为非空,则基于所述至少一个邻居节点的占据子节点,确定所述第一索引。
  3. 根据权利要求2所述的方法,其特征在于,所述基于所述至少一个邻居节点的占据子节点,确定所述第一索引,包括:
    若所述至少一个邻居节点的占据子节点均分布在垂直于所述第k轴的第一平面上,则确定所述第一索引为第一数值;
    若所述至少一个邻居节点的占据子节点均分布在垂直于第k轴的第二平面上,则确定所述第一索引为第二数值;
    否则,确定所述第一索引为第三数值。
  4. 根据权利要求2或3所述的方法,其特征在于,所述方法还包括:
    基于所述至少一个邻居节点和所述当前节点之间的距离,确定所述当前节点的第二索引。
  5. 根据权利要求4所述的方法,其特征在于,所述基于所述至少一个邻居节点和所述当前节点之间的距离,确定所述当前节点的第二索引,包括:
    确定所述第二索引为第四数值。
  6. 根据权利要求5所述的方法,其特征在于,所述第四数值为0时表征:所述至少一个邻居节点和所述当前节点之间的距离小于或等于预设阈值,或所述至少一个邻居节点在所述当前节点的预设区域内。
  7. 根据权利要求4至6中任一项所述的方法,其特征在于,所述至少一个邻居节点和所述当前节点之间的距离为曼哈顿距离。
  8. 根据权利要求1至7中任一项所述的方法,其特征在于,所述方法还包括:
    若所述至少一个邻居节点为空,则基于所述当前节点在垂直于所述第k轴的平面上的前一个已解码节点的平面信息,确定第三索引。
  9. 根据权利要求8所述的方法,其特征在于,所述基于所述当前节点在垂直于所述第k轴的平面上的前一个已解码节点的平面信息,确定第三索引,包括:
    若所述平面信息为1,则确定所述第三索引为第一数值;
    若所述平面信息不为1,则确定所述第三索引为第二数值。
  10. 根据权利要求8或9所述的方法,其特征在于,所述方法还包括:
    基于所述前一个已解码节点和所述当前节点之间的距离,确定所述当前节点的第四索引。
  11. 根据权利要求10所述的方法,其特征在于,所述基于所述前一个已解码节点和所述当前节点之间的距离,确定所述当前节点的第四索引,包括:
    获取已启用平面模式的八叉树深度层数n;
    基于所述n,确定所述第四索引。
  12. 根据权利要求11所述的方法,其特征在于,所述基于所述n,确定所述第四索引,包括:
    若所述前一个已解码节点与所述当前节点之间的距离小于或等于2 n,则确定所述第四索引为第五数值;
    若所述前一个已解码节点与所述当前节点之间的距离大于2 n,则确定所述第四索引为第六数值。
  13. 根据权利要求10至12中任一项所述的方法,其特征在于,所述前一个已解码节点与所述当前节点之间的距离为曼哈顿距离。
  14. 根据权利要求3或9所述的方法,其特征在于,所述第一数值为1,所述第二数值为0,所述第三数值为-1。
  15. 根据权利要求1至14中任一项所述的方法,其特征在于,k的取值为0,1,2。
  16. 根据权利要求1至15中任一项所述的方法,其特征在于,所述方法还包括:
    基于所述当前节点的第一索引,对所述当前节点进行解码。
  17. 一种索引确定方法,其特征在于,所述方法适用于编码器,所述方法包括:
    基于当前节点在垂直于第k轴的平面上的已编码的至少一个邻居节点的占据子节点,确定所述当前节点的第一索引。
  18. 根据权利要求17所述的方法,其特征在于,所述基于当前节点在垂直于第k轴的平面上的已编码的至少一个邻居节点的占据子节点,确定所述当前节点的第一索引,包括:
    若所述至少一个邻居节点为非空,则基于所述至少一个邻居节点的占据子节点,确定所述第一索引。
  19. 根据权利要求18所述的方法,其特征在于,所述基于所述至少一个邻居节点的占据子节点,确定所述第一索引,包括:
    若所述至少一个邻居节点的占据子节点均分布在垂直于所述第k轴的第一平面上,则确定所述第一索引为第一数值;
    若所述至少一个邻居节点的占据子节点均分布在垂直于第k轴的第二平面上,则确定所述第一索引为第二数值;
    否则,确定所述第一索引为第三数值。
  20. 根据权利要求18或19所述的方法,其特征在于,所述方法还包括:
    基于所述至少一个邻居节点和所述当前节点之间的距离,确定所述当前节点的第二索引。
  21. 根据权利要求20所述的方法,其特征在于,所述基于所述至少一个邻居节点和所述当前节点之间的距离,确定所述当前节点的第二索引,包括:
    确定所述第二索引为第四数值。
  22. 根据权利要求21所述的方法,其特征在于,所述第四数值为0时表征:所述至少一个邻居节点和所述当前节点之间的距离小于或等于预设阈值,或所述至少一个邻居节点在所述当前节点的预设区域内。
  23. 根据权利要求20至22中任一项所述的方法,其特征在于,所述至少一个邻居节点和所述当前节点之间的距离为曼哈顿距离。
  24. 根据权利要求17至23中任一项所述的方法,其特征在于,所述方法还包括:
    若所述至少一个邻居节点为空,则基于所述当前节点在垂直于所述第k轴的平面上的前一个已编码节点的平面信息,确定第三索引。
  25. 根据权利要求24所述的方法,其特征在于,所述基于所述当前节点在垂直于所述第k轴的平面上的前一个已编码节点的平面信息,确定第三索引,包括:
    若所述平面信息为1,则确定所述第三索引为第一数值;
    若所述平面信息不为1,则确定所述第三索引为第二数值。
  26. 根据权利要求24或25所述的方法,其特征在于,所述方法还包括:
    基于所述前一个已编码节点和所述当前节点之间的距离,确定所述当前节点的第四索引。
  27. 根据权利要求26所述的方法,其特征在于,所述基于所述前一个已编码节点和所述当前节点之间的距离,确定所述当前节点的第四索引,包括:
    获取已启用平面模式的八叉树深度层数n;
    基于所述n,确定所述第四索引。
  28. 根据权利要求27所述的方法,其特征在于,所述基于所述n,确定所述第四索引,包括:
    若所述前一个已编码节点与所述当前节点之间的距离小于或等于2 n,则确定所述第四索引为第五数值;
    若所述前一个已编码节点与所述当前节点之间的距离大于2 n,则确定所述第四索引为第六数值。
  29. 根据权利要求26至28中任一项所述的方法,其特征在于,所述前一个已编码节点与所述当前节点之间的距离为曼哈顿距离。
  30. 根据权利要求19或25所述的方法,其特征在于,所述第一数值为1,所述第二数值为0,所述第三数值为-1。
  31. 根据权利要求17至30中任一项所述的方法,其特征在于,k的取值为0,1,2。
  32. 根据权利要求17至31中任一项所述的方法,其特征在于,所述方法还包括:
    基于所述当前节点的第一索引,对所述当前节点进行编码。
  33. 一种索引确定装置,其特征在于,包括:
    确定单元,用于基于当前节点在垂直于第k轴的平面上的已解码的至少一个邻居节点的占据子节点,确定所述当前节点的第一索引。
  34. 一种索引确定装置,其特征在于,包括:
    确定单元,用于基于当前节点在垂直于第k轴的平面上的已编码的至少一个邻居节点的占据子节点,确定所述当前节点的第一索引。
  35. 一种解码器,其特征在于,包括:
    处理器,适于执行计算机程序;
    计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被所述处理器执行时,实现如权利要求1至16中任一项所述的方法。
  36. 一种编码器,其特征在于,包括:
    处理器,适于执行计算机程序;
    计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被所述处理器执行时,实现如权利要求17至32中任一项所述的方法。
  37. 一种计算机可读存储介质,其特征在于,用于存储计算机程序,所述计算机程序使得计算机执行如权利要求1至16中任一项所述的方法或如权利要求17至32中任一项所述的方法。
  38. 一种计算机程序产品,包括计算机程序/指令,其特征在于,所述计算机程序/指令被处理器执行时实现如权利要求1至16中任一项所述的方法或如权利要求17至32中任一项所述的方法。
  39. 一种码流,其特征在于,所述码流如权利要求1至16中任一项所述的方法解码的码流或如权利要求17至32中任一项所述的方法生成的码流。
PCT/CN2022/087244 2022-04-16 2022-04-16 索引确定方法、装置、解码器以及编码器 WO2023197338A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/087244 WO2023197338A1 (zh) 2022-04-16 2022-04-16 索引确定方法、装置、解码器以及编码器

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/087244 WO2023197338A1 (zh) 2022-04-16 2022-04-16 索引确定方法、装置、解码器以及编码器

Publications (1)

Publication Number Publication Date
WO2023197338A1 true WO2023197338A1 (zh) 2023-10-19

Family

ID=88328684

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/087244 WO2023197338A1 (zh) 2022-04-16 2022-04-16 索引确定方法、装置、解码器以及编码器

Country Status (1)

Country Link
WO (1) WO2023197338A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111247802A (zh) * 2019-01-10 2020-06-05 深圳市大疆创新科技有限公司 用于三维数据点集处理的方法和设备
CN112565795A (zh) * 2020-12-03 2021-03-26 西安电子科技大学 一种点云几何信息编码及解码方法
CN113473127A (zh) * 2020-03-30 2021-10-01 鹏城实验室 一种点云几何编码方法、解码方法、编码设备及解码设备
US20210407143A1 (en) * 2020-06-22 2021-12-30 Qualcomm Incorporated Planar and azimuthal mode in geometric point cloud compression
WO2022035256A1 (ko) * 2020-08-12 2022-02-17 엘지전자 주식회사 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111247802A (zh) * 2019-01-10 2020-06-05 深圳市大疆创新科技有限公司 用于三维数据点集处理的方法和设备
CN113473127A (zh) * 2020-03-30 2021-10-01 鹏城实验室 一种点云几何编码方法、解码方法、编码设备及解码设备
US20210407143A1 (en) * 2020-06-22 2021-12-30 Qualcomm Incorporated Planar and azimuthal mode in geometric point cloud compression
WO2022035256A1 (ko) * 2020-08-12 2022-02-17 엘지전자 주식회사 포인트 클라우드 데이터 송신 장치, 포인트 클라우드 데이터 송신 방법, 포인트 클라우드 데이터 수신 장치 및 포인트 클라우드 데이터 수신 방법
CN112565795A (zh) * 2020-12-03 2021-03-26 西安电子科技大学 一种点云几何信息编码及解码方法

Similar Documents

Publication Publication Date Title
WO2022121650A1 (zh) 点云属性的预测方法、编码器、解码器及存储介质
WO2022067775A1 (zh) 点云的编码、解码方法、编码器、解码器以及编解码系统
WO2023197338A1 (zh) 索引确定方法、装置、解码器以及编码器
WO2023197337A1 (zh) 索引确定方法、装置、解码器以及编码器
WO2023015530A1 (zh) 点云编解码方法、编码器、解码器及计算机可读存储介质
WO2022067776A1 (zh) 点云的解码、编码方法、解码器、编码器和编解码系统
WO2022141461A1 (zh) 点云编解码方法、编码器、解码器以及计算机存储介质
WO2023240455A1 (zh) 点云编码方法、编码装置、编码设备以及存储介质
WO2023023918A1 (zh) 解码方法、编码方法、解码器以及编码器
WO2023159428A1 (zh) 编码方法、编码器以及存储介质
WO2024077548A1 (zh) 点云解码方法、点云编码方法、解码器和编码器
WO2022257155A1 (zh) 解码方法、编码方法、解码器、编码器以及编解码设备
WO2023240660A1 (zh) 解码方法、编码方法、解码器以及编码器
WO2023097694A1 (zh) 解码方法、编码方法、解码器以及编码器
WO2024065272A1 (zh) 点云编解码方法、装置、设备及存储介质
TWI806481B (zh) 點雲中鄰居點的選擇方法及裝置、編碼設備、解碼設備及電腦設備
WO2024103304A1 (zh) 点云编解码方法、编码器、解码器、码流及存储介质
WO2024065269A1 (zh) 点云编解码方法、装置、设备及存储介质
WO2024082152A1 (zh) 编解码方法及装置、编解码器、码流、设备、存储介质
WO2022133752A1 (zh) 点云的编码方法、解码方法、编码器以及解码器
WO2023024842A1 (zh) 点云编解码方法、装置、设备及存储介质
WO2023123284A1 (zh) 一种解码方法、编码方法、解码器、编码器及存储介质
WO2023240662A1 (zh) 编解码方法、编码器、解码器以及存储介质
WO2022170511A1 (zh) 点云解码方法、解码器及计算机存储介质
WO2024065270A1 (zh) 点云编解码方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22936983

Country of ref document: EP

Kind code of ref document: A1