CN118696538A

CN118696538A - Method and apparatus for prediction, encoder, decoder, and codec system

Info

Publication number: CN118696538A
Application number: CN202280091900.4A
Authority: CN
Inventors: 徐异凌; 侯礼志; 高粼遥; 魏红莲
Original assignee: Shanghai Jiaotong University; Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Shanghai Jiaotong University; Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2022-02-15
Filing date: 2022-02-15
Publication date: 2024-09-24
Also published as: WO2023155045A1

Abstract

A method and apparatus for prediction, an encoder, a decoder, and a codec system are provided. The method of prediction comprises: obtaining a hierarchical structure of a point cloud, wherein the hierarchical structure comprises a parent block and at least one child block of the parent block; determining a reference block of a current sub-block in the hierarchical structure, wherein the reference block comprises at least one parent block associated with the same level as a parent block of the current sub-block and/or at least one sub-block associated with the same level as the current sub-block; inputting the information of the reference block and/or the information of the current sub-block into a neural network model to obtain a predicted attribute value of the current sub-block, wherein training data of the neural network model comprises the information of the reference block of the sub-block and a real attribute value of the sub-block. The embodiment of the application can flexibly select the reference block for predicting the current sub-block, is based on the strong expression capacity of the neural network model, can be beneficial to improving the accuracy and stability of the average attribute value of the current sub-block, and further improves the coding and decoding efficiency.

Description

Method and apparatus for prediction, encoder, decoder, and codec system

Technical Field

The embodiment of the application relates to the technical field of point cloud coding and decoding, and more particularly relates to a prediction method and device, an encoder, a decoder and a coding and decoding system.

Background

With the continuous development of point cloud technology, compression coding of point cloud data becomes an important research problem. Currently, standards for point cloud coding, such as point cloud compression codec (Geometry-based Point Cloud Compression, G-PCC) based on geometric information, are established by the national digital audio video codec standards working group (Audio Video coding Standard Workgroup ofChina, AVS) and the moving picture experts group (MPEG, moving Picture Experts Group) in the international organization for standardization. How to further improve the performance of the point cloud codec is a problem to be solved.

Disclosure of Invention

Provided are a prediction method and apparatus, an encoder, a decoder, and a codec system, which can help to improve accuracy and stability of an average attribute value of a current sub-block, and further can improve codec efficiency.

In a first aspect, a method of prediction is provided, comprising:

obtaining a hierarchical structure of a point cloud, wherein the hierarchical structure comprises a parent block and at least one child block of the parent block;

Determining a reference block of a current sub-block in the hierarchical structure, wherein the reference block comprises at least one parent block associated with the same level as a parent block of the current sub-block and/or at least one sub-block associated with the same level as the current sub-block;

Inputting the information of the reference block and/or the information of the current sub-block into a neural network model to obtain a predicted attribute value of the current sub-block, wherein training data of the neural network model comprises the information of the reference block of the sub-block and a real attribute value of the sub-block.

In a second aspect, there is provided a coding method comprising:

According to the method of the first aspect, a predicted attribute value of the current sub-block is obtained;

Determining a prediction transformation coefficient of the current sub-block according to the prediction attribute value and the point number in the current sub-block;

determining a real transformation coefficient of the current sub-block according to the real attribute value of the current sub-block and the point number in the current sub-block;

Determining a difference value between the predicted transform coefficient and the true transform coefficient;

And writing the difference value into a code stream.

In a third aspect, a decoding method is provided, comprising:

Obtaining the difference value between the predicted transformation coefficient and the real transformation coefficient of the current sub-block according to the code stream;

Determining a real transformation coefficient of the current sub-block according to the prediction attribute value and the difference value;

And determining the real attribute value of the current sub-block according to the real transformation coefficient and the point number in the current sub-block.

In a fourth aspect, a method for processing a point cloud is provided, including:

up-sampling the point cloud to obtain the position information of the newly added point;

according to the method of the first aspect, the attribute information of the newly added point is obtained.

In a fifth aspect, there is provided an apparatus for prediction, comprising:

An obtaining unit, configured to obtain a hierarchical structure of a point cloud, where the hierarchical structure includes a parent block and at least one child block of the parent block;

a processing unit, configured to determine a reference block of a current sub-block in the hierarchical structure, where the reference block includes at least one parent block associated with a parent block of the current sub-block at a same level, and/or at least one sub-block associated with the current sub-block at the same level;

And the neural network model is used for inputting the information of the reference block and/or the information of the current sub-block and obtaining the predicted attribute value of the current sub-block, wherein the training data of the neural network model comprises the information of the reference block of the sub-block and the real attribute value of the sub-block.

In a sixth aspect, there is provided an encoder comprising:

An obtaining unit, configured to obtain a predicted attribute value of a current sub-block according to the method described in the first aspect;

The processing unit is used for determining the prediction transformation coefficient of the current subblock according to the prediction attribute value and the point number in the current subblock;

The processing unit is further used for determining the real transformation coefficient of the current sub-block according to the real attribute value of the current sub-block and the point number in the current sub-block;

The processing unit is further configured to determine a difference between the predicted transform coefficient and the true transform coefficient;

And the encoding unit is used for writing the difference value into the code stream.

In a seventh aspect, there is provided a decoder comprising:

The acquisition unit is used for acquiring the difference value between the predicted transformation coefficient and the real transformation coefficient of the current sub-block according to the code stream;

the obtaining unit is further configured to obtain a predicted attribute value of the current sub-block according to the method of the first aspect;

The processing unit is used for determining the real transformation coefficient of the current sub-block according to the prediction attribute value and the difference value;

the processing unit is further configured to determine a true attribute value of the current sub-block according to the true transform coefficient and the number of points in the current sub-block.

An eighth aspect provides a codec system, including the encoder of the fifth aspect and the decoder of the sixth aspect.

In a ninth aspect, an apparatus for point cloud processing is provided, including:

The up-sampling unit is used for up-sampling the point cloud to obtain the position information of the newly added point;

An obtaining unit, configured to obtain attribute information of the new point according to the method of the first aspect.

In a tenth aspect, an electronic device is provided that includes a processor and a memory. The memory is for storing a computer program and the processor is for calling and running the computer program stored in the memory for performing the method of any of the above first to fourth aspects.

In an eleventh aspect, there is provided a chip comprising: a processor for calling and running a computer program from a memory, causing a device on which the chip is mounted to perform the method of any one of the first to fourth aspects as described above.

In a twelfth aspect, there is provided a computer-readable storage medium storing a computer program that causes a computer to execute the method of any one of the first to fourth aspects.

In a thirteenth aspect, there is provided a computer program product comprising computer program instructions for causing a computer to perform the method of any one of the first to fourth aspects above.

In a fourteenth aspect, there is provided a computer program which, when run on a computer, causes the computer to perform the method of any one of the first to fourth aspects above.

Through the technical scheme, the reference block for predicting the current sub-block can be flexibly selected, and the accuracy and the stability of the average attribute value of the current sub-block can be improved based on the strong expression capacity of the neural network model. According to the prediction attribute value of the current sub-block obtained by the intra-frame prediction method, the encoding and decoding efficiency can be further improved under the condition that the accuracy and the stability of the average attribute value of the current sub-block are improved.

Drawings

FIG. 1 is a schematic diagram of an octree structure according to embodiments of the present application;

FIG. 2A is a schematic diagram of octree partitioning in accordance with embodiments of the present application;

FIG. 2B is another schematic diagram of octree partitioning in accordance with embodiments of the present application;

FIG. 3 is a schematic diagram of an encoder according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a decoder according to an embodiment of the present application;

FIG. 5 is a schematic flow chart diagram of a method of prediction provided by an embodiment of the present application;

FIG. 6 shows a specific example of a method of prediction according to an embodiment of the present application;

FIG. 7 is a schematic flow chart of an encoding method provided by an embodiment of the present application;

FIG. 8 is a schematic flow chart of a decoding method provided by an embodiment of the present application;

FIG. 9 is a schematic block diagram of an apparatus for prediction of an embodiment of the present application;

FIG. 10 is a schematic block diagram of an encoder provided by an embodiment of the present application;

FIG. 11 is a schematic block diagram of a decoder provided by an embodiment of the present application;

Fig. 12 is a schematic block diagram of an electronic device provided by an embodiment of the present application.

Detailed Description

The following description of the technical solutions according to the embodiments of the present application will be given with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art to which the application pertains without inventive faculty, are intended to fall within the scope of the application.

The method is suitable for the technical field of point cloud data compression. First, related terms related to the embodiments of the present application will be described.

1) A point cloud (point cloud), which is a three-dimensional (3D) representation of the surface of an object, may refer to a collection of massive points in a three-dimensional space. Each point has associated attributes such as color, texture characteristics, etc. For example, an object or scene may be reconstructed as a combination of points using a point cloud. The points in the point cloud may include geometric information of the points and attribute information of the points. As an example, the geometrical information of the point may be three-dimensional coordinate information of the point, which may be represented by a cartesian coordinate system, or (x, y, z) in an arbitrary coordinate system, for example. The geometric information of a point may also be referred to as position information of the point. By way of example, the points may have associated attribute information such as color, e.g., red-Green-Blue (RGB) or (luminence-Chrominance, YUV) three component values, etc., and other attribute information may include transparency, reflectance normal vectors, etc., without limitation.

The point cloud may be static or dynamic. For example, a detailed scan or mapping of an object or terrain may be static point cloud data and an environmental scan for machine vision purposes may be dynamic point cloud data. Because the dynamic point cloud data varies over time, the dynamic point cloud may be a time ordered sequence of point clouds.

The point cloud data may be applied to various fields, such as virtual/augmented reality, machine vision, geographic information systems, medical fields, and the like. The point cloud on the surface of the object can be acquired through acquisition equipment such as photoelectric radars, laser scanners, multi-view cameras and the like. The number of points in the point cloud is large, for example, several billions, and thus the amount of raw data of the point cloud is particularly large, and thus, an effective compression technique, i.e., encoding and decoding process, is required to reduce the amount of data of the point cloud.

2) The tree structure of the point cloud can represent the division result of the geometric information of the point cloud in the process of encoding or decoding the point cloud. In the point cloud division process based on the tree structure, the volume space of the point cloud is recursively divided into sub-volumes, the corresponding volume space corresponds to the root node of the tree structure, and each sub-volume corresponds to a node of the tree structure. For example, it may be determined whether to divide the sub-volume further based on whether points are included in the sub-volume. Each node may have a placeholder bit indicating whether the child corresponding to the node contains a point. Alternatively, the placeholder bits may be arithmetically encoded to obtain a binary code stream.

As an example, the tree structure may be an octree (octree). In the octree structure of the point cloud, the volume space or sub-volume is a cube, and each segmentation further yields eight sub-volumes/sub-cubes. Fig. 1 shows a schematic diagram of an octree structure. As shown in fig. 1, block 10 may be a root node, which may correspond to a volume of a complete point cloud, such as a cube. The volume space corresponding to the block 10 may be divided into 8 sub-volumes, each sub-volume corresponding to one of the blocks in the dashed box 20. The block 10 is a parent block (may also be referred to as a parent node) of the blocks in the dashed box 20, and the corresponding blocks in the dashed box 20 are child blocks (may also be referred to as child nodes) of the block 10, which may be referred to as sibling blocks. As shown in fig. 1, a sub-block of the block 10 (i.e., a block in the dashed box 20) may include a block containing a dot, where the occupation bit is 1, which indicates that the sub-block corresponding to the block contains the dot. The sub-blocks of the block 10 may also include a block that does not contain a dot, whose occupancy bit is 0, indicating that the sub-body corresponding to the block does not contain a dot, i.e., the sub-body is empty. A parent block may be represented by the occupancy bits of its child blocks, e.g., block 10 may be represented in binary form as "00001001", indicating that the occupancy bits of the child blocks 21 and 22 are 1.

Illustratively, the blocks with a 1-bit occupancy in the dashed box 20, e.g., blocks 21 and 22, may be further divided into 8 sub-blocks. Accordingly, blocks 21 and 22 are parent blocks of 8 child-corresponding nodes of the respective corresponding child further divided, the further divided 8 child being child blocks, e.g., each block in dashed box 30. Similarly, block 21 may be represented in binary form as "01001000" indicating that the occupancy bits of sub-blocks 31 and 32 are 1; block 22 may be represented in binary form as "001000000" indicating that the occupancy bit of sub-block 33 is 1. Alternatively, the placeholder bits may be arithmetically encoded to obtain a binary code stream.

In some alternative embodiments, the block 10 may also be a sub-body corresponding block, i.e. the octree structure in fig. 1 may be part of an octree structure corresponding to a complete point cloud, which the present application is not limited to.

In some alternative embodiments, blocks of the octree structure having the same depth may constitute one layer. The octree structure may include at least two layers, each layer may include at least one block, and each block may correspond to one child. The octree structure is a hierarchical structure. In other embodiments, the tree structure of the point cloud may be at least one of a quadtree structure, a binary tree structure, and a hierarchical structure such as a non-uniform space division structure, which is not limited.

As an example, referring to fig. 1, when the block 10 is a root node, the depth value of each block in the dashed box 20 is 1, belonging to one layer. Illustratively, the layer corresponding to the dashed box 20 may be layer 0 of the octree structure. Similarly, the depth value of each block in the dashed box 30 is 2, belonging to one layer. Illustratively, the layer corresponding to the dashed box 30 may be layer 1 of the octree structure. When the sub-volume corresponding to the block in the dashed box 30 is further segmented, the octree structure may have blocks of greater depth, corresponding to more layers. Similarly, as the depth value of the block increases, the number of layers of each layer increases in turn.

FIG. 2A shows a schematic of the spatial locations of 8 sub-blocks (i.e., sub-blocks 0-7) generated by octree partitioning relative to their parent blocks (i.e., current blocks). When the current node encodes an 8-bit space occupation code, neighbor reference information of the same layer, such as occupation information of neighbor sub-blocks including left, front and lower directions (for example, negative directions of x, y and z axes in a coordinate system) can be obtained. For example, for sub-blocks of different locations of the current block, at least one neighbor of 3 co-planes, 3 co-lines, and 1 co-vertex of the same layer may be used as the reference block. Fig. 2B shows an example of co-planar neighbors and co-linear neighbors of the same layer block, in order from left to right, upper right back co-planar neighbors, lower left front co-planar neighbors, upper right back co-linear neighbors, and lower left front co-linear neighbors.

Next, a codec frame applicable to the point cloud compression according to an embodiment of the present application will be described with reference to fig. 3 and 4.

Fig. 3 is a schematic block diagram of an encoder 100 provided by an embodiment of the present application. Illustratively, the encoder 100 may be a G-PCC encoder. The input to the encoder 100 includes geometric information and attribute information of the point cloud. For example, the input point cloud may be sliced (slice) into slices, each slice obtained being encoded independently. In one slice, the geometric information and attribute information of the point cloud are encoded separately. As shown in fig. 3, the encoder 100 may coordinate-convert the geometric information such that the point clouds are all contained in one bounding box (boundingbox). The bounding box may be referred to as a volume space to which the point cloud corresponds. A voxelization process may then be performed, including for example, quantifying and removing repeat points. Wherein quantization is used to scale the results of the coordinate conversion. Since the quantization rounding makes the geometric information of a part of points identical, it is possible to decide whether to remove the repeated points according to the parameters. Next, the bounding box may be octree partitioned. Depending on the octree division level depth, the encoding of the geometric information can be classified into octree-based encoding and triangular patch set (trisoup) -based encoding.

In an octree-based encoding process, the bounding box may be octally divided into 8 subcubes and the occupancy bits of the subcubes recorded. Wherein a placeholder bit of 1 for a subcube indicates that the subcube is non-empty, in other words that the subcube is occupied by a point in the point cloud, i.e. that the subcube contains a point in the point cloud. A placeholder bit of 0 for a subcube indicates that the subcube is empty, in other words that the subcube is not occupied by a point in the point cloud, i.e. that the subcube does not contain a point in the point cloud. Further, the non-empty subcubes continue to be octant. For example, partitioning may be stopped when the resulting leaf node of the partitioning is a 1x1x1 unit cube.

For example, a subcube may be referred to as a child, i.e., a division of bounding box or volume space. In the octree, the bounding box may be referred to as a root node, and each subcube may be referred to as a child node of the root node, i.e., a sub-block.

In the octree partitioning process, intra prediction can be performed on the occupancy bits using spatial correlation of blocks and surrounding blocks. Context modeling may then be performed to obtain context information for the block, and arithmetic coding (e.g., adaptive binary arithmetic coding) may be performed based on the context information to generate a binary code stream, i.e., a geometric code stream.

In trisoup-based encoding, octree partitioning is also performed. Unlike the octree-based encoding process, in the trisoup-based encoding process, the point cloud does not need to be divided stepwise into unit cubes having a side length of 1x1x1, but is divided into blocks (blocks) having a side length of W to stop the division. Based on the surface formed by the distribution of point clouds in each block, at most twelve intersections (vertex) of the surface with twelve sides of the block are obtained. And (3) encoding the vertex coordinates of each block in turn to generate a binary code stream, namely a geometric code stream.

And after the G-PCC encoder finishes the encoding of the geometric information, reconstructing the geometric information, and encoding the attribute information of the point cloud by using the reconstructed geometric information. Illustratively, the attribute information encoding of the point cloud is mainly encoding color information of points in the point cloud. First, the encoder may perform color conversion on the color information of the dots. For example, when color information at points in the input point cloud is represented using an RGB color space, the encoder may convert the color information from the RGB color space to a YUV color space. The point cloud is then recoloured with the reconstructed geometric information such that the uncoded attribute information corresponds to the reconstructed geometric information. Then, the color information is transformed. For example, there are two transformation methods, one is a distance-based lifting transformation depending on Level of Detail (LOD) division, and the other is a direct Region-adaptive hierarchical transformation (Region ADAPTIVE HIERARCHAL Transform, RAHT), which Transform color information from a spatial domain to a frequency domain to obtain high-frequency coefficients and low-frequency coefficients, and finally quantize and arithmetically encode the coefficients to generate a binary code stream, i.e., an attribute code stream.

Optionally, in the encoding process of the attribute information, the morton code may be used to sort the point clouds, the nearest neighbor of a point to be encoded (may also be referred to as a point to be predicted) is searched by using a geometric space relation, interpolation prediction is performed on the point to be encoded by using the reconstructed attribute value of the found adjacent point to obtain a predicted attribute value, then differential operation may be performed on the real attribute value and the predicted attribute value to obtain a predicted residual, and finally quantization and arithmetic encoding are performed on the predicted residual to obtain a binary code stream.

Fig. 4 is a schematic block diagram of a decoder 200 provided by an embodiment of the present application. The input to the decoder 200 includes a geometric code 1 stream and an attribute code stream of the point cloud, which are decoded separately. As shown in fig. 4, the decoder 200 performs processes such as arithmetic decoding, context modeling, octree division, inverse quantization, and inverse coordinate conversion on an input geometric code stream to obtain geometric information, and performs arithmetic decoding, inverse quantization, inverse transformation, attribute reconstruction, and inverse color conversion on an input attribute code stream to obtain attribute information. Specifically, the decoding process is reciprocal to the encoding process.

A RATH intra prediction method, for a series of 1x1 unit cubes obtained by octree geometry coding, arranged from small to large in the order of the morton code size. Defining A _n as attribute values of n sub-blocks, and T _n as transform coefficients of n sub-blocks. Initially, a _n＝T _n.

The transition is stepwise upward from the smallest unit cube. Assume that two adjacent sub-blocks contain points with attribute values of respectivelyAndTheir transform coefficients are respectivelyAndDefining the attribute value of the point contained in the parent block of the higher level containing these two sub-blocks asIts transform coefficient isThen there are:

Wherein,

In other words, the transform coefficients of the parent block in RAHT transforms are directly inherited from the transform coefficients of the sub-block except for the first transform coefficient (called DC coefficient or direct current coefficient) of its sub-block transform coefficients. For the DC coefficients, the DC coefficients of the parent block are transformed from the DC coefficients of the sub-blocks, and the transformation matrix is related to the number of points in each sub-block, i.e.:

The transformed coefficients are finally quantized and entropy-encoded through stepwise transformation, and the attribute value of each point can be reconstructed at the decoding end according to the transformed coefficients obtained by decoding.

Another method of RATH intra prediction, RAHT is adjusted to progressively transform down from the largest parent block, initially defined as the dc coefficient of the largest parent blockW is the number of points in the entire parent block, a _i is the attribute value of each point, and the AC coefficient is 0. The DC coefficients and AC coefficients for the sub-blocks are calculated by a transform matrix, in the above example, namely:

the resulting DC coefficients are then used for the next level of conversion and the AC coefficients are used for direct encoding of the write stream. In the step-by-step conversion process from the parent block to the child block, if the attribute value and the number of points of each child block are known, the conversion coefficient of the child block can be calculated, a prediction method is introduced to predict the attribute value of the child block, for each child block of the parent block, the average attribute value of 19 parent blocks in total, including the parent block of the child block, 6 peer parent blocks coplanar with the parent block, 12 peer parent blocks collinear with the parent block, and the distance between the child block and 19 parent blocks is calculated, and the average attribute value of the predicted child block is:

Where a _up is the average attribute value of the sub-blocks to be predicted, k is the number of non-empty parent blocks in 19 parent blocks, d _k is the distance between the current sub-block and the non-empty parent block, and a _k is the average attribute value of the non-empty parent block.

Obtaining an average prediction attribute value of each sub-block through prediction, and performing RAHT transformation on the prediction attribute value once to obtain a series of prediction transformation coefficients, namely:

calculating a real transformation coefficient obtained by transforming a real attribute value, namely:

Calculating the difference between the true transform coefficient and the predicted transform coefficient, namely:

The residual of the AC coefficients is directly quantized and entropy encoded and the DC coefficients are continued for the next level of transformation. Thus, the final quantization and entropy coding is the difference between the true transform coefficient and the predicted transform coefficient.

In the process of predicting the attribute value of the sub-block by the parent block, only the parent block is used, the attribute values of the 6 identical-level parent blocks coplanar with the parent block and the 12 identical-level parent blocks collinear with the parent block are subjected to simple linear weighting to predict the average attribute value of the sub-block, so that the accuracy of the average attribute value of the sub-block is low, the stability is poor, and the coding and decoding efficiency is influenced.

In view of this, an embodiment of the present application provides a method for predicting, first, determining a reference block of a current sub-block in a hierarchical structure of a point cloud, and then inputting information of the reference block and/or information of the current sub-block into a neural network model to obtain a predicted attribute value of the current sub-block, where the reference block includes at least one parent block associated with a same level of a parent block of the current sub-block and/or at least one sub-block associated with the same level of the current sub-block, and training data of the neural network model includes information of the reference block of the sub-block and a true attribute value of the sub-block. The embodiment of the application can flexibly select the reference block for predicting the current sub-block and is beneficial to improving the accuracy and stability of the average attribute value of the current sub-block based on the strong expression capability of the neural network model.

The embodiment of the application also provides a coding method which can be used for coding according to the prediction attribute value of the current sub-block obtained by the prediction method to obtain the code stream of the point cloud. Specifically, the prediction transform coefficient of the current sub-block may be determined according to the prediction attribute value of the current sub-block and the number of points in the current sub-block, and the true transform coefficient of the current sub-block may be determined according to the true attribute value of the current sub-block and the number of points in the current sub-block, and then the difference value between the prediction transform coefficient and the true transform coefficient may be determined, and the difference value may be written into the code stream.

The embodiment of the application also provides a decoding method which can decode according to the predicted attribute value of the current sub-block obtained by the prediction method to obtain the attribute information of the point cloud. Specifically, a difference value between a predicted transform coefficient and a true transform coefficient of a current sub-block may be obtained according to a code stream, and the true transform coefficient of the current sub-block may be determined according to the predicted attribute value and the difference value of the current sub-block, and the true attribute value of the current sub-block may be determined according to the true transform coefficient and the number of points in the current sub-block.

In the embodiment of the application, the encoding and decoding are carried out according to the predicted attribute value of the current sub-block obtained by the prediction method, and the encoding and decoding efficiency can be further improved under the condition that the accuracy and the stability of the average attribute value of the current sub-block are improved.

The embodiment of the application also provides a method for processing the point cloud, which can be used for up-sampling the point cloud to obtain the position information of the newly added point, and acquiring the attribute information of the newly added point according to the prediction method, so that the density of the point cloud can be improved.

The technical scheme provided by the embodiment of the application is described in detail below with reference to the accompanying drawings.

Fig. 5 shows a schematic flow chart of a method 300 of prediction provided by an embodiment of the present application. The predictive method 300 may be applied to the encoder 100 shown in fig. 3, or the decoder 200 shown in fig. 4, to implement compression codec of point cloud. As shown in fig. 5, method 300 includes steps 310 through 330.

310, A hierarchical structure of a point cloud is obtained, wherein the hierarchical structure includes a parent block and at least one child block of the parent block.

Illustratively, the hierarchical structure may include at least one of an octree structure, a quadtree structure, a binary tree structure, and a non-uniform spatial division structure. For example, the octree structure may be described in fig. 1 and 2, and will not be described here.

320, Determining a reference block of a current sub-block in the hierarchical structure, wherein the reference block includes at least one parent block associated with a parent block of the current sub-block at a same level (i.e., peer) and/or at least one sub-block associated with the current sub-block at a same level. The reference block is used to predict a prediction attribute value of the current sub-block.

The current sub-block, i.e. the block whose prediction attribute value is currently to be predicted, may be, for example, a certain sub-cube in an octree structure, without limitation. The current sub-block may also be referred to as a sub-block to be predicted, without limitation.

Illustratively, the at least one parent block associated with the same hierarchy as the parent block of the current child block may include at least one of:

The method comprises the steps of enabling a parent block of a current child block to be coplanar with the parent block of the current child block in the same level, enabling the parent block to be collinear with the parent block of the current child block in the same level, enabling the parent block to be co-point with the parent block of the current child block in the same level, enabling the parent block to be distant from two parent blocks in the same level as the parent block of the current child block in the x positive direction or the negative direction, enabling the parent block to be distant from two parent blocks in the same level as the parent block of the current child block in the y positive direction or the negative direction, and enabling the parent block to be distant from two parent blocks in the z positive direction or the negative direction in the same level as the parent block of the current child block.

Here, at least one parent block associated with the same hierarchy as the parent block of the current sub-block may be referred to as a range of reference parent blocks for prediction.

For example, the at least one sub-block associated with the same hierarchy of the current sub-block may include at least one of:

The method comprises the steps of a sub-block coplanar with the current sub-block at the same level, a sub-block collinear with the current sub-block at the same level, a sub-block at the same level as the current sub-block at the same point, a sub-block at the same level as the current sub-block at a distance from two sub-blocks in the x positive direction or the negative direction axis, a sub-block at the same level as the current sub-block at a distance from two sub-blocks in the y positive direction or the negative direction axis, and a sub-block at the same level as the current sub-block at a distance from two sub-blocks in the z positive direction or the negative direction axis.

Here, at least one sub-block associated with the same level of the current sub-block may be referred to as a range of reference sub-blocks for prediction. The sum of the range of the reference parent block for prediction and the range of the reference child block for prediction is the range of the reference block for prediction.

It should be noted that, when the encoding and decoding are performed in a hierarchical encoding and decoding manner, the above information of each reference parent block for predicting the prediction attribute value thereof may be obtained at the decoding end in total, whereas only the information of a part of the sub-blocks in the above reference sub-block for predicting the prediction attribute value thereof may be obtained, and the sub-blocks that can be obtained may be determined according to, for example, the traversal order adopted in the hierarchical structure at the time of decoding, which is not limited by the present application. As an alternative implementation, the reference sub-block may be selected from among these available sub-blocks at decoding, as the application is not limited in this regard.

In some alternative embodiments, the attribute value of the reference block is 0 in the case where the reference block does not contain a point. That is, a parent block within the range of the reference parent block for prediction, whose attribute value is regarded as 0 in the case where no point is included; and/or a sub-block within the range of the reference sub-block for prediction, whose attribute value is considered to be 0 without including any point. The reference block that does not contain a dot may be referred to as a reference block with a space occupation.

And 330, inputting the information of the reference block and/or the information of the current sub-block into a neural network model to obtain a predicted attribute value of the current sub-block, wherein training data of the neural network model comprises the information of the reference block of the sub-block and the real attribute value of the sub-block. The predicted attribute value may also be referred to as an average attribute value, as the application is not limited in this regard.

For example, the neural network model may include a multi-layer perceptron and/or a transducer.

By training the neural network model according to training data, namely information of a reference block of the sub-block and real attribute values of the sub-block, the prediction attribute values of the current sub-block obtained by the trained neural network model can be enabled to be as close as possible to the real attribute values of the current sub-block, and accuracy and stability of average attribute values of the current sub-block can be improved.

In some embodiments, when the prediction method provided by the embodiment of the present application is applied to the encoding and decoding of the point cloud, the training target of the neural network model may be adjusted in consideration of the stability of the data, for example, the stability of the prediction may be considered.

In some embodiments, the output of the neural network model may be a predicted attribute value of the current sub-block.

In other embodiments, an initial predicted value may also be determined, and the output of the neural network model may be the difference between the initial predicted value and the actual attribute value. Illustratively, the initial predicted value may be calculated from a parent block of the current sub-block and at least one parent block associated with the same hierarchy of parent blocks, which the present application is not limited to.

Therefore, in the embodiment of the present application, the prediction attribute value of the current sub-block can be obtained by determining the reference block of the current sub-block in the hierarchical structure of the point cloud, and then inputting the information of the reference block and/or the information of the current sub-block into the neural network model, wherein the reference block comprises at least one parent block associated with the same level as the parent block of the current sub-block, and/or at least one sub-block associated with the same level as the current sub-block, and the training data of the neural network model comprises the information of the reference block of the sub-block and the real attribute value of the sub-block. The embodiment of the application can flexibly select the reference block for predicting the current sub-block and is beneficial to improving the accuracy and stability of the average attribute value of the current sub-block based on the strong expression capability of the neural network model.

In some alternative embodiments, two parameters m and k may be defined, where m.ltoreq.k, m, k each being a positive integer. It may be determined whether to predict the prediction attribute value of the current sub-block or the manner in which the prediction attribute value of the current sub-block is predicted, based on the parameters m and k. By way of example, the following 3 cases may exist.

Case 1

When the number of non-empty reference blocks in a reference block of a current sub-block (i.e., within a range of reference blocks used for prediction) is less than m, the prediction attribute value of the current sub-block is not predicted. Wherein the non-empty reference block contains at least one point.

Case 2

When the number of non-empty reference blocks in the reference blocks of the current sub-block (i.e. the range of the reference blocks used for prediction) is larger than k, k non-empty reference blocks in the non-empty reference blocks are selected, and the information of the k non-empty reference blocks and/or the information of the current sub-block are input into the neural network model to obtain the prediction attribute value of the current sub-block.

As a possible implementation manner, the k non-empty reference blocks may be determined according to distance information of the non-empty reference block of the reference block from the current sub-block. By way of example, the distance information may include, without limitation, euclidean distance and/or Manhattan distance. K non-empty parent blocks and/or non-empty sub-blocks closest to the current sub-block may be selected as reference blocks for prediction, e.g., based on euclidean distance and/or manhattan distance between non-empty reference blocks and the current sub-block in the reference blocks.

Case 3

When the number of non-empty reference blocks in the reference block of the current sub-block (i.e., within the range of the reference block for prediction) is greater than or equal to m and less than k, the reference blocks with empty space in the reference block may be interpolated to obtain k non-empty reference blocks, and then information of the k non-empty reference blocks and/or information of the current sub-block may be input into the neural network model to obtain a prediction attribute value of the current sub-block.

As a possible implementation manner, at least one non-empty block of the same level of the first reference block with empty space is obtained, and then the first reference block is interpolated according to the non-empty block, so as to obtain the k non-empty reference blocks.

For example, reference blocks within the range of reference blocks for prediction may be successively interpolated at a certain priority until k non-empty reference blocks are reached. For a block to be interpolated, such as the first reference block described above, interpolation may be performed with reference to one or more non-empty reference blocks that are coplanar, collinear, or co-point with the same level.

As another possible implementation manner, a weighted average of the attribute values of the n1 non-empty reference blocks may be determined according to the attribute values of the n1 non-empty reference blocks in the reference block and the distance between the n1 non-empty reference blocks and the second reference block with the space occupied by the second reference block, that is:

Wherein n1 is a positive integer, a _i is an attribute value of an i-th non-empty reference block in n1 non-empty reference blocks, d _i is a distance between the i-th reference block in the n1 non-empty reference blocks and a current second reference block, m is less than or equal to n1 and less than or equal to k, and then interpolation is carried out on the second reference block according to the weighted average value so as to obtain the k non-empty reference blocks.

For example, reference blocks within the range of reference blocks for prediction may be successively interpolated according to a certain priority from n1 non-empty reference blocks that have been obtained until k non-empty reference blocks are reached. For a block to be interpolated, for example, the second reference block described above, the attribute values of the n1 already obtained non-empty reference blocks, and the distances of the n non-empty reference blocks from the reference block to be interpolated currently, may be referred to, and a weighted average of the attribute values is calculated as a final interpolation result.

As another possible implementation manner, a first average value of attribute values of n2 non-empty reference blocks in the reference block may be determined, and a second average value of distances between the n2 non-empty reference blocks and a third reference block with a space occupation may be determined, and then the third reference block is interpolated according to the first average value and the second average value to obtain the k non-empty reference blocks, where the attribute values of the third reference block are the first average value, and the distances between the third reference block and a current sub-block are the second average value.

For example, an average value of the properties of n2 non-empty reference blocks may be calculated from n2 non-empty reference blocks that have been obtained, and an average distance of the n2 non-empty reference blocks from the current sub-block, and distances of the remaining (k-n 2) reference blocks (i.e., one example of the third reference block) from the current sub-block are all the average distances, and the property values of the (k-n 2) reference blocks are all the average property values.

In some optional embodiments, at least one of an attribute value of the reference block, distance information between the reference block and the current sub-block, and auxiliary information of the current sub-block may be input into the neural network model, to obtain a predicted attribute value of the current sub-block.

Illustratively, the auxiliary information includes at least one of three-dimensional space coordinates of the current sub-block, three-dimensional space coordinates of a parent block of the current sub-block, a relative position of the current sub-block in the parent block thereof, a transformation level at which the current sub-block is located, size information of the current sub-block, a size of the parent block of the current sub-block, and a spatial distribution of sub-blocks in the parent block of the current sub-block.

In some embodiments, the auxiliary information may be null, as the application is not limited in this regard.

In some optional embodiments, it may further be determined that at least one of the number of non-empty blocks in the reference block, a ratio of the non-empty blocks to the total number of reference blocks, and a difference of properties of the non-empty blocks in the reference block satisfies a preset condition. That is, the prediction method of the above method 300 is used to predict the predicted attribute value of the current sub-block only if at least one of the number of non-empty blocks in the above reference block, the ratio of the non-empty blocks to the total reference block, and the attribute difference of the non-empty blocks in the reference block satisfies a preset condition. Alternatively, when at least one of the number of non-empty blocks in the reference block, the ratio of the number of non-empty blocks to the total reference block, and the attribute difference of the non-empty blocks in the reference block does not satisfy the preset condition, the prediction attribute value of the current sub-block may be determined in other prediction manners, for example, in the manner of the prior art.

Alternatively, a dynamic switching identifier may be defined, and when the prediction attribute value of the current sub-block is required to be determined according to the prediction method provided by the embodiment of the present application, the dynamic switching identifier may be defined as 1; otherwise, it is defined as 0. As an example, when the above method 300 is used for attribute encoding, the dynamic switching identifier may be written to the code stream.

Fig. 6 shows a specific example of a method of prediction according to an embodiment of the present application. As shown in fig. 6, a reference block of the current sub-block, i.e., a range of reference blocks for prediction, includes: the parent blocks of the current sub-block (1 in number), the parent blocks coplanar with the parent blocks of the current sub-block at the same level (6 in number), the parent blocks collinear with the parent blocks of the current sub-block at the same level (12 in number), and the parent blocks co-located with the parent blocks of the current sub-block at the same level (8 in number), for a total of 27 parent blocks. Alternatively, 12 non-empty parent blocks may be selected from the 27 parent blocks as reference blocks (i.e., k=12), and the euclidean distance between the reference block and the current sub-block may be selected as the reference block, for example, the 12 parent blocks with the smallest euclidean distance between the reference block and the current sub-block may be selected as the reference blocks. Alternatively, when the example of the non-empty parent block of the 27 parent blocks is greater than or equal to 12, the interpolation operation may not be performed. For example, the Euclidean distance of the selected non-empty parent block from the current child block may be calculated for a total of 12 elements. Alternatively, the side information of the current sub-block may include the relative position of the current sub-block in its parent block and the side length of the current sub-block, for a total of 26 elements. As shown in fig. 6, the 26 elements may be connected to form a vector that is input into a multi-layer perceptron that includes three hidden layers. Correspondingly, the output of the network is the predicted attribute value of the current sub-block.

Therefore, the embodiment of the application can help to improve the accuracy and stability of the average attribute value of the current sub-block by flexibly selecting the reference block for predicting the current sub-block and based on the strong expression capability of the neural network model.

Fig. 7 shows a schematic flow chart of an encoding method 500 provided by an embodiment of the present application. The encoding method 500 may be applied to the encoder 100 shown in fig. 3, for example, geometric information and attribute information of a point cloud may be input into the encoder 100, so as to implement compression encoding of the point cloud. As shown in fig. 7, method 500 includes steps 510 through 550.

And 510, obtaining the prediction attribute value of the current sub-block.

Illustratively, the predicted attribute value of the current sub-block may be obtained in accordance with the method 300 shown in FIG. 3. Specifically, reference may be made to the descriptions in fig. 5 to 6, and no further description is given.

And 520, determining the prediction transformation coefficient of the current sub-block according to the prediction attribute value of the current sub-block and the point number in the current sub-block.

As a specific example, the above-described methods 300 and RATH may be used in combination with the encoding of a point cloud. In the layer-by-layer transformation RAHT from parent block to child block, initially, the largest parent block DC coefficient is defined asW is the number of points in the entire parent block, a _i is the attribute value of each point, and the AC coefficient is 0. The DC coefficients and AC coefficients of the sub-blocks are calculated by a transform matrix, namely:

The resulting DC coefficients continue for the next level of conversion and the AC coefficients are used for direct encoding. In the progressive conversion process from the parent block to the sub-block, if the attribute value and the point number of each sub-block are known, the conversion coefficient of the sub-block can be calculated. By using the prediction method provided by the embodiment of the application, the output value of the network can be rounded to obtain the prediction attribute value a _up of the sub-block. Predicting to obtain a predicted attribute value of each sub-block, and performing RAHT transformation on the predicted attribute value to obtain a series of predicted transformation coefficients, namely:

and 530, determining the real transformation coefficient of the current sub-block according to the real attribute value of the current sub-block and the point number in the current sub-block.

Illustratively, a true transform coefficient obtained by transforming a true attribute value is calculated, namely:

The difference between the predicted transform coefficient and the true transform coefficient is determined 540.

Illustratively, the difference between the true transform coefficient and the predicted transform coefficient is calculated, namely:

The difference is written 550 into the code stream.

Illustratively, the residuals of the AC coefficients are written into the bitstream, which may be quantized and entropy encoded directly, for example, with the DC coefficients continuing for the next level of transformation. Thus, the final quantization and entropy coding is the difference between the true transform coefficient and the predicted transform coefficient.

Therefore, the embodiment of the application encodes according to the prediction attribute value of the current sub-block obtained by the prediction method, and can further improve the encoding efficiency under the condition of improving the accuracy and stability of the average attribute value of the current sub-block.

Fig. 8 shows a schematic flow chart of a decoding method 600 provided by an embodiment of the present application. The decoding method 600 may be applied to the decoder 200 shown in fig. 4, for example, the geometric code stream and the attribute code stream of the point cloud may be input into the decoder 200, so as to implement decoding of the point cloud. As shown in fig. 8, method 600 includes steps 610 through 640.

And 610, obtaining the prediction attribute value of the current sub-block.

And 620, obtaining the difference value between the predicted transformation coefficient and the real transformation coefficient of the current sub-block according to the code stream.

And 630, determining the real transformation coefficient of the current sub-block according to the prediction attribute value and the difference value.

For example, determining the actual transform coefficient of the current sub-block according to the prediction attribute value and the difference value is an inverse process of calculating the difference value between the prediction transform coefficient and the actual transform coefficient of the current sub-block in the method 500, and specific reference may be made to the description in fig. 7, which is not repeated herein.

640, Determining the real attribute value of the current sub-block according to the real transform coefficient and the number of points in the current sub-block.

For example, determining the true attribute value of the current sub-block according to the true transform coefficient and the number of points in the current sub-block, and for the inverse process of calculating the true transform coefficient in the method 500, reference may be made to the description in fig. 7, which is not repeated here.

In some alternative embodiments, when the prediction method provided according to the embodiments of the present application, for example, the method 300 performs the encoding and decoding of the point cloud, for example, in the method 500 or 600, the prediction method 300 may be dynamically switched with other methods for predicting the attribute value of the current sub-block. By way of example, this other method of predicting the attribute value of the current sub-block may calculate a weighted average of the attribute values of 19 reference blocks for selecting 1 parent block containing the current sub-block, 6 co-planar parent blocks with the parent block level containing the current sub-block, 12 co-linear parent blocks with the parent block level containing the current sub-block, and the like. The conditions for dynamic switching may include, for example: at least one of the number of non-empty blocks in the reference block, the ratio of the number of the non-empty blocks to the total reference block, and the attribute difference of the non-empty blocks in the reference block satisfies a preset condition.

Alternatively, a dynamic handoff identifier may be defined. In the process of encoding and decoding, when the prediction method needs to be switched, the switching identifier may be defined as 1, otherwise, as 0. At this time, the dynamic switching identifier needs to be written into the code stream.

In some optional embodiments, when performing the encoding and decoding of the point cloud, each block in the first n layers of the hierarchical structure may encode its real attribute value, and quantize and entropy encode the real attribute value and write the quantized real attribute value into the code stream; starting from the n+1th layer of the hierarchical structure, the prediction attribute value of each sub-block is predicted by using the prediction method provided by the embodiment of the present application, for example, the method 300, and a difference value is calculated by using the real attribute value of the sub-block and the predicted prediction attribute value, and the difference value is quantized and entropy coded to form an attribute code stream of the point cloud.

In some alternative embodiments, when the hierarchical structure is an octree structure, the foregoing process of encoding and decoding the attributes of the point cloud may be directly embedded into the process of geometric encoding and decoding, so that synchronous progressive encoding and decoding of the geometric and attributes may be implemented at the decoding/encoding end.

The embodiment of the application also provides a method for processing the point cloud, in which the point cloud can be up-sampled to obtain the position information of the newly added point, and then the attribute information of the newly added point is obtained according to the prediction method provided by the embodiment of the application, such as the prediction method of the method 300.

Specifically, the resolution of a point cloud can be described as the spatial density of points, the more points within the same spatial size, the better the subjective quality of the point cloud. By up-sampling the point clouds, the point clouds can be made denser by some operation at the decoding end, for example. The up-sampling can acquire the position information of the newly added point, and further the predicted attribute value of the newly added point can be obtained through the prediction method provided by the embodiment of the application. By way of example, the newly added point may fall within a certain 1 x 1 size smallest sub-block, the attribute value may be obtained by using the prediction method provided in the embodiment of the present application, for example, the method 300 described above. Therefore, the embodiment of the application can help to improve the density of the point cloud.

The method embodiments of the present application are described in detail above with reference to fig. 5 to 8, and the apparatus embodiments of the present application are described in detail below with reference to fig. 9 to 12.

Fig. 9 is a schematic block diagram of an apparatus 900 for prediction according to an embodiment of the present application, as shown in fig. 9, the apparatus 900 may include an acquisition unit 910, a processing unit 920, and a neural network model 930.

An obtaining unit 910, configured to obtain a hierarchical structure of a point cloud, where the hierarchical structure includes a parent block and at least one child block of the parent block;

A processing unit 920, configured to determine a reference block of a current sub-block in the hierarchical structure, where the reference block includes at least one parent block associated with a parent block of the current sub-block at a same level, and/or at least one sub-block associated with the current sub-block at the same level;

and the neural network model 930 is configured to input the information of the reference block and/or the information of the current sub-block, and obtain a predicted attribute value of the current sub-block, where the training data of the neural network model includes the information of the reference block of the sub-block and the real attribute value of the sub-block.

Optionally, the at least one parent block associated with the same hierarchy as the parent block of the current sub-block includes at least one of:

Optionally, the at least one sub-block associated with the same level of the current sub-block includes at least one of:

Optionally, the processing unit 920 is further configured to:

determining that the number of non-empty reference blocks in the reference blocks is greater than or equal to k, wherein the non-empty reference blocks contain at least one point, and k is a positive integer;

determining k non-empty reference blocks in the non-empty reference blocks of the reference block;

The neural network model 930 is specifically configured to input information of the k non-empty reference blocks and/or information of the current sub-block, to obtain a prediction attribute value of the current sub-block.

Optionally, the processing unit 920 is specifically configured to:

And determining the k non-empty reference blocks according to the distance information between the non-empty reference blocks of the reference blocks and the current sub-block.

Optionally, the distance information includes euclidean distance and/or manhattan distance.

Optionally, the processing unit 920 is further configured to:

Determining that the number of non-empty reference blocks in the reference blocks is greater than or equal to m and less than k, wherein the non-empty reference blocks contain at least one point, m and k are positive integers, and m is less than or equal to k;

Interpolating the reference blocks with empty space in the reference blocks to obtain k non-empty reference blocks;

Optionally, the processing unit 920 is specifically configured to:

Acquiring at least one non-empty block which is coplanar, collinear and co-point in the same level of a first reference block with empty occupation in the reference blocks;

And interpolating the first reference block according to the non-empty blocks to obtain the k non-empty reference blocks.

Optionally, the processing unit 920 is specifically configured to:

Determining a weighted average value of the attribute values of the n1 non-empty reference blocks according to the attribute values of the n1 non-empty reference blocks in the reference blocks and the distance between the n1 non-empty reference blocks and a second reference block with empty space, wherein n1 is a positive integer, and m is less than or equal to n1 and less than or equal to k;

And interpolating the second reference block according to the weighted average value to obtain the k non-empty reference blocks.

Optionally, the processing unit 920 is specifically configured to:

determining a first average value of attribute values of n2 non-empty reference blocks in the reference block;

determining a second average value of distances from the n2 non-empty reference blocks to a third reference block with empty space;

And interpolating the third reference block according to the first average value and the second average value to obtain the k non-empty reference blocks, wherein the attribute value of the third reference block is the first average value, and the distance between the third reference block and the current sub-block is the second average value.

Optionally, the neural network model 930 is specifically configured to input at least one of an attribute value of the reference block, distance information between the reference block and the current sub-block, and auxiliary information of the current sub-block, to obtain a predicted attribute value of the current sub-block.

Optionally, the auxiliary information includes at least one of three-dimensional space coordinates of the current sub-block, three-dimensional space coordinates of a parent block of the current sub-block, a relative position of the current sub-block in the parent block, a transformation level of the current sub-block, size information of the current sub-block, a size of the parent block of the current sub-block, and a spatial distribution condition of sub-blocks in the parent block of the current sub-block.

Optionally, the processing unit 920 is further configured to:

And determining that at least one of the number of the non-empty blocks in the reference blocks, the ratio of the non-empty blocks to the total number of the reference blocks and the attribute difference of the non-empty blocks in the reference blocks meets a preset condition.

Optionally, in a case where the reference block does not include a point, the attribute value of the reference block is 0.

Optionally, the hierarchical structure includes at least one of an octree structure, a quadtree structure, a binary tree structure, and a non-uniform spatial division structure.

Optionally, the neural network model includes a multi-layer perceptron and/or a transducer.

It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. To avoid repetition, no further description is provided here. Specifically, the apparatus 900 of prediction shown in fig. 9 may correspond to a corresponding main body in the method 300 of performing the embodiment of the present application, and the foregoing and other operations and/or functions of each module in the apparatus 900 are respectively for implementing the corresponding flow in each method in fig. 5, which is not described herein for brevity.

Fig. 10 is a schematic block diagram of an encoder 1000, such as the encoder of fig. 3, in accordance with embodiments of the present application. As shown in fig. 10, the encoder 1000 may include an acquisition unit 1010, a processing unit 1020, and an encoding unit 1030.

An obtaining unit 1010, configured to obtain a prediction attribute value of the current sub-block. The prediction attribute value of the current sub-block may be obtained, for example, in accordance with the method 300 shown in fig. 5 and described above, without limitation.

And the processing unit is used for determining the prediction transformation coefficient of the current subblock according to the prediction attribute value and the point number in the current subblock.

The processing unit is further configured to determine a real transform coefficient of the current sub-block according to the real attribute value of the current sub-block and the number of points in the current sub-block.

It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. To avoid repetition, no further description is provided here. Specifically, the encoder 1000 shown in fig. 10 may correspond to a corresponding main body in the method 500 for performing the embodiment of the present application, and the foregoing and other operations and/or functions of each module in the encoder 1000 are respectively for implementing the corresponding flow in each method in fig. 7, which are not described herein for brevity.

Fig. 11 is a schematic block diagram of a decoder 1100, such as the decoder of fig. 4, in accordance with an embodiment of the present application. As shown in fig. 11, the decoder 1100 may include an acquisition unit 1110 and a processing unit 1120.

An obtaining unit 1110, configured to obtain, according to the code stream, a difference value between the predicted transform coefficient and the actual transform coefficient of the current sub-block.

The obtaining unit 1110 is further configured to obtain the predicted attribute value of the current sub-block, for example, the predicted attribute value of the current sub-block may be obtained according to the method 300 shown in fig. 5, which is not limited.

A processing unit 1120, configured to determine a true transform coefficient of the current sub-block according to the prediction attribute value and the difference value.

The processing unit 1220 is further configured to determine a true attribute value of the current sub-block according to the true transform coefficient and the number of points in the current sub-block.

It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. To avoid repetition, no further description is provided here. Specifically, the decoder 1100 shown in fig. 11 may correspond to a corresponding main body in the method 600 for executing the embodiment of the present application, and the foregoing and other operations and/or functions of each module in the decoder 1100 are respectively for implementing the corresponding flow in each method in fig. 8, which are not described herein for brevity.

The embodiment of the application also provides a device for processing the point cloud, which comprises an up-sampling unit and an acquisition unit. The up-sampling unit can be used for up-sampling the point cloud to obtain the position information of the newly added point; the obtaining unit is configured to obtain the attribute information of the new adding point, for example, may obtain the predicted attribute value of the current sub-block according to the method 300 shown in fig. 5, which is not limited.

The apparatus and system of embodiments of the present application are described above in terms of functional modules in connection with the accompanying drawings. It should be understood that the functional module may be implemented in hardware, or may be implemented by instructions in software, or may be implemented by a combination of hardware and software modules. Specifically, each step of the method embodiment in the embodiment of the present application may be implemented by an integrated logic circuit of hardware in a processor and/or an instruction in a software form, and the steps of the method disclosed in connection with the embodiment of the present application may be directly implemented as a hardware decoding processor or implemented by a combination of hardware and software modules in the decoding processor. Alternatively, the software modules may be located in a well-established storage medium in the art such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, and the like. The storage medium is located in a memory, and the processor reads information in the memory, and in combination with hardware, performs the steps in the above method embodiments.

Fig. 12 is a schematic block diagram of an electronic device 1200 provided by an embodiment of the present application.

As shown in fig. 12, the electronic device 1200 may include:

A memory 1210 and a processor 1220, the memory 1210 being adapted to store a computer program and to transfer the program code to the processor 1220. In other words, the processor 1220 may call and run a computer program from the memory 1210 to implement the method in the embodiment of the present application.

For example, the processor 1220 may be configured to perform the steps of the methods 300, 500 or 600 described above according to instructions in the computer program.

In some embodiments of the application, the processor 1220 may include, but is not limited to:

A general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field programmable gate array (Field Programmable GATE ARRAY, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

In some embodiments of the application, the memory 1210 includes, but is not limited to:

Volatile memory and/or nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (STATIC RAM, SRAM), dynamic random access memory (DYNAMIC RAM, DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate Synchronous dynamic random access memory (Double DATA RATE SDRAM, DDR SDRAM), enhanced Synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCH LINK DRAM, SLDRAM), and Direct memory bus RAM (DR RAM).

In some embodiments of the present application, the computer program may be divided into one or more modules, which are stored in the memory 1210 and executed by the processor 1220 to perform the point cloud processing method provided by the present application. The one or more modules may be a series of computer program instruction segments capable of performing the specified functions, which are used to describe the execution of the computer program in the electronic device 1200.

Optionally, as shown in fig. 12, the electronic device 1200 may further include:

a transceiver 1230, the transceiver 1230 may be coupled to the processor 1220 or memory 1210.

Processor 1220 may control transceiver 1230 to communicate with other devices, and in particular, may send information or data to other devices or receive information or data sent by other devices. The transceiver 1230 may include a transmitter and a receiver. The transceiver 1230 may further include antennas, the number of which may be one or more.

It should be appreciated that the various components in the electronic device 1200 are connected by a bus system that includes a power bus, a control bus, and a status signal bus in addition to a data bus.

According to an aspect of the present application, there is provided an encoder comprising a processor and a memory for storing a computer program, the processor being adapted to invoke and run the computer program stored in the memory, such that the encoder performs the encoding method of the above method embodiments.

According to an aspect of the present application, there is provided a decoder comprising a processor and a memory for storing a computer program, the processor being adapted to invoke and run the computer program stored in the memory, such that the decoder performs the decoding method of the above-described method embodiments.

According to an aspect of the present application, there is provided a codec system including the above encoder and decoder.

According to an aspect of the present application, there is provided a computer storage medium having stored thereon a computer program which, when executed by a computer, enables the computer to perform the method of the above-described method embodiments. Alternatively, embodiments of the present application also provide a computer program product comprising instructions which, when executed by a computer, cause the computer to perform the method of the method embodiments described above.

According to another aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium and executes the computer instructions to cause the computer device to perform the method of the above-described method embodiments.

In other words, when implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a Solid State Drive (SSD)), or the like.

Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.

The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. For example, functional modules in various embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.

In summary, the present application is only specific embodiments, but the scope of the application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

A method of prediction, comprising:

obtaining a hierarchical structure of a point cloud, wherein the hierarchical structure comprises a parent block and at least one child block of the parent block;

Determining a reference block of a current sub-block in the hierarchical structure, wherein the reference block comprises at least one parent block associated with the same level as a parent block of the current sub-block and/or at least one sub-block associated with the same level as the current sub-block;

Inputting the information of the reference block and/or the information of the current sub-block into a neural network model to obtain a predicted attribute value of the current sub-block, wherein training data of the neural network model comprises the information of the reference block of the sub-block and a real attribute value of the sub-block.
The method of claim 1, wherein the at least one parent block associated with the same hierarchy as the parent block of the current child block comprises at least one of:

The method comprises the steps of enabling a parent block of a current child block to be coplanar with the parent block of the current child block in the same level, enabling the parent block to be collinear with the parent block of the current child block in the same level, enabling the parent block to be co-point with the parent block of the current child block in the same level, enabling the parent block to be distant from two parent blocks in the same level as the parent block of the current child block in the x positive direction or the negative direction, enabling the parent block to be distant from two parent blocks in the same level as the parent block of the current child block in the y positive direction or the negative direction, and enabling the parent block to be distant from two parent blocks in the z positive direction or the negative direction in the same level as the parent block of the current child block.
The method of claim 1, wherein the at least one sub-block associated with the same hierarchy as the current sub-block comprises at least one of:

The method comprises the steps of a sub-block coplanar with the current sub-block at the same level, a sub-block collinear with the current sub-block at the same level, a sub-block at the same level as the current sub-block at the same point, a sub-block at the same level as the current sub-block at a distance from two sub-blocks in the x positive direction or the negative direction axis, a sub-block at the same level as the current sub-block at a distance from two sub-blocks in the y positive direction or the negative direction axis, and a sub-block at the same level as the current sub-block at a distance from two sub-blocks in the z positive direction or the negative direction axis.
A method according to any one of claims 1-3, wherein said inputting the information of the reference block and/or the information of the current sub-block into a neural network model to obtain the predicted attribute value of the current sub-block comprises:

determining that the number of non-empty reference blocks in the reference blocks is greater than or equal to k, wherein the non-empty reference blocks contain at least one point, and k is a positive integer;

determining k non-empty reference blocks in the non-empty reference blocks of the reference block;

and inputting the information of the k non-empty reference blocks and/or the information of the current sub-block into the neural network model to obtain the prediction attribute value of the current sub-block.
The method of claim 4, wherein said determining k non-empty reference blocks among the non-empty reference blocks of the reference block comprises:

And determining the k non-empty reference blocks according to the distance information between the non-empty reference blocks of the reference blocks and the current sub-block.
The method of claim 5, wherein the distance information comprises euclidean distance and/or manhattan distance.
The method according to claim 1, wherein inputting the information of the reference block and/or the information of the current sub-block into a neural network model to obtain the predicted attribute value of the current sub-block comprises:

Determining that the number of non-empty reference blocks in the reference blocks is greater than or equal to m and less than k, wherein the non-empty reference blocks contain at least one point, m and k are positive integers, and m is less than or equal to k;

Interpolating the reference blocks with empty space in the reference blocks to obtain k non-empty reference blocks;

and inputting the information of the k non-empty reference blocks and/or the information of the current sub-block into the neural network model to obtain the prediction attribute value of the current sub-block.
The method of claim 7, wherein interpolating the reference blocks with empty space among the reference blocks to obtain k non-empty reference blocks comprises:

Acquiring at least one non-empty block which is coplanar, collinear and co-point in the same level of a first reference block with empty occupation in the reference blocks;

And interpolating the first reference block according to the non-empty blocks to obtain the k non-empty reference blocks.
The method of claim 7, wherein interpolating the reference blocks with empty space among the reference blocks to obtain k non-empty reference blocks comprises:

Determining a weighted average value of the attribute values of the n1 non-empty reference blocks according to the attribute values of the n1 non-empty reference blocks in the reference blocks and the distance between the n1 non-empty reference blocks and a second reference block with empty space, wherein n1 is a positive integer, and m is less than or equal to n1 and less than or equal to k;

And interpolating the second reference block according to the weighted average value to obtain the k non-empty reference blocks.
The method of claim 7, wherein interpolating the reference blocks with empty space among the reference blocks to obtain k non-empty reference blocks comprises:

determining a first average value of attribute values of n2 non-empty reference blocks in the reference block;

determining a second average value of distances from the n2 non-empty reference blocks to a third reference block with empty space;

And interpolating the third reference block according to the first average value and the second average value to obtain the k non-empty reference blocks, wherein the attribute value of the third reference block is the first average value, and the distance between the third reference block and the current sub-block is the second average value.
The method according to any one of claims 1-10, wherein inputting the information of the reference block and/or the information of the current sub-block into a neural network model to obtain the predicted attribute value of the current sub-block comprises:

And inputting at least one of the attribute value of the reference block, the distance information between the reference block and the current sub-block and the auxiliary information of the current sub-block into the neural network model to obtain the predicted attribute value of the current sub-block.
The method of claim 11, wherein the auxiliary information comprises at least one of three-dimensional space coordinates of the current sub-block, three-dimensional space coordinates of a parent block of the current sub-block, a relative position of the current sub-block in its parent block, a transformation level at which the current sub-block is located, size information of the current sub-block, a size of the parent block of the current sub-block, and a spatial distribution of sub-blocks in the parent block of the current sub-block.
The method according to any one of claims 1-12, further comprising:

And determining that at least one of the number of the non-empty blocks in the reference blocks, the ratio of the non-empty blocks to the total number of the reference blocks and the attribute difference of the non-empty blocks in the reference blocks meets a preset condition.
The method according to any one of claims 1-13, wherein the attribute value of the reference block is 0 in case the reference block contains no points.
The method of any of claims 1-14, wherein the hierarchical structure comprises at least one of an octree structure, a quadtree structure, a binary tree structure, and a non-uniform spatial division structure.
The method of any one of claims 1-15, wherein the neural network model comprises a multi-layer perceptron and/or a transducer.
A method of encoding, comprising:

the method according to any one of claims 1-16, wherein a prediction attribute value of a current sub-block is obtained;

Determining a prediction transformation coefficient of the current sub-block according to the prediction attribute value and the point number in the current sub-block;

determining a real transformation coefficient of the current sub-block according to the real attribute value of the current sub-block and the point number in the current sub-block;

Determining a difference value between the predicted transform coefficient and the true transform coefficient;

And writing the difference value into a code stream.
The method as recited in claim 17, further comprising:

Writing a switching identifier into the code stream, wherein the switching identifier is used for indicating that a prediction attribute value of a current sub-block is obtained according to the method of any one of claims 1-16.
A decoding method, comprising:

Obtaining the difference value between the predicted transformation coefficient and the real transformation coefficient of the current sub-block according to the code stream;

The method according to any one of claims 1-15, wherein a predicted attribute value of the current sub-block is obtained;

Determining a real transformation coefficient of the current sub-block according to the prediction attribute value and the difference value;

And determining the real attribute value of the current sub-block according to the real transformation coefficient and the point number in the current sub-block.
The method as recited in claim 19, further comprising:

A switching identifier is obtained from the code stream, wherein the switching identifier is used for indicating that a prediction attribute value of a current sub-block is obtained according to the method of any one of claims 1-16.
A method of point cloud processing, comprising:

up-sampling the point cloud to obtain the position information of the newly added point;

the method according to any one of claims 1-16, wherein attribute information of the newly added point is obtained.
A predictive device, comprising:

An obtaining unit, configured to obtain a hierarchical structure of a point cloud, where the hierarchical structure includes a parent block and at least one child block of the parent block;

a processing unit, configured to determine a reference block of a current sub-block in the hierarchical structure, where the reference block includes at least one parent block associated with a parent block of the current sub-block at a same level, and/or at least one sub-block associated with the current sub-block at the same level;

And the neural network model is used for inputting the information of the reference block and/or the information of the current sub-block and obtaining the predicted attribute value of the current sub-block, wherein the training data of the neural network model comprises the information of the reference block of the sub-block and the real attribute value of the sub-block.
An encoder is provided, which is used for encoding a data signal, characterized by comprising the following steps:

An obtaining unit, configured to obtain a predicted attribute value of a current sub-block according to the method of any one of claims 1 to 16;

The processing unit is used for determining the prediction transformation coefficient of the current subblock according to the prediction attribute value and the point number in the current subblock;

The processing unit is further used for determining the real transformation coefficient of the current sub-block according to the real attribute value of the current sub-block and the point number in the current sub-block;

The processing unit is further configured to determine a difference between the predicted transform coefficient and the true transform coefficient;

And the encoding unit is used for writing the difference value into the code stream.
A method for decoding a picture of a picture, characterized by comprising the following steps:

The acquisition unit is used for acquiring the difference value between the predicted transformation coefficient and the real transformation coefficient of the current sub-block according to the code stream;

The obtaining unit is further configured to obtain a predicted attribute value of the current sub-block according to the method of any one of claims 1-15;

The processing unit is used for determining the real transformation coefficient of the current sub-block according to the prediction attribute value and the difference value;

the processing unit is further configured to determine a true attribute value of the current sub-block according to the true transform coefficient and the number of points in the current sub-block.
A codec system comprising an encoder as claimed in claim 23 and a decoder as claimed in claim 24.
An apparatus for point cloud processing, comprising:

The up-sampling unit is used for up-sampling the point cloud to obtain the position information of the newly added point;

An obtaining unit, configured to obtain attribute information of the new point according to the method of any one of claims 1 to 16.
An electronic device comprising a processor and a memory;

The memory is for storing a computer program, and the processor is for invoking and running the computer program stored in the memory to cause the electronic device to perform the method of any of claims 1-21.
A computer readable storage medium storing a computer program for causing a computer to perform the method of any one of claims 1-21.
A computer program product comprising computer program code which, when run by an electronic device, causes the electronic device to perform the method of any one of claims 1-21.