WO2023130333A1 - 编解码方法、编码器、解码器以及存储介质 - Google Patents
编解码方法、编码器、解码器以及存储介质 Download PDFInfo
- Publication number
- WO2023130333A1 WO2023130333A1 PCT/CN2022/070598 CN2022070598W WO2023130333A1 WO 2023130333 A1 WO2023130333 A1 WO 2023130333A1 CN 2022070598 W CN2022070598 W CN 2022070598W WO 2023130333 A1 WO2023130333 A1 WO 2023130333A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- point cloud
- layer
- module
- information
- current frame
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 108
- 238000013528 artificial neural network Methods 0.000 claims abstract description 144
- 238000012549 training Methods 0.000 claims abstract description 42
- 230000004913 activation Effects 0.000 claims description 98
- 238000013138 pruning Methods 0.000 claims description 68
- 238000000605 extraction Methods 0.000 claims description 56
- 238000005070 sampling Methods 0.000 claims description 42
- 230000015654 memory Effects 0.000 claims description 38
- 230000006837 decompression Effects 0.000 claims description 37
- 230000006835 compression Effects 0.000 claims description 34
- 238000007906 compression Methods 0.000 claims description 34
- 244000141353 Prunus domestica Species 0.000 claims description 22
- 238000004590 computer program Methods 0.000 claims description 21
- 238000013139 quantization Methods 0.000 claims description 13
- 238000010586 diagram Methods 0.000 description 28
- 239000000203 mixture Substances 0.000 description 24
- 239000013598 vector Substances 0.000 description 23
- 238000004422 calculation algorithm Methods 0.000 description 21
- 230000008569 process Effects 0.000 description 18
- 230000006870 function Effects 0.000 description 13
- 238000004891 communication Methods 0.000 description 7
- 230000000670 limiting effect Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 230000001360 synchronised effect Effects 0.000 description 5
- 230000009466 transformation Effects 0.000 description 5
- 238000011176 pooling Methods 0.000 description 4
- 230000003068 static effect Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000036961 partial effect Effects 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000002829 reductive effect Effects 0.000 description 2
- KLDZYURQCUYZBL-UHFFFAOYSA-N 2-[3-[(2-hydroxyphenyl)methylideneamino]propyliminomethyl]phenol Chemical compound OC1=CC=CC=C1C=NCCCN=CC1=CC=CC=C1O KLDZYURQCUYZBL-UHFFFAOYSA-N 0.000 description 1
- 238000010146 3D printing Methods 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 201000001098 delayed sleep phase syndrome Diseases 0.000 description 1
- 208000033921 delayed sleep phase type circadian rhythm sleep disease Diseases 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000009365 direct transmission Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000002310 reflectometry Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 235000014347 soups Nutrition 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000011426 transformation method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
Definitions
- the embodiments of the present application relate to the technical field of video encoding and decoding, and in particular, relate to an encoding and decoding method, an encoder, a decoder, and a storage medium.
- a point cloud is defined as a collection of points in a three-dimensional space, where each point is expressed as three-dimensional coordinates and specific attribute information.
- point cloud is widely used in virtual reality, immersive telepresence, 3D printing and other fields.
- a typical application of point cloud is to represent the three-dimensional image of dynamic human body in virtual reality and telepresence. This image is called dynamic point cloud (Dynamic Point Cloud, DPC), and the data volume of point cloud is huge. Compression is a key technology in these applications.
- the existing dynamic point cloud compression technology constructs a neural network for motion estimation and compensation alone. During the training process, it is necessary to mark the motion vector for the data set, which increases the difficulty of training. The encoding and decoding efficiency of the encoding and decoding framework thus constructed needs to be improved.
- the embodiment of the present application provides an encoding and decoding method, an encoder, a decoder, and a storage medium.
- An end-to-end neural network is used for encoding and decoding, which can not only improve the quality of the point cloud, but also save the code rate, thereby improving the encoding and decoding. efficiency.
- the embodiment of the present application provides an encoding method applied to an encoder, and the method includes:
- the first neural network is an end-to-end neural network, and the first neural network is configured as:
- the embodiment of the present application provides a decoding method, which is applied to a decoder, and the method includes:
- the second neural network is an end-to-end neural network, and the second neural network is configured as:
- an encoder which includes a determination unit and an encoding unit; wherein,
- the determination unit is configured to determine the current frame point cloud, and the reference frame reconstruction point cloud corresponding to the current frame point cloud;
- the encoding unit is configured to encode the point cloud of the current frame by using the preset first neural network to reconstruct the point cloud based on the reference frame, and write the obtained encoded bits into the code stream;
- the first neural network is an end-to-end neural network, and the first neural network is configured as:
- an embodiment of the present application provides an encoder, where the encoder includes a first memory and a first processor; wherein,
- a first memory for storing a computer program capable of running on the first processor
- the first processor is configured to execute the method of the first aspect when running the computer program.
- the embodiment of the present application provides a decoder, the decoder includes an acquisition unit and a decoding unit, wherein,
- the obtaining unit is used to obtain a code stream
- the decoding unit is used to use the preset second neural network to decode the code stream to obtain the reconstruction point cloud of the current frame;
- the second neural network is an end-to-end neural network, and the second neural network is configured as:
- the embodiment of the present application provides a decoder, where the decoder includes a second memory and a second processor; wherein,
- a second memory for storing a computer program capable of running on the second processor
- the second processor is configured to execute the method as described in the third aspect when running the computer program.
- the embodiment of the present application provides a computer storage medium, the computer storage medium stores a computer program, and when the computer program is executed by the first processor, the method as described in the first aspect is implemented, or the computer program is implemented by the second
- the processor realizes the method described in the second aspect when executing.
- the embodiment of the present application provides a codec method, an encoder, a decoder, and a storage medium.
- the encoder the current frame point cloud is determined, and the reference frame corresponding to the current frame point cloud is used to reconstruct the point cloud;
- the first neural network encodes the current frame point cloud based on the reference frame reconstruction point cloud, and writes the obtained coded bits into the code stream; wherein, the first neural network is an end-to-end neural network, and the second neural network is an end-to-end neural network.
- a neural network is configured to: perform inter-frame prediction based on the reference frame reconstruction point cloud and the current frame point cloud to obtain motion information and residual information of the current frame point cloud; The difference information is encoded, and the obtained encoded bits are written into the code stream.
- the bit stream is obtained; the second neural network is used to decode the bit stream to obtain the current frame reconstruction point cloud; wherein, the second neural network is an end-to-end neural network, and the second neural network is configured as: Decoding the code stream to determine the motion information and residual information of the current frame point cloud; reconstructing the point cloud based on the motion information and the reference frame to perform motion compensation to obtain the prediction information of the current frame point cloud; based on the residual information and The prediction information of the point cloud of the current frame is used to obtain the reconstructed point cloud of the current frame.
- the encoder uses an end-to-end neural network for point cloud encoding.
- the network does not require additional training sample sets for the motion information of samples during training, which reduces the difficulty of training.
- the network ensures the quality of point cloud reconstruction by reducing the bit rate. For training purposes, using this network for encoding can not only improve the quality of the point cloud, but also save the bit rate, thereby improving the encoding efficiency.
- the decoder uses the second neural network to reconstruct the point cloud.
- the second neural network can be understood as a part of the network structure that has the decoding function in the first neural network.
- the neural networks at the encoding end and the decoding end perform end-to-end automatic Supervised learning, reducing human intervention, using the network for decoding, can reduce the distortion and ensure the quality of the reconstructed point cloud.
- Fig. 1 is a composition framework schematic diagram of a kind of G-PCC coder
- Fig. 2 is a composition framework schematic diagram of a kind of G-PCC decoder
- FIG. 3 is a schematic flowchart of an encoding method provided in an embodiment of the present application.
- FIG. 4 is a schematic diagram of the composition and structure of the inter-frame prediction module in the embodiment of the present application.
- FIG. 5 is a schematic diagram of the composition and structure of the first neural network in the embodiment of the present application.
- FIG. 6 is a schematic diagram of the composition and structure of the downsampling module in the embodiment of the present application.
- FIG. 7 is a schematic diagram of the composition and structure of the first upsampling module in the embodiment of the present application.
- FIG. 8 is a schematic diagram of the composition and structure of the second upsampling module in the embodiment of the present application.
- FIG. 9 is a schematic flowchart of a decoding method in an embodiment of the present application.
- FIG. 10 is a schematic diagram of the composition and structure of the second neural network in the embodiment of the present application.
- FIG. 11 is a schematic diagram of the composition and structure of an encoder provided in an embodiment of the present application.
- FIG. 12 is a schematic diagram of a specific hardware structure of an encoder provided in an embodiment of the present application.
- FIG. 13 is a schematic structural diagram of a decoder provided in an embodiment of the present application.
- FIG. 14 is a schematic diagram of a specific hardware structure of a decoder provided in an embodiment of the present application.
- FIG. 15 is a schematic diagram of the composition and structure of an encoding and decoding system provided by an embodiment of the present application.
- references to “some embodiments” describe a subset of all possible embodiments, but it is understood that “some embodiments” may be the same subset or a different subset of all possible embodiments, and Can be combined with each other without conflict.
- first ⁇ second ⁇ third involved in the embodiment of the present application is only used to distinguish similar objects, and does not represent a specific ordering of objects. Understandably, “first ⁇ second ⁇ The specific order or sequence of "third” may be interchanged where permitted so that the embodiments of the application described herein can be implemented in an order other than that illustrated or described herein.
- Geometry-based point cloud compression Geometry-based Point Cloud Compression, G-PCC or GPCC
- video-based point cloud compression Video-based Point Cloud Compression, V-PCC or VPCC
- variational autoencoder Variational AutoEncoder , VAE
- autoencoder AutoEncoder, AE
- variational autodecoder Variational AutoDecoder, VAD
- self-decoder AutoDecoder, AD
- multi-layer perceptron Multi-layer Perceptron
- three-layer initial residual Network Inception Residual Network, IRN
- binary cross entropy binary cross entropy
- octree Octree
- bounding box bounding box
- K nearest neighbor K Nearest Neighbor, KNN
- Point cloud is a three-dimensional representation of the surface of an object.
- the point cloud (data) on the surface of an object can be collected through acquisition equipment such as photoelectric radar, laser radar, laser scanner, and multi-view camera.
- Point cloud refers to a collection of massive three-dimensional points, and the points in the point cloud can include point location information and point attribute information.
- the point position information may be three-dimensional coordinate information of the point.
- the location information of a point may also be referred to as geometric information of a point.
- the attribute information of a point may include color information and/or reflectivity and the like.
- color information may be information on any color space.
- color information may be RGB information. Wherein, R represents red (Red, R), G represents green (Green, G), and B represents blue (Blue, B).
- the color information may be luminance chrominance (YCbCr, YUV) information. Among them, Y represents brightness, Cb(U) represents blue chroma, and Cr(V) represents red chroma.
- the points in the point cloud can include the three-dimensional coordinate information of the point and the laser reflection intensity (reflectance) of the point.
- the points in the point cloud may include the three-dimensional coordinate information of the point and the color information of the point.
- the points in the point cloud may include the three-dimensional coordinate information of the point, the laser reflection intensity (reflectance) of the point, and the color information of the point.
- Point clouds can be divided into the following ways:
- the first type of static point cloud that is, the object is stationary, and the device that obtains the point cloud is also stationary;
- the second type of dynamic point cloud the object is moving, but the device that obtains the point cloud is still;
- the third type of dynamic acquisition point Cloud The device that acquires the point cloud is in motion.
- point cloud For example, according to the purpose of point cloud, it is divided into two categories:
- Category 1 Machine perception point cloud, which can be used in scenarios such as autonomous navigation system, real-time inspection system, geographic information system, visual sorting robot, emergency rescue robot;
- Category 2 Human eye perception point cloud, which can be used in digital Point cloud application scenarios such as cultural heritage, free viewpoint broadcasting, 3D immersive communication, and 3D immersive interaction.
- the point cloud is a collection of massive points, storing the point cloud will not only consume a large amount of memory, but also is not conducive to transmission, and there is no such a large bandwidth to support the direct transmission of the point cloud at the network layer without compression. Therefore, it is necessary to Cloud for compression.
- the point cloud coding framework that can compress the point cloud can be the G-PCC codec framework or the V-PCC codec framework provided by the Moving Picture Experts Group (MPEG), or it can be audio and video coding
- MPEG Moving Picture Experts Group
- AVS-PCC codec framework provided by the standard (Audio Video Standard, AVS).
- the G-PCC codec framework can be used to compress the first type of static point cloud and the third type of dynamically acquired point cloud
- the V-PCC codec framework can be used to compress the second type of dynamic point cloud.
- the description here mainly focuses on the G-PCC codec framework.
- each slice is independently encoded.
- FIG. 1 is a schematic diagram of a composition framework of a G-PCC encoder. As shown in Figure 1, this G-PCC encoder is applied to a point cloud encoder.
- the point cloud data is divided into multiple slices through slice division first.
- the geometric information of the point cloud and the attribute information corresponding to each point cloud are encoded separately.
- the geometric information is transformed into coordinates so that all point clouds are included in a bounding box, and then quantized. This step of quantization mainly plays a role in scaling.
- the geometry of a part of the point cloud Due to the rounding of quantization, the geometry of a part of the point cloud The information is the same, so based on the parameters to decide whether to remove duplicate points, the process of quantizing and removing duplicate points is also called the voxelization process. Then perform octree division on the bounding box. In the octree-based geometric information encoding process, the bounding box is divided into 8 sub-cubes, and the sub-cubes that are not empty (including points in the point cloud) are continued to be divided into 8 sub-cubes until the obtained leaf structure is obtained.
- the octree division is also performed first, but different from the geometric information encoding based on octree, this trisoup does not need to divide the point cloud step by step Divide into a unit cube with a side length of 1 ⁇ 1 ⁇ 1, but stop dividing when it is divided into a sub-block (block) with a side length of W.
- vertex Based on the surface formed by the distribution of point clouds of each block, the surface and the block are obtained. At most twelve intersection points (vertex) generated by the twelve edges of the vertex are arithmetically encoded (surface fitting based on the intersection point) to generate a binary geometric bit stream, that is, a geometric code stream. Vertex is also used in the implementation of the geometric reconstruction process, and the reconstructed set information is used when encoding the attributes of the point cloud.
- the geometric encoding is completed, and after the geometric information is reconstructed, color conversion is performed to convert the color information (that is, the attribute information) from the RGB color space to the YUV color space. Then, the point cloud is recolored with the reconstructed geometry information so that the unencoded attribute information corresponds to the reconstructed geometry information.
- Attribute coding is mainly carried out for color information. In the process of color information coding, there are mainly two transformation methods, one is distance-based lifting transformation that relies on LOD division, and the other is direct RAHT transformation.
- Both methods will color information Transform from the space domain to the frequency domain, obtain high-frequency coefficients and low-frequency coefficients through transformation, and finally quantize the coefficients (that is, quantize coefficients).
- the geometric encoding data and quantized coefficients processed by octree division and surface fitting properties After the coded data is combined into slices, the vertex coordinates of each block are coded sequentially (that is, arithmetic coding) to generate a binary attribute bit stream, that is, an attribute code stream.
- FIG. 2 is a schematic diagram of a composition framework of a G-PCC decoder. As shown in Fig. 2, this G-PCC decoder is applied to the point cloud encoder. In the G-PCC decoding framework, for the obtained binary code stream, the geometric bit stream and attribute bit stream in the binary code stream are first independently decoded.
- the geometric information of the point cloud is obtained through arithmetic decoding - octree synthesis - surface fitting - reconstruction geometry - inverse coordinate transformation; when decoding the attribute bit stream, through arithmetic decoding - inverse Quantization-LOD-based lifting inverse transformation or RAHT-based inverse transformation-inverse color conversion to obtain attribute information of the point cloud, and restore the 3D image model of the point cloud data to be encoded based on the geometric information and attribute information.
- the existing G-PCC codec framework uses a separate network for motion estimation and motion compensation. During the training process, it is necessary to calculate the loss value between the predicted motion vector and the real motion vector, often It is necessary to mark the motion vector for the data set. The application of this network may cause a large difference between the reconstructed point cloud and the original point cloud, and the distortion is serious, which will affect the quality of the entire point cloud.
- the embodiment of the present application proposes a codec method, which can affect the motion estimation and motion compensation in the G-PCC encoding framework, and can also affect the motion compensation in the G-PCC decoding framework.
- FIG. 3 is a schematic flowchart of the encoding method provided in the embodiment of the present application. As shown in FIG. 3, the method may include:
- Step 301 Determine the current frame point cloud, and the reference frame reconstruction point cloud corresponding to the current frame point cloud;
- the encoding method described in the embodiment of the present application specifically refers to the point cloud encoding method, which can be applied to a point cloud encoder (in the embodiment of the present application, it may be simply referred to as "encoder").
- the point cloud of the current frame can be understood as the point cloud to be encoded.
- the reference frame reconstruction point cloud can be understood as an encoded point cloud, and the reference frame reconstruction point cloud can reconstruct the point cloud for the previous frame, or the reconstruction point set of some encoded points in the current frame point cloud. That is to say, the reference point of the point to be encoded can be the reconstruction point of the previous frame or the current frame.
- a point in the point cloud of the current frame corresponds to a geometric information and an attribute information; wherein, the geometric information represents the spatial position of the point, and the geometric information is specifically three-dimensional geometric coordinates.
- the attribute information may include color components, specifically color information of any color space.
- the attribute information may be color information in RGB space, may also be color information in YUV space, may also be color information in YCbCr space, etc., which are not specifically limited in this embodiment of the present application.
- Step 302 Using the preset first neural network to reconstruct the point cloud based on the reference frame to encode the point cloud of the current frame, and write the obtained encoded bits into the code stream;
- the first neural network is an end-to-end neural network, and the first neural network is configured as:
- the first neural network includes an inter-frame prediction module, and the inter-frame prediction module is configured to:
- the residual information is obtained based on the current frame point cloud and prediction information of the current frame point cloud.
- the point cloud of the current frame can be understood as the real information of the point cloud of the current frame, and the residual information is obtained by subtracting the real information from the predicted information.
- the real information specifically includes the real value of the attribute of each point
- the predicted information includes the predicted value of the attribute of each point.
- inter prediction includes motion estimation and motion compensation.
- motion estimation the embodiment of the present application provides a multi-scale motion estimation method to solve the problem of excessive time and space complexity of the existing motion estimation network.
- motion compensation the embodiment of the present application provides a bounded three-nearest neighbor interpolation algorithm, which solves the problem of poor interpolation effect in sparse point cloud space.
- the inter prediction module includes a multi-scale motion estimation module, and the multi-scale motion estimation module is configured to:
- connection data Connecting the reconstructed point cloud of the reference frame with the point cloud of the current frame to obtain connection data
- the final motion information is obtained.
- the low-scale motion estimation can be understood as a kind of low-precision motion estimation, and the obtained low-scale motion information (ie, the first motion information) is used to represent the approximate motion direction of the object in the point cloud of the current frame.
- the low-scale motion information represents the motion information of a point cloud block including a person from a reference frame to a current frame.
- High-scale motion estimation can be understood as a high-precision motion estimation, and the obtained high-scale motion information (ie, second motion information) is used to represent the specific motion direction of the object in the point cloud of the current frame.
- the high-scale motion information represents the motion information of different parts of the human body from the reference frame to the current frame in the point cloud block containing the person.
- the motion information specifically refers to a motion vector, which can be decomposed into motion components in the xyz three-point direction, and participate in motion compensation.
- low-scale motion estimation is performed first to obtain low-scale motion information, which includes rough motion vectors.
- Use low-scale motion information to guide high-scale motion estimation, and obtain high-scale motion information, including fine motion vectors.
- the low-scale motion information is added to the high-scale motion information to obtain comprehensive motion information.
- the comprehensive motion information can accurately represent the motion characteristics of the point to be encoded, improve the accuracy of motion estimation, and then improve the accuracy of subsequent motion compensation and improve the quality of point cloud reconstruction.
- the inter prediction module includes a first compression module and a first decompression module corresponding to the first compression module;
- the first compression module is configured to: down-sample the motion information; perform quantization and entropy coding on the down-sampled motion information to obtain coded bits of the motion information;
- the first decompression module is configured to: perform entropy decoding and up-sampling on coded bits of the motion information to obtain decoded motion information.
- the first decompression module further includes inverse quantization after entropy decoding.
- the first compression module includes: a convolution layer, a quantizer, and an autoencoder (AE), and the first decompression module includes: an autodecoder (AD) and a deconvolution layer.
- the first compression module includes: a convolution layer, a quantizer, and an autoencoder (AE)
- the first decompression module includes: an autodecoder (AD) and a deconvolution layer.
- the motion information is down-sampled and quantized
- the probability distribution is obtained through the entropy model
- the AE is used for arithmetic coding to obtain the 01 bit stream, which is transmitted to the decoding end.
- Corresponding entropy decoding and upsampling are performed at the decoding end, and the decoded running information is obtained to participate in point cloud reconstruction.
- Corresponding entropy decoding and upsampling also need to be performed at the encoding end, and the decoded running information is obtained to participate in point cloud reconstruction, and the reconstructed point cloud of the current frame is obtained to participate in the encoding of the point cloud of the next frame.
- the inter-frame prediction module at the coding end also includes a motion compensation module, which uses a preset interpolation algorithm to perform motion compensation.
- the interpolation algorithm may be a bounded three-nearest neighbor interpolation algorithm, or a trilinear interpolation algorithm.
- the motion compensation module when the motion compensation module performs motion compensation based on a bounded three-nearest neighbor interpolation algorithm, the motion compensation module is configured to:
- the penalty coefficient is used to limit the weights of the K neighbor points of the isolated point.
- the penalty coefficient can be understood as limiting the selection boundary of the neighboring points. For the isolated point, the distance between the neighboring points is far away, and the penalty coefficient limits the weight of the neighboring points of the isolated point, so as to avoid that the isolated point can still obtain a large attribute after interpolation Predictive value.
- K neighboring points refer to the K points closest to the second geometric coordinates in the reference frame, and the second geometric coordinates may be integers or decimals.
- the penalty coefficient is used to limit the sum of the weights of the K neighbor points of the isolated point
- the sum of the weights of the K neighboring points is less than or equal to the penalty coefficient, based on the sum of the weights of the K neighboring points, the weight of the K neighboring points and the attribute reconstruction value of the K neighboring points , to determine the attribute prediction value of the target point.
- the distance between the neighboring point and the second geometric coordinate is determined based on the second geometric coordinate and the geometric coordinates of the K neighboring points, and the weight is determined based on the distance.
- the penalty coefficient can limit the weight of the K neighbor points of the isolated point by limiting the weight of each neighbor point, or by limiting the sum of the weights of the K neighbor points. , to avoid outliers still obtaining large attribute prediction values after interpolation.
- ⁇ is the penalty coefficient
- ⁇ is the weight of the jth neighbor, when d ij is larger, Corresponding The weight value will be reduced to penalize the , but compared to bilinear interpolation, this penalty factor does not make Only when d ij ⁇ , In experiments, ⁇ is usually set to 3.
- bounded three-nearest neighbor interpolation has a larger search range, which effectively avoids the problem that the predicted value of the attribute obtained by interpolation is zero.
- the penalty coefficient ⁇ is used to limit the sum of the weights of the three neighbors of the outliers.
- this solution uses channel motion vectors instead of original motion vectors in some embodiments. Specifically, when the motion information of the target point is the motion information of the target point on the target channel, determine the attribute prediction value of the target point on the target channel; wherein, the target channel is the current One of all channels in the frame point cloud.
- ⁇ x i , ⁇ y i , ⁇ z i are the motion vectors corresponding to channel c in x,y,z on the weight.
- the bounded three-nearest neighbor interpolation algorithm used in motion estimation has a larger search range, which effectively avoids the problem that the predicted value of the attribute obtained by interpolation is zero.
- the penalty coefficient is used to avoid that the isolated points can still obtain a large attribute prediction value after interpolation, and improve the accuracy of attribute value prediction.
- a trilinear interpolation algorithm may also be used.
- bilinear interpolation is a common interpolation method applied to images.
- Trilinear interpolation is bilinear interpolation taking into account the z-axis.
- offset set N 3 ⁇ (x,y,z) ⁇ x,y,z ⁇ 0,1 ⁇
- the following further illustrates the inter-frame prediction module in the first neural network in the embodiment of the present application.
- FIG. 4 is a schematic structural diagram of an inter-frame prediction module in an embodiment of the present application.
- the inter-frame prediction module includes a multi-scale motion estimation module, a motion information compression and decompression module, and a motion compensation module.
- the multi-scale motion estimation module includes a connection module, which is used to connect the reconstructed point cloud of the reference frame with the point cloud of the current frame to obtain connection data.
- the current frame point cloud and the reference frame reconstructed point cloud are in the form of sparse tensors, and the sparse tensor form of the current frame point cloud p 2 is:
- the sparse tensor form of the reference frame reconstruction point cloud p 1 is:
- a motion vector is extracted using a sparse convolution-based motion estimator.
- p 1 and p 2 define the connected sparse tensor p c as:
- pc is defined as the set of geometric coordinates of the sparse tensor p.
- f i is defined as follows:
- f i is defined as the feature of the sparse tensor p c corresponding to the geometric coordinates (x i , y i , z i ), is the splicing operation of vectors
- p[xi , y i ,zi ] is defined as the feature of the sparse tensor p corresponding to the geometric coordinates (xi , y i ,zi ) .
- ⁇ represents the intersection symbol
- - represents the complement symbol
- p 1 .cp 2 .c represents the coordinates belong to p 1 .c but not p 2 .c
- p 2 .cp 1 .c represents the coordinates belong to p 2 .c but not p 1 .c.
- the multi-scale motion estimation module includes an extraction module comprising: two convolutional layers and an activation layer followed by each convolutional layer;
- the extraction module is configured to: input the connection data into each convolutional layer and the subsequent activation layer in turn to obtain the original motion information.
- the first convolutional layer parameter of the extraction module is Conv(64, 3, 1)
- the second convolutional layer parameter is Conv(64, 3, 1).
- the multi-scale motion estimation module includes a first motion estimation module, and the first motion estimation module includes: a convolutional layer, an activation layer, and a three-layer initial residual network;
- the first motion estimation module is configured to: input the original motion information to the convolutional layer, the activation layer, and the three-layer initial residual network in sequence to perform low-scale motion estimation to obtain the first Sports information.
- the first motion estimation module can be understood as a low-scale motion estimation module, which is used to perform rough motion estimation on the point cloud of the current frame.
- the convolutional layer parameter of the first motion estimation module is Conv(64, 2, 2), which is used to down-sample the original motion information.
- the multi-scale motion estimation module includes a second motion estimation module, and the second motion estimation module includes: a deconvolution layer, a first pruning layer, a subtractor, and a convolution layer ;
- the second motion estimation module is configured to:
- the second motion estimation module can be understood as a high-scale motion estimation module, which is used to perform precise motion estimation on the point cloud of the current frame under the guidance of the second-scale motion estimation module.
- the parameters of the deconvolution layer of the second motion estimation module are Deconv(64, 2, 2)
- the parameters of the convolution layer are Conv(64, 2, 2).
- the multi-scale motion estimation module also includes: a second pruning layer, a third pruning layer and an adder;
- the multi-scale motion estimation module is configured to:
- the adder adds the pruned first motion information and the second motion information to obtain the final motion information.
- the motion information includes motion features and geometric coordinates. That is to say, encoding and decoding motion information includes encoding and decoding motion features and geometric coordinates.
- the motion information compression and decompression module specifically compresses and decompresses motion features, and the lossless encoder performs lossless encoding on the geometric coordinate set C P2 corresponding to the current frame point cloud P 2 and writes the coded bits into the code stream.
- the motion feature passes through the convolutional layer Conv (48, 2, 2), the quantizer Q and the autoencoder AE, and the encoded bits are written into the code stream.
- the motion information decompression compensation module decompresses the motion features, and the code stream is decoded through the self-decoder and the deconvolution layer Deconv (64, 2, 2).
- the motion compensation module also includes an extraction module, which is used to obtain the motion information of the target point from the decoded motion information.
- the extraction module includes: a first pruning layer, a first convolutional layer, a pooling layer, a deconvolution layer, a second pruning layer, a second convolutional layer and an adder.
- the decoded motion features are pruned through the first pruning layer, so that the pruned motion information is the same as the geometric coordinate set of the residual information;
- An adder is used to add the low-scale motion information and the high-scale motion information to obtain the motion information of each channel.
- the second pruning layer prunes the output of the deconvolution layer based on the decoded geometric coordinate set C P2 of the current frame point cloud, so that the geometric coordinate sets of the low-scale motion information and the high-scale motion information before addition are the same .
- the reference frame point cloud P 1 and the motion information output by the adder are used for interpolation operation to obtain prediction information P' 2 .
- the first neural network further includes a first feature extraction module and a second feature extraction module located before the inter prediction module;
- the first feature extraction module is configured to: perform feature extraction on the reference frame reconstruction point cloud, and convert the reference frame reconstruction point cloud into a sparse tensor form;
- the second feature extraction module is configured to: perform feature extraction on the current frame point cloud, and convert the current frame point cloud into a sparse tensor form.
- the point cloud is converted into a sparse tensor form through the feature extraction module, and then the subsequent motion estimation, motion estimation, and encoding and decoding operations are performed.
- each feature extraction module includes a first down-sampling module and a second down-sampling module;
- the first down-sampling module includes: two convolutional layers, each convolutional layer followed by Activation layer and three-layer initial residual network:
- the second down-sampling module includes: two convolutional layers, activation layers followed by each convolutional layer and three-layer initial residual network; the first down-sampling module and The convolution layer parameters of the second down-sampling module are different.
- Fig. 5 is a schematic diagram of the composition structure of the first neural network in the embodiment of the present application.
- the first neural network includes a feature extraction module, an inter-frame prediction module, a residual compression and decompression module, and a point cloud reconstruction module.
- the first feature extraction module is used to perform feature extraction on the reconstructed point cloud of the previous frame to obtain the sparse tensor form P 1 of the reconstructed point cloud of the previous frame
- the second feature extraction module is used to perform feature extraction on the current frame point cloud , to obtain the sparse tensor form P 2 of the point cloud of the current frame.
- FIG. 6 is a schematic diagram of the composition and structure of the downsampling module in the embodiment of the present application, as shown in FIG. 6 ,
- the downsampling module is implemented using a feature extractor based on a sparse convolutional network, which maps the point cloud geometric space to the point cloud feature space, which is the sparse tensor form of the point cloud.
- the downsampling module consists of a convolution layer with a convolution kernel size of 3 and a step size of 1, and a convolution layer with a convolution kernel size of 2 and a step size of 2.
- Each convolution layer is followed by a ReLU activation layer.
- IRN Intra Residual Network
- the parameter H of the convolutional layer in the downsampling module represents the hidden dimension
- O represents the output dimension.
- H and O are shown in Figure 5, that is, the first convolutional layer H of the first downsampling module is 16, and the second convolutional layer O is 32, the first convolutional layer H of the second downsampling module is 32, and the second convolutional layer O is 64.
- Conv(c, k, s) identifies a convolutional layer with a channel number (dimension) of c, a convolution kernel size of k, and a step size of s.
- the first neural network includes a second compression module and a second decompression module corresponding to the second compression module; that is, the residual compression and decompression module in FIG. 5 .
- the second compression module is configured to: down-sample the residual information; perform quantization and entropy coding on the down-sampled residual information to obtain the encoding bits;
- the second decompression module is configured to: perform entropy decoding on coded bits of the residual information to obtain decoded residual information.
- the second decompression module further includes inverse quantization after entropy decoding.
- the second compression module includes: a convolutional layer Conv(32,8), a quantizer Q, and an autoencoder (AE), and the second decompression module includes: an autodecoder (AD).
- the probability distribution is obtained through the entropy model, and the AE is used for arithmetic coding to obtain the 01 bit stream, which is transmitted to the decoding end.
- Corresponding entropy decoding and upsampling need to be performed at the decoding end, and the decoded running information is obtained to participate in point cloud reconstruction.
- Corresponding entropy decoding and upsampling also need to be performed at the encoding end, and the decoded running information is obtained to participate in point cloud reconstruction.
- residual information includes residuals and geometric coordinates. That is to say, encoding and decoding the residual information includes encoding and decoding the residual and the geometric coordinates CR .
- the first neural network further includes a point cloud reconstruction module located after the inter-frame prediction module; the point cloud reconstruction module is configured to:
- Up-sampling is performed on the first reconstructed point cloud to obtain the reconstructed point cloud of the current frame.
- the point cloud reconstruction module includes a first upsampling module, a second upsampling module and a third upsampling module.
- Fig. 7 is a schematic diagram of the composition and structure of the first upsampling module in the embodiment of the present application.
- the first upsampling module includes: a deconvolution layer, a first activation layer, a first convolution layer, a Two activation layers, three layers of initial residual network IRN, adder, second convolution layer, classification layer (Classify), pruning layer;
- the first upsampling module is configured to:
- the pruning layer prunes the addition result based on the first set of geometric coordinates to obtain the first reconstructed point cloud.
- the parameter H in the upsampling module represents the hidden dimension
- O represents the output dimension.
- the specific values of H and O are shown in Figure 5, that is, the deconvolution layer H of the first upsampling module is 32, and the first convolution layer O is 32.
- Conv(c, k, s) identifies a convolutional layer with a channel number (dimension) of c, a convolution kernel size of k, and a step size of s.
- the upsampling module consists of a deconvolution layer with a convolution kernel size of 2 and a step size of 2, and a convolution layer with a convolution kernel size of 3 and a step size of 1.
- the convolutional layers are connected with the ReLU activation function.
- a classification layer is used to determine the probability distribution of occupancy, and pruning is performed.
- the coefficient ⁇ is defined, and only the points with the occupancy probability before ⁇ N in the sparse tensor are retained after pruning.
- the occupancy condition is to select a point ⁇ N before the occupancy probability.
- the point cloud reconstruction module includes a second upsampling module and a third upsampling module, which are used to perform two upsampling on the first reconstructed point cloud output by the first upsampling module to obtain Reconstruct the point cloud for the current frame.
- FIG 8 is a schematic diagram of the composition and structure of the second upsampling module in the embodiment of the present application.
- the second upsampling module includes: a deconvolution layer, a first activation layer, a first convolution layer, a second Two activation layers, three layers of initial residual network IRN, second convolutional layer, classification layer, and pruning layer;
- the second upsampling module is configured to: sequentially pass the first reconstructed point cloud through the first deconvolution layer, the first activation layer, the first convolution layer, the second activation layer and three layers of initial residuals network to obtain the first reconstructed point cloud after upsampling;
- the first pruning layer prunes the upsampled first reconstructed point cloud based on the second set of geometric coordinates to obtain a second reconstructed point cloud.
- the deconvolution layer and convolution layer parameters in the second upsampling module are shown in Figure 8, where the deconvolution layer H of the second upsampling module is 64, and the first convolution layer O is 64.
- the third upsampling module includes: a second deconvolution layer, a third activation layer, a third convolution layer, a fourth activation layer, a three-layer initial residual network, a fourth convolution layer, a second classification layer, second pruning layer;
- the third upsampling module is configured to: pass the second reconstructed point cloud through the second deconvolution layer, the third activation layer, the third convolution layer, the fourth activation layer, and three layers of initial residuals in sequence network to obtain the second reconstructed point cloud after upsampling;
- the second pruning layer prunes the upsampled second reconstruction point cloud based on the third geometric coordinate set to obtain the current frame reconstruction point cloud; wherein, the second upsampling module's The parameters of the first deconvolution layer and the second deconvolution layer of the third upsampling module are different, the first convolution layer of the second upsampling module and the third volume of the third upsampling module Layer parameters are different.
- composition structure of the third upsampling module is the same as that of the second upsampling module, and the parameters of the convolution layer and the deconvolution layer are different, wherein the deconvolution layer H of the third upsampling module is 16, and the third upsampling module The convolutional layer O is 16.
- the embodiment of this application provides an end-to-end neural network, which uses a multi-scale motion estimation network, a bounded three-nearest neighbor interpolation algorithm, and a factorial variational self-encoding entropy model based on deep learning, which greatly improves the encoding efficiency.
- the calculation process is all composed of matrix operations, which has good parallelism, and can obtain a huge acceleration effect when running on a graphics processing unit (GPU).
- GPU graphics processing unit
- the encoding method provided in the embodiment of the present application further includes: training the first neural network.
- the training sample set includes one or more sample point clouds;
- the first sample point cloud is any sample point cloud in the training sample set
- the first sample point cloud is input into the first neural network as the current frame point cloud
- the output corresponding to the first sample point cloud is Motion information code stream and residual information code stream, and its reconstructed point cloud, based on the first sample point cloud and reconstructed point cloud, determine the distortion loss value of the first sample point cloud, based on the motion information code stream and residual information
- the code stream calculates the code rate loss value of the first sample point cloud, builds a loss function for the training target by reducing the code rate to ensure the quality of point cloud reconstruction, and calculates the total loss value.
- the loss value of the first neural network is greater than the preset threshold (ie does not meet the loss condition)
- adjust the network parameters for the next training when the loss value is less than or equal to the preset threshold (that is, meet the loss condition), the trained first neural network is obtained, which is used in dynamic point cloud encoding.
- the loss function of the first neural network is composed of two parts: the distortion of the point cloud, denoted as D; the code rate, denoted as R.
- sparse convolution is used to downsample the motion information/residual information to obtain the downsampled feature y. Since the quantization process is not derivable, uniform noise U(-0.5,0.5) is used instead in the training phase Quantify.
- the counting feature is Using the arithmetic coder pair For entropy encoding and decoding, then
- the decoding end and the encoding end can perform end-to-end self-supervised learning as a whole, reducing artificial Intervention, using this network for encoding and decoding, using this network for encoding can not only improve the quality of the point cloud, but also save the bit rate, thereby improving the efficiency of encoding and decoding.
- FIG. 9 is a schematic flowchart of the decoding method in the embodiment of the present application. As shown in FIG. 9, the method may include:
- Step 901 Obtain code stream
- bit stream includes motion information and residual information of the point cloud.
- the second neural network is used to decode the bit stream and reconstruct the point cloud.
- Step 902 Preset the code stream decoded by the second neural network to obtain the reconstructed point cloud of the current frame
- the second neural network is an end-to-end neural network, and the second neural network is configured as:
- decoding method described in the embodiment of the present application specifically refers to the point cloud decoding method, which can be applied to a point cloud decoder (in the embodiment of the present application, it may be simply referred to as "decoder").
- the point cloud of the current frame can be understood as the point cloud to be decoded.
- the reference frame reconstruction point cloud can be understood as a decoded point cloud
- the reference frame reconstruction point cloud can be a reconstruction point cloud for the previous frame, or a reconstruction point set of some decoded points in the current frame point cloud. That is to say, the reference point of the point to be decoded may be the reconstruction point of the previous frame or the current frame.
- the second neural network includes a first decompression module
- the first decompression module is configured to: perform entropy decoding and up-sampling on coded bits of the motion information in the code stream to obtain the motion information.
- the first decompression module includes: an autodecoder (AD) and a deconvolution layer.
- the first decompression module performs entropy decoding and upsampling on the code stream, and obtains the decoded running information to participate in point cloud reconstruction.
- the second neural network includes a motion compensation module, and the motion compensation module uses a preset interpolation algorithm to perform motion compensation.
- the interpolation algorithm may be a bounded three-nearest neighbor interpolation algorithm, or a trilinear interpolation algorithm.
- the motion compensation module when the motion compensation module performs motion compensation based on a bounded three-nearest neighbor interpolation algorithm, the motion compensation module is configured to:
- the penalty coefficient is used to limit the weights of the K neighbor points of the isolated point.
- the penalty coefficient can be understood as limiting the selection boundary of the neighboring points. For the isolated point, the distance between the neighboring points is far away, and the penalty coefficient limits the weight of the neighboring points of the isolated point, so as to avoid that the isolated point can still obtain a large attribute after interpolation Predictive value.
- the K neighboring points refer to the K points closest to the second geometric coordinate in the reference frame
- the second geometric coordinate is the position of the target point in the reference frame
- the second geometric coordinate can be integer or decimal.
- the penalty coefficient is used to limit the sum of the weights of the K neighbor points of the isolated point
- Determining the attribute prediction value of the target point in the current frame point cloud based on the attribute reconstruction values of the K neighbor points in the reference frame reconstruction point cloud and a preset penalty coefficient includes:
- the sum of the weights of the K neighboring points is less than or equal to the penalty coefficient, based on the sum of the weights of the K neighboring points, the weight of the K neighboring points and the attribute reconstruction value of the K neighboring points , to determine the attribute prediction value of the target point.
- the penalty coefficient can limit the weight of the K neighbor points of the isolated point by limiting the weight of each neighbor point, or by limiting the sum of the weights of the K neighbor points.
- ⁇ is the penalty coefficient
- ⁇ is the weight of the jth neighbor, when d ij is larger, Corresponding The weight value will be reduced to penalize the , but compared to bilinear interpolation, this penalty factor does not make Only when d ij ⁇ , In experiments, ⁇ is usually set to 3.
- bounded three-nearest neighbor interpolation has a larger search range, which effectively avoids the problem that the predicted value of the attribute obtained by interpolation is zero.
- the penalty coefficient ⁇ is used to limit the sum of the weights of the three neighbors of the outliers.
- this solution uses channel motion vectors instead of original motion vectors in some embodiments. Specifically, when the motion information of the target point is the motion information of the target point on the target channel, determine the attribute prediction value of the target point on the target channel; wherein, the target channel is the current One of all channels in the frame point cloud.
- ⁇ x i , ⁇ y i , ⁇ z i are the motion vectors corresponding to channel c in x,y,z on the weight.
- the bounded three-nearest neighbor interpolation algorithm used in motion estimation has a larger search range, which effectively avoids the problem that the predicted value of the attribute obtained by interpolation is zero.
- the penalty coefficient is used to avoid that the isolated points can still obtain a large attribute prediction value after interpolation, and improve the accuracy of attribute value prediction.
- the interpolation algorithm may also use a trilinear interpolation algorithm.
- bilinear interpolation is a common interpolation method applied to images.
- Trilinear interpolation is bilinear interpolation taking into account the z-axis.
- offset set N 3 ⁇ (x,y,z) ⁇ x,y,z ⁇ 0,1 ⁇
- the specific structure of the motion compensation module can be referred to in FIG. 4 , and the motion compensation module also includes an extraction module for obtaining motion information of the target point from the decoded motion information.
- the extraction module includes: a first pruning layer, a first convolutional layer, a pooling layer, a deconvolution layer, a second pruning layer, a second convolutional layer and an adder.
- the decoded motion features are pruned through the first pruning layer, so that the pruned motion information is the same as the geometric coordinate set of the residual information;
- An adder is used to add the low-scale motion information and the high-scale motion information to obtain the motion information of each channel.
- the second pruning layer prunes the output of the deconvolution layer based on the decoded geometric coordinate set C P2 of the current frame point cloud, so that the geometric coordinate sets of the low-scale motion information and the high-scale motion information before addition are the same .
- the reference frame point cloud P 1 and the motion information output by the adder are used for interpolation operation to obtain prediction information P' 2 .
- the second neural network further includes a first feature extraction module located before the motion compensation module;
- the first feature extraction module is configured to: perform feature extraction on the reconstructed point cloud of the reference frame, and convert the reconstructed point cloud of the reference frame into a sparse tensor form.
- the point cloud is converted into a sparse tensor form through the feature extraction module, and then the subsequent motion estimation and decoding operations are performed.
- the first feature extraction module includes a first down-sampling module and a second down-sampling module
- the first downsampling module includes: two convolutional layers, an activation layer followed by each convolutional layer, and a three-layer initial residual network:
- the second down-sampling module includes: two convolutional layers, an activation layer followed by each convolutional layer, and a three-layer initial residual network;
- the convolution layer parameters of the first down-sampling module and the second down-sampling module are different.
- Fig. 10 is a schematic diagram of the composition and structure of the second neural network in the embodiment of the present application.
- the second neural network includes a first feature extraction module, a first decompression module (ie, a motion information decompression module), and a motion compensation module. module, a second decompression module (i.e. residual decompression module), and a point cloud reconstruction module.
- the first feature extraction module is used to perform feature extraction on the reconstructed point cloud of the previous frame to obtain the sparse tensor form P 1 of the reconstructed point cloud of the previous frame.
- a schematic diagram of the composition and structure of the down-sampling module in the first feature extraction module is shown in FIG. 6 .
- the first decompression module is configured to: perform entropy decoding and up-sampling on coded bits of the motion information to obtain decoded motion information.
- the second decompression module is configured to: perform entropy decoding on the coded bits of the residual information to obtain decoded residual information.
- residual information includes residuals and geometric coordinates. That is to say, encoding and decoding the residual information includes encoding and decoding the residual and the geometric coordinates CR .
- the second neural network further includes a point cloud reconstruction module located after the motion compensation module;
- the point cloud reconstruction module is configured to:
- Up-sampling is performed on the first reconstructed point cloud to obtain the reconstructed point cloud of the current frame.
- the point cloud reconstruction module includes a first upsampling module, a second upsampling module and a third upsampling module.
- the first upsampling module includes: a deconvolution layer, a first activation layer, a first convolution layer, a second activation layer, a three-layer initial residual network, an adder, a second convolution layer, a classification layer, a shear branch layer;
- the first upsampling module is configured to:
- the residual information is sequentially passed through the deconvolution layer, the first activation layer, the first convolution layer, the second activation layer and the three-layer initial residual network to obtain the upsampled residual information;
- the pruning layer prunes the addition result based on the first set of geometric coordinates to obtain the first reconstructed point cloud.
- the second upsampling module includes: a first deconvolution layer, a first activation layer, a first convolution layer, a second activation layer, a three-layer initial residual network, a second convolution layer, a first classification layer, first pruning layer;
- the second upsampling module is configured to: sequentially pass the first reconstructed point cloud through the first deconvolution layer, the first activation layer, the first convolution layer, the second activation layer and three layers of initial residuals network to obtain the first reconstructed point cloud after upsampling;
- the first pruning layer prunes the upsampled first reconstructed point cloud based on the second set of geometric coordinates to obtain a second reconstructed point cloud;
- the third upsampling module includes: a second deconvolution layer, a third activation layer, a third convolution layer, a fourth activation layer, a three-layer initial residual network, a fourth convolution layer, a second classification layer, second pruning layer;
- the third upsampling module is configured to: pass the second reconstructed point cloud through the second deconvolution layer, the third activation layer, the third convolution layer, the fourth activation layer, and three layers of initial residuals in sequence network to obtain the second reconstructed point cloud after upsampling;
- the second pruning layer prunes the upsampled second reconstruction point cloud based on the third geometric coordinate set to obtain the current frame reconstruction point cloud; wherein, the second upsampling module's The parameters of the first deconvolution layer and the second deconvolution layer of the third upsampling module are different, the first convolution layer of the second upsampling module and the third volume of the third upsampling module Layer parameters are different.
- the decoding method provided in the embodiment of the present application further includes: training the second neural network.
- the decoding end and the encoding end network can be used as a whole for end-to-end Self-supervised learning, after the training is completed, the encoding end retains the entire network (i.e. the first neural network), and the decoding end retains part of the network shown in Figure 10 (i.e. the second neural network).
- the decoder and encoder networks can perform end-to-end self-supervised learning as a whole, reducing human intervention, and using this network for decoding can reduce distortion and ensure the quality of the reconstructed point cloud.
- FIG. 11 shows a schematic diagram of the composition and structure of an encoder 110 provided in the embodiment of the present application.
- the encoder 110 may include: a determining unit 1101 and an encoding unit 1102,
- the determination unit is configured to determine the current frame point cloud, and the reference frame reconstruction point cloud corresponding to the current frame point cloud;
- the encoding unit is configured to encode the point cloud of the current frame by using the preset first neural network to reconstruct the point cloud based on the reference frame, and write the obtained encoded bits into the code stream;
- the first neural network is an end-to-end neural network, and the first neural network is configured as:
- the first neural network includes an inter prediction module configured to:
- the residual information is obtained based on the current frame point cloud and prediction information of the current frame point cloud.
- the inter prediction module includes a multi-scale motion estimation module configured to:
- connection data Connecting the reconstructed point cloud of the reference frame with the point cloud of the current frame to obtain connection data
- the final motion information is obtained.
- the multi-scale motion estimation module includes an extraction module comprising: two convolutional layers each followed by an activation layer;
- the extraction module is configured to: input the connection data into each convolutional layer and the subsequent activation layer in turn to obtain the original motion information.
- the multi-scale motion estimation module includes a first motion estimation module comprising: a convolutional layer, an activation layer, and a three-layer initial residual network;
- the first motion estimation module is configured to: input the original motion information to the convolutional layer, the activation layer, and the three-layer initial residual network in sequence to perform low-scale motion estimation to obtain the first Sports information.
- the multi-scale motion estimation module comprises a second motion estimation module comprising: a deconvolution layer, a first pruning layer, a subtractor and a convolution layer;
- the second motion estimation module is configured to:
- the multi-scale motion estimation module further includes: a second pruning layer, a third pruning layer and an adder;
- the multi-scale motion estimation module is configured to:
- the adder adds the pruned first motion information and the second motion information to obtain the final motion information.
- the inter prediction module includes a first compression module and a first decompression module corresponding to the first compression module;
- the first compression module is configured to:
- the first decompression module is configured to:
- Entropy decoding and upsampling are performed on the coded bits of the motion information to obtain decoded motion information.
- the inter prediction module includes a motion compensation module configured to:
- the penalty coefficient is used to limit the weights of the K neighbor points of the isolated point.
- the penalty coefficient is used to limit the sum of the weights of the K neighbors of the isolated point
- Determining the attribute prediction value of the target point in the current frame point cloud based on the attribute reconstruction values of the K neighbor points in the reference frame reconstruction point cloud and a preset penalty coefficient includes:
- the sum of the weights of the K neighboring points is less than or equal to the penalty coefficient, based on the sum of the weights of the K neighboring points, the weight of the K neighboring points and the attribute reconstruction value of the K neighboring points , to determine the attribute prediction value of the target point.
- the motion information of the target point is the motion information of the target point on the target channel
- the target channel is one of all channels in the point cloud of the current frame.
- the first neural network further includes a first feature extraction module and a second feature extraction module located before the inter prediction module;
- the first feature extraction module is configured to: perform feature extraction on the reference frame reconstruction point cloud, and convert the reference frame reconstruction point cloud into a sparse tensor form;
- the second feature extraction module is configured to: perform feature extraction on the current frame point cloud, and convert the current frame point cloud into a sparse tensor form.
- each feature extraction module includes a first downsampling module and a second downsampling module
- the first downsampling module includes: two convolutional layers, an activation layer followed by each convolutional layer, and a three-layer initial residual network:
- the second down-sampling module includes: two convolutional layers, an activation layer followed by each convolutional layer, and a three-layer initial residual network;
- the convolution layer parameters of the first down-sampling module and the second down-sampling module are different.
- the first neural network further includes a point cloud reconstruction module located after the inter prediction module;
- the point cloud reconstruction module is configured to:
- Up-sampling is performed on the first reconstructed point cloud to obtain the reconstructed point cloud of the current frame.
- the point cloud reconstruction module includes a first upsampling module
- the first upsampling module includes: a deconvolution layer, a first activation layer, a first convolution layer, a second activation layer, a three-layer initial residual network, an adder, a second convolution layer, a classification layer, a shear branch layer;
- the first upsampling module is configured to:
- the pruning layer prunes the addition result based on the first set of geometric coordinates to obtain the first reconstructed point cloud.
- the point cloud reconstruction module includes a second upsampling module and a third upsampling module
- the second upsampling module includes: a first deconvolution layer, a first activation layer, a first convolution layer, a second activation layer, a three-layer initial residual network, a second convolution layer, a first classification layer, first pruning layer;
- the second upsampling module is configured to: sequentially pass the first reconstructed point cloud through the first deconvolution layer, the first activation layer, the first convolution layer, the second activation layer and three layers of initial residuals network to obtain the first reconstructed point cloud after upsampling;
- the first pruning layer prunes the upsampled first reconstructed point cloud based on the second set of geometric coordinates to obtain a second reconstructed point cloud;
- the third upsampling module includes: a second deconvolution layer, a third activation layer, a third convolution layer, a fourth activation layer, a three-layer initial residual network, a fourth convolution layer, a second classification layer, second pruning layer;
- the third upsampling module is configured to: pass the second reconstructed point cloud through the second deconvolution layer, the third activation layer, the third convolution layer, the fourth activation layer, and three layers of initial residuals in sequence network to obtain the second reconstructed point cloud after upsampling;
- the second pruning layer prunes the upsampled second reconstruction point cloud based on the third geometric coordinate set to obtain the current frame reconstruction point cloud; wherein, the second upsampling module's The parameters of the first deconvolution layer and the second deconvolution layer of the third upsampling module are different, the first convolution layer of the second upsampling module and the third volume of the third upsampling module Layer parameters are different.
- the first neural network includes a second compression module and a second decompression module corresponding to the second compression module;
- the second compression module is configured to:
- the second decompression module is configured to:
- Entropy decoding is performed on the coded bits of the residual information to obtain decoded residual information.
- the training unit is configured to obtain a training sample set; wherein, the training sample set includes one or more sample point clouds; the first sample in the training sample set is processed by the first neural network
- the point cloud is encoded and reconstructed to obtain the code rate of the first sample point cloud and the reconstructed point cloud; based on the first sample point cloud and the reconstructed point cloud, determine the value of the first sample point cloud Distortion; calculate a loss value based on the distortion and code rate of the first sample point cloud; when the loss value does not meet the convergence condition, adjust the network parameters of the first neural network; when the loss value meets the convergence condition, It is determined that the training of the first neural network is completed.
- a "unit" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a module, or it may be non-modular.
- each component in this embodiment may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated units can be implemented in the form of hardware or in the form of software function modules.
- the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer-readable storage medium.
- the technical solution of this embodiment is essentially or It is said that the part that contributes to the prior art or the whole or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium, and includes several instructions to make a computer device (which can It is a personal computer, a server, or a network device, etc.) or a processor (processor) that executes all or part of the steps of the method described in this embodiment.
- the aforementioned storage medium includes: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other various media that can store program codes.
- the embodiment of the present application provides a computer storage medium, which is applied to the encoder 110, and the computer storage medium stores a computer program, and when the computer program is executed by the first processor, it implements any one of the preceding embodiments. Methods.
- the encoder 110 may include: a first communication interface 1201 , a first memory 1202 and a first processor 1203 ; each component is coupled together through a first bus system 1204 .
- the first bus system 1204 is used to realize connection and communication between these components.
- the first bus system 1204 also includes a power bus, a control bus and a status signal bus.
- the various buses are labeled as the first bus system 1204 in FIG. 19 . in,
- the first communication interface 1201 is used for receiving and sending signals during the process of sending and receiving information with other external network elements;
- the first memory 1202 is used to store computer programs that can run on the first processor 1203;
- the first processor 1203 is configured to execute the steps of the encoding method of the present application when running the computer program.
- the first memory 1202 in the embodiment of the present application may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories.
- the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash.
- the volatile memory can be Random Access Memory (RAM), which acts as external cache memory.
- RAM Static Random Access Memory
- DRAM Dynamic Random Access Memory
- SRAM Dynamic Random Access Memory
- Synchronous Dynamic Random Access Memory Synchronous Dynamic Random Access Memory
- SDRAM double data rate synchronous dynamic random access memory
- Double Data Rate SDRAM DDRSDRAM
- enhanced SDRAM ESDRAM
- Synchlink DRAM SLDRAM
- Direct Memory Bus Random Access Memory Direct Rambus RAM, DRRAM
- the first memory 1202 of the systems and methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.
- the first processor 1203 may be an integrated circuit chip, which has a signal processing capability. In the implementation process, each step of the above method may be implemented by an integrated logic circuit of hardware in the first processor 1203 or an instruction in the form of software.
- the above-mentioned first processor 1203 may be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a ready-made programmable gate array (Field Programmable Gate Array, FPGA) Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
- DSP Digital Signal Processor
- ASIC Application Specific Integrated Circuit
- FPGA Field Programmable Gate Array
- Various methods, steps, and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed.
- a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
- the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
- the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
- the storage medium is located in the first memory 1202, and the first processor 1203 reads the information in the first memory 1202, and completes the steps of the above method in combination with its hardware.
- the embodiments described in this application may be implemented by hardware, software, firmware, middleware, microcode or a combination thereof.
- the processing unit can be implemented in one or more application specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processor (Digital Signal Processing, DSP), digital signal processing device (DSP Device, DSPD), programmable Logic device (Programmable Logic Device, PLD), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), general-purpose processor, controller, microcontroller, microprocessor, other devices used to perform the functions described in this application electronic unit or its combination.
- the techniques described herein can be implemented through modules (eg, procedures, functions, and so on) that perform the functions described herein.
- Software codes can be stored in memory and executed by a processor. Memory can be implemented within the processor or external to the processor.
- the first processor 1203 is further configured to execute the method described in any one of the foregoing embodiments when running the computer program.
- FIG. 13 shows a schematic diagram of the composition and structure of a decoder 130 provided in the embodiment of the present application.
- the decoder 130 may include: an acquisition unit 1301 and a decoding unit 1302, wherein,
- the obtaining unit 1301 is configured to obtain a code stream
- the decoding unit 1302 is configured to use the preset second neural network to decode the code stream to obtain the reconstructed point cloud of the current frame;
- the second neural network is an end-to-end neural network, and the second neural network is configured as:
- the second neural network includes a motion compensation module configured to:
- the penalty coefficient is used to limit the weights of the K neighbor points of the isolated point.
- the penalty coefficient is used to limit the sum of the weights of the K neighbors of the isolated point
- Determining the attribute prediction value of the target point in the current frame point cloud based on the attribute reconstruction values of the K neighbor points in the reference frame reconstruction point cloud and a preset penalty coefficient includes:
- the sum of the weights of the K neighboring points is less than or equal to the penalty coefficient, based on the sum of the weights of the K neighboring points, the weight of the K neighboring points and the attribute reconstruction value of the K neighboring points , to determine the attribute prediction value of the target point.
- the motion information of the target point is the motion information of the target point on the target channel
- the target channel is one of all channels in the point cloud of the current frame.
- the second neural network further includes a first feature extraction module located before the motion compensation module;
- the first feature extraction module is configured to: perform feature extraction on the reconstructed point cloud of the reference frame, and convert the reconstructed point cloud of the reference frame into a sparse tensor form.
- the first feature extraction module includes a first downsampling module and a second downsampling module
- the first downsampling module includes: two convolutional layers, an activation layer followed by each convolutional layer, and a three-layer initial residual network:
- the second down-sampling module includes: two convolutional layers, an activation layer followed by each convolutional layer, and a three-layer initial residual network;
- the convolution layer parameters of the first down-sampling module and the second down-sampling module are different.
- the second neural network further includes a point cloud reconstruction module located after the motion compensation module;
- the point cloud reconstruction module is configured to:
- Up-sampling is performed on the first reconstructed point cloud to obtain the reconstructed point cloud of the current frame.
- the point cloud reconstruction module includes a first upsampling module
- the first upsampling module includes: a deconvolution layer, a first activation layer, a first convolution layer, a second activation layer, a three-layer initial residual network, an adder, a second convolution layer, a classification layer, a shear branch layer;
- the first upsampling module is configured to:
- the residual information is sequentially passed through the deconvolution layer, the first activation layer, the first convolution layer, the second activation layer and the three-layer initial residual network to obtain the upsampled residual information;
- the pruning layer prunes the addition result based on the first set of geometric coordinates to obtain the first reconstructed point cloud.
- the point cloud reconstruction module includes a second upsampling module and a third upsampling module
- the second upsampling module includes: a first deconvolution layer, a first activation layer, a first convolution layer, a second activation layer, a three-layer initial residual network, a second convolution layer, a first classification layer, first pruning layer;
- the second upsampling module is configured to: sequentially pass the first reconstructed point cloud through the first deconvolution layer, the first activation layer, the first convolution layer, the second activation layer and three layers of initial residuals network to obtain the first reconstructed point cloud after upsampling;
- the first pruning layer prunes the upsampled first reconstructed point cloud based on the second set of geometric coordinates to obtain a second reconstructed point cloud;
- the third upsampling module includes: a second deconvolution layer, a third activation layer, a third convolution layer, a fourth activation layer, a three-layer initial residual network, a fourth convolution layer, a second classification layer, second pruning layer;
- the third upsampling module is configured to: pass the second reconstructed point cloud through the second deconvolution layer, the third activation layer, the third convolution layer, the fourth activation layer, and three layers of initial residuals in sequence network to obtain the second reconstructed point cloud after upsampling;
- the second pruning layer prunes the upsampled second reconstruction point cloud based on the third geometric coordinate set to obtain the current frame reconstruction point cloud; wherein, the second upsampling module's The parameters of the first deconvolution layer and the second deconvolution layer of the third upsampling module are different, the first convolution layer of the second upsampling module and the third volume of the third upsampling module Layer parameters are different.
- the second neural network includes a second decompression module
- the second decompression module is configured to: perform entropy decoding on coded bits of the residual information in the code stream to obtain decoded residual information.
- the decoder 130 may include: a second communication interface 1401 , a second memory 1402 , and a second processor 1403 ; each component is coupled together through a second bus system 1404 .
- the second bus system 1404 is used to realize connection and communication between these components.
- the second bus system 1404 includes not only a data bus, but also a power bus, a control bus and a status signal bus. However, the various buses are labeled as the second bus system 1404 in FIG. 14 for clarity of illustration. in,
- the second communication interface 1401 is used for receiving and sending signals during the process of sending and receiving information with other external network elements;
- the second memory 1402 is used to store computer programs that can run on the second processor 1403;
- the second processor 1403 is configured to execute the steps of the decoding method of the present application when running the computer program.
- FIG. 15 shows a schematic diagram of the composition and structure of a codec system provided by the embodiment of the present application.
- the codec system 150 may include an encoder 1501 and a decoder 1502 .
- the encoder 1501 may be the encoder described in any one of the foregoing embodiments
- the decoder 1502 may be the decoder described in any one of the foregoing embodiments.
- the encoder uses an end-to-end neural network for point cloud encoding, and the network does not need to additionally mark the training sample set of the motion information of the sample during training, which reduces the difficulty of training , the network takes reducing the bit rate to ensure the quality of point cloud reconstruction as the training goal.
- the decoder uses the second neural network to reconstruct the point cloud.
- the second neural network can be understood as a part of the network structure that has the decoding function in the first neural network.
- the neural networks at the encoding end and the decoding end perform end-to-end automatic Supervised learning, reducing human intervention, using the network for decoding, can reduce the distortion and ensure the quality of the reconstructed point cloud.
- the current frame point cloud is determined, and the reference frame corresponding to the current frame point cloud is used to reconstruct the point cloud; the preset first neural network is used to reconstruct the point cloud based on the reference frame.
- the point cloud of the current frame is encoded, and the obtained encoded bits are written into the code stream;
- the first neural network is an end-to-end neural network, and the first neural network is configured to: reconstruct the point cloud based on the reference frame and Inter-frame prediction is performed on the point cloud of the current frame to obtain motion information and residual information of the point cloud of the current frame; the motion information and the residual information are encoded, and the obtained coded bits are written into a code stream.
- the bit stream is obtained; the second neural network is used to decode the bit stream to obtain the current frame reconstruction point cloud; wherein, the second neural network is an end-to-end neural network, and the second neural network is configured as: Decoding the code stream to determine the motion information and residual information of the current frame point cloud; reconstructing the point cloud based on the motion information and the reference frame to perform motion compensation to obtain the prediction information of the current frame point cloud; based on the residual information and The prediction information of the point cloud of the current frame is used to obtain the reconstructed point cloud of the current frame.
- the encoder uses an end-to-end neural network for point cloud encoding.
- the network does not require additional training sample sets for the motion information of samples during training, which reduces the difficulty of training.
- the network ensures the quality of point cloud reconstruction by reducing the bit rate. For training purposes, using this network for encoding can not only improve the quality of the point cloud, but also save the bit rate, thereby improving the encoding efficiency.
- the decoder uses the second neural network to reconstruct the point cloud.
- the second neural network can be understood as a part of the network structure that has the decoding function in the first neural network.
- the neural networks at the encoding end and the decoding end perform end-to-end automatic Supervised learning, reducing human intervention, using the network for decoding, can reduce the distortion and ensure the quality of the reconstructed point cloud.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
本申请实施例公开了一种编解码方法、编码器、解码器以及存储介质,该方法包括:确定当前帧点云和参考帧重建点云;利用预设第一神经网络基于参考帧重建点云对当前帧点云进行编码,将得到的编码比特写入码流;其中,第一神经网络为端到端神经网络配置成:基于参考帧重建点云和当前帧点云进行帧间预测,得到当前帧点云的运动信息和残差信息;进行编码,将得到的编码比特写入码流。这样,利用端到端的神经网络进行点云编码,该网络在训练时无需额外标注样本的运动信息的训练样本集,降低了训练难度,以降低码率保证点云重建质量为训练目标,使用该网络进行编解码不仅能够提升点云的质量,还能够节省码率,进而提高编解码效率。
Description
本申请实施例涉及视频编解码技术领域,尤其涉及一种编解码方法、编码器、解码器以及存储介质。
点云被定义为三维空间中点的集合,其中每个点被表示为三维坐标和具体的属性信息。随着三维重建和三维成像技术的发展,点云被广泛应用于虚拟现实、沉浸式远程呈现、三维打印等领域。点云的一种典型应用是在虚拟现实和远程呈现中表示动态人体的三维影像,这种影像被称为动态点云(Dynamic Point Cloud,DPC),点云的数据量庞大,对动态点云的压缩是这些应用中的关键技术。
现有的动态点云压缩技术单独针对运动估计和补偿部分构建神经网络,在训练过程中需要为数据集标注运动向量,增加训练难度,由此构建的编解码框架的编解码效率有待提高。
发明内容
本申请实施例提供一种编解码方法、编码器、解码器以及存储介质,采用一种端到端神经网络进行编解码,不仅能够提升点云的质量,还能够节省码率,进而提高编解码效率。
本申请实施例的技术方案可以如下实现:
第一方面,本申请实施例提供了一种编码方法,应用于编码器,该方法包括:
确定当前帧点云,以及所述当前帧点云对应的参考帧重建点云;
利用预设第一神经网络基于所述参考帧重建点云对所述当前帧点云进行编码,将得到的编码比特写入码流;
其中,所述第一神经网络为端到端神经网络,所述第一神经网络配置成:
基于所述参考帧重建点云和所述当前帧点云进行帧间预测,得到所述当前帧点云的运动信息和残差信息;
对所述运动信息和所述残差信息进行编码,将得到的编码比特写入码流。
第二方面,本申请实施例提供了一种解码方法,应用于解码器,该方法包括:
获取码流;
利用预设第二神经网络解码码流得到当前帧重建点云;
其中,所述第二神经网络为端到端神经网络,所述第二神经网络配置成:
解码码流,确定当前帧点云的运动信息和残差信息;
基于所述运动信息和参考帧重建点云进行运动补偿,得到所述当前帧点云的预测信息;
基于所述残差信息和所述当前帧点云的预测信息,得到所述当前帧重建点云。
第三方面,本申请实施例提供了一种编码器,该编码器包括确定单元和编码单元;其中,
所述确定单元,用于确定当前帧点云,以及所述当前帧点云对应的参考帧重建点云;
所述编码单元,用于利用预设第一神经网络基于所述参考帧重建点云对所述当前帧点云进行编码,将得到的编码比特写入码流;
其中,所述第一神经网络为端到端神经网络,所述第一神经网络配置成:
基于所述参考帧重建点云和所述当前帧点云进行帧间预测,得到所述当前帧点云的运动信息和残差信息;
对所述运动信息和所述残差信息进行编码,将得到的编码比特写入码流。
第四方面,本申请实施例提供了一种编码器,该编码器包括第一存储器和第一处理器;其中,
第一存储器,用于存储能够在第一处理器上运行的计算机程序;
第一处理器,用于在运行计算机程序时,执行如第一方面的方法。
第五方面,本申请实施例提供了一种解码器,该解码器包括获取单元和解码单元,其中,
所述获取单元,用于获取码流;
所述解码单元,用于利用预设第二神经网络解码码流得到当前帧重建点云;
其中,所述第二神经网络为端到端神经网络,所述第二神经网络配置成:
解码码流,确定当前帧点云的运动信息和残差信息;
基于所述运动信息和参考帧重建点云进行运动补偿,得到所述当前帧点云的预测信息;
基于所述残差信息和所述当前帧点云的预测信息,得到所述当前帧重建点云。
第六方面,本申请实施例提供了一种解码器,该解码器包括第二存储器和第二处理器;其中,
第二存储器,用于存储能够在第二处理器上运行的计算机程序;
第二处理器,用于在运行计算机程序时,执行如第三方面所述的方法。
第七方面,本申请实施例提供了一种计算机存储介质,该计算机存储介质存储有计算机程序,所述计算机程序被第 一处理器执行时实现如第一方面所述的方法、或者被第二处理器执行时实现如第二方面所述的方法。
本申请实施例提供了一种编解码方法、编码器、解码器以及存储介质,在编码器中,确定当前帧点云,以及所述当前帧点云对应的参考帧重建点云;利用预设第一神经网络基于所述参考帧重建点云对所述当前帧点云进行编码,将得到的编码比特写入码流;其中,所述第一神经网络为端到端神经网络,所述第一神经网络配置成:基于所述参考帧重建点云和所述当前帧点云进行帧间预测,得到所述当前帧点云的运动信息和残差信息;对所述运动信息和所述残差信息进行编码,将得到的编码比特写入码流。在解码器中,获取码流;利用预设第二神经网络解码码流得到当前帧重建点云;其中,所述第二神经网络为端到端神经网络,所述第二神经网络配置成:解码码流,确定当前帧点云的运动信息和残差信息;基于所述运动信息和参考帧重建点云进行运动补偿,得到所述当前帧点云的预测信息;基于所述残差信息和所述当前帧点云的预测信息,得到所述当前帧重建点云。这样,编码器利用一种端到端的神经网络进行点云编码,该网络在训练时无需额外标注样本的运动信息的训练样本集,降低了训练难度,该网络以降低码率保证点云重建质量为训练目标,使用该网络进行编码不仅能够提升点云的质量,还能够节省码率,进而提高编码效率。相应地,解码器利用第二神经网络进行点云重建,第二神经网络可以理解为第一神经网络中具备解码功能的部分网络结构,编码端和解码端的神经网络作为一个整体进行端到端自监督学习,减少人为干预,使用该网络进行解码,能够降低的失真保证重建点云质量。
图1为一种G-PCC编码器的组成框架示意图;
图2为一种G-PCC解码器的组成框架示意图;
图3为本申请实施例提供的编码方法的流程示意图;
图4为本申请实施例中帧间预测模块的组成结构示意图;
图5为本申请实施例中第一神经网络的组成结构示意图;
图6为本申请实施例中下采样模块的组成结构示意图;
图7为本申请实施例中第一上采样模块的组成结构示意图;
图8为本申请实施例中第二上采样模块的组成结构示意图;
图9为本申请实施例中解码方法的流程示意图;
图10为本申请实施例中第二神经网络的组成结构示意图;
图11为本申请实施例提供的一种编码器的组成结构示意图;
图12为本申请实施例提供的一种编码器的具体硬件结构示意图;
图13为本申请实施例提供的一种解码器的组成结构示意图;
图14为本申请实施例提供的一种解码器的具体硬件结构示意图;
图15为本申请实施例提供的一种编解码系统的组成结构示意图。
为了能够更加详尽地了解本申请实施例的特点与技术内容,下面结合附图对本申请实施例的实现进行详细阐述,所附附图仅供参考说明之用,并非用来限定本申请实施例。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。还需要指出,本申请实施例所涉及的术语“第一\第二\第三”仅是用于区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。
对本申请实施例进行进一步详细说明之前,先对本申请实施例中涉及的名词和术语进行说明,本申请实施例中涉及的名词和术语适用于如下的解释:
基于几何的点云压缩(Geometry-based Point Cloud Compression,G-PCC或GPCC),基于视频的点云压缩(Video-based Point Cloud Compression,V-PCC或VPCC),变分自编码器(Variational AutoEncoder,VAE),自编码器(AutoEncoder,AE),变分自解码器(Variational AutoDecoder,VAD),自解码器(AutoDecoder,AD),多层感知机(Multi-layer Perceptron),三层初始残差网络(Inception Residual Network,IRN),二元交叉熵(binary cross entropy),八叉树(Octree),包围盒(bounding box),K近邻(K Nearest Neighbor,KNN)
点云是物体表面的三维表现形式,通过光电雷达、激光雷达、激光扫描仪、多视角相机等采集设备,可以采集得到物体表面的点云(数据)。
点云(Point Cloud)是指海量三维点的集合,点云中的点可以包括点的位置信息和点的属性信息。例如,点的位置信息可以是点的三维坐标信息。点的位置信息也可称为点的几何信息。例如,点的属性信息可包括颜色信息和/或反射率等等。例如,颜色信息可以是任意一种色彩空间上的信息。例如,颜色信息可以是RGB信息。其中,R表示红色(Red,R),G表示绿色(Green,G),B表示蓝色(Blue,B)。再如,颜色信息可以是亮度色度(YCbCr,YUV)信息。其中,Y表示明亮度,Cb(U)表示蓝色色度,Cr(V)表示红色色度。
根据激光测量原理得到的点云,点云中的点可以包括点的三维坐标信息和点的激光反射强度(reflectance)。再如, 根据摄影测量原理得到的点云,点云中的点可以可包括点的三维坐标信息和点的颜色信息。再如,结合激光测量和摄影测量原理得到点云,点云中的点可以可包括点的三维坐标信息、点的激光反射强度(reflectance)和点的颜色信息。
点云可以按获取的途径分为:
第一类静态点云:即物体是静止的,获取点云的设备也是静止的;第二类动态点云:物体是运动的,但获取点云的设备是静止的;第三类动态获取点云:获取点云的设备是运动的。
例如,按点云的用途分为两大类:
类别一:机器感知点云,其可以用于自主导航系统、实时巡检系统、地理信息系统、视觉分拣机器人、抢险救灾机器人等场景;类别二:人眼感知点云,其可以用于数字文化遗产、自由视点广播、三维沉浸通信、三维沉浸交互等点云应用场景。
由于点云是海量点的集合,存储点云不仅会消耗大量的内存,而且不利于传输,也没有这么大的带宽可以支持将点云不经过压缩直接在网络层进行传输,因此,需要对点云进行压缩。
截止目前,可对点云进行压缩的点云编码框架可以是运动图像专家组(Moving Picture Experts Group,MPEG)提供的G-PCC编解码框架或V-PCC编解码框架,也可以是音视频编码标准(Audio Video Standard,AVS)提供的AVS-PCC编解码框架。其中,G-PCC编解码框架可用于针对第一类静态点云和第三类动态获取点云进行压缩,V-PCC编解码框架可用于针对第二类动态点云进行压缩。在本申请实施例中,这里主要是针对G-PCC编解码框架进行描述。
可以理解,在点云G-PCC编解码框架中,将输入三维图像模型的点云进行条带(slice)划分后,对每一个slice进行独立编码。
图1为一种G-PCC编码器的组成框架示意图。如图1所示,该G-PCC编码器应用于点云编码器。在该G-PCC编码框架中,针对待编码的点云数据,首先通过slice划分,将点云数据划分为多个slice。在每一个slice中,点云的几何信息和每个点云所对应的属性信息是分开进行编码的。在几何编码过程中,对几何信息进行坐标转换,使点云全都包含在一个bounding box中,然后再进行量化,这一步量化主要起到缩放的作用,由于量化取整,使得一部分点云的几何信息相同,于是再基于参数来决定是否移除重复点,量化和移除重复点这一过程又被称为体素化过程。接着对bounding box进行八叉树划分。在基于八叉树的几何信息编码流程中,将包围盒八等分为8个子立方体,对非空的(包含点云中的点)的子立方体继续进行八等分,直到划分得到的叶子结点为1×1×1的单位立方体时停止划分,对叶子结点中的点进行算术编码,生成二进制的几何比特流,即几何码流。在基于三角面片集(triangle soup,trisoup)的几何信息编码过程中,同样也要先进行八叉树划分,但区别于基于八叉树的几何信息编码,该trisoup不需要将点云逐级划分到边长为1×1×1的单位立方体,而是划分到子块(block)边长为W时停止划分,基于每个block种点云的分布所形成的表面,得到该表面与block的十二条边所产生的至多十二个交点(vertex),对vertex进行算术编码(基于交点进行表面拟合),生成二进制的几何比特流,即几何码流。Vertex还用于在几何重建的过程的实现,而重建的集合信息在对点云的属性编码时使用。
在属性编码过程中,几何编码完成,对几何信息进行重建后,进行颜色转换,将颜色信息(即属性信息)从RGB颜色空间转换到YUV颜色空间。然后,利用重建的几何信息对点云重新着色,使得未编码的属性信息与重建的几何信息对应起来。属性编码主要针对颜色信息进行,在颜色信息编码过程中,主要有两种变换方法,一是依赖于LOD划分的基于距离的提升变换,二是直接进行RAHT变换,这两种方法都会将颜色信息从空间域转换到频域,通过变换得到高频系数和低频系数,最后对系数进行量化(即量化系数),最后,将经过八叉树划分及表面拟合的几何编码数据与量化系数处理属性编码数据进行slice合成后,依次编码每个block的vertex坐标(即算术编码),生成二进制的属性比特流,即属性码流。
图2为一种G-PCC解码器的组成框架示意图。如图2所示,该G-PCC解码器应用于点云编码器。在该G-PCC解码框架中,针对所获取的二进制码流,首先对二进制码流中的几何比特流和属性比特流分别进行独立解码。在对几何比特流的解码时,通过算术解码-八叉树合成-表面拟合-重建几何-逆坐标转换,得到点云的几何信息;在对属性比特流的解码时,通过算术解码-反量化-基于LOD的提升逆变换或者基于RAHT的逆变换-逆颜色转换,得到点云的属性信息,基于几何信息和属性信息还原待编码的点云数据的三维图像模型。
然而,已有的G-PCC编解码框架在进行运动估计和运动补偿时,会使用单独的网络来实现,在训练过程中需要计算预测的运动向量与真实的运动向量之间的损失值,往往需要为数据集标注运动向量,该网络的应用可能会使得重建点云和原始点云相差比较大,失真较为严重,从而会影响到整个点云的质量。
基于此,本申请实施例提出一种编解码方法,该方法可以影响G-PCC编码框架中的运动估计和运动补偿的部分,也可以影响G-PCC解码框架中的运动补偿的部分。
下面将结合附图对本申请各实施例进行清楚、完整的描述。
本申请实施例提供了一种点云编码方法,应用于编码器,图3为本申请实施例提供的编码方法的流程示意图,如图3所示,该方法可以包括:
步骤301:确定当前帧点云,以及所述当前帧点云对应的参考帧重建点云;
需要说明的是,本申请实施例所述的编码方法具体是指点云编码方法,可以应用于点云编码器(本申请实施例中,可简称为“编码器”)。
当前帧点云可以理解为待编码的点云,对于当前帧点云中的一个点,在对该点进行编码时,其可以作为当前帧点云中的待编码点,而该点的周围存在有多个已编码点。参考帧重建点云可以理解为已编码点云,参考帧重建点云可以为上一帧重建点云,或者当前帧点云中部分已编码点的重建点集合。也就是说,待编码点的参考点可以为上一帧或当前帧的重建点。
进一步地,在本申请实施例中,对于当前帧点云中的一个点,其对应一个几何信息和一个属性信息;其中,几何信 息表征该点的空间位置,几何信息具体为三维几何坐标。属性信息可以包括颜色分量,具体为任意颜色空间的颜色信息。示例性地,属性信息可以为RGB空间的颜色信息,也可以为YUV空间的颜色信息,还可以为YCbCr空间的颜色信息等等,本申请实施例不作具体限定。
步骤302:利用预设第一神经网络基于所述参考帧重建点云对所述当前帧点云进行编码,将得到的编码比特写入码流;
其中,所述第一神经网络为端到端神经网络,所述第一神经网络配置成:
基于所述参考帧重建点云和所述当前帧点云进行帧间预测,得到所述当前帧点云的运动信息和残差信息;
对所述运动信息和所述残差信息进行编码,将得到的编码比特写入码流。
示例性的,在一些实施例中,所述第一神经网络包括帧间预测模块,所述帧间预测模块配置成:
基于所述参考帧重建点云和所述当前帧点云进行多尺度运动估计,得到所述运动信息;
基于解码后的运动信息和所述参考帧重建点云进行运动补偿,得到所述当前帧点云的预测信息;
基于所述当前帧点云和所述当前帧点云的预测信息,得到所述残差信息。
这里,当前帧点云可以理解为当前帧点云的真实信息,真实信息与预测信息相减得到残差信息,真实信息具体包括各点的属性真实值,预测信息包括各点的属性预测值。
这里,帧间预测包括运动估计和运动补偿。对于运动估计,本申请实施例提供了一种多尺度运动估计方法,用来解决现有运动估计网络时间与空间复杂度过高的问题。对于运动补偿,本申请实施例提供了一种有界三近邻插值算法,解决了在稀疏的点云空间中插值效果不佳的问题。
示例性的,在一些实施例中,所述帧间预测模块包括多尺度运动估计模块,所述多尺度运动估计模块配置成:
将所述参考帧重建点云和所述当前帧点云进行连接,得到连接数据;
从连接数据中提取原始运动信息;
对所述原始运动信息进行低尺度运动估计,得到第一运动信息;
基于所述第一运动信息对所述原始运动信息进行高尺度运动估计,得到第二运动信息;
基于所述第一运动信息和所述第二运动信息,得到最终的所述运动信息。
这里,低尺度运动估计可以理解为一种低精度的运动估计,得到的低尺度运动信息(即第一运动信息)用于表示当前帧点云中的物体的大致运动方向。示例性的,低尺度运动信息表示包含人的点云块从参考帧到当前帧的运动信息。
高尺度运动估计可以理解为一种高精度的运动估计,得到的高尺度运动信息(即第二运动信息)用于表示当前帧点云中的物体的具体运动方向。示例性的,高尺度运动信息表示包含人的点云块中人体不同部位从参考帧到当前帧的运动信息。
示例性的,运动信息具体指运动向量,运动向量可以分解为xyz三分方向上的运动分量,参与到运动补偿中。
也就是说,在编码端,首先进行低尺度的运动估计,得到低尺度的运动信息,低尺度的运动信息包含了粗略的运动向量。使用低尺度的运动信息指引高尺度的运动估计,得到高尺度的运动信息,包含了精细的运动向量。将低尺度的运动信息与高尺度的运动信息相加,得到综合运动信息。综合运动信息能够跟准确的表示待编码点的运动特征,提高运动估计精度,进而提高后续运动补偿精度,提高点云重建质量。
示例性的,在一些实施例中,所述帧间预测模块包括第一压缩模块以及与所述第一压缩模块对应的第一解压缩模块;
所述第一压缩模块配置成:对所述运动信息进行下采样;对下采样后的运动信息进行量化和熵编码,得到所述运动信息的编码比特;
所述第一解压缩模块配置成:对所述运动信息的编码比特进行熵解码和上采样,得到解码后的运动信息。
需要说明的是,若第一压缩模块的量化步长大于1,则第一解压缩模块在熵解码之后还包括反量化。
示例性的,第一压缩模块包括:卷积层、量化器和自编码器(AE),第一解压缩模块包括:自解码器(AD)和反卷积层。运动信息进行下采样、量化之后经熵模型得到概率分布,使用AE进行算术编码得到01比特流,传至解码端。在解码端进行对应的熵解码与上采样,得到解码后的运行信息参与到点云重建。在编码端也需进行对应的熵解码与上采样,得到解码后的运行信息参与到点云重建,得到当前帧重建点云参与到下一帧点云的编码中。
进一步地,在编码端帧间预测模块还包括运动补偿模块,运动补偿模块采用预设的插值算法进行运动补偿。示例性的,插值算法可以为有界三近邻插值算法,或者三线性插值算法。
示例性的,在一些实施例中,当运动补偿模块基于有界三近邻插值算法进行运动补偿时,所述运动补偿模块配置成:
从解码后的运动信息中获取所述当前帧点云中目标点的运动信息;
基于所述目标点在所述当前帧点云中的第一几何坐标和所述目标点的运动信息,确定所述目标点在所述参考帧重建点云中对应的第二几何坐标;
基于所述第二几何坐标在所述参考帧重建点云中确定K个近邻点;
基于所述K个近邻点在所述参考帧重建点云中的属性重建值,以及预设的惩罚系数,确定所述目标点在所述当前帧点云中的属性预测值;
其中,所述惩罚系数用于限制孤立点的K个近邻点的权重。
这里,惩罚系数可以理解为限制近邻点的选取边界,对于孤立点来说,近邻点距离较远,惩罚系数限制孤立点的近邻点的权重,避免孤立点在插值后仍能获得较大的属性预测值。
需要说明的是,K个近邻点是指参考帧中与第二几何坐标距离最近的K个点,第二几何坐标可以整数或小数。当K=3时,利用参考帧中3个近邻点的属性重建值进行预测。
示例性的,在一些实施例中,所述惩罚系数用于限制孤立点的K个近邻点的权重之和,
所述基于所述K个近邻点在所述参考帧重建点云中的属性重建值,以及预设的惩罚系数,确定所述目标点在所述当 前帧点云中的属性预测值,包括:
基于所述第二几何坐标和所述K个近邻点的几何坐标确定所述K个近邻点的权重;
所述K个近邻点的权重之和大于所述惩罚系数时,基于所述惩罚系数、所述K个近邻点的权重和所述K个近邻点的属性重建值,确定所述目标点的属性预测值;
所述K个近邻点的权重之和小于或者等于所述惩罚系数时,基于所述K个近邻点的权重之和、所述K个近邻点的权重和所述K个近邻点的属性重建值,确定所述目标点的属性预测值。
示例性的,基于第二几何坐标和K个近邻点的几何坐标确定近邻点与第二几何坐标之间的距离,基于所述距离确定权重。
距离越远权重越小,距离越近权重越大,惩罚系数可以通过限制每个近邻点的权重,或者通过限制K个近邻点的权重之和,来实现限制孤立点的K个近邻点的权重,来避免孤立点在插值后仍能获得较大的属性预测值。
示例性的,以K=3为例对有界三近邻插值算法进行进一步地举例说明。
对于当前帧的特征的几何坐标集合C={(x
i,y
i,z
i)∣i∈{1,2,…,N}}与参考帧的特征P′={(x
i′,y
i′,z
i′,f′
i)∣i∈{0,1,…,N′}},f′
i为参考帧点(x
i′,y
i′,z
i′)的属性重建值,对应几何坐标(x
i,y
i,z
i)的属性预测值
的计算方法如下:
其中
为(x
i+Δx
i,y
i+Δy
i,z
i+Δz
i)在参考帧的几何坐标集合C′中的第j个近邻,Δx
i,Δy
i,Δz
i分别是运动向量在x,y,z方向上的分量。α为惩罚系数,
为第j个近邻的权重,当d
ij较大时,
所对应的
权重值会减小,以此惩罚相较于
的偏移,但相较于双线性插值,该惩罚系数不会使
仅当d
ij→∞时,
在实验中,α通常设置为3。相较于双线性插值,有界三近邻插值的搜索范围更大,有效避免了插值得到的属性预测值为零的问题。同时,为了避免孤立点在插值后仍能获得较大的属性预测值,使用惩罚系数α限制孤立点的3近邻权重之和。
需要说明的是,上述方案假设点云特征空间各个通道都使用同一个运动向量,但实际上各个特征通道的运动向量可能有一定区别。为提升运动补偿的效率,本方案在一些实施例中使用通道运动向量代替原有的运动向量。具体地,所述目标点的运动信息为所述目标点在目标通道上的运动信息时,确定所述目标点在所述目标通道上的属性预测值;其中,所述目标通道为所述当前帧点云中所有通道中的一个通道。
运动估计时采用有界三近邻插值算法的搜索范围更大,有效避免了插值得到的属性预测值为零的问题。同时,使用惩罚系数避免孤立点在插值后仍能获得较大的属性预测值,提高属性值预测的准确性。
在一些实施例中,还可以采用三线性插值算法。实际应用中,双线性插值是一种应用于图像的常见插值方式。三线性插值即为考虑到z轴的双线性插值。
定义偏移集N
3={(x,y,z)∣x,y,z∈{0,1}}
则对于稀疏张量p={(x
i,y
i,z
i,f
i)∣i∈{0,1,…,N})}和待插值坐标集合C′={(x′
i,y′
i,z′
i)∣i∈{0,1,…,M}},插值后的特征集F′为:
F′={f′
i∣i∈{0,1,…,M}}
其中floor为向下取整操作。
下面对本申请实施例中第一神经网络中的帧间预测模块进行进一步地举例说明。
示例性的,图4为本申请实施例中帧间预测模块的组成结构示意图,如图4所示,帧间预测模块包括多尺度运动估计模块、运动信息压缩和解压缩模块和运动补偿模块。
其中,多尺度运动估计模块包括连接模块,连接模块用于将所述参考帧重建点云和所述当前帧点云进行连接,得到连接数据。
示例性的,当前帧点云和参考帧重建点云为稀疏张量形式,当前帧点云p
2的稀疏张量形式为:
p
2={(x
i,y
i,z
i,f
i)∣(x
i,y
i,z
i)∈C
2}
参考帧重建点云p
1的稀疏张量形式为:
p
1={(x′
i,y′
i,z′
i,f
i)∣(x′
i,y′
i,z′
i)∈C
1}
将两帧点云特征空间的稀疏张量连接后,使用基于稀疏卷积的运动估计器提取运动向量。对于稀疏张量p
1与p
2,定义连接后的稀疏张量p
c为:
p
c={(x
i,y
i,z
i,f
i)∣(x
i,y
i,z
i)∈p
1.c∪p
2.c}
其中p.c定义为稀疏张量p的几何坐标集合。f
i定义如下:
其中,f
i定义为稀疏张量p
c对应几何坐标(x
i,y
i,z
i)的特征,
为向量的拼接操作,p[x
i,y
i,z
i]定义为稀疏张量p对应几何坐标(x
i,y
i,z
i)的特征。∩代表交集符号,-代表补集符号,p
1.c-p
2.c代表坐标属于p
1.c但不属于p
2.c,p
2.c-p
1.c代表坐标属于p
2.c但不属于p
1.c。
多尺度运动估计模块包括提取模块,所述提取模块包括:两个卷积层以及每个卷积层后跟的激活层;
所述提取模块配置成:将所述连接数据依次输入每个卷积层,以及其后的激活层,得到所述原始运动信息。
示例性的,如图4所示,提取模块的第一个卷积层参数为Conv(64,3,1),第二卷积层参数为Conv(64,3,1)。
示例性的,在一些实施例中,所述多尺度运动估计模块包括第一运动估计模块,所述第一运动估计模块包括:卷积层、激活层和三层初始残差网络;
所述第一运动估计模块配置成:将所述原始运动信息依次输入到所述卷积层、所述激活层和所述三层初始残差网络进行低尺度的运动估计,得到所述第一运动信息。
这里,第一运动估计模块可以理解为低尺度运动估计模块,用于对当前帧点云进行粗略的运动估计。示例性的,如图4所示,第一运动估计模块的卷积层参数为Conv(64,2,2),用于对原始运动信息进行下采样。
示例性的,在一些实施例中,所述多尺度运动估计模块包括第二运动估计模块,所述第二运动估计模块包括:反卷积层、第一剪枝层、减法器和卷积层;
所述第二运动估计模块配置成:
利用所述反卷积层对所述第一运动信息进行上采样,得到上采样后的第一运动信息;
利用所述第一剪枝层对所述上采样后的第一运动信息进行剪枝,使得剪枝后的第一运动信息与所述原始运动信息的几何坐标集合相同;
利用所述减法器将所述原始运动信息与所述剪枝后的第一运动信息相减,再利用所述卷积层进行下采样,得到所述第二运动信息。
这里,第二运动估计模块可以理解为高尺度运动估计模块,用于在第尺度运动估计模块的指导下对当前帧点云进行精确的运动估计。示例性的,如图4所示,第二运动估计模块的反卷积层参数为Deconv(64,2,2),卷积层参数为Conv(64,2,2)。
如图4所示,多尺度运动估计模块还包括:第二剪枝层、第三剪枝层和加法器;
所述多尺度运动估计模块配置成:
利用所述第二剪枝层对所述第一运动信息进行剪枝,使得剪枝后的第一运动信息与所述残差信息的几何坐标集合C
R相同;
利用所述第三剪枝层对所述第二运动信息进行剪枝,使得剪枝后的第二运动特征信息与所述残差信息的几何坐标集合C
R相同;
利用所述加法器将剪枝后的第一运动信息和所述第二运动信息相加,得到最终的所述运动信息。
本申请实施例中,运动信息包括运动特征和几何坐标。也就是说,对运动信息进行编解码包括对运动特征和几何坐标进行编解码。
如图4所示,运动信息压缩和解压缩模块具体对运动特征进行压缩和解压缩,无损编码器对当前帧点云P
2对应的几何坐标集合C
P2进行无损编码将编码比特写入码流。运动特征经过卷积层Conv(48,2,2),量化器Q和自编码器AE,得到编码比特写入码流。
运动信息解压缩偿模块对运动特征进行解压缩,码流经过自解码器和反卷积层Deconv(64,2,2)进行解码。
运动补偿模块还包括提取模块,用于从解码后的运动信息获取目标点的运动信息。
提取模块包括:第一剪枝层、第一卷积层、池化层、反卷积层、第二剪枝层、第二卷积层和加法器。
解码后的运动特征经过第一剪枝层进行剪枝,使得剪枝后的运动信息与所述残差信息的几何坐标集合相同;
利用第一卷积层Conv(64x3,3,1)和池化层Depooling(2,2)提取每个通道的低尺度运动信息;
利用反卷积层Deconv(64x3,3,1)、第二剪枝层和卷积层Conv(64x3,3,1)提取每个通道的高尺度运动信息;
利用加法器将低尺度运动信息和高尺度运动信息相加,得到每个通道的运动信息。
这里,第二剪枝层基于当前帧点云解码后的几何坐标集合C
P2对反卷积层的输出进行剪枝,使得相加之前的低尺度运动信息和高尺度运动信息的几何坐标集合相同。
进一步地,基于有界三近邻插值算法利用参考帧点云P
1和加法器输出的运动信息进行插值运算,得到预测信息P’
2。
示例性的,在一些实施例中,所述第一神经网络还包括位于所述帧间预测模块之前的第一特征提取模块和第二特征提取模块;
所述第一特征提取模块配置成:对所述参考帧重建点云进行特征提取,将所述参考帧重建点云转换成稀疏张量形式;
所述第二特征提取模块配置成:对所述当前帧点云进行特征提取,将所述当前帧点云转换成稀疏张量形式。
也就是说,通过特征提取模块将点云转换为稀疏张量形式,再进行后续的运动估计、运动估计、编解码操作。
示例性的,在一些实施例中,每个特征提取模块包括第一下采样模块和第二下采样模块;所述第一下采样模块包括:两个卷积层、每个卷积层后跟的激活层和三层初始残差网络:所述第二下采样模块包括:两个卷积层、每个卷积层后跟的激活层和三层初始残差网络;所述第一下采样模块和所述第二下采样模块的卷积层参数不同。
图5为本申请实施例中第一神经网络的组成结构示意图,如图5所示,第一神经网络包括特征提取模块,帧间预测 模块,残差压缩和解压缩模块,以及点云重建模块。
其中,第一特征提取模块用于对上一帧重建点云进行特征提取,得到上一帧重建点云的稀疏张量形式P
1,第二特征提取模块用于对当前帧点云进行特征提取,得到当前帧点云的稀疏张量形式P
2。
图6为本申请实施例中下采样模块的组成结构示意图,如图6所示,
使用基于稀疏卷积网络的特征提取器实现下采样模块,将点云几何空间映射至点云特征空间,即点云的稀疏张量形式。下采样模块由一层卷积核大小为3,步长为1的卷积层,与一层卷积核大小为2,步长为2的卷积层组成,每层卷积层后跟ReLU激活层。同时,使用初始残差网络(Inception Residual Network,IRN)提升特征提取效率。下采样模块中卷积层的参数H代表隐藏维度,O代表输出维度,H与O的具体值见图5,即第一下采样模块的第一卷积层H为16,第二卷积层O为32,第二下采样模块的第一卷积层H为32,第二卷积层O为64。Conv(c,k,s)标识通道数(维度)为c,卷积核大小为k,步长为s的卷积层。
所述第一神经网络包括第二压缩模块以及与所述第二压缩模块对应的第二解压缩模块;即图5中残差压缩和解压缩模块。
示例性的,在一些实施例中,所述第二压缩模块配置成:对所述残差信息进行下采样;对下采样后的残差信息进行量化和熵编码,得到所述残差信息的编码比特;
所述第二解压缩模块配置成:对所述残差信息的编码比特进行熵解码,得到解码后的残差信息。
需要说明的是,若第二压缩模块的量化步长大于1,则第二解压缩模块在熵解码之后还包括反量化。
如图5所示,第二压缩模块包括:卷积层Conv(32,8)、量化器Q和自编码器(AE),第二解压缩模块包括:自解码器(AD)。运动信息进行下采样、量化之后经熵模型得到概率分布,使用AE进行算术编码得到01比特流,传至解码端。在解码端需进行对应的熵解码与上采样,得到解码后的运行信息参与到点云重建。在编码端也需进行对应的熵解码与上采样,得到解码后的运行信息参与到点云重建。
实际应用中,残差信息包括残差和几何坐标。也就是说,对残差信息进行编解码包括对残差和几何坐标C
R进行编解码。
示例性的,在一些实施例中,所述第一神经网络还包括位于所述帧间预测模块之后的点云重建模块;所述点云重建模块配置成:
对解码后的残差信息进行上采样,得到上采样后的残差信息;
基于上采样后的残差信息和所述当前帧点云的预测信息,得到第一重建点云;
对第一重建点云进行上采样,得到所述当前帧重建点云。
如图5所示,所述点云重建模块包括第一上采样模块、第二上采样模块和第三上采样模块。
图7为本申请实施例中第一上采样模块的组成结构示意图,如图7所示,所述第一上采样模块包括:反卷积层、第一激活层、第一卷积层、第二激活层、三层初始残差网络IRN、加法器、第二卷积层、分类层(Classify)、剪枝层;
所述第一上采样模块配置成:
将解码后的残差信息依次经过所述反卷积层、第一激活层、第一卷积层、第二激活层和三层初始残差网络,得到上采样后的残差信息;
利用所述加法器将上采样后的残差信息和所述当前帧点云的预测信息相加;
将所述相加结果依次经过所述第二卷积层和分类层,确定满足占用条件的第一几何坐标集合;
所述剪枝层基于所述第一几何坐标集合对所述相加结果进行剪枝,得到所述第一重建点云。
上采样模块中参数H代表隐藏维度,O代表输出维度,H与O的具体值见图5,即第一上采样模块的反卷积层H为32,第一卷积层O为32。Conv(c,k,s)标识通道数(维度)为c,卷积核大小为k,步长为s的卷积层。
使用稀疏卷积网络实现点云的上采样。上采样模块由一层卷积核大小为2,步长为2的反卷积层,与一层卷积核大小为3,步长为1的卷积层构成。卷积层之间用ReLU激活函数连接。同时,使用初始残差网络(Inception Residual Network)协助上采样。上采样后通过一个分类层,判断出占用情况的概率分布,并进行剪枝,对于点数为N的原点云,定义系数ρ,则剪枝后仅保留稀疏张量中占用概率前ρN的点。这里,占用条件为选取占用概率前ρN的点。
示例性的,在一些实施例中,所述点云重建模块包括第二上采样模块和第三上采样模块,用于对第一上采样模块输出的第一重建点云进行两次上采样得到当前帧重建点云。
图8为本申请实施例中第二上采样模块的组成结构示意图,如图8所示,所述第二上采样模块包括:反卷积层、第一激活层、第一卷积层、第二激活层、三层初始残差网络IRN、第二卷积层、分类层、剪枝层;
所述第二上采样模块配置成:将所述第一重建点云依次经过所述第一反卷积层、第一激活层、第一卷积层、第二激活层和三层初始残差网络,得到上采样后的第一重建点云;
将所述上采样后的第一重建点云依次经过所述第二卷积层、第一分类层,确定满足占用条件的第二几何坐标集合;
所述第一剪枝层基于所述第二几何坐标集合对所述上采样后的第一重建点云进行剪枝,得到第二重建点云。
第二上采样模块中反卷积层和卷积层参数如图8所示,其中第二上采样模块的反卷积层H为64,第一卷积层O为64。
所述第三上采样模块包括:第二反卷积层、第三激活层、第三卷积层、第四激活层、三层初始残差网络、第四卷积层、第二分类层、第二剪枝层;
所述第三上采样模块配置成:将所述第二重建点云依次经过所述第二反卷积层、第三激活层、第三卷积层、第四激活层和三层初始残差网络,得到上采样后的第二重建点云;
将所述上采样后的第二重建点云依次经过所述第四卷积层和第二分类层,确定满足占用条件的第三几何坐标集合;
所述第二剪枝层基于所述第三几何坐标集合对所述上采样后的第二重建点云进行剪枝,得到所述当前帧重建点云;其中,所述第二上采样模块的第一反卷积层和所述第三上采样模块的第二反卷积层的参数不同,所述第二上采样模块的第一卷积层和所述第三上采样模块的第三卷积层参数不同。
也就是说,第三上采样模块的组成结构和第二上采样模块相同,卷积层和反卷积层参数有所差别,其中第三上采样模块的反卷积层H为16,第三卷积层O为16。
本申请实施例提供了一个端到端的神经网络,其中采用了多尺度运动估计网络,有界三近邻插值算法,以及基于深度学习的因式变分自编码熵模型,极大地提升了编码效率。另外,计算过程全部由矩阵运算构成,有良好的并行性,在图形处理器(GPU)上运行时能获得巨大的加速效果。
进一步地,本申请实施例提供的编码方法还包括:训练第一神经网络。
具体地,获取训练样本集;其中,所述训练样本集中包括一个或多个样本点云;
利用所述第一神经网络对所述训练样本集中的第一样本点云进行编码和重建,得到所述第一样本点云的码率和重建点云;
基于所述第一样本点云和所述重建点云,确定所述第一样本点云的失真;
基于所述第一样本点云的失真和码率计算损失值;
所述损失值不满足收敛条件时,调节所述第一神经网络的网络参数;
所述损失值满足收敛条件时,确定所述第一神经网络训练完成。
如图5所示,第一样本点云为训练样本集中任意一个样本点云,第一样本点云作为当前帧点云输入到第一神经网络中,输出第一样本点云对应的运动信息码流和残差信息码流,及其重建点云,基于第一样本点云和重建点云,确定第一样本点云的失真损失值,基于运动信息码流和残差信息码流计算第一样本点云的码率损失值,以降低码率保证点云重建质量为训练目标构建损失函数,计算总损失值,当第一神经网络的损失值大于预设阈值(即不满足损失条件),调整网络参数进行下一次训练,当损失值小于或者等于预设阈值(即满足损失条件),得到训练好的第一神经网络,用于动态点云编码中。
示例性的,第一神经网络的损失函数由两部分构成:点云的失真,记为D;码率,记为R。
使用系数λ平衡失真与码率的关系,调整λ可以得到网络不同的码率点。
L=λD+R
在一些实施例中,使用稀疏卷积对运动信息/残差信息进行下采样,得到下采样后的特征y,由于量化过程不可导,因此在训练阶段使用均匀噪声U(-0.5,0.5)代替量化。记量化特征为
使用算术编码器对
进行熵编解码,则
可见,对这种端到端的神经网络只需利用包含样本点云的样本集进行训练,无需额外标注样本点云的运动信息,降低了训练集的获取难度,该网络以降低码率保证点云重建质量为训练目标,使用该网络进行编码不仅能够提升点云的质量,还能够节省码率,进而提高编码效率。
另外,由于解码端所使用的第二神经网络与编码端的第一神经网络中解码功能的部分网络结构完全相同,因此,解码端和编码端可以作为一个整体进行端到端自监督学习,减少人为干预,使用该网络进行编解码,使用该网络进行编码不仅能够提升点云的质量,还能够节省码率,进而提高编码和解码效率。
在本申请的又一实施例中还提供了一种解码方法,图9为本申请实施例中解码方法的流程示意图,如图9所示,该方法可以包括:
步骤901:获取码流;
这里,码流中包含点云的运动信息和残差信息,本申请实施例中利用第二神经网络解码码流并进行点云重建。
步骤902:预设第二神经网络解码码流得到当前帧重建点云;
其中,所述第二神经网络为端到端神经网络,所述第二神经网络配置成:
解码码流,确定当前帧点云的运动信息和残差信息;
基于所述运动信息和参考帧重建点云进行运动补偿,得到所述当前帧点云的预测信息;
基于所述残差信息和所述当前帧点云的预测信息,得到所述当前帧重建点云。
需要说明的是,本申请实施例所述的解码方法具体是指点云解码方法,可以应用于点云解码器(本申请实施例中,可简称为“解码器”)。
当前帧点云可以理解为待解码的点云,对于当前帧点云中的一个点,在对该点进行解码时,其可以作为当前帧点云中的待解码点,而该点的周围存在有多个已解码点。参考帧重建点云可以理解为已解码点云,参考帧重建点云可以为上一帧重建点云,或者当前帧点云中部分已解码点的重建点集合。也就是说,待解码点的参考点可以为上一帧或当前帧的重建点。
在一些实施例中,所述第二神经网络包括第一解压缩模块;
所述第一解压缩模块配置成:对所述码流中所述运动信息的编码比特进行熵解码和上采样,得到所述运动信息。
第一解压缩模块包括:自解码器(AD)和反卷积层。第一解压缩模块对码流进行熵解码与上采样,得到解码后的运行信息参与到点云重建。
在一些实施例中,所述第二神经网络包括运动补偿模块,运动补偿模块采用预设的插值算法进行运动补偿。示例性的,插值算法可以为有界三近邻插值算法,或者三线性插值算法。
示例性的,当运动补偿模块基于有界三近邻插值算法进行运动补偿时,所述运动补偿模块配置成:
从解码后的运动信息中获取所述当前帧点云中目标点的运动信息;
基于所述目标点在所述当前帧点云中的第一几何坐标和所述目标点的运动信息,确定所述目标点在所述参考帧重建点云中对应的第二几何坐标;
基于所述第二几何坐标在所述参考帧重建点云中确定K个近邻点;
基于所述K个近邻点在所述参考帧重建点云中的属性重建值,以及预设的惩罚系数,确定所述目标点在所述当前帧点云中的属性预测值;
其中,所述惩罚系数用于限制孤立点的K个近邻点的权重。
这里,惩罚系数可以理解为限制近邻点的选取边界,对于孤立点来说,近邻点距离较远,惩罚系数限制孤立点的近邻点的权重,避免孤立点在插值后仍能获得较大的属性预测值。
需要说明的是,K个近邻点是指参考帧中与第二几何坐标距离最近的K个点,第二几何坐标为目标点在参考帧中的位置,第二几何坐标可以整数或小数。当K=3时,利用参考帧中3个近邻点的属性重建值进行预测。
示例性的,在一些实施例中,所述惩罚系数用于限制孤立点的K个近邻点的权重之和,
所述基于所述K个近邻点在所述参考帧重建点云中的属性重建值,以及预设的惩罚系数,确定所述目标点在所述当前帧点云中的属性预测值,包括:
基于所述第二几何坐标和所述K个近邻点的几何坐标确定所述K个近邻点的权重;
所述K个近邻点的权重之和大于所述惩罚系数时,基于所述惩罚系数、所述K个近邻点的权重和所述K个近邻点的属性重建值,确定所述目标点的属性预测值;
所述K个近邻点的权重之和小于或者等于所述惩罚系数时,基于所述K个近邻点的权重之和、所述K个近邻点的权重和所述K个近邻点的属性重建值,确定所述目标点的属性预测值。
也就是说,实际应用中,惩罚系数可以通过限制每个近邻点的权重,或者通过限制K个近邻点的权重之和,来实现限制孤立点的K个近邻点的权重。
示例性的,以K=3为例对有界三近邻插值算法进行进一步地举例说明。
对于当前帧的特征的几何坐标集合C={(x
i,y
i,z
i)∣i∈{1,2,…,N}}与参考帧的特征P′={(x
i′,y
i′,z
i′,f′
i)∣i∈{0,1,…,N′}},f′
i为参考帧点(x
i′,y
i′,z
i′)的属性重建值,对应几何坐标(x
i,y
i,z
i)的属性预测值
的计算方法如下:
其中
为(x
i+Δx
i,y
i+Δy
i,z
i+Δz
i)在参考帧的几何坐标集合C′中的第j个近邻,Δx
i,Δy
i,Δz
i分别是运动向量在x,y,z方向上的分量。α为惩罚系数,
为第j个近邻的权重,当d
ij较大时,
所对应的
权重值会减小,以此惩罚相较于
的偏移,但相较于双线性插值,该惩罚系数不会使
仅当d
ij→∞时,
在实验中,α通常设置为3。相较于双线性插值,有界三近邻插值的搜索范围更大,有效避免了插值得到的属性预测值为零的问题。同时,为了避免孤立点在插值后仍能获得较大的属性预测值,使用惩罚系数α限制孤立点的3近邻权重之和。
需要说明的是,上述方案假设点云特征空间各个通道都使用同一个运动向量,但实际上各个特征通道的运动向量可能有一定区别。为提升运动补偿的效率,本方案在一些实施例中使用通道运动向量代替原有的运动向量。具体地,所述目标点的运动信息为所述目标点在目标通道上的运动信息时,确定所述目标点在所述目标通道上的属性预测值;其中,所述目标通道为所述当前帧点云中所有通道中的一个通道。
运动估计时采用有界三近邻插值算法的搜索范围更大,有效避免了插值得到的属性预测值为零的问题。同时,使用惩罚系数避免孤立点在插值后仍能获得较大的属性预测值,提高属性值预测的准确性。
在一些实施例中,插值算法还可以采用三线性插值算法。实际应用中,双线性插值是一种应用于图像的常见插值方式。三线性插值即为考虑到z轴的双线性插值。
定义偏移集N
3={(x,y,z)∣x,y,z∈{0,1}}
则对于稀疏张量p={(x
i,y
i,z
i,f
i)∣i∈{0,1,…,N})}和待插值坐标集合C′={(x′
i,y′
i,z′
i)∣i∈{0,1,…,M}},插值后的特征集F′为:
F′={f′
i∣i∈{0,1,…,M}}
其中floor为向下取整操作。
运动补偿模块的具体结构可以参见图4,运动补偿模块还包括提取模块,用于从解码后的运动信息获取目标点的运动信息。
提取模块包括:第一剪枝层、第一卷积层、池化层、反卷积层、第二剪枝层、第二卷积层和加法器。
解码后的运动特征经过第一剪枝层进行剪枝,使得剪枝后的运动信息与所述残差信息的几何坐标集合相同;
利用第一卷积层Conv(64x3,3,1)和池化层Depooling(2,2)提取每个通道的低尺度运动信息;
利用反卷积层Deconv(64x3,3,1)、第二剪枝层和卷积层Conv(64x3,3,1)提取每个通道的高尺度运动信息;
利用加法器将低尺度运动信息和高尺度运动信息相加,得到每个通道的运动信息。
这里,第二剪枝层基于当前帧点云解码后的几何坐标集合C
P2对反卷积层的输出进行剪枝,使得相加之前的低尺度运动信息和高尺度运动信息的几何坐标集合相同。
进一步地,基于有界三近邻插值算法利用参考帧点云P
1和加法器输出的运动信息进行插值运算,得到预测信息P’
2。
示例性的,在一些实施例中,所述第二神经网络还包括位于所述运动补偿模块之前的第一特征提取模块;
所述第一特征提取模块配置成:对所述参考帧重建点云进行特征提取,将所述参考帧重建点云转换成稀疏张量形式。
也就是说,通过特征提取模块将点云转换为稀疏张量形式,再进行后续的运动估计和解码操作。
示例性的,在一些实施例中,所述第一特征提取模块包括第一下采样模块和第二下采样模块;
所述第一下采样模块包括:两个卷积层、每个卷积层后跟的激活层和三层初始残差网络:
所述第二下采样模块包括:两个卷积层、每个卷积层后跟的激活层和三层初始残差网络;
所述第一下采样模块和所述第二下采样模块的卷积层参数不同。
图10为本申请实施例中第二神经网络的组成结构示意图,如图10所示,第二神经网络包括第一特征提取模块,第一解压缩模块(即运动信息解压缩模块),运动补偿模块,第二解压缩模块(即残差解压缩模块),以及点云重建模块。
其中,第一特征提取模块用于对上一帧重建点云进行特征提取,得到上一帧重建点云的稀疏张量形式P
1。第一特征提取模块中下采样模块的组成结构示意图如图6所示。
第一解压缩模块配置成:对所述运动信息的编码比特进行熵解码和上采样,得到解码后的运动信息。
第二解压缩模块配置成:对所述残差信息的编码比特进行熵解码,得到解码后的残差信息。
实际应用中,残差信息包括残差和几何坐标。也就是说,对残差信息进行编解码包括对残差和几何坐标C
R进行编解码。
在一些实施例中,所述第二神经网络还包括位于所述运动补偿模块之后的点云重建模块;
所述点云重建模块配置成:
对所述残差信息进行上采样,得到上采样后的残差信息;
基于上采样后的残差信息和所述当前帧点云的预测信息,得到第一重建点云;
对第一重建点云进行上采样,得到所述当前帧重建点云。
如图10所示,所述点云重建模块包括第一上采样模块、第二上采样模块和第三上采样模块。
所述第一上采样模块包括:反卷积层、第一激活层、第一卷积层、第二激活层、三层初始残差网络、加法器、第二卷积层、分类层、剪枝层;
所述第一上采样模块配置成:
将残差信息依次经过所述反卷积层、第一激活层、第一卷积层、第二激活层和三层初始残差网络,得到上采样后的残差信息;
利用所述加法器将上采样后的残差信息和所述当前帧点云的预测信息相加;
将所述相加结果依次经过所述第二卷积层和分类层,确定满足占用条件的第一几何坐标集合;
所述剪枝层基于所述第一几何坐标集合对所述相加结果进行剪枝,得到所述第一重建点云。
所述第二上采样模块包括:第一反卷积层、第一激活层、第一卷积层、第二激活层、三层初始残差网络、第二卷积层、第一分类层、第一剪枝层;
所述第二上采样模块配置成:将所述第一重建点云依次经过所述第一反卷积层、第一激活层、第一卷积层、第二激活层和三层初始残差网络,得到上采样后的第一重建点云;
将所述上采样后的第一重建点云依次经过所述第二卷积层、第一分类层,确定满足占用条件的第二几何坐标集合;
所述第一剪枝层基于所述第二几何坐标集合对所述上采样后的第一重建点云进行剪枝,得到第二重建点云;
所述第三上采样模块包括:第二反卷积层、第三激活层、第三卷积层、第四激活层、三层初始残差网络、第四卷积层、第二分类层、第二剪枝层;
所述第三上采样模块配置成:将所述第二重建点云依次经过所述第二反卷积层、第三激活层、第三卷积层、第四激活层和三层初始残差网络,得到上采样后的第二重建点云;
将所述上采样后的第二重建点云依次经过所述第四卷积层和第二分类层,确定满足占用条件的第三几何坐标集合;
所述第二剪枝层基于所述第三几何坐标集合对所述上采样后的第二重建点云进行剪枝,得到所述当前帧重建点云;其中,所述第二上采样模块的第一反卷积层和所述第三上采样模块的第二反卷积层的参数不同,所述第二上采样模块的第一卷积层和所述第三上采样模块的第三卷积层参数不同。
进一步地,本申请实施例提供的解码方法还包括:训练第二神经网络。
训练第二神经网络时,由于解码端所使用的第二神经网络与编码端的第一神经网络中解码功能的部分网络结构完全相同,因此,解码端和编码端网络可以作为一个整体进行端到端自监督学习,训练完成后,编码端保留整个网络(即第一神经网络),解码端保留图10所示的部分网络(即第二神经网络)。
采样上述方案,解码端和编码端网络可以作为一个整体进行端到端自监督学习,减少人为干预,使用该网络进行解码,能够降低的失真保证重建点云质量。
在本申请的再一实施例中,基于前述实施例相同的发明构思,参见图11,其示出了本申请实施例提供的一种编码器110的组成结构示意图。如图11所示,该编码器110可以包括:确定单元1101和编码单元1102,
所述确定单元,用于确定当前帧点云,以及所述当前帧点云对应的参考帧重建点云;
所述编码单元,用于利用预设第一神经网络基于所述参考帧重建点云对所述当前帧点云进行编码,将得到的编码比特写入码流;
其中,所述第一神经网络为端到端神经网络,所述第一神经网络配置成:
基于所述参考帧重建点云和所述当前帧点云进行帧间预测,得到所述当前帧点云的运动信息和残差信息;
对所述运动信息和所述残差信息进行编码,将得到的编码比特写入码流。
在一些实施例中,所述第一神经网络包括帧间预测模块,所述帧间预测模块配置成:
基于所述参考帧重建点云和所述当前帧点云进行多尺度运动估计,得到所述运动信息;
基于解码后的运动信息和所述参考帧重建点云进行运动补偿,得到所述当前帧点云的预测信息;
基于所述当前帧点云和所述当前帧点云的预测信息,得到所述残差信息。
在一些实施例中,所述帧间预测模块包括多尺度运动估计模块,所述多尺度运动估计模块配置成:
将所述参考帧重建点云和所述当前帧点云进行连接,得到连接数据;
从连接数据中提取原始运动信息;
对所述原始运动信息进行低尺度运动估计,得到第一运动信息;
基于所述第一运动信息对所述原始运动信息进行高尺度运动估计,得到第二运动信息;
基于所述第一运动信息和所述第二运动信息,得到最终的所述运动信息。
在一些实施例中,所述多尺度运动估计模块包括提取模块,所述提取模块包括:两个卷积层以及每个卷积层后跟的激活层;
所述提取模块配置成:将所述连接数据依次输入每个卷积层,以及其后的激活层,得到所述原始运动信息。
在一些实施例中,所述多尺度运动估计模块包括第一运动估计模块,所述第一运动估计模块包括:卷积层、激活层和三层初始残差网络;
所述第一运动估计模块配置成:将所述原始运动信息依次输入到所述卷积层、所述激活层和所述三层初始残差网络进行低尺度的运动估计,得到所述第一运动信息。
在一些实施例中,所述多尺度运动估计模块包括第二运动估计模块,所述第二运动估计模块包括:反卷积层、第一剪枝层、减法器和卷积层;
所述第二运动估计模块配置成:
利用所述反卷积层对所述第一运动信息进行上采样,得到上采样后的第一运动信息;
利用所述第一剪枝层对所述上采样后的第一运动信息进行剪枝,使得剪枝后的第一运动信息与所述原始运动信息的几何坐标集合相同;
利用所述减法器将所述原始运动信息与所述剪枝后的第一运动信息相减,再利用所述卷积层进行下采样,得到所述第二运动信息。
在一些实施例中,所述多尺度运动估计模块还包括:第二剪枝层、第三剪枝层和加法器;
所述多尺度运动估计模块配置成:
利用所述第二剪枝层对所述第一运动信息进行剪枝,使得剪枝后的第一运动信息与所述残差信息的几何坐标集合相同;
利用所述第三剪枝层对所述第二运动信息进行剪枝,使得剪枝后的第二运动特征信息与所述残差信息的几何坐标集合相同;
利用所述加法器将剪枝后的第一运动信息和所述第二运动信息相加,得到最终的所述运动信息。
在一些实施例中,所述帧间预测模块包括第一压缩模块以及与所述第一压缩模块对应的第一解压缩模块;
所述第一压缩模块配置成:
对所述运动信息进行下采样;
对下采样后的运动信息进行量化和熵编码,得到所述运动信息的编码比特;
所述第一解压缩模块配置成:
对所述运动信息的编码比特进行熵解码和上采样,得到解码后的运动信息。
在一些实施例中,所述帧间预测模块包括运动补偿模块,所述运动补偿模块配置成:
从解码后的运动信息中获取所述当前帧点云中目标点的运动信息;
基于所述目标点在所述当前帧点云中的第一几何坐标和所述目标点的运动信息,确定所述目标点在所述参考帧重建点云中对应的第二几何坐标;
基于所述第二几何坐标在所述参考帧重建点云中确定K个近邻点;
基于所述K个近邻点在所述参考帧重建点云中的属性重建值,以及预设的惩罚系数,确定所述目标点在所述当前帧点云中的属性预测值;
其中,所述惩罚系数用于限制孤立点的K个近邻点的权重。
在一些实施例中,所述惩罚系数用于限制孤立点的K个近邻点的权重之和,
所述基于所述K个近邻点在所述参考帧重建点云中的属性重建值,以及预设的惩罚系数,确定所述目标点在所述当前帧点云中的属性预测值,包括:
基于所述第二几何坐标和所述K个近邻点的几何坐标确定所述K个近邻点的权重;
所述K个近邻点的权重之和大于所述惩罚系数时,基于所述惩罚系数、所述K个近邻点的权重和所述K个近邻点的属性重建值,确定所述目标点的属性预测值;
所述K个近邻点的权重之和小于或者等于所述惩罚系数时,基于所述K个近邻点的权重之和、所述K个近邻点的权重和所述K个近邻点的属性重建值,确定所述目标点的属性预测值。
在一些实施例中,所述目标点的运动信息为所述目标点在目标通道上的运动信息时,确定所述目标点在所述目标通道上的属性预测值;
其中,所述目标通道为所述当前帧点云中所有通道中的一个通道。
在一些实施例中,所述第一神经网络还包括位于所述帧间预测模块之前的第一特征提取模块和第二特征提取模块;
所述第一特征提取模块配置成:对所述参考帧重建点云进行特征提取,将所述参考帧重建点云转换成稀疏张量形式;
所述第二特征提取模块配置成:对所述当前帧点云进行特征提取,将所述当前帧点云转换成稀疏张量形式。
在一些实施例中,每个特征提取模块包括第一下采样模块和第二下采样模块;
所述第一下采样模块包括:两个卷积层、每个卷积层后跟的激活层和三层初始残差网络:
所述第二下采样模块包括:两个卷积层、每个卷积层后跟的激活层和三层初始残差网络;
所述第一下采样模块和所述第二下采样模块的卷积层参数不同。
在一些实施例中,所述第一神经网络还包括位于所述帧间预测模块之后的点云重建模块;
所述点云重建模块配置成:
对解码后的残差信息进行上采样,得到上采样后的残差信息;
基于上采样后的残差信息和所述当前帧点云的预测信息,得到第一重建点云;
对第一重建点云进行上采样,得到所述当前帧重建点云。
在一些实施例中,所述点云重建模块包括第一上采样模块,
所述第一上采样模块包括:反卷积层、第一激活层、第一卷积层、第二激活层、三层初始残差网络、加法器、第二卷积层、分类层、剪枝层;
所述第一上采样模块配置成:
将解码后的残差信息依次经过所述反卷积层、第一激活层、第一卷积层、第二激活层和三层初始残差网络,得到上采样后的残差信息;
利用所述加法器将上采样后的残差信息和所述当前帧点云的预测信息相加;
将所述相加结果依次经过所述第二卷积层和分类层,确定满足占用条件的第一几何坐标集合;
所述剪枝层基于所述第一几何坐标集合对所述相加结果进行剪枝,得到所述第一重建点云。
在一些实施例中,所述点云重建模块包括第二上采样模块和第三上采样模块,
所述第二上采样模块包括:第一反卷积层、第一激活层、第一卷积层、第二激活层、三层初始残差网络、第二卷积层、第一分类层、第一剪枝层;
所述第二上采样模块配置成:将所述第一重建点云依次经过所述第一反卷积层、第一激活层、第一卷积层、第二激活层和三层初始残差网络,得到上采样后的第一重建点云;
将所述上采样后的第一重建点云依次经过所述第二卷积层、第一分类层,确定满足占用条件的第二几何坐标集合;
所述第一剪枝层基于所述第二几何坐标集合对所述上采样后的第一重建点云进行剪枝,得到第二重建点云;
所述第三上采样模块包括:第二反卷积层、第三激活层、第三卷积层、第四激活层、三层初始残差网络、第四卷积层、第二分类层、第二剪枝层;
所述第三上采样模块配置成:将所述第二重建点云依次经过所述第二反卷积层、第三激活层、第三卷积层、第四激活层和三层初始残差网络,得到上采样后的第二重建点云;
将所述上采样后的第二重建点云依次经过所述第四卷积层和第二分类层,确定满足占用条件的第三几何坐标集合;
所述第二剪枝层基于所述第三几何坐标集合对所述上采样后的第二重建点云进行剪枝,得到所述当前帧重建点云;其中,所述第二上采样模块的第一反卷积层和所述第三上采样模块的第二反卷积层的参数不同,所述第二上采样模块的第一卷积层和所述第三上采样模块的第三卷积层参数不同。
在一些实施例中,所述第一神经网络包括第二压缩模块以及与所述第二压缩模块对应的第二解压缩模块;
所述第二压缩模块配置成:
对所述残差信息进行下采样;
对下采样后的残差信息进行量化和熵编码,得到所述残差信息的编码比特;
所述第二解压缩模块配置成:
对所述残差信息的编码比特进行熵解码,得到解码后的残差信息。
在一些实施例中,训练单元,用于获取训练样本集;其中,所述训练样本集中包括一个或多个样本点云;利用所述第一神经网络对所述训练样本集中的第一样本点云进行编码和重建,得到所述第一样本点云的码率和重建点云;基于所述第一样本点云和所述重建点云,确定所述第一样本点云的失真;基于所述第一样本点云的失真和码率计算损失值;所述损失值不满足收敛条件时,调节所述第一神经网络的网络参数;所述损失值满足收敛条件时,确定所述第一神经网络训练完成。
可以理解地,在本申请实施例中,“单元”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是模块,还可以是非模块化的。而且在本实施例中的各组成部分可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
所述集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中,基于这样的理解,本实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或processor(处理器)执行本实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
因此,本申请实施例提供了一种计算机存储介质,应用于编码器110,该计算机存储介质存储有计算机程序,所述计算机程序被第一处理器执行时实现前述实施例中任一项所述的方法。
基于上述编码器110的组成以及计算机存储介质,参见图12,其示出了本申请实施例提供的编码器110的具体硬件结构示意图。如图12所示,编码器110可以包括:第一通信接口1201、第一存储器1202和第一处理器1203;各个组件通过第一总线系统1204耦合在一起。可理解,第一总线系统1204用于实现这些组件之间的连接通信。第一总线系统1204除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图19中将各种总线都标为第一总线系统1204。其中,
第一通信接口1201,用于在与其他外部网元之间进行收发信息过程中,信号的接收和发送;
第一存储器1202,用于存储能够在第一处理器1203上运行的计算机程序;
第一处理器1203,用于在运行所述计算机程序时,执行本申请编码方法的步骤。
可以理解,本申请实施例中的第一存储器1202可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本申请描述的系统和方法的第一存储器1202旨在包括但不限于这些和任意其它适合类型的存储器。
而第一处理器1203可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过第一处理器1203中的硬件的集成逻辑电路或者软件形式的指令完成。上述的第一处理器1203可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于第一存储器1202,第一处理器1203读取第一存储器1202中的信息,结合其硬件完成上述方法的步骤。
可以理解的是,本申请描述的这些实施例可以用硬件、软件、固件、中间件、微码或其组合来实现。对于硬件实现,处理单元可以实现在一个或多个专用集成电路(Application Specific Integrated Circuits,ASIC)、数字信号处理器(Digital Signal Processing,DSP)、数字信号处理设备(DSP Device,DSPD)、可编程逻辑设备(Programmable Logic Device,PLD)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、通用处理器、控制器、微控制器、微处理器、用于执行本申请所述功能的其它电子单元或其组合中。对于软件实现,可通过执行本申请所述功能的模块(例如过程、函数等)来实现本申请所述的技术。软件代码可存储在存储器中并通过处理器执行。存储器可以在处理器中或在处理器外部实现。
可选地,作为另一个实施例,第一处理器1203还配置为在运行所述计算机程序时,执行前述实施例中任一项所述的方法。
在本申请的再一实施例中,基于前述实施例相同的发明构思,参见图13,其示出了本申请实施例提供的一种解码器130的组成结构示意图。如图13所示,该解码器130可以包括:包括获取单元1301和解码单元1302,其中,
所述获取单元1301,用于获取码流;
所述解码单元1302,用于利用预设第二神经网络解码码流得到当前帧重建点云;
其中,所述第二神经网络为端到端神经网络,所述第二神经网络配置成:
解码码流,确定当前帧点云的运动信息和残差信息;
基于所述运动信息和参考帧重建点云进行运动补偿,得到所述当前帧点云的预测信息;
基于所述残差信息和所述当前帧点云的预测信息,得到所述当前帧重建点云。
在一些实施例中,所述第二神经网络包括运动补偿模块,所述运动补偿模块配置成:
从所述运动信息中获取所述当前帧点云中目标点的运动信息;
基于所述目标点在所述当前帧点云中的第一几何坐标和所述目标点的运动信息,确定所述目标点在所述参考帧重建 点云中对应的第二几何坐标;
基于所述第二几何坐标在所述参考帧重建点云中确定K个近邻点;
基于所述K个近邻点在所述参考帧重建点云中的属性重建值,以及预设的惩罚系数,确定所述目标点在所述当前帧点云中的属性预测值;
其中,所述惩罚系数用于限制孤立点的K个近邻点的权重。
在一些实施例中,所述惩罚系数用于限制孤立点的K个近邻点的权重之和,
所述基于所述K个近邻点在所述参考帧重建点云中的属性重建值,以及预设的惩罚系数,确定所述目标点在所述当前帧点云中的属性预测值,包括:
基于所述第二几何坐标和所述K个近邻点的几何坐标确定所述K个近邻点的权重;
所述K个近邻点的权重之和大于所述惩罚系数时,基于所述惩罚系数、所述K个近邻点的权重和所述K个近邻点的属性重建值,确定所述目标点的属性预测值;
所述K个近邻点的权重之和小于或者等于所述惩罚系数时,基于所述K个近邻点的权重之和、所述K个近邻点的权重和所述K个近邻点的属性重建值,确定所述目标点的属性预测值。
在一些实施例中,所述目标点的运动信息为所述目标点在目标通道上的运动信息时,确定所述目标点在所述目标通道上的属性预测值;
其中,所述目标通道为所述当前帧点云中所有通道中的一个通道。
在一些实施例中,所述第二神经网络还包括位于所述运动补偿模块之前的第一特征提取模块;
所述第一特征提取模块配置成:对所述参考帧重建点云进行特征提取,将所述参考帧重建点云转换成稀疏张量形式。
在一些实施例中,所述第一特征提取模块包括第一下采样模块和第二下采样模块;
所述第一下采样模块包括:两个卷积层、每个卷积层后跟的激活层和三层初始残差网络:
所述第二下采样模块包括:两个卷积层、每个卷积层后跟的激活层和三层初始残差网络;
所述第一下采样模块和所述第二下采样模块的卷积层参数不同。
在一些实施例中,所述第二神经网络还包括位于所述运动补偿模块之后的点云重建模块;
所述点云重建模块配置成:
对所述残差信息进行上采样,得到上采样后的残差信息;
基于上采样后的残差信息和所述当前帧点云的预测信息,得到第一重建点云;
对第一重建点云进行上采样,得到所述当前帧重建点云。
在一些实施例中,所述点云重建模块包括第一上采样模块,
所述第一上采样模块包括:反卷积层、第一激活层、第一卷积层、第二激活层、三层初始残差网络、加法器、第二卷积层、分类层、剪枝层;
所述第一上采样模块配置成:
将残差信息依次经过所述反卷积层、第一激活层、第一卷积层、第二激活层和三层初始残差网络,得到上采样后的残差信息;
利用所述加法器将上采样后的残差信息和所述当前帧点云的预测信息相加;
将所述相加结果依次经过所述第二卷积层和分类层,确定满足占用条件的第一几何坐标集合;
所述剪枝层基于所述第一几何坐标集合对所述相加结果进行剪枝,得到所述第一重建点云。
在一些实施例中,所述点云重建模块包括第二上采样模块和第三上采样模块,
所述第二上采样模块包括:第一反卷积层、第一激活层、第一卷积层、第二激活层、三层初始残差网络、第二卷积层、第一分类层、第一剪枝层;
所述第二上采样模块配置成:将所述第一重建点云依次经过所述第一反卷积层、第一激活层、第一卷积层、第二激活层和三层初始残差网络,得到上采样后的第一重建点云;
将所述上采样后的第一重建点云依次经过所述第二卷积层、第一分类层,确定满足占用条件的第二几何坐标集合;
所述第一剪枝层基于所述第二几何坐标集合对所述上采样后的第一重建点云进行剪枝,得到第二重建点云;
所述第三上采样模块包括:第二反卷积层、第三激活层、第三卷积层、第四激活层、三层初始残差网络、第四卷积层、第二分类层、第二剪枝层;
所述第三上采样模块配置成:将所述第二重建点云依次经过所述第二反卷积层、第三激活层、第三卷积层、第四激活层和三层初始残差网络,得到上采样后的第二重建点云;
将所述上采样后的第二重建点云依次经过所述第四卷积层和第二分类层,确定满足占用条件的第三几何坐标集合;
所述第二剪枝层基于所述第三几何坐标集合对所述上采样后的第二重建点云进行剪枝,得到所述当前帧重建点云;其中,所述第二上采样模块的第一反卷积层和所述第三上采样模块的第二反卷积层的参数不同,所述第二上采样模块的第一卷积层和所述第三上采样模块的第三卷积层参数不同。
在一些实施例中,所述第二神经网络包括第二解压缩模块;
所述第二解压缩模块配置成:对所述码流中所述残差信息的编码比特进行熵解码,得到解码后的残差信息。
基于上述解码器130的组成以及计算机存储介质,参见图14,其示出了本申请实施例提供的解码器130的具体硬件结构示意图。如图14所示,解码器130可以包括:第二通信接口1401、第二存储器1402和第二处理器1403;各个组件通过第二总线系统1404耦合在一起。可理解,第二总线系统1404用于实现这些组件之间的连接通信。第二总线系统1404除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图14中将各种总线都标 为第二总线系统1404。其中,
第二通信接口1401,用于在与其他外部网元之间进行收发信息过程中,信号的接收和发送;
第二存储器1402,用于存储能够在第二处理器1403上运行的计算机程序;
第二处理器1403,用于在运行所述计算机程序时,执行本申请解码方法的步骤。
在本申请的再一实施例中,参见图15,其示出了本申请实施例提供的一种编解码系统的组成结构示意图。如图15所示,编解码系统150可以包括编码器1501和解码器1502。其中,编码器1501可以为前述实施例中任一项所述的编码器,解码器1502可以为前述实施例中任一项所述的解码器。
在本申请实施例中,该编解码系统150中,编码器利用一种端到端的神经网络进行点云编码,该网络在训练时无需额外标注样本的运动信息的训练样本集,降低了训练难度,该网络以降低码率保证点云重建质量为训练目标,使用该网络进行编码不仅能够提升点云的质量,还能够节省码率,进而提高编码效率。相应地,解码器利用第二神经网络进行点云重建,第二神经网络可以理解为第一神经网络中具备解码功能的部分网络结构,编码端和解码端的神经网络作为一个整体进行端到端自监督学习,减少人为干预,使用该网络进行解码,能够降低的失真保证重建点云质量。
需要说明的是,在本申请中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
本申请所提供的几个方法实施例中所揭露的方法,在不冲突的情况下可以任意组合,得到新的方法实施例。本申请所提供的几个产品实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的产品实施例。本申请所提供的几个方法或设备实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的方法实施例或设备实施例。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。
本申请实施例中,在编码器中,确定当前帧点云,以及所述当前帧点云对应的参考帧重建点云;利用预设第一神经网络基于所述参考帧重建点云对所述当前帧点云进行编码,将得到的编码比特写入码流;其中,所述第一神经网络为端到端神经网络,所述第一神经网络配置成:基于所述参考帧重建点云和所述当前帧点云进行帧间预测,得到所述当前帧点云的运动信息和残差信息;对所述运动信息和所述残差信息进行编码,将得到的编码比特写入码流。在解码器中,获取码流;利用预设第二神经网络解码码流得到当前帧重建点云;其中,所述第二神经网络为端到端神经网络,所述第二神经网络配置成:解码码流,确定当前帧点云的运动信息和残差信息;基于所述运动信息和参考帧重建点云进行运动补偿,得到所述当前帧点云的预测信息;基于所述残差信息和所述当前帧点云的预测信息,得到所述当前帧重建点云。这样,编码器利用一种端到端的神经网络进行点云编码,该网络在训练时无需额外标注样本的运动信息的训练样本集,降低了训练难度,该网络以降低码率保证点云重建质量为训练目标,使用该网络进行编码不仅能够提升点云的质量,还能够节省码率,进而提高编码效率。相应地,解码器利用第二神经网络进行点云重建,第二神经网络可以理解为第一神经网络中具备解码功能的部分网络结构,编码端和解码端的神经网络作为一个整体进行端到端自监督学习,减少人为干预,使用该网络进行解码,能够降低的失真保证重建点云质量。
Claims (34)
- 一种编码方法,应用于编码器,所述方法包括:确定当前帧点云,以及所述当前帧点云对应的参考帧重建点云;利用预设第一神经网络基于所述参考帧重建点云对所述当前帧点云进行编码,将得到的编码比特写入码流;其中,所述第一神经网络为端到端神经网络,所述第一神经网络配置成:基于所述参考帧重建点云和所述当前帧点云进行帧间预测,得到所述当前帧点云的运动信息和残差信息;对所述运动信息和所述残差信息进行编码,将得到的编码比特写入码流。
- 根据权利要求1所述的方法,其中,所述第一神经网络包括帧间预测模块,所述帧间预测模块配置成:基于所述参考帧重建点云和所述当前帧点云进行多尺度运动估计,得到所述运动信息;基于解码后的运动信息和所述参考帧重建点云进行运动补偿,得到所述当前帧点云的预测信息;基于所述当前帧点云和所述当前帧点云的预测信息,得到所述残差信息。
- 根据权利要求2所述的方法,其中,所述帧间预测模块包括多尺度运动估计模块,所述多尺度运动估计模块配置成:将所述参考帧重建点云和所述当前帧点云进行连接,得到连接数据;从连接数据中提取原始运动信息;对所述原始运动信息进行低尺度运动估计,得到第一运动信息;基于所述第一运动信息对所述原始运动信息进行高尺度运动估计,得到第二运动信息;基于所述第一运动信息和所述第二运动信息,得到最终的所述运动信息。
- 根据权利要求3所述的方法,其中,所述多尺度运动估计模块包括提取模块,所述提取模块包括:两个卷积层以及每个卷积层后跟的激活层;所述提取模块配置成:将所述连接数据依次输入每个卷积层,以及其后的激活层,得到所述原始运动信息。
- 根据权利要求3所述的方法,其中,所述多尺度运动估计模块包括第一运动估计模块,所述第一运动估计模块包括:卷积层、激活层和三层初始残差网络;所述第一运动估计模块配置成:将所述原始运动信息依次输入到所述卷积层、所述激活层和所述三层初始残差网络进行低尺度的运动估计,得到所述第一运动信息。
- 根据权利要求3所述的方法,其中,所述多尺度运动估计模块包括第二运动估计模块,所述第二运动估计模块包括:反卷积层、第一剪枝层、减法器和卷积层;所述第二运动估计模块配置成:利用所述反卷积层对所述第一运动信息进行上采样,得到上采样后的第一运动信息;利用所述第一剪枝层对所述上采样后的第一运动信息进行剪枝,使得剪枝后的第一运动信息与所述原始运动信息的几何坐标集合相同;利用所述减法器将所述原始运动信息与所述剪枝后的第一运动信息相减,再利用所述卷积层进行下采样,得到所述第二运动信息。
- 根据权利要求3所述的方法,其中,所述多尺度运动估计模块还包括:第二剪枝层、第三剪枝层和加法器;所述多尺度运动估计模块配置成:利用所述第二剪枝层对所述第一运动信息进行剪枝,使得剪枝后的第一运动信息与所述残差信息的几何坐标集合相同;利用所述第三剪枝层对所述第二运动信息进行剪枝,使得剪枝后的第二运动特征信息与所述残差信息的几何坐标集合相同;利用所述加法器将剪枝后的第一运动信息和所述第二运动信息相加,得到最终的所述运动信息。
- 根据权利要求2所述的方法,其中,所述帧间预测模块包括第一压缩模块以及与所述第一压缩模块对应的第一解压缩模块;所述第一压缩模块配置成:对所述运动信息进行下采样;对下采样后的运动信息进行量化和熵编码,得到所述运动信息的编码比特;所述第一解压缩模块配置成:对所述运动信息的编码比特进行熵解码和上采样,得到所述解码后的运动信息。
- 根据权利要求2所述的方法,其中,所述帧间预测模块包括运动补偿模块,所述运动补偿模块配置成:从所述解码后的运动信息中获取所述当前帧点云中目标点的运动信息;基于所述目标点在所述当前帧点云中的第一几何坐标和所述目标点的运动信息,确定所述目标点在所述参考帧重建点云中对应的第二几何坐标;基于所述第二几何坐标在所述参考帧重建点云中确定K个近邻点;基于所述K个近邻点在所述参考帧重建点云中的属性重建值,以及预设的惩罚系数,确定所述目标点在所述当前帧点云中的属性预测值;其中,所述惩罚系数用于限制孤立点的K个近邻点的权重。
- 根据权利要求9所述的方法,其中,所述惩罚系数用于限制孤立点的K个近邻点的权重之和,所述基于所述K个近邻点在所述参考帧重建点云中的属性重建值,以及预设的惩罚系数,确定所述目标点在所述当前帧点云中的属性预测值,包括:基于所述第二几何坐标和所述K个近邻点的几何坐标确定所述K个近邻点的权重;所述K个近邻点的权重之和大于所述惩罚系数时,基于所述惩罚系数、所述K个近邻点的权重和所述K个近邻点的属性重建值,确定所述目标点的属性预测值;所述K个近邻点的权重之和小于或者等于所述惩罚系数时,基于所述K个近邻点的权重之和、所述K个近邻点的权重和所述K个近邻点的属性重建值,确定所述目标点的属性预测值。
- 根据权利要求9所述的方法,其中,所述目标点的运动信息为所述目标点在目标通道上的运动信息时,确定所述目标点在所述目标通道上的属性预测值;其中,所述目标通道为所述当前帧点云中所有通道中的一个通道。
- 根据权利要求2所述的方法,其中,所述第一神经网络还包括位于所述帧间预测模块之前的第一特征提取模块和第二特征提取模块;所述第一特征提取模块配置成:对所述参考帧重建点云进行特征提取,将所述参考帧重建点云转换成稀疏张量形式;所述第二特征提取模块配置成:对所述当前帧点云进行特征提取,将所述当前帧点云转换成稀疏张量形式。
- 根据权利要求12所述的方法,其中,每个特征提取模块包括第一下采样模块和第二下采样模块;所述第一下采样模块包括:两个卷积层、每个卷积层后跟的激活层和三层初始残差网络:所述第二下采样模块包括:两个卷积层、每个卷积层后跟的激活层和三层初始残差网络;所述第一下采样模块和所述第二下采样模块的卷积层参数不同。
- 根据权利要求2所述的方法,其中,所述第一神经网络还包括位于所述帧间预测模块之后的点云重建模块;所述点云重建模块配置成:对解码后的残差信息进行上采样,得到上采样后的残差信息;基于上采样后的残差信息和所述当前帧点云的预测信息,得到第一重建点云;对第一重建点云进行上采样,得到所述当前帧重建点云。
- 根据权利要求14所述的方法,其中,所述点云重建模块包括第一上采样模块,所述第一上采样模块包括:反卷积层、第一激活层、第一卷积层、第二激活层、三层初始残差网络、加法器、第二卷积层、分类层、剪枝层;所述第一上采样模块配置成:将所述解码后的残差信息依次经过所述反卷积层、第一激活层、第一卷积层、第二激活层和三层初始残差网络,得到上采样后的残差信息;利用所述加法器将上采样后的残差信息和所述当前帧点云的预测信息相加;将所述相加结果依次经过所述第二卷积层和分类层,确定满足占用条件的第一几何坐标集合;所述剪枝层基于所述第一几何坐标集合对所述相加结果进行剪枝,得到所述第一重建点云。
- 根据权利要求14所述的方法,其中,所述点云重建模块包括第二上采样模块和第三上采样模块,所述第二上采样模块包括:第一反卷积层、第一激活层、第一卷积层、第二激活层、三层初始残差网络、第二卷积层、第一分类层、第一剪枝层;所述第二上采样模块配置成:将所述第一重建点云依次经过所述第一反卷积层、第一激活层、第一卷积层、第二激活层和三层初始残差网络,得到上采样后的第一重建点云;将所述上采样后的第一重建点云依次经过所述第二卷积层、第一分类层,确定满足占用条件的第二几何坐标集合;所述第一剪枝层基于所述第二几何坐标集合对所述上采样后的第一重建点云进行剪枝,得到第二重建点云;所述第三上采样模块包括:第二反卷积层、第三激活层、第三卷积层、第四激活层、三层初始残差网络、第四卷积层、第二分类层、第二剪枝层;所述第三上采样模块配置成:将所述第二重建点云依次经过所述第二反卷积层、第三激活层、第三卷积层、第四激活层和三层初始残差网络,得到上采样后的第二重建点云;将所述上采样后的第二重建点云依次经过所述第四卷积层和第二分类层,确定满足占用条件的第三几何坐标集合;所述第二剪枝层基于所述第三几何坐标集合对所述上采样后的第二重建点云进行剪枝,得到所述当前帧重建点云;其中,所述第二上采样模块的第一反卷积层和所述第三上采样模块的第二反卷积层的参数不同,所述第二上采样模块的第一卷积层和所述第三上采样模块的第三卷积层参数不同。
- 根据权利要求1所述的方法,其中,所述第一神经网络包括第二压缩模块以及与所述第二压缩模块对应的第二解压缩模块;所述第二压缩模块配置成:对所述残差信息进行下采样;对下采样后的残差信息进行量化和熵编码,得到所述残差信息的编码比特;所述第二解压缩模块配置成:对所述残差信息的编码比特进行熵解码,得到解码后的残差信息。
- 根据权利要求1-17任一项所述的方法,其中,所述方法还包括:获取训练样本集;其中,所述训练样本集中包括一个或多个样本点云;利用所述第一神经网络对所述训练样本集中的第一样本点云进行编码和重建,得到所述第一样本点云的码率和重建点云;基于所述第一样本点云和所述重建点云,确定所述第一样本点云的失真;基于所述第一样本点云的失真和码率计算损失值;所述损失值不满足收敛条件时,调节所述第一神经网络的网络参数;所述损失值满足收敛条件时,确定所述第一神经网络训练完成。
- 一种解码方法,应用于解码器,所述方法包括:获取码流;利用预设第二神经网络解码码流得到当前帧重建点云;其中,所述第二神经网络为端到端神经网络,所述第二神经网络配置成:解码码流,确定当前帧点云的运动信息和残差信息;基于所述运动信息和参考帧重建点云进行运动补偿,得到所述当前帧点云的预测信息;基于所述残差信息和所述当前帧点云的预测信息,得到所述当前帧重建点云。
- 根据权利要求19所述的方法,其中,所述第二神经网络包括第一解压缩模块;所述第一解压缩模块配置成:对所述码流中所述运动信息的编码比特进行熵解码和上采样,得到所述运动信息。
- 根据权利要求19所述的方法,其中,所述第二神经网络包括运动补偿模块,所述运动补偿模块配置成:从所述运动信息中获取所述当前帧点云中目标点的运动信息;基于所述目标点在所述当前帧点云中的第一几何坐标和所述目标点的运动信息,确定所述目标点在所述参考帧重建点云中对应的第二几何坐标;基于所述第二几何坐标在所述参考帧重建点云中确定K个近邻点;基于所述K个近邻点在所述参考帧重建点云中的属性重建值,以及预设的惩罚系数,确定所述目标点在所述当前帧点云中的属性预测值;其中,所述惩罚系数用于限制孤立点的K个近邻点的权重。
- 根据权利要求21所述的方法,其中,所述惩罚系数用于限制孤立点的K个近邻点的权重之和,所述基于所述K个近邻点在所述参考帧重建点云中的属性重建值,以及预设的惩罚系数,确定所述目标点在所述当前帧点云中的属性预测值,包括:基于所述第二几何坐标和所述K个近邻点的几何坐标确定所述K个近邻点的权重;所述K个近邻点的权重之和大于所述惩罚系数时,基于所述惩罚系数、所述K个近邻点的权重和所述K个近邻点的属性重建值,确定所述目标点的属性预测值;所述K个近邻点的权重之和小于或者等于所述惩罚系数时,基于所述K个近邻点的权重之和、所述K个近邻点的权重和所述K个近邻点的属性重建值,确定所述目标点的属性预测值。
- 根据权利要求21所述的方法,其中,所述目标点的运动信息为所述目标点在目标通道上的运动信息时,确定所述目标点在所述目标通道上的属性预测值;其中,所述目标通道为所述当前帧点云中所有通道中的一个通道。
- 根据权利要求21所述的方法,其中,所述第二神经网络还包括位于所述运动补偿模块之前的第一特征提取模块;所述第一特征提取模块配置成:对所述参考帧重建点云进行特征提取,将所述参考帧重建点云转换成稀疏张量形式。
- 根据权利要求24所述的方法,其中,所述第一特征提取模块包括第一下采样模块和第二下采样模块;所述第一下采样模块包括:两个卷积层、每个卷积层后跟的激活层和三层初始残差网络:所述第二下采样模块包括:两个卷积层、每个卷积层后跟的激活层和三层初始残差网络;所述第一下采样模块和所述第二下采样模块的卷积层参数不同。
- 根据权利要求21所述的方法,其中,所述第二神经网络还包括位于所述运动补偿模块之后的点云重建模块;所述点云重建模块配置成:对所述残差信息进行上采样,得到上采样后的残差信息;基于上采样后的残差信息和所述当前帧点云的预测信息,得到第一重建点云;对第一重建点云进行上采样,得到所述当前帧重建点云。
- 根据权利要求26所述的方法,其中,所述点云重建模块包括第一上采样模块,所述第一上采样模块包括:反卷积层、第一激活层、第一卷积层、第二激活层、三层初始残差网络、加法器、第二卷积层、分类层、剪枝层;所述第一上采样模块配置成:将所述残差信息依次经过所述反卷积层、第一激活层、第一卷积层、第二激活层和三层初始残差网络,得到上采样后的残差信息;利用所述加法器将上采样后的残差信息和所述当前帧点云的预测信息相加;将所述相加结果依次经过所述第二卷积层和分类层,确定满足占用条件的第一几何坐标集合;所述剪枝层基于所述第一几何坐标集合对所述相加结果进行剪枝,得到所述第一重建点云。
- 根据权利要求26所述的方法,其中,所述点云重建模块包括第二上采样模块和第三上采样模块,所述第二上采样模块包括:第一反卷积层、第一激活层、第一卷积层、第二激活层、三层初始残差网络、第二卷积层、第一分类层、第一剪枝层;所述第二上采样模块配置成:将所述第一重建点云依次经过所述第一反卷积层、第一激活层、第一卷积层、第二激活层和三层初始残差网络,得到上采样后的第一重建点云;将所述上采样后的第一重建点云依次经过所述第二卷积层、第一分类层,确定满足占用条件的第二几何坐标集合;所述第一剪枝层基于所述第二几何坐标集合对所述上采样后的第一重建点云进行剪枝,得到第二重建点云;所述第三上采样模块包括:第二反卷积层、第三激活层、第三卷积层、第四激活层、三层初始残差网络、第四卷积层、第二分类层、第二剪枝层;所述第三上采样模块配置成:将所述第二重建点云依次经过所述第二反卷积层、第三激活层、第三卷积层、第四激活层和三层初始残差网络,得到上采样后的第二重建点云;将所述上采样后的第二重建点云依次经过所述第四卷积层和第二分类层,确定满足占用条件的第三几何坐标集合;所述第二剪枝层基于所述第三几何坐标集合对所述上采样后的第二重建点云进行剪枝,得到所述当前帧重建点云;其中,所述第二上采样模块的第一反卷积层和所述第三上采样模块的第二反卷积层的参数不同,所述第二上采样模块的第一卷积层和所述第三上采样模块的第三卷积层参数不同。
- 根据权利要求19所述的方法,其中,所述第二神经网络包括第二解压缩模块;所述第二解压缩模块配置成:对所述码流中所述残差信息的编码比特进行熵解码,得到所述残差信息。
- 一种编码器,所述编码器包括确定单元和编码单元;其中,所述确定单元,用于确定当前帧点云,以及所述当前帧点云对应的参考帧重建点云;所述编码单元,用于利用预设第一神经网络基于所述参考帧重建点云对所述当前帧点云进行编码,将得到的编码比特写入码流;其中,所述第一神经网络为端到端神经网络,所述第一神经网络配置成:基于所述参考帧重建点云和所述当前帧点云进行帧间预测,得到所述当前帧点云的运动信息和残差信息;对所述运动信息和所述残差信息进行编码,将得到的编码比特写入码流。
- 一种编码器,所述编码器包括第一存储器和第一处理器;其中,所述第一存储器,用于存储能够在所述第一处理器上运行的计算机程序;所述第一处理器,用于在运行所述计算机程序时,执行如权利要求1至18任一项所述的方法。
- 一种解码器,所述解码器包括获取单元和解码单元,其中,所述获取单元,用于获取码流;所述解码单元,用于利用预设第二神经网络解码码流得到当前帧重建点云;其中,所述第二神经网络为端到端神经网络,所述第二神经网络配置成:解码码流,确定当前帧点云的运动信息和残差信息;基于所述运动信息和参考帧重建点云进行运动补偿,得到所述当前帧点云的预测信息;基于所述残差信息和所述当前帧点云的预测信息,得到所述当前帧重建点云。
- 一种解码器,所述解码器包括第二存储器和第二处理器;其中,所述第二存储器,用于存储能够在所述第二处理器上运行的计算机程序;所述第二处理器,用于在运行所述计算机程序时,执行如权利要求19至29任一项所述的方法。
- 一种计算机存储介质,其中,所述计算机存储介质存储有计算机程序,所述计算机程序被第一处理器执行时实现如权利要求1-18任一项所述的方法、或者被第二处理器执行时实现如权利要求19-29任一项所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202280087595.1A CN118614061A (zh) | 2022-01-06 | 2022-01-06 | 编解码方法、编码器、解码器以及存储介质 |
PCT/CN2022/070598 WO2023130333A1 (zh) | 2022-01-06 | 2022-01-06 | 编解码方法、编码器、解码器以及存储介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2022/070598 WO2023130333A1 (zh) | 2022-01-06 | 2022-01-06 | 编解码方法、编码器、解码器以及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023130333A1 true WO2023130333A1 (zh) | 2023-07-13 |
Family
ID=87072944
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/070598 WO2023130333A1 (zh) | 2022-01-06 | 2022-01-06 | 编解码方法、编码器、解码器以及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN118614061A (zh) |
WO (1) | WO2023130333A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116863419A (zh) * | 2023-09-04 | 2023-10-10 | 湖北省长投智慧停车有限公司 | 一种目标检测模型轻量化的方法、装置、电子设备及介质 |
CN117014633A (zh) * | 2023-10-07 | 2023-11-07 | 深圳大学 | 一种跨模态数据压缩方法、装置、设备及介质 |
CN117615148A (zh) * | 2024-01-24 | 2024-02-27 | 华中科技大学 | 一种基于多尺度框架的端到端特征图分层压缩方法 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170347120A1 (en) * | 2016-05-28 | 2017-11-30 | Microsoft Technology Licensing, Llc | Motion-compensated compression of dynamic voxelized point clouds |
EP3474231A1 (en) * | 2017-10-19 | 2019-04-24 | Thomson Licensing | Method and device for predictive encoding/decoding of a point cloud |
CN111464815A (zh) * | 2020-04-17 | 2020-07-28 | 中国科学技术大学 | 一种基于神经网络的视频编码方法及系统 |
CN113284248A (zh) * | 2021-06-10 | 2021-08-20 | 上海交通大学 | 一种点云有损压缩的编解码方法、装置和系统 |
CN113766228A (zh) * | 2020-06-05 | 2021-12-07 | Oppo广东移动通信有限公司 | 点云压缩方法、编码器、解码器及存储介质 |
-
2022
- 2022-01-06 CN CN202280087595.1A patent/CN118614061A/zh active Pending
- 2022-01-06 WO PCT/CN2022/070598 patent/WO2023130333A1/zh unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170347120A1 (en) * | 2016-05-28 | 2017-11-30 | Microsoft Technology Licensing, Llc | Motion-compensated compression of dynamic voxelized point clouds |
EP3474231A1 (en) * | 2017-10-19 | 2019-04-24 | Thomson Licensing | Method and device for predictive encoding/decoding of a point cloud |
CN111464815A (zh) * | 2020-04-17 | 2020-07-28 | 中国科学技术大学 | 一种基于神经网络的视频编码方法及系统 |
CN113766228A (zh) * | 2020-06-05 | 2021-12-07 | Oppo广东移动通信有限公司 | 点云压缩方法、编码器、解码器及存储介质 |
CN113284248A (zh) * | 2021-06-10 | 2021-08-20 | 上海交通大学 | 一种点云有损压缩的编解码方法、装置和系统 |
Non-Patent Citations (1)
Title |
---|
THANOU DORINA; CHOU PHILIP A.; FROSSARD PASCAL: "Graph-based motion estimation and compensation for dynamic 3D point cloud compression", 2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), IEEE, 27 September 2015 (2015-09-27), pages 3235 - 3239, XP032827070, DOI: 10.1109/ICIP.2015.7351401 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116863419A (zh) * | 2023-09-04 | 2023-10-10 | 湖北省长投智慧停车有限公司 | 一种目标检测模型轻量化的方法、装置、电子设备及介质 |
CN117014633A (zh) * | 2023-10-07 | 2023-11-07 | 深圳大学 | 一种跨模态数据压缩方法、装置、设备及介质 |
CN117014633B (zh) * | 2023-10-07 | 2024-04-05 | 深圳大学 | 一种跨模态数据压缩方法、装置、设备及介质 |
CN117615148A (zh) * | 2024-01-24 | 2024-02-27 | 华中科技大学 | 一种基于多尺度框架的端到端特征图分层压缩方法 |
CN117615148B (zh) * | 2024-01-24 | 2024-04-05 | 华中科技大学 | 一种基于多尺度框架的端到端特征图分层压缩方法 |
Also Published As
Publication number | Publication date |
---|---|
CN118614061A (zh) | 2024-09-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021244363A1 (zh) | 点云压缩方法、编码器、解码器及存储介质 | |
WO2023130333A1 (zh) | 编解码方法、编码器、解码器以及存储介质 | |
CN111630570A (zh) | 图像处理方法、设备及计算机可读存储介质 | |
CN115086660B (zh) | 基于点云属性预测的解码、编码方法、解码器及编码器 | |
US20230377208A1 (en) | Geometry coordinate scaling for ai-based dynamic point cloud coding | |
WO2023230996A1 (zh) | 编解码方法、编码器、解码器以及可读存储介质 | |
WO2022257145A1 (zh) | 点云属性的预测方法、装置及编解码器 | |
WO2022067775A1 (zh) | 点云的编码、解码方法、编码器、解码器以及编解码系统 | |
US20230237704A1 (en) | Point cloud decoding and encoding method, and decoder, encoder and encoding and decoding system | |
WO2023225091A1 (en) | Geometry coordinate scaling for ai-based dynamic point cloud coding | |
CN114598883A (zh) | 点云属性的预测方法、编码器、解码器及存储介质 | |
WO2023015530A1 (zh) | 点云编解码方法、编码器、解码器及计算机可读存储介质 | |
WO2022141461A1 (zh) | 点云编解码方法、编码器、解码器以及计算机存储介质 | |
CN115086716A (zh) | 点云中邻居点的选择方法、装置及编解码器 | |
WO2024159534A1 (zh) | 编解码方法、码流、编码器、解码器以及存储介质 | |
WO2024007144A1 (zh) | 编解码方法、码流、编码器、解码器以及存储介质 | |
WO2024060161A1 (zh) | 编解码方法、编码器、解码器以及存储介质 | |
WO2024103304A1 (zh) | 点云编解码方法、编码器、解码器、码流及存储介质 | |
WO2024178687A1 (zh) | 编解码方法、码流、编码器、解码器以及存储介质 | |
WO2023240455A1 (zh) | 点云编码方法、编码装置、编码设备以及存储介质 | |
WO2022170511A1 (zh) | 点云解码方法、解码器及计算机存储介质 | |
WO2023123471A1 (zh) | 编解码方法、码流、编码器、解码器以及存储介质 | |
WO2024217512A1 (en) | Method, apparatus, and medium for point cloud processing | |
WO2023201450A1 (zh) | 编解码方法、码流、编码器、解码器以及存储介质 | |
WO2023123467A1 (zh) | 编解码方法、码流、编码器、解码器以及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22917808 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |