CN115550653A

CN115550653A - Dynamic 3D point cloud coding mode rapid determination method and system based on lightweight neural network

Info

Publication number: CN115550653A
Application number: CN202211154185.2A
Authority: CN
Inventors: 李跃; 阙识澄; 万亚平; 刘杰
Original assignee: University of South China
Current assignee: University of South China
Priority date: 2022-09-21
Filing date: 2022-09-21
Publication date: 2022-12-30

Abstract

A dynamic 3D point cloud coding mode rapid determination method and system based on a lightweight neural network relate to the technical field of video coding. The invention carries out the predictive coding process of merge/skip mode according to the original coding scheme of HEVC, and obtains the input characteristic x of the T1 network by extracting and processing the characteristic information of the current coding CU ₁ X is to be ₁ Inputting a T1 network to obtain a judgment result, skipping a decision process of a subsequent mode to directly enter a division stage if the T1 network judges that the merge/skip mode is the best mode, and otherwise, continuously executing a predictive coding process of the inter2 Nx 2N mode; extracting the characteristic information of the current coding CU and processing the characteristic information to obtain the input characteristic x of the T2 network ₂ X is to be ₂ And inputting the T2 network to obtain a judgment result, if the T2 network judges that the merge/skip mode or the inter2 Nx 2N mode is the optimal coding mode, skipping the subsequent mode decision process and directly entering a division stage, otherwise, continuing the subsequent mode decision process. Compared with the prior art, the method can greatly shorten the interframe coding time of the dynamic 3D point cloud.

Description

Dynamic 3D point cloud coding mode rapid determination method and system based on lightweight neural network

Technical Field

The invention relates to the technical field of video coding, in particular to a method and a system for quickly determining a dynamic 3D point cloud coding mode based on a lightweight neural network.

Background

Three-dimensional industrial production and commercial scanning devices, such as VR devices, RGBD cameras, and light detection and ranging, are proportionately more common and less expensive than ever. These sensing devices are capable of scanning and generating large amounts of 3D data, and 3D visual representation methods such as polygonal meshes, light fields and point clouds are becoming more popular as they are able to represent 3D data in a more realistic fashion. In these three-dimensional digital representation formats, the acquisition of the point cloud makes a good compromise between convenience, authenticity and operability of the data. Point cloud techniques are also being employed more and more frequently.

The point cloud technology lays a solid foundation for the advance of the visual technology, and 3D technologies which are more and more frequently appearing in daily life in recent years include immersive Virtual Reality (VR), augmented Reality (AR), and Mixed Reality (MR). These advanced technologies have subversive roles in many applications, including historical site and art museum exploration, immersive real-time telecommunications, portrait interactive games, mobile navigation, and the like. However, because the amount of data carried by the point cloud is extremely large, the large amount of original point cloud data cannot be presented in a cached and fluidized form, which means that the encoding and compressing of the point cloud is a very complicated process and the encoding takes a very long time. To overcome such obstacles, there is then a need to produce efficient Point Cloud Compression (PCC).

Disclosure of Invention

One of the objectives of the present invention is to provide a method for quickly determining a dynamic 3D point cloud encoding mode based on a lightweight neural network, which can reduce the inter-frame encoding time of the dynamic 3D point cloud without affecting the encoding quality.

In order to solve the technical problems, the invention adopts the following technical scheme: a dynamic 3D point cloud coding mode rapid determination method based on a lightweight neural network comprises the following steps:

step1, data collection: extracting data, wherein samples are used for training of T1 and T2 networks, and the extracted features comprise: maximum bi-partition prediction distortion variance, maximum tetra-partition prediction distortion variance, coding block flag, coding block depth, quantization parameter, real-time coding order, coding unit type, front layer mode and merge/skip mode coding flag.

Step2, data processing: firstly, carrying out normalization of distinguishing geometric graph and texture graph coding on the maximum two-division prediction distortion variance and the maximum four-division prediction distortion variance in the provided characteristics; then, the two characteristics are subjected to upper and lower limit overflow truncation and are assigned as boundary values so as to be limited within the range of [0,1 ]; then, carrying out data cleaning according to the maximum binary-partition prediction distortion variance and the maximum quaternary-partition prediction distortion variance of the sample; and finally, carrying out maximum and minimum normalization on the quantization parameters.

Step3, parameter training: for a CU sample of a T1 network in a training set, using a full connection layer as an input of a lightweight neural network T1, and judging whether a merge/skip mode is an optimal mode according to the sample input by using a classifier; for a CU sample of a T2 network in a training set, using a full connection layer as an input of a lightweight neural network T2, and judging whether a merge/skip mode or an inter2 Nx 2N mode is an optimal mode or not according to the sample input by using a classifier; and finally, performing rapid mode decision according to the classification result.

Step4, model deployment: after the merge/skip mode decision is finished, if the T1 network judges that the merge/skip mode is the best mode of the current coding CU, skipping all subsequent mode decision processes, and directly entering a partition stage, or continuing the subsequent mode decision process; after the decision of the inter2 Nx 2N mode is finished, if the T2 network judges that the merge/skip mode or the inter2 Nx 2N mode has the best mode of the current coding CU, skipping all subsequent mode decision processes, and directly entering a dividing stage, otherwise, continuing the subsequent mode decision process.

Preferably, in step1, when calculating the maximum variance of the bi-partition prediction distortion or the maximum variance of the tetra-partition prediction distortion, the sub-blocks are divided by a horizontal or vertical bi-partition or tetra-partition method, and finally, a value with the maximum variance of the prediction distortion transformation in the sub-blocks is extracted as the feature.

More preferably, in step1, the T1 and T2 network features should be extracted separately, the features extracted by the T1 network are the maximum variance of the bi-partition prediction distortion, the maximum variance of the tetra-partition prediction distortion, a coding block flag, a coding block depth, a quantization parameter, a real-time coding order, a coding unit type, and a previous layer mode, and the coding flag is used only in the T2 network for the feature merge/skip mode.

Further, the calculation of the prediction distortion transformation is based on a prediction distortion transformation matrix, which is defined as follows:

wherein

Is a pixel value matrix of coding mode prediction of a current coding CU with width w and height h by a coding mode,

for the original pixel value matrix of the current encoded CU,

is a prediction distortion transformation matrix with width w and height h obtained from the prediction distortion,

is a transform coefficient of which the alpha root isThe current coding image has different values according to different categories:

according to a matrix

Calculating the variance of the predictive distortion transformation of different subblocks according to the dividing mode of a horizontal and vertical binary division or quaternary division method, wherein the maximum variance value of four subblocks under the binary division is the maximum binary division predictive distortion variance BDV, and the calculation of the maximum quaternary division predictive distortion variance QDV is defined as follows:

preferably, in step2, the normalization method for the maximum variance of the bi-partition prediction distortion and the maximum variance of the tetra-partition prediction distortion is defined as follows:

where BDV and QDV denote the maximum variance of bi-partition prediction distortion and the maximum variance of quad-partition prediction distortion, var, respectively _i Representing a horizontal, vertical, binary or quaternary division methodAnd dividing the subblock with the maximum prediction distortion transformation variance, wherein the normalization coefficient beta has different fixed values according to different coded geometries and texture maps.

More preferably, in step2, the data cleansing is performed by: for a sample of the T1 network, delete if BDV or QDV <0.2 is satisfied and the final encoding mode is non-merge/skip mode, delete if BDV or QDV >0.8 is satisfied and the final encoding mode is merge/skip mode; for the samples of the T2 network, deletion is performed if BDV or QDV <0.1 is satisfied and the final coding mode is other than merge/skip mode and inter2N × 2N mode, and deletion is performed if BDV or QDV >0.9 is satisfied and the final coding mode is merge/skip mode or inter2N × 2N mode.

More preferably, in step2, the normalization of the maximum and minimum values of the quantization parameter QP is defined as follows:

further, for the algorithm flow deployed in the codec, the T1 and T2 networks have respective use conditions, the use condition of the T1 network is that the current coding CU does not belong to a placeholder image and is not an I-frame, and the use condition of the T2 network is that the current coding CU does not belong to a placeholder image and is not an I-frame and the T1 network does not make a fast mode decision.

In addition, the invention also provides a dynamic 3D point cloud coding mode rapid determination system based on the lightweight neural network, which comprises the following steps:

a data collection module: for extracting data, samples are to be used for training of T1, T2 networks, the extracted features include: maximum bi-partition prediction distortion variance, maximum tetra-partition prediction distortion variance, coding block flag, coding block depth, quantization parameter, real-time coding order, coding unit type, front layer mode and merge/skip mode coding flag.

A data processing module: normalization for discriminating geometric and texture map coding for maximum bi-partition prediction distortion variance, maximum tetra-partition prediction distortion variance in the proposed features; then, the two characteristics are subjected to upper and lower limit overflow truncation and are assigned as boundary values so as to be limited within the range of [0,1 ]; then, carrying out data cleaning according to the maximum binary-partition prediction distortion variance and the maximum quaternary-partition prediction distortion variance of the sample; and finally, carrying out maximum and minimum normalization on the quantization parameters.

A parameter training module: the method comprises the steps that a CU sample of a T1 network in a training set is used as the input of a lightweight neural network T1, and a full connection layer is used as a classifier to judge whether a merge/skip mode is the best mode or not according to the sample input; the method is used for taking a CU sample of a T2 network in a training set as the input of a lightweight neural network T2, and using a full connection layer as a classifier to judge whether a merge/skip mode or an inter2 Nx 2N mode is the best mode or not according to the sample input; and finally, performing rapid mode decision according to the classification result.

A model deployment module: after the merge/skip mode decision is finished, if the T1 network judges that the merge/skip mode is the best mode of the current coding CU, skipping all subsequent mode decision processes, and directly entering a dividing stage, or continuing the subsequent mode decision process; after the decision of the inter2 Nx 2N mode is finished, if the T2 network judges that the merge/skip mode or the inter2 Nx 2N mode has the best mode of the current coding CU, skipping all subsequent mode decision processes, and directly entering a dividing stage, otherwise, continuing the subsequent mode decision process.

The invention uses the maximum two-division prediction distortion variance, the maximum four-division prediction distortion variance, a coding block mark, coding block depth, quality parameters, real-time coding sequence, coding unit type, a front layer mode and a merge/skip mode mark independently added by T2 as the input characteristics of a neural network, and the aim of optimizing coding time is achieved by skipping part of mode decision processes through the light-weight neural network on the premise of not causing excessive extra computational complexity. Meanwhile, the invention also preprocesses the sample set to improve the fitting ability of the lightweight neural network to the data, and judges whether the coding mode is in the acceptable range by adding the RD cost of the coding mode in the process of training the neural network, thereby assisting the training of the neural network to be compatible with the coding mode with higher coding efficiency. Therefore, the encoding quality is not reduced basically, the calculation complexity of the dynamic 3D point cloud interframe encoding is effectively reduced, and the encoding speed is improved.

Drawings

FIG. 1 is a model for extracting neural network training data in the present invention;

FIG. 2 is a flow chart of the fast decision-making for the inter-frame coding mode of dynamic 3D point cloud in the present invention;

fig. 3 is an example of a method of bi-dividing or quad-dividing one CU coding block.

Detailed Description

In order to facilitate understanding of those skilled in the art, the present invention is further described below with reference to the following examples and the accompanying drawings, which are not intended to limit the present invention.

A dynamic 3D point cloud coding mode rapid determination method based on a lightweight neural network comprises the following steps:

step1, data collection: data extraction is carried out according to a feature extraction model shown in the attached figure 1, samples are used for training of T1 and T2 networks, and extracted features comprise: maximum bi-partition prediction distortion variance, maximum tetra-partition prediction distortion variance, coding block flag, coding block depth, quantization parameter, real-time coding order, coding unit type, front layer mode and merge/skip mode coding flag.

Further, when calculating the maximum variance of the bi-partition prediction distortion or the maximum variance of the tetra-partition prediction distortion, the sub-blocks are divided by the horizontal and vertical bi-partition or tetra-partition method as shown in fig. 3, and finally, the value with the maximum variance of the prediction distortion transformation in the sub-blocks is extracted as the feature.

Furthermore, the T1 and T2 network features should be extracted separately, the features extracted by the T1 network are the first eight features, that is, the maximum variance of the second partition prediction distortion, the maximum variance of the fourth partition prediction distortion, the coded block flag, the coded block depth, the quantization parameter, the real-time coding order, the type of the coding unit, and the previous layer mode, and the coded flag is used only in the T2 network for the feature merge/skip mode.

wherein

for the original pixel value matrix of the current encoded CU,

is a transformation coefficient, wherein alpha has different values according to the category of the current coding image,

according to a matrix

The variance of the prediction distortion transformation of different sub-blocks can be calculated according to the division manner shown in fig. 3, wherein the maximum variance value of the four sub-blocks under the binary division is the maximum binary division prediction distortion variance BDV, and the calculation of the maximum quaternary division prediction distortion variance QDV is the same as defined below:

step2, data processing: firstly, carrying out normalization of distinguishing geometric graph and texture graph coding on the maximum two-division prediction distortion variance and the maximum four-division prediction distortion variance in the provided characteristics; then, in order to limit the two characteristics within the range of [0,1], performing upper and lower limit overflow truncation and assigning the two characteristics as boundary values; then, carrying out data cleaning according to the maximum two-division prediction distortion variance and the maximum four-division prediction distortion variance of the sample; finally, carrying out maximum and minimum normalization on the quantization parameters;

further, the normalization method for the maximum variance of the prediction distortion of the second partition and the maximum variance of the prediction distortion of the fourth partition is defined as follows:

where BDV and QDV denote the maximum variance of bi-partition prediction distortion and the maximum variance of quad-partition prediction distortion, var, respectively _i Representing the maximum variance of the predictive distortion transform in the sub-block divided as shown in fig. 3, the normalization coefficient β has different fixed values according to the difference between the encoded geometry and texture maps.

Furthermore, the data cleaning mode is that, for the samples of the T1 network, if BDV or QDV <0.2 is satisfied and the final coding mode is a non-merge/skip mode, the deletion is performed, if BDV or QDV >0.8 is satisfied and the final coding mode is a merge/skip mode; for the samples of the T2 network, deletion is performed if BDV or QDV <0.1 is satisfied and the final coding mode is other than merge/skip mode and inter2N × 2N mode, and deletion is performed if BDV or QDV >0.9 is satisfied and the final coding mode is merge/skip mode or inter2N × 2N mode.

Further, the normalization of the maximum and minimum values of the quantization parameter QP is defined as follows:

Step4, model deployment: according to a mode decision flow chart different from the original scheme shown in fig. 2, after the merge/skip mode decision is finished, if the T1 network judges that the merge/skip mode is the best mode of the current coding CU, skipping all subsequent mode decision processes, and directly entering a partition stage, otherwise continuing the subsequent mode decision process; after the decision of the inter2 Nx 2N mode is finished, if the T2 network judges that the merge/skip mode or the inter2 Nx 2N mode has the best mode of the current coding CU, skipping all subsequent mode decision processes, and directly entering a dividing stage, otherwise, continuing the subsequent mode decision process.

The difference variance between the luminance pixel value of the CU and the luminance pixel value predicted by the mode reflects to a large extent whether the current coding mode is the best mode, and if the difference variance between the luminance pixel value of the current CU and the reconstructed luminance pixel value is large, the current CU needs to be further divided. Therefore, the present invention takes the variance of the difference between the luminance pixel value of the current CU and the mode predicted luminance pixel value as an important input feature into the network. In addition, compared with the conventional heuristic method, the method has the advantages that the method is closer to the actual situation by taking the lightweight neural network as the self-learning classification standard, and the inter-frame coding time of V-PCC + HEVC can be effectively reduced under the condition of basically keeping the coding quality unchanged.

For the extraction model of neural network training data provided by this embodiment, as shown in fig. 1, the transparent rectangular boxes in the drawing represent the original processing steps; the dark rectangular box represents the feature extraction step; solid arrows represent CU information direction; the open arrows represent the feature transfer directions; the three-dimensional square represents a T1 feature training network, and is only opened when T2 features are extracted, and only (1) route is effective if the network is closed; route (1) represents a T1 network output > 0.5; route (2) represents that the output of the T1 network is less than or equal to 0.5. The principle of the extraction model is as follows:

step1, predicting a current coding block by using a merge/skip mode;

step2, calculating the maximum two-partition prediction distortion variance, the maximum four-partition prediction distortion variance, a coding block mark, the coding block depth, a quantization parameter, a real-time coding sequence, a coding unit type and a front layer mode of the current coding block, and recording the obtained characteristics as T1 characteristics;

step3, inputting the T1 characteristics into a T1 network if the T2 characteristics need to be extracted, judging whether a quick mode decision needs to be carried out according to a network prediction result, if so, entering Step8, otherwise, entering Step4, and if not, directly entering Step4;

step4, predicting the current coding block by using an inter2 Nx 2N mode;

step5, calculating the maximum two-partition prediction distortion variance, the maximum four-partition prediction distortion variance, a coding block mark, coding block depth, quantization parameters, a real-time coding sequence, a coding unit type, a front layer mode and a merge/skip mode coding mark of the current coding block, and marking the obtained characteristics as T2 characteristics;

step6, using the prediction of other coding modes for the current coding block;

step7, marking the stored T1 or T2 characteristic set according to the mode prediction result;

and Step8, performing a subsequent encoding process.

The extraction of the T1 and T2 features is performed twice, only one model feature is extracted each time, and when the T2 features are extracted, the coding unit sample which needs to be subjected to the rapid mode decision for the T1 network judgment does not enter a feature set.

The method for labeling in Step7 is as follows:

if the current extracted T1 feature is T1, marking the coding unit sample which finally selects the merge/skip mode as the optimal coding mode as 0, and marking the CU sample which selects other modes as the optimal coding mode as 1;

and if the currently extracted T2 features are T2 features, marking the coding unit sample which finally selects the merge/skip mode or the inter2 Nx 2N mode as the optimal coding mode as 0, and marking the coding unit sample which selects other modes as the optimal coding mode as 1.

For the flow chart of the dynamic 3D point cloud-oriented inter-frame coding mode fast decision in this embodiment, as shown in fig. 2, a solid arrow represents a coding unit information rendering direction; the hollow arrow represents the information presentation direction of the code unit placeholder map; the dashed arrow represents the mode-skipping direction; the rectangular box represents the mode decision process; the solid blocks represent neural network modules. The principle of the process is as follows:

step1, predicting the current coding unit by using merge/skip mode;

step2, obtaining input characteristics of the T1 network according to the mode prediction result of the current coding block and the corresponding occupancy information in the occupancy map, inputting the input characteristics into the T1 network to obtain a neural network prediction result, judging whether a subsequent coding mode needs to be skipped according to the result, and if not, entering Step3, and if so, entering Step6;

step3, predicting the current coding unit by using an inter2 Nx 2N mode;

step4, obtaining input characteristics of the T2 network according to the mode prediction result of the current coding block and the corresponding occupancy information in the occupancy map, inputting the input characteristics into the T2 network to obtain a neural network prediction result, judging whether a subsequent coding mode needs to be skipped according to the result, and if not, entering Step5, and if so, entering Step6;

step5, using asymmetric and intra mode predictive coding for the current coding unit;

and Step6, further coding unit division is carried out on the current coding unit.

In addition, the coding block of CU in fig. 3 is divided into two and four parts, wherein the width and height of a current coding block may be any one of the four cases of 64 × 64, 32 × 32, 16 × 16 and 8 × 8, which are collectively expressed as 2N × 2N, and in order to reduce the overall variance of the prediction distortion matrix to describe the error generated by the prediction of the coding unit by the current coding mode, the entire 2N × 2N block is divided into two 2N × N sub-regions, two N × 2N sub-regions and four N × 2N sub-regions according to four different methods, i.e., the whole 2N × 2N block is divided into two 2N × N sub-regions, two N × 2N sub-regions and four sub-regions

Sub-area, divided into four

A sub-region. And respectively calculating the maximum variance of each subregion.

Next, a simulation experiment is performed to verify the encoding performance of the dynamic 3D point cloud encoding mode fast determination method based on the lightweight neural network proposed in this embodiment.

In order to evaluate the feasibility and effectiveness of the method, dynamic 3D point cloud coding reference software TMC2-v15.0 and HEVC reference software HM16.18+ SCM8.8 are independently executed as a test platform. The test sequences included 5 different test sequences with a resolution of 1024 × 1024 × 1024 supplied by 8 iVSLF: queen, root, readandblack, soldier, longaddress. The coding quantization parameter combinations (QPs) are set to ([ 32, 42], [28, 37], [24, 32], [20, 27], [16, 22 ]), and the coding configuration is in RA (Random Access) mode. Point-to-point errors (D1) and point-to-plane errors (D2) are used to evaluate the geometry map distortion, mean square errors (D) for different properties (Luma, chroma Cb, chroma Cr) are used to evaluate the texture map distortion, and Δ T is used to measure the time savings. The mathematical definition of the above metrics is:

wherein

The encoding time of the original test pattern representing the ith set of QPs for a test sequence,

which represents the coding time after the method provided by the embodiment is applied to the original test model.

Table 1: comparison of the Performance of this example with that of TMC2v15.0+ HM16.20+ SCM8.8 (unit:%)

As can be seen from the comparison results of the coding time saving and the coding rate increasing in table 1, the method provided by this embodiment saves the coding time of each test sequence by 55.2%, 53.8%, 38.4%, 52.1% and 36.8% on average, the total time saving is 47.3% on average, while the mean square error for measuring the coding distortion of the geometric figure increases by-0.5%, -0.5% and 0.1%, -0.2%, the mean square error for measuring the coding distortion of the attribute figure increases by 0.9%, -1.3%, 0.5% and 0.1%, -1.3%, -0.3%.

The experimental result can show that the method can effectively generate good time optimization effect on different point cloud sequences, and the reduction of the video coding quality is within an acceptable range. In summary, the method of the present invention can effectively realize the balance between the saving of coding time and the increase of code rate under the condition that human eyes can accept the reduction range of coding quality.

In recent years, in view of the excellent performance of neural networks in the fields of computer vision and video coding compression, the neural network for point cloud video coding needs to be researched, and in order to meet the requirement of saving time, a lightweight neural network is used for quickly determining the process of coding mode decision, so that on the basis of saving redundant coding time, the extra time consumption brought by an algorithm is reduced as much as possible, and the reduction of video coding quality is within an acceptable degree according to the characteristics of the algorithm. The invention improves and creates an algorithm on V-PCC, in which point cloud is firstly divided into 3D patches, then the 3D patches are projected on a two-dimensional (2D) plane and packed into geometric and attribute videos, then blanks in the geometric videos and the attribute videos are filled to maintain spatial continuity and improve video compression efficiency, and finally, the geometric videos and the attribute videos are compressed by High Efficiency Video Coding (HEVC). The coding computation complexity is effectively reduced under the condition of ensuring that the coding quality is almost unchanged.

Claims

1. The method for quickly determining the dynamic 3D point cloud coding mode based on the lightweight neural network is characterized by comprising the following steps of:

step1, data collection: extracting data, wherein samples are used for training of T1 and T2 networks, and the extracted features comprise: maximum two-partition prediction distortion variance, maximum four-partition prediction distortion variance, coding block flag, coding block depth, quantization parameter, real-time coding sequence, coding unit type, front layer mode and merge/skip mode coding flag;

step2, data processing: firstly, carrying out normalization of distinguishing geometric graph and texture graph coding on the maximum two-division prediction distortion variance and the maximum four-division prediction distortion variance in the provided characteristics; then, the two characteristics are subjected to upper and lower limit overflow truncation and are assigned as boundary values so as to be limited within the range of [0,1 ]; then, carrying out data cleaning according to the maximum two-division prediction distortion variance and the maximum four-division prediction distortion variance of the sample; finally, carrying out maximum and minimum normalization on the quantization parameters;

step3, parameter training: for a CU sample of a T1 network in a training set, using a full connection layer as an input of a lightweight neural network T1, and judging whether a merge/skip mode is an optimal mode according to the sample input by using a classifier; for a CU sample of a T2 network in a training set, using a full connection layer as an input of a lightweight neural network T2, and judging whether a merge/skip mode or an inter2 Nx 2N mode is an optimal mode or not according to the sample input by using a classifier; finally, performing rapid mode decision according to the classification result;

step4, model deployment: after the merge/skip mode decision is finished, if the T1 network judges that the merge/skip mode is the best mode of the current coding CU, skipping all subsequent mode decision processes, and directly entering a partitioning stage, otherwise, continuing the subsequent mode decision process; after the decision of the inter2 Nx 2N mode is finished, if the T2 network judges that the merge/skip mode or the inter2 Nx 2N mode has the best mode of the current coding CU, skipping all subsequent mode decision processes, and directly entering a dividing stage, otherwise, continuing the subsequent mode decision process.

2. The method for rapidly determining the encoding mode of the dynamic 3D point cloud based on the lightweight neural network as claimed in claim 1, wherein: in step1, when the maximum variance of the bi-partition prediction distortion or the maximum variance of the tetra-partition prediction distortion is calculated, the sub-blocks are divided by a horizontal and vertical bi-partition or tetra-partition method, and finally, the value with the maximum variance of the prediction distortion transformation in the sub-blocks is extracted as the feature.

3. The method for rapidly determining the encoding mode of the dynamic 3D point cloud based on the lightweight neural network as claimed in claim 1, wherein: in step1, the characteristics of the T1 and T2 networks are extracted respectively, the characteristics extracted by the T1 network are the maximum variance of the second partition prediction distortion, the maximum variance of the fourth partition prediction distortion, a coding block flag, a coding block depth, a quantization parameter, a real-time coding sequence, a coding unit type and a previous layer mode, and the characteristic merge/skip mode coding flag is only used in the T2 network.

4. The method for rapidly determining the encoding mode of the dynamic 3D point cloud based on the lightweight neural network as claimed in claim 2, wherein: the calculation of the prediction distortion transformation is based on a prediction distortion transformation matrix, which is defined as follows:

wherein

for the original pixel value matrix of the current encoded CU,

is a prediction distortion transformation matrix with width w and height h obtained according to the prediction distortion,

is a transform coefficient, where α has different values according to the type of the current encoded image:

according to a matrix

BDV＝max{Var _i ^h,B ,Var _i ^v,B },i∈{1,2}

QDV＝max{Var _i ^h,Q ,Var _i ^v,Q },i∈{1,2,3,4}。

5. the method for rapidly determining the encoding mode of the dynamic 3D point cloud based on the lightweight neural network as claimed in claim 2, wherein: in step2, the normalization method for the maximum variance of the bi-partition prediction distortion and the maximum variance of the tetra-partition prediction distortion is defined as follows:

where BDV and QDV denote the maximum variance of bi-partition prediction distortion and the maximum variance of quad-partition prediction distortion, var, respectively _i Method for representing predictive distortion transformation in subblocks divided in horizontal, vertical bi-or quad-division methodThe maximum difference value, the normalization coefficient beta has different fixed values according to the difference between the coded geometry and the texture map.

6. The method for rapidly determining the encoding mode of the dynamic 3D point cloud based on the lightweight neural network as claimed in claim 1, wherein: in step2, the data cleaning mode is as follows: for a sample of the T1 network, delete if BDV or QDV <0.2 is satisfied and the final encoding mode is non-merge/skip mode, delete if BDV or QDV >0.8 is satisfied and the final encoding mode is merge/skip mode; for the samples of the T2 network, deletion is performed if BDV or QDV <0.1 is satisfied and the final coding mode is other than merge/skip mode and inter2N × 2N mode, and deletion is performed if BDV or QDV >0.9 is satisfied and the final coding mode is merge/skip mode or inter2N × 2N mode.

7. The method for rapidly determining the encoding mode of the dynamic 3D point cloud based on the lightweight neural network as claimed in claim 1, wherein: in step2, the normalization of the maximum and minimum values of the quantization parameter QP is defined as follows:

8. the method for rapidly determining the encoding mode of the dynamic 3D point cloud based on the lightweight neural network as claimed in claim 1, wherein: for the algorithm flow deployed in the codec, the T1 and T2 networks have respective use conditions, the use condition of the T1 network is that the current coding CU does not belong to a placeholder map image and is not an I-frame, and the use condition of the T2 network is that the current coding CU does not belong to a placeholder map image and is not an I-frame and the T1 network does not make a fast mode decision.

9. A dynamic 3D point cloud coding mode rapid determination system based on a lightweight neural network is characterized by comprising the following steps:

a data collection module: for extracting data, samples to be used for training of T1, T2 networks, the extracted features including: maximum two-partition prediction distortion variance, maximum four-partition prediction distortion variance, coding block flag, coding block depth, quantization parameter, real-time coding sequence, coding unit type, front layer mode and merge/skip mode coding flag;

a data processing module: normalization for discriminating geometric and texture map coding for maximum variance of bi-partition prediction distortion and maximum variance of tetra-partition prediction distortion in the proposed features; then, the two characteristics are subjected to upper and lower limit overflow truncation and are assigned as boundary values so as to be limited within the range of [0,1 ]; then, carrying out data cleaning according to the maximum two-division prediction distortion variance and the maximum four-division prediction distortion variance of the sample; finally, carrying out maximum and minimum normalization on the quantization parameters;

a parameter training module: the method comprises the steps that CU samples of a T1 network in a training set are used as input of a lightweight neural network T1, and a full connection layer is used as a classifier to judge whether a merge/skip mode is an optimal mode or not according to sample input; the method is used for taking a CU sample of a T2 network in a training set as the input of a lightweight neural network T2, and using a full connection layer as a classifier to judge whether a merge/skip mode or an inter2 Nx 2N mode is the best mode or not according to the sample input; finally, performing rapid mode decision according to the classification result;

10. The system for fast determining a dynamic 3D point cloud coding mode based on a lightweight neural network as claimed in claim 9, wherein: the method of any of claims 2-8, operating by the method of fast determination of a lightweight neural network based dynamic 3D point cloud encoding mode.