CN115550653A - Dynamic 3D point cloud coding mode rapid determination method and system based on lightweight neural network - Google Patents

Dynamic 3D point cloud coding mode rapid determination method and system based on lightweight neural network Download PDF

Info

Publication number
CN115550653A
CN115550653A CN202211154185.2A CN202211154185A CN115550653A CN 115550653 A CN115550653 A CN 115550653A CN 202211154185 A CN202211154185 A CN 202211154185A CN 115550653 A CN115550653 A CN 115550653A
Authority
CN
China
Prior art keywords
mode
coding
maximum
variance
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211154185.2A
Other languages
Chinese (zh)
Inventor
李跃
阙识澄
万亚平
刘杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of South China
Original Assignee
University of South China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of South China filed Critical University of South China
Priority to CN202211154185.2A priority Critical patent/CN115550653A/en
Publication of CN115550653A publication Critical patent/CN115550653A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding

Abstract

A dynamic 3D point cloud coding mode rapid determination method and system based on a lightweight neural network relate to the technical field of video coding. The invention carries out the predictive coding process of merge/skip mode according to the original coding scheme of HEVC, and obtains the input characteristic x of the T1 network by extracting and processing the characteristic information of the current coding CU 1 X is to be 1 Inputting a T1 network to obtain a judgment result, skipping a decision process of a subsequent mode to directly enter a division stage if the T1 network judges that the merge/skip mode is the best mode, and otherwise, continuously executing a predictive coding process of the inter2 Nx 2N mode; extracting the characteristic information of the current coding CU and processing the characteristic information to obtain the input characteristic x of the T2 network 2 X is to be 2 And inputting the T2 network to obtain a judgment result, if the T2 network judges that the merge/skip mode or the inter2 Nx 2N mode is the optimal coding mode, skipping the subsequent mode decision process and directly entering a division stage, otherwise, continuing the subsequent mode decision process. Compared with the prior art, the method can greatly shorten the interframe coding time of the dynamic 3D point cloud.

Description

Dynamic 3D point cloud coding mode rapid determination method and system based on lightweight neural network
Technical Field
The invention relates to the technical field of video coding, in particular to a method and a system for quickly determining a dynamic 3D point cloud coding mode based on a lightweight neural network.
Background
Three-dimensional industrial production and commercial scanning devices, such as VR devices, RGBD cameras, and light detection and ranging, are proportionately more common and less expensive than ever. These sensing devices are capable of scanning and generating large amounts of 3D data, and 3D visual representation methods such as polygonal meshes, light fields and point clouds are becoming more popular as they are able to represent 3D data in a more realistic fashion. In these three-dimensional digital representation formats, the acquisition of the point cloud makes a good compromise between convenience, authenticity and operability of the data. Point cloud techniques are also being employed more and more frequently.
The point cloud technology lays a solid foundation for the advance of the visual technology, and 3D technologies which are more and more frequently appearing in daily life in recent years include immersive Virtual Reality (VR), augmented Reality (AR), and Mixed Reality (MR). These advanced technologies have subversive roles in many applications, including historical site and art museum exploration, immersive real-time telecommunications, portrait interactive games, mobile navigation, and the like. However, because the amount of data carried by the point cloud is extremely large, the large amount of original point cloud data cannot be presented in a cached and fluidized form, which means that the encoding and compressing of the point cloud is a very complicated process and the encoding takes a very long time. To overcome such obstacles, there is then a need to produce efficient Point Cloud Compression (PCC).
Disclosure of Invention
One of the objectives of the present invention is to provide a method for quickly determining a dynamic 3D point cloud encoding mode based on a lightweight neural network, which can reduce the inter-frame encoding time of the dynamic 3D point cloud without affecting the encoding quality.
In order to solve the technical problems, the invention adopts the following technical scheme: a dynamic 3D point cloud coding mode rapid determination method based on a lightweight neural network comprises the following steps:
step1, data collection: extracting data, wherein samples are used for training of T1 and T2 networks, and the extracted features comprise: maximum bi-partition prediction distortion variance, maximum tetra-partition prediction distortion variance, coding block flag, coding block depth, quantization parameter, real-time coding order, coding unit type, front layer mode and merge/skip mode coding flag.
Step2, data processing: firstly, carrying out normalization of distinguishing geometric graph and texture graph coding on the maximum two-division prediction distortion variance and the maximum four-division prediction distortion variance in the provided characteristics; then, the two characteristics are subjected to upper and lower limit overflow truncation and are assigned as boundary values so as to be limited within the range of [0,1 ]; then, carrying out data cleaning according to the maximum binary-partition prediction distortion variance and the maximum quaternary-partition prediction distortion variance of the sample; and finally, carrying out maximum and minimum normalization on the quantization parameters.
Step3, parameter training: for a CU sample of a T1 network in a training set, using a full connection layer as an input of a lightweight neural network T1, and judging whether a merge/skip mode is an optimal mode according to the sample input by using a classifier; for a CU sample of a T2 network in a training set, using a full connection layer as an input of a lightweight neural network T2, and judging whether a merge/skip mode or an inter2 Nx 2N mode is an optimal mode or not according to the sample input by using a classifier; and finally, performing rapid mode decision according to the classification result.
Step4, model deployment: after the merge/skip mode decision is finished, if the T1 network judges that the merge/skip mode is the best mode of the current coding CU, skipping all subsequent mode decision processes, and directly entering a partition stage, or continuing the subsequent mode decision process; after the decision of the inter2 Nx 2N mode is finished, if the T2 network judges that the merge/skip mode or the inter2 Nx 2N mode has the best mode of the current coding CU, skipping all subsequent mode decision processes, and directly entering a dividing stage, otherwise, continuing the subsequent mode decision process.
Preferably, in step1, when calculating the maximum variance of the bi-partition prediction distortion or the maximum variance of the tetra-partition prediction distortion, the sub-blocks are divided by a horizontal or vertical bi-partition or tetra-partition method, and finally, a value with the maximum variance of the prediction distortion transformation in the sub-blocks is extracted as the feature.
More preferably, in step1, the T1 and T2 network features should be extracted separately, the features extracted by the T1 network are the maximum variance of the bi-partition prediction distortion, the maximum variance of the tetra-partition prediction distortion, a coding block flag, a coding block depth, a quantization parameter, a real-time coding order, a coding unit type, and a previous layer mode, and the coding flag is used only in the T2 network for the feature merge/skip mode.
Further, the calculation of the prediction distortion transformation is based on a prediction distortion transformation matrix, which is defined as follows:
Figure BDA0003857759590000031
wherein
Figure BDA0003857759590000032
Is a pixel value matrix of coding mode prediction of a current coding CU with width w and height h by a coding mode,
Figure BDA0003857759590000041
for the original pixel value matrix of the current encoded CU,
Figure BDA0003857759590000042
is a prediction distortion transformation matrix with width w and height h obtained from the prediction distortion,
Figure BDA0003857759590000043
is a transform coefficient of which the alpha root isThe current coding image has different values according to different categories:
Figure BDA0003857759590000044
according to a matrix
Figure BDA0003857759590000045
Calculating the variance of the predictive distortion transformation of different subblocks according to the dividing mode of a horizontal and vertical binary division or quaternary division method, wherein the maximum variance value of four subblocks under the binary division is the maximum binary division predictive distortion variance BDV, and the calculation of the maximum quaternary division predictive distortion variance QDV is defined as follows:
Figure BDA0003857759590000046
Figure BDA0003857759590000047
preferably, in step2, the normalization method for the maximum variance of the bi-partition prediction distortion and the maximum variance of the tetra-partition prediction distortion is defined as follows:
Figure BDA0003857759590000048
Figure BDA0003857759590000049
Figure BDA00038577595900000410
where BDV and QDV denote the maximum variance of bi-partition prediction distortion and the maximum variance of quad-partition prediction distortion, var, respectively i Representing a horizontal, vertical, binary or quaternary division methodAnd dividing the subblock with the maximum prediction distortion transformation variance, wherein the normalization coefficient beta has different fixed values according to different coded geometries and texture maps.
More preferably, in step2, the data cleansing is performed by: for a sample of the T1 network, delete if BDV or QDV <0.2 is satisfied and the final encoding mode is non-merge/skip mode, delete if BDV or QDV >0.8 is satisfied and the final encoding mode is merge/skip mode; for the samples of the T2 network, deletion is performed if BDV or QDV <0.1 is satisfied and the final coding mode is other than merge/skip mode and inter2N × 2N mode, and deletion is performed if BDV or QDV >0.9 is satisfied and the final coding mode is merge/skip mode or inter2N × 2N mode.
More preferably, in step2, the normalization of the maximum and minimum values of the quantization parameter QP is defined as follows:
Figure BDA0003857759590000051
further, for the algorithm flow deployed in the codec, the T1 and T2 networks have respective use conditions, the use condition of the T1 network is that the current coding CU does not belong to a placeholder image and is not an I-frame, and the use condition of the T2 network is that the current coding CU does not belong to a placeholder image and is not an I-frame and the T1 network does not make a fast mode decision.
In addition, the invention also provides a dynamic 3D point cloud coding mode rapid determination system based on the lightweight neural network, which comprises the following steps:
a data collection module: for extracting data, samples are to be used for training of T1, T2 networks, the extracted features include: maximum bi-partition prediction distortion variance, maximum tetra-partition prediction distortion variance, coding block flag, coding block depth, quantization parameter, real-time coding order, coding unit type, front layer mode and merge/skip mode coding flag.
A data processing module: normalization for discriminating geometric and texture map coding for maximum bi-partition prediction distortion variance, maximum tetra-partition prediction distortion variance in the proposed features; then, the two characteristics are subjected to upper and lower limit overflow truncation and are assigned as boundary values so as to be limited within the range of [0,1 ]; then, carrying out data cleaning according to the maximum binary-partition prediction distortion variance and the maximum quaternary-partition prediction distortion variance of the sample; and finally, carrying out maximum and minimum normalization on the quantization parameters.
A parameter training module: the method comprises the steps that a CU sample of a T1 network in a training set is used as the input of a lightweight neural network T1, and a full connection layer is used as a classifier to judge whether a merge/skip mode is the best mode or not according to the sample input; the method is used for taking a CU sample of a T2 network in a training set as the input of a lightweight neural network T2, and using a full connection layer as a classifier to judge whether a merge/skip mode or an inter2 Nx 2N mode is the best mode or not according to the sample input; and finally, performing rapid mode decision according to the classification result.
A model deployment module: after the merge/skip mode decision is finished, if the T1 network judges that the merge/skip mode is the best mode of the current coding CU, skipping all subsequent mode decision processes, and directly entering a dividing stage, or continuing the subsequent mode decision process; after the decision of the inter2 Nx 2N mode is finished, if the T2 network judges that the merge/skip mode or the inter2 Nx 2N mode has the best mode of the current coding CU, skipping all subsequent mode decision processes, and directly entering a dividing stage, otherwise, continuing the subsequent mode decision process.
The invention uses the maximum two-division prediction distortion variance, the maximum four-division prediction distortion variance, a coding block mark, coding block depth, quality parameters, real-time coding sequence, coding unit type, a front layer mode and a merge/skip mode mark independently added by T2 as the input characteristics of a neural network, and the aim of optimizing coding time is achieved by skipping part of mode decision processes through the light-weight neural network on the premise of not causing excessive extra computational complexity. Meanwhile, the invention also preprocesses the sample set to improve the fitting ability of the lightweight neural network to the data, and judges whether the coding mode is in the acceptable range by adding the RD cost of the coding mode in the process of training the neural network, thereby assisting the training of the neural network to be compatible with the coding mode with higher coding efficiency. Therefore, the encoding quality is not reduced basically, the calculation complexity of the dynamic 3D point cloud interframe encoding is effectively reduced, and the encoding speed is improved.
Drawings
FIG. 1 is a model for extracting neural network training data in the present invention;
FIG. 2 is a flow chart of the fast decision-making for the inter-frame coding mode of dynamic 3D point cloud in the present invention;
fig. 3 is an example of a method of bi-dividing or quad-dividing one CU coding block.
Detailed Description
In order to facilitate understanding of those skilled in the art, the present invention is further described below with reference to the following examples and the accompanying drawings, which are not intended to limit the present invention.
A dynamic 3D point cloud coding mode rapid determination method based on a lightweight neural network comprises the following steps:
step1, data collection: data extraction is carried out according to a feature extraction model shown in the attached figure 1, samples are used for training of T1 and T2 networks, and extracted features comprise: maximum bi-partition prediction distortion variance, maximum tetra-partition prediction distortion variance, coding block flag, coding block depth, quantization parameter, real-time coding order, coding unit type, front layer mode and merge/skip mode coding flag.
Further, when calculating the maximum variance of the bi-partition prediction distortion or the maximum variance of the tetra-partition prediction distortion, the sub-blocks are divided by the horizontal and vertical bi-partition or tetra-partition method as shown in fig. 3, and finally, the value with the maximum variance of the prediction distortion transformation in the sub-blocks is extracted as the feature.
Furthermore, the T1 and T2 network features should be extracted separately, the features extracted by the T1 network are the first eight features, that is, the maximum variance of the second partition prediction distortion, the maximum variance of the fourth partition prediction distortion, the coded block flag, the coded block depth, the quantization parameter, the real-time coding order, the type of the coding unit, and the previous layer mode, and the coded flag is used only in the T2 network for the feature merge/skip mode.
Further, the calculation of the prediction distortion transformation is based on a prediction distortion transformation matrix, which is defined as follows:
Figure BDA0003857759590000081
wherein
Figure BDA0003857759590000082
Is a pixel value matrix of coding mode prediction of a current coding CU with width w and height h by a coding mode,
Figure BDA0003857759590000083
for the original pixel value matrix of the current encoded CU,
Figure BDA0003857759590000084
is a prediction distortion transformation matrix with width w and height h obtained from the prediction distortion,
Figure BDA0003857759590000085
is a transformation coefficient, wherein alpha has different values according to the category of the current coding image,
Figure BDA0003857759590000086
according to a matrix
Figure BDA0003857759590000087
The variance of the prediction distortion transformation of different sub-blocks can be calculated according to the division manner shown in fig. 3, wherein the maximum variance value of the four sub-blocks under the binary division is the maximum binary division prediction distortion variance BDV, and the calculation of the maximum quaternary division prediction distortion variance QDV is the same as defined below:
Figure BDA0003857759590000088
Figure BDA0003857759590000091
step2, data processing: firstly, carrying out normalization of distinguishing geometric graph and texture graph coding on the maximum two-division prediction distortion variance and the maximum four-division prediction distortion variance in the provided characteristics; then, in order to limit the two characteristics within the range of [0,1], performing upper and lower limit overflow truncation and assigning the two characteristics as boundary values; then, carrying out data cleaning according to the maximum two-division prediction distortion variance and the maximum four-division prediction distortion variance of the sample; finally, carrying out maximum and minimum normalization on the quantization parameters;
further, the normalization method for the maximum variance of the prediction distortion of the second partition and the maximum variance of the prediction distortion of the fourth partition is defined as follows:
Figure BDA0003857759590000092
Figure BDA0003857759590000093
Figure BDA0003857759590000094
where BDV and QDV denote the maximum variance of bi-partition prediction distortion and the maximum variance of quad-partition prediction distortion, var, respectively i Representing the maximum variance of the predictive distortion transform in the sub-block divided as shown in fig. 3, the normalization coefficient β has different fixed values according to the difference between the encoded geometry and texture maps.
Furthermore, the data cleaning mode is that, for the samples of the T1 network, if BDV or QDV <0.2 is satisfied and the final coding mode is a non-merge/skip mode, the deletion is performed, if BDV or QDV >0.8 is satisfied and the final coding mode is a merge/skip mode; for the samples of the T2 network, deletion is performed if BDV or QDV <0.1 is satisfied and the final coding mode is other than merge/skip mode and inter2N × 2N mode, and deletion is performed if BDV or QDV >0.9 is satisfied and the final coding mode is merge/skip mode or inter2N × 2N mode.
Further, the normalization of the maximum and minimum values of the quantization parameter QP is defined as follows:
Figure BDA0003857759590000101
step3, parameter training: for a CU sample of a T1 network in a training set, using a full connection layer as an input of a lightweight neural network T1, and judging whether a merge/skip mode is an optimal mode according to the sample input by using a classifier; for a CU sample of a T2 network in a training set, using a full connection layer as an input of a lightweight neural network T2, and judging whether a merge/skip mode or an inter2 Nx 2N mode is an optimal mode or not according to the sample input by using a classifier; and finally, performing rapid mode decision according to the classification result.
Step4, model deployment: according to a mode decision flow chart different from the original scheme shown in fig. 2, after the merge/skip mode decision is finished, if the T1 network judges that the merge/skip mode is the best mode of the current coding CU, skipping all subsequent mode decision processes, and directly entering a partition stage, otherwise continuing the subsequent mode decision process; after the decision of the inter2 Nx 2N mode is finished, if the T2 network judges that the merge/skip mode or the inter2 Nx 2N mode has the best mode of the current coding CU, skipping all subsequent mode decision processes, and directly entering a dividing stage, otherwise, continuing the subsequent mode decision process.
Further, for the algorithm flow deployed in the codec, the T1 and T2 networks have respective use conditions, the use condition of the T1 network is that the current coding CU does not belong to a placeholder image and is not an I-frame, and the use condition of the T2 network is that the current coding CU does not belong to a placeholder image and is not an I-frame and the T1 network does not make a fast mode decision.
The difference variance between the luminance pixel value of the CU and the luminance pixel value predicted by the mode reflects to a large extent whether the current coding mode is the best mode, and if the difference variance between the luminance pixel value of the current CU and the reconstructed luminance pixel value is large, the current CU needs to be further divided. Therefore, the present invention takes the variance of the difference between the luminance pixel value of the current CU and the mode predicted luminance pixel value as an important input feature into the network. In addition, compared with the conventional heuristic method, the method has the advantages that the method is closer to the actual situation by taking the lightweight neural network as the self-learning classification standard, and the inter-frame coding time of V-PCC + HEVC can be effectively reduced under the condition of basically keeping the coding quality unchanged.
For the extraction model of neural network training data provided by this embodiment, as shown in fig. 1, the transparent rectangular boxes in the drawing represent the original processing steps; the dark rectangular box represents the feature extraction step; solid arrows represent CU information direction; the open arrows represent the feature transfer directions; the three-dimensional square represents a T1 feature training network, and is only opened when T2 features are extracted, and only (1) route is effective if the network is closed; route (1) represents a T1 network output > 0.5; route (2) represents that the output of the T1 network is less than or equal to 0.5. The principle of the extraction model is as follows:
step1, predicting a current coding block by using a merge/skip mode;
step2, calculating the maximum two-partition prediction distortion variance, the maximum four-partition prediction distortion variance, a coding block mark, the coding block depth, a quantization parameter, a real-time coding sequence, a coding unit type and a front layer mode of the current coding block, and recording the obtained characteristics as T1 characteristics;
step3, inputting the T1 characteristics into a T1 network if the T2 characteristics need to be extracted, judging whether a quick mode decision needs to be carried out according to a network prediction result, if so, entering Step8, otherwise, entering Step4, and if not, directly entering Step4;
step4, predicting the current coding block by using an inter2 Nx 2N mode;
step5, calculating the maximum two-partition prediction distortion variance, the maximum four-partition prediction distortion variance, a coding block mark, coding block depth, quantization parameters, a real-time coding sequence, a coding unit type, a front layer mode and a merge/skip mode coding mark of the current coding block, and marking the obtained characteristics as T2 characteristics;
step6, using the prediction of other coding modes for the current coding block;
step7, marking the stored T1 or T2 characteristic set according to the mode prediction result;
and Step8, performing a subsequent encoding process.
The extraction of the T1 and T2 features is performed twice, only one model feature is extracted each time, and when the T2 features are extracted, the coding unit sample which needs to be subjected to the rapid mode decision for the T1 network judgment does not enter a feature set.
The method for labeling in Step7 is as follows:
if the current extracted T1 feature is T1, marking the coding unit sample which finally selects the merge/skip mode as the optimal coding mode as 0, and marking the CU sample which selects other modes as the optimal coding mode as 1;
and if the currently extracted T2 features are T2 features, marking the coding unit sample which finally selects the merge/skip mode or the inter2 Nx 2N mode as the optimal coding mode as 0, and marking the coding unit sample which selects other modes as the optimal coding mode as 1.
For the flow chart of the dynamic 3D point cloud-oriented inter-frame coding mode fast decision in this embodiment, as shown in fig. 2, a solid arrow represents a coding unit information rendering direction; the hollow arrow represents the information presentation direction of the code unit placeholder map; the dashed arrow represents the mode-skipping direction; the rectangular box represents the mode decision process; the solid blocks represent neural network modules. The principle of the process is as follows:
step1, predicting the current coding unit by using merge/skip mode;
step2, obtaining input characteristics of the T1 network according to the mode prediction result of the current coding block and the corresponding occupancy information in the occupancy map, inputting the input characteristics into the T1 network to obtain a neural network prediction result, judging whether a subsequent coding mode needs to be skipped according to the result, and if not, entering Step3, and if so, entering Step6;
step3, predicting the current coding unit by using an inter2 Nx 2N mode;
step4, obtaining input characteristics of the T2 network according to the mode prediction result of the current coding block and the corresponding occupancy information in the occupancy map, inputting the input characteristics into the T2 network to obtain a neural network prediction result, judging whether a subsequent coding mode needs to be skipped according to the result, and if not, entering Step5, and if so, entering Step6;
step5, using asymmetric and intra mode predictive coding for the current coding unit;
and Step6, further coding unit division is carried out on the current coding unit.
In addition, the coding block of CU in fig. 3 is divided into two and four parts, wherein the width and height of a current coding block may be any one of the four cases of 64 × 64, 32 × 32, 16 × 16 and 8 × 8, which are collectively expressed as 2N × 2N, and in order to reduce the overall variance of the prediction distortion matrix to describe the error generated by the prediction of the coding unit by the current coding mode, the entire 2N × 2N block is divided into two 2N × N sub-regions, two N × 2N sub-regions and four N × 2N sub-regions according to four different methods, i.e., the whole 2N × 2N block is divided into two 2N × N sub-regions, two N × 2N sub-regions and four sub-regions
Figure BDA0003857759590000141
Sub-area, divided into four
Figure BDA0003857759590000142
A sub-region. And respectively calculating the maximum variance of each subregion.
Next, a simulation experiment is performed to verify the encoding performance of the dynamic 3D point cloud encoding mode fast determination method based on the lightweight neural network proposed in this embodiment.
In order to evaluate the feasibility and effectiveness of the method, dynamic 3D point cloud coding reference software TMC2-v15.0 and HEVC reference software HM16.18+ SCM8.8 are independently executed as a test platform. The test sequences included 5 different test sequences with a resolution of 1024 × 1024 × 1024 supplied by 8 iVSLF: queen, root, readandblack, soldier, longaddress. The coding quantization parameter combinations (QPs) are set to ([ 32, 42], [28, 37], [24, 32], [20, 27], [16, 22 ]), and the coding configuration is in RA (Random Access) mode. Point-to-point errors (D1) and point-to-plane errors (D2) are used to evaluate the geometry map distortion, mean square errors (D) for different properties (Luma, chroma Cb, chroma Cr) are used to evaluate the texture map distortion, and Δ T is used to measure the time savings. The mathematical definition of the above metrics is:
Figure BDA0003857759590000143
Figure BDA0003857759590000144
Figure BDA0003857759590000145
Figure BDA0003857759590000146
wherein
Figure BDA0003857759590000147
The encoding time of the original test pattern representing the ith set of QPs for a test sequence,
Figure BDA0003857759590000151
which represents the coding time after the method provided by the embodiment is applied to the original test model.
Table 1: comparison of the Performance of this example with that of TMC2v15.0+ HM16.20+ SCM8.8 (unit:%)
Figure BDA0003857759590000152
As can be seen from the comparison results of the coding time saving and the coding rate increasing in table 1, the method provided by this embodiment saves the coding time of each test sequence by 55.2%, 53.8%, 38.4%, 52.1% and 36.8% on average, the total time saving is 47.3% on average, while the mean square error for measuring the coding distortion of the geometric figure increases by-0.5%, -0.5% and 0.1%, -0.2%, the mean square error for measuring the coding distortion of the attribute figure increases by 0.9%, -1.3%, 0.5% and 0.1%, -1.3%, -0.3%.
The experimental result can show that the method can effectively generate good time optimization effect on different point cloud sequences, and the reduction of the video coding quality is within an acceptable range. In summary, the method of the present invention can effectively realize the balance between the saving of coding time and the increase of code rate under the condition that human eyes can accept the reduction range of coding quality.
In recent years, in view of the excellent performance of neural networks in the fields of computer vision and video coding compression, the neural network for point cloud video coding needs to be researched, and in order to meet the requirement of saving time, a lightweight neural network is used for quickly determining the process of coding mode decision, so that on the basis of saving redundant coding time, the extra time consumption brought by an algorithm is reduced as much as possible, and the reduction of video coding quality is within an acceptable degree according to the characteristics of the algorithm. The invention improves and creates an algorithm on V-PCC, in which point cloud is firstly divided into 3D patches, then the 3D patches are projected on a two-dimensional (2D) plane and packed into geometric and attribute videos, then blanks in the geometric videos and the attribute videos are filled to maintain spatial continuity and improve video compression efficiency, and finally, the geometric videos and the attribute videos are compressed by High Efficiency Video Coding (HEVC). The coding computation complexity is effectively reduced under the condition of ensuring that the coding quality is almost unchanged.

Claims (10)

1. The method for quickly determining the dynamic 3D point cloud coding mode based on the lightweight neural network is characterized by comprising the following steps of:
step1, data collection: extracting data, wherein samples are used for training of T1 and T2 networks, and the extracted features comprise: maximum two-partition prediction distortion variance, maximum four-partition prediction distortion variance, coding block flag, coding block depth, quantization parameter, real-time coding sequence, coding unit type, front layer mode and merge/skip mode coding flag;
step2, data processing: firstly, carrying out normalization of distinguishing geometric graph and texture graph coding on the maximum two-division prediction distortion variance and the maximum four-division prediction distortion variance in the provided characteristics; then, the two characteristics are subjected to upper and lower limit overflow truncation and are assigned as boundary values so as to be limited within the range of [0,1 ]; then, carrying out data cleaning according to the maximum two-division prediction distortion variance and the maximum four-division prediction distortion variance of the sample; finally, carrying out maximum and minimum normalization on the quantization parameters;
step3, parameter training: for a CU sample of a T1 network in a training set, using a full connection layer as an input of a lightweight neural network T1, and judging whether a merge/skip mode is an optimal mode according to the sample input by using a classifier; for a CU sample of a T2 network in a training set, using a full connection layer as an input of a lightweight neural network T2, and judging whether a merge/skip mode or an inter2 Nx 2N mode is an optimal mode or not according to the sample input by using a classifier; finally, performing rapid mode decision according to the classification result;
step4, model deployment: after the merge/skip mode decision is finished, if the T1 network judges that the merge/skip mode is the best mode of the current coding CU, skipping all subsequent mode decision processes, and directly entering a partitioning stage, otherwise, continuing the subsequent mode decision process; after the decision of the inter2 Nx 2N mode is finished, if the T2 network judges that the merge/skip mode or the inter2 Nx 2N mode has the best mode of the current coding CU, skipping all subsequent mode decision processes, and directly entering a dividing stage, otherwise, continuing the subsequent mode decision process.
2. The method for rapidly determining the encoding mode of the dynamic 3D point cloud based on the lightweight neural network as claimed in claim 1, wherein: in step1, when the maximum variance of the bi-partition prediction distortion or the maximum variance of the tetra-partition prediction distortion is calculated, the sub-blocks are divided by a horizontal and vertical bi-partition or tetra-partition method, and finally, the value with the maximum variance of the prediction distortion transformation in the sub-blocks is extracted as the feature.
3. The method for rapidly determining the encoding mode of the dynamic 3D point cloud based on the lightweight neural network as claimed in claim 1, wherein: in step1, the characteristics of the T1 and T2 networks are extracted respectively, the characteristics extracted by the T1 network are the maximum variance of the second partition prediction distortion, the maximum variance of the fourth partition prediction distortion, a coding block flag, a coding block depth, a quantization parameter, a real-time coding sequence, a coding unit type and a previous layer mode, and the characteristic merge/skip mode coding flag is only used in the T2 network.
4. The method for rapidly determining the encoding mode of the dynamic 3D point cloud based on the lightweight neural network as claimed in claim 2, wherein: the calculation of the prediction distortion transformation is based on a prediction distortion transformation matrix, which is defined as follows:
Figure FDA0003857759580000021
wherein
Figure FDA0003857759580000022
Is a pixel value matrix of coding mode prediction of a current coding CU with width w and height h by a coding mode,
Figure FDA0003857759580000031
for the original pixel value matrix of the current encoded CU,
Figure FDA0003857759580000032
is a prediction distortion transformation matrix with width w and height h obtained according to the prediction distortion,
Figure FDA0003857759580000033
is a transform coefficient, where α has different values according to the type of the current encoded image:
Figure FDA0003857759580000034
according to a matrix
Figure FDA0003857759580000035
Calculating the variance of the predictive distortion transformation of different subblocks according to the dividing mode of a horizontal and vertical binary division or quaternary division method, wherein the maximum variance value of four subblocks under the binary division is the maximum binary division predictive distortion variance BDV, and the calculation of the maximum quaternary division predictive distortion variance QDV is defined as follows:
BDV=max{Var i h,B ,Var i v,B },i∈{1,2}
QDV=max{Var i h,Q ,Var i v,Q },i∈{1,2,3,4}。
5. the method for rapidly determining the encoding mode of the dynamic 3D point cloud based on the lightweight neural network as claimed in claim 2, wherein: in step2, the normalization method for the maximum variance of the bi-partition prediction distortion and the maximum variance of the tetra-partition prediction distortion is defined as follows:
Figure FDA0003857759580000036
Figure FDA0003857759580000037
Figure FDA0003857759580000038
where BDV and QDV denote the maximum variance of bi-partition prediction distortion and the maximum variance of quad-partition prediction distortion, var, respectively i Method for representing predictive distortion transformation in subblocks divided in horizontal, vertical bi-or quad-division methodThe maximum difference value, the normalization coefficient beta has different fixed values according to the difference between the coded geometry and the texture map.
6. The method for rapidly determining the encoding mode of the dynamic 3D point cloud based on the lightweight neural network as claimed in claim 1, wherein: in step2, the data cleaning mode is as follows: for a sample of the T1 network, delete if BDV or QDV <0.2 is satisfied and the final encoding mode is non-merge/skip mode, delete if BDV or QDV >0.8 is satisfied and the final encoding mode is merge/skip mode; for the samples of the T2 network, deletion is performed if BDV or QDV <0.1 is satisfied and the final coding mode is other than merge/skip mode and inter2N × 2N mode, and deletion is performed if BDV or QDV >0.9 is satisfied and the final coding mode is merge/skip mode or inter2N × 2N mode.
7. The method for rapidly determining the encoding mode of the dynamic 3D point cloud based on the lightweight neural network as claimed in claim 1, wherein: in step2, the normalization of the maximum and minimum values of the quantization parameter QP is defined as follows:
Figure FDA0003857759580000041
8. the method for rapidly determining the encoding mode of the dynamic 3D point cloud based on the lightweight neural network as claimed in claim 1, wherein: for the algorithm flow deployed in the codec, the T1 and T2 networks have respective use conditions, the use condition of the T1 network is that the current coding CU does not belong to a placeholder map image and is not an I-frame, and the use condition of the T2 network is that the current coding CU does not belong to a placeholder map image and is not an I-frame and the T1 network does not make a fast mode decision.
9. A dynamic 3D point cloud coding mode rapid determination system based on a lightweight neural network is characterized by comprising the following steps:
a data collection module: for extracting data, samples to be used for training of T1, T2 networks, the extracted features including: maximum two-partition prediction distortion variance, maximum four-partition prediction distortion variance, coding block flag, coding block depth, quantization parameter, real-time coding sequence, coding unit type, front layer mode and merge/skip mode coding flag;
a data processing module: normalization for discriminating geometric and texture map coding for maximum variance of bi-partition prediction distortion and maximum variance of tetra-partition prediction distortion in the proposed features; then, the two characteristics are subjected to upper and lower limit overflow truncation and are assigned as boundary values so as to be limited within the range of [0,1 ]; then, carrying out data cleaning according to the maximum two-division prediction distortion variance and the maximum four-division prediction distortion variance of the sample; finally, carrying out maximum and minimum normalization on the quantization parameters;
a parameter training module: the method comprises the steps that CU samples of a T1 network in a training set are used as input of a lightweight neural network T1, and a full connection layer is used as a classifier to judge whether a merge/skip mode is an optimal mode or not according to sample input; the method is used for taking a CU sample of a T2 network in a training set as the input of a lightweight neural network T2, and using a full connection layer as a classifier to judge whether a merge/skip mode or an inter2 Nx 2N mode is the best mode or not according to the sample input; finally, performing rapid mode decision according to the classification result;
a model deployment module: after the merge/skip mode decision is finished, if the T1 network judges that the merge/skip mode is the best mode of the current coding CU, skipping all subsequent mode decision processes, and directly entering a dividing stage, or continuing the subsequent mode decision process; after the decision of the inter2 Nx 2N mode is finished, if the T2 network judges that the merge/skip mode or the inter2 Nx 2N mode has the best mode of the current coding CU, skipping all subsequent mode decision processes, and directly entering a dividing stage, otherwise, continuing the subsequent mode decision process.
10. The system for fast determining a dynamic 3D point cloud coding mode based on a lightweight neural network as claimed in claim 9, wherein: the method of any of claims 2-8, operating by the method of fast determination of a lightweight neural network based dynamic 3D point cloud encoding mode.
CN202211154185.2A 2022-09-21 2022-09-21 Dynamic 3D point cloud coding mode rapid determination method and system based on lightweight neural network Pending CN115550653A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211154185.2A CN115550653A (en) 2022-09-21 2022-09-21 Dynamic 3D point cloud coding mode rapid determination method and system based on lightweight neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211154185.2A CN115550653A (en) 2022-09-21 2022-09-21 Dynamic 3D point cloud coding mode rapid determination method and system based on lightweight neural network

Publications (1)

Publication Number Publication Date
CN115550653A true CN115550653A (en) 2022-12-30

Family

ID=84727491

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211154185.2A Pending CN115550653A (en) 2022-09-21 2022-09-21 Dynamic 3D point cloud coding mode rapid determination method and system based on lightweight neural network

Country Status (1)

Country Link
CN (1) CN115550653A (en)

Similar Documents

Publication Publication Date Title
CN111242997B (en) Point cloud attribute prediction method and equipment based on filter
KR102406845B1 (en) Point cloud data transmission apparatus, point cloud data transmission method, point cloud data reception apparatus and point cloud data reception method
CN104378643B (en) A kind of 3D video depths image method for choosing frame inner forecast mode and system
CN114930397A (en) Point cloud data transmitting device, point cloud data transmitting method, point cloud data receiving device, and point cloud data receiving method
CN108712648B (en) Rapid intra-frame coding method for depth video
TWI728944B (en) Dynamic picture encoding apparatus, dynamic picture decoding apparatus, and storage media
CN107864380B (en) 3D-HEVC fast intra-frame prediction decision method based on DCT
CN110446052B (en) 3D-HEVC intra-frame depth map rapid CU depth selection method
CN105430415A (en) Fast intraframe coding method of 3D-HEVC depth videos
CN102447925A (en) Method and device for synthesizing virtual viewpoint image
CN104780379A (en) Compression method for screen image set
CN113518226A (en) G-PCC point cloud coding improvement method based on ground segmentation
Liu et al. Fast depth intra coding based on depth edge classification network in 3D-HEVC
CN106791876B (en) A kind of depth map fast intra-frame predicting method based on 3D-HEVC
CN112601082B (en) Video-based fast dynamic point cloud coding method and system
Lazzarotto et al. On block prediction for learning-based point cloud compression
CN111385585B (en) 3D-HEVC depth map coding unit division method based on machine learning
CN104244008B (en) Depth video coding method
Zhang et al. An efficient coding algorithm for 360-degree video based on improved adaptive QP Compensation and early CU partition termination
CN111741313A (en) 3D-HEVC (high efficiency video coding) rapid CU (CU) segmentation method based on image entropy K-means clustering
CN115550653A (en) Dynamic 3D point cloud coding mode rapid determination method and system based on lightweight neural network
AU2020101803A4 (en) Method for Optimizing the Coding Block-Level Lagrange Multiplier of Longitude-Latitude Image
KR20240032912A (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device and point cloud data reception method
CN105007494A (en) Intra-frame wedge-shaped segmentation mode selection method of 3D video depth image model mode
KR20230173094A (en) Point cloud data transmission method, point cloud data transmission device, point cloud data reception method, and point cloud data reception device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination