CN111586405A

CN111586405A - Prediction mode rapid selection method based on ALF filtering in multifunctional video coding

Info

Publication number: CN111586405A
Application number: CN202010331912.2A
Authority: CN
Inventors: 张昊; 钟培雄; 傅枧根; 冯冰雪; 马学睿
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2020-04-24
Filing date: 2020-04-24
Publication date: 2020-08-25
Anticipated expiration: 2040-04-24
Also published as: CN111586405B

Abstract

The invention provides a prediction mode fast selection method based on ALF filtering in multifunctional video coding, the inventive concept of the method is to predict the intra-frame prediction angle mode of the current block by using the direction gradient value obtained in the ALF filtering process, reduce the search range of the angle prediction mode to reduce the coding time as much as possible, skip unnecessary CU partition modes in advance by using the direction gradient value obtained by the ALF filtering, accelerate the CU partition process of inter-frame coding, have simple steps and small calculation amount, and can be conveniently put into practical application.

Description

Prediction mode rapid selection method based on ALF filtering in multifunctional video coding

Technical Field

The invention belongs to the technical field of video processing, and particularly relates to a prediction mode rapid selection method based on ALF filtering in multifunctional video coding.

Background

In recent years, with the development of video services in various application scenes, people have more and more demands on videos and have higher and higher requirements on video quality. For example, some video websites and live broadcast platforms provide 4K to 8K ultra-High-definition video streams, and a High-Dynamic Range (HDR) technology is introduced to provide a higher-quality video service experience for users. With the commercialization of the 5G technology, the network bandwidth is greatly improved, various novel video creation contents, video applications and video services are produced at the same time, and the video ecology in the whole Internet can be further enriched; but at the same time, the proportion of video traffic in internet traffic continues to increase, which presents a greater challenge to current global computer communication networks. The conventional High Efficiency Video Coding (HEVC) technology cannot achieve better compression performance, so that a Joint Video exploration Group (Joint Video encoding Team, jvt for short) was established in 2015 10 by a Moving Picture experts Group (MPEG for short) and a Video Coding Experts Group (VCEG), a new Video Coding standard was researched and named as multifunctional Video Coding (VVC for short) and a corresponding coder test model VTM1.0 was issued. The encoder test model has been updated to VTM version 7.0 at present. The VTM adopts a quadtree and a nested multi-type tree structure (QTMT), so that a Coding Unit (CU) is divided more flexibly; many new coding tools, such as luminance and chrominance separation partitioning structures and multi-transform selection, are employed by the VVC and integrated into the VTM. Meanwhile, on the basis of the prior generation of standard HEVC, in addition to filtering methods of Deblocking filtering (DBF) and Sample Adaptive compensation filtering (SAO), a method of Adaptive Loop Filtering (ALF) is proposed, and one of 25 filters is selected for each 4 × 4 block by a luminance component according to gradient directions and values (activity) of diagonal lines of local horizontal, vertical, 45-degree and 135-degree angles, thereby further improving the filtering effect.

The VTM has three loop filters in common. In addition to DBF and SAO as in HEVC, ALF is also employed. The order of the filtering process in the VTM is: DBF > > SAO > > ALF. In VTM, SAO and DBF are almost the same as the process in HEVC.

The implementation process of the ALF filtering is as follows:

(1) a filter shape is determined. In VTM, two kinds of diamond filters as in fig. 1 are used. Filtering the chrominance component by using a 5 multiplied by 5 diamond filter, and filtering the luminance component by using a 7 multiplied by 7 diamond filter;

(2) for the luminance component, calculating the gradient value D and the activity value A of each 4 multiplied by 4 block, and calculating the required filter index value according to D and A; for the chroma components, no classification method is applied, i.e. a single set of ALF coefficients is applied to each chroma component;

(3) performing a geometric transformation on the corresponding filter coefficients and filter clipping values, which is equivalent to applying the geometric transformation to the area covered by the filter;

(4) the corresponding region is filtered.

The ALF filtering technology has the following characteristics:

(1) ALF is one of 25 filters selected for each 4 × 4 block of the luminance component, and only 1 for the chrominance component, depending on the direction and activity of the local gradient;

(2) the ALF filtering is carried out after the whole frame of video image is coded;

(3) ALF applies a subsampled one-dimensional laplacian algorithm to reduce the complexity of block classification. As shown in fig. 2, the calculation for the four gradients all sample the same down-sampling position.

(4) ALF simplifies the filtering operation, and three transform operations are employed for filter coefficients and filter clipping values: diagonal transformation, vertical flipping and rotation.

Like HEVC and h.264/AVC, multi-function Video Coding (VVC) also employs a block-based hybrid Coding framework. Fig. 3 shows a typical VVC video encoding flow. An input picture is first divided into square tiles of equal size, which are called Coding TreeUnit (CTU), which is the root node of a quad-tree and nested multi-tree partitioning structure. The CTU further divides the partition structure according to the quadtree and the nested multi-type tree into Coding Units (CU), which are basic units for prediction. A CU first performs intra prediction or inter prediction based on its intra-frame and inter-frame properties. If the prediction is intra-frame prediction, the pixel prediction value of the current CU is obtained by mainly utilizing the reference pixels adjacent in space through linear interpolation, and if the prediction is inter-frame prediction, the pixel prediction value of the current CU is obtained by utilizing the reference pixels adjacent in time (the previous frame or the previous frames) through displacement compensation. And then subtracting the original value from the predicted value of the CU to obtain a residual error, and transforming the residual error to further reduce the spatial correlation of the errors of adjacent pixel points and obtain a corresponding residual error coefficient. After the residual coefficient is quantized, entropy coding is carried out by combining information such as a coding mode and related coding parameters, so that a compressed code stream is obtained. On the other hand, the quantized residual coefficient is subjected to inverse quantization and inverse transformation, then the residual after inverse quantization and inverse transformation is added with the predicted value to obtain a reconstructed pixel, and the reconstructed image is filtered to generate a reference frame and stored in a decoded image buffer to be used as a reference pixel in the following CU intra-frame prediction or inter-frame prediction.

In order to adapt to richer textures of images, HEVC sets more intra prediction modes corresponding to different prediction directions. HEVC has 35 intra luma prediction modes, of which 33 are directional prediction modes or angular prediction modes, and the other two are Direct Current (DC) and Planar (Planar) modes, as shown in fig. 4. There are 5 chroma prediction modes, where mode 0 is the Planar mode, which corresponds to luma mode 0; mode 1 is a vertical mode, and corresponds to luminance mode 1; mode 2 is a horizontal mode, corresponding to luminance mode 10, mode 3 is a DC mode, corresponding to luminance mode 1; mode 4, also called the derived mode, uses the same mode as the corresponding luma block.

In order to better characterize any boundary direction feature in the video image, 67 intra prediction modes are added in the VVC, including 65 angular prediction modes, DC mode and planar mode, wherein the 65 angular prediction modes include 33 modes in HEVC, and the DC mode and planar mode are the same as those in HEVC, as shown in fig. 5. In VTM5.0, there are a plurality of Low-Frequency Non-Separable Transform (LFNST) channels for the luminance component. In the first LFNST channel, the process of prediction mode is: 1) initializing a brightness prediction mode; 2) traversing 67 prediction modes, skipping 33 newly added angle modes in VVC, only carrying out SATD calculation on 35 modes existing in HEVC, selecting a few modes with minimum SATD and SATD values thereof, storing the modes, the mode list and the cost list into a mode list and a cost list, and storing the mode number, the mode list and the cost list into an LFNST channel; 3) traversing several modes selected in the last step, if the several modes are between 2 and 66, comparing the SATD of each mode with two adjacent modes, selecting one with the minimum SATD value, and updating the values in the mode list and the cost list, wherein the number of the modes is unchanged; 4) constructing an MPM list, traversing 6 modes in the MPM list, calculating SATD of the 6 modes, comparing with the SATD value in the cost list in the previous step, selecting the mode with the minimum SATD, and updating the values in the mode list and the cost list again; 5) deriving MIP candidate patterns by using Hadamard transform; 6) adding the MPM mode of the MIP to a mode list and updating the mode number and a cost list; 7) deleting non-MPM modes from the ISP list; 8) combining the regular intra, MIP, and ISP modes together to create a complete mode list; 9) traversing all modes in the mode list, detecting each mode by using RDcost, and selecting the mode with the minimum RDcost as the best prediction mode.

When a video image is coded, the video image is firstly divided into a plurality of square coding units (CTUs) with fixed sizes, and the CTUs are adaptively divided into a plurality of Coding Units (CUs) by taking the CTUs as the division starting points according to the specific characteristics of the image.

In increasingly diverse application scenarios, the VVC purposefully proposes a concept of a quad tree structure of nested binary and trigeminal partitions, replaces a simpler partition of HEVC with a more flexible and diverse CU partition, and enlarges the size of the largest CU to 128 × 128. By adopting the dividing structure, the concept of CU, PU and TU is not distinguished in the VVC, and four binary and trigeminal dividing modes are added: horizontal binary division, vertical binary division, horizontal three-fork division, and vertical three-fork division (as shown in fig. 6). Due to the flexible dividing mode, the CU can obtain a more optimal dividing structure, and the coding efficiency is effectively improved.

The VVC coder follows the division mode of nested multi-type trees based on the quadtree when the image is coded. First, an image is divided into a number of basic coding units CTUs (128 × 128 pixel size). Then, with each CTU as a start point of division, division is performed with a quad tree structure. And performing recursive division on each obtained leaf node of the four-way division by using a multi-type tree structure, and stopping division when the division depth of the CU reaches a set maximum value. In the dividing process, the rate distortion cost of each dividing mode in various prediction modes is calculated, and the prediction mode corresponding to the dividing mode with the minimum rate distortion cost is selected as the best prediction mode.

The quad-tree partitioning structure of the nested multi-type trees adopted by the VVC brings more flexible and various modes for partitioning a coding unit CU in a video image, such as a CTU (transform unit) partitioned based on the quad-tree structure of the nested multi-type trees shown in FIG. 7; meanwhile, CU partition with better image characteristics can be obtained, and better coding quality and coding efficiency are obtained. However, this partition structure also brings a great problem, and since it is necessary to traverse each partition mode to the smallest CU partition and calculate the rate distortion cost thereof, the computational complexity of encoding is greatly increased. Therefore, how to reduce the computational complexity of CU partitioning is the key to save encoding time and speed up encoding.

Disclosure of Invention

The present invention is directed to solving at least one of the problems of the prior art. Therefore, the invention provides a prediction mode fast selection method based on ALF filtering in multifunctional video coding.

According to a first aspect of the present invention, a method for fast selecting prediction mode based on ALF filtering in multifunctional video coding is provided, and the method for fast selecting prediction mode based on ALF filtering in multifunctional video coding includes the following steps:

s1: obtaining coded data, wherein the coded data is obtained by statistical analysis of video sequence coded code stream by VTM 5.0;

s2: performing inter-frame prediction on the current CU to obtain the optimal MV of the current PU, reserving the vector value of the optimal MV, and determining the corresponding optimal reference block;

s3: performing statistical analysis on the gradient information of the optimal reference block;

s4: calculating the maximum average pixel gradient value AvgGrad of the current CU reference block;

s5: obtaining an optimal division mode;

s6: entering current PU intra-frame prediction, traversing 67 intra-frame prediction modes, skipping angle modes newly added in VVC, carrying out SATD calculation on 35 prediction modes in HEVC, and determining modes in a mode list CandList;

s7: if the patterns in the pattern list CandList are between 2 and 66, comparing the SATD value of each pattern with that of the two adjacent patterns, and adding the pattern with the minimum SATD value into the pattern list CandList;

s8: constructing an MPM according to adjacent blocks on the left side and the upper side of the current block, traversing modes in an MPM list, and adding the modes in the MPM into CandList if the modes in the MPM are not included in the CandList;

s9: respectively executing the MIP module and the ISP module, and updating a candidate list CandList;

s10: acquiring four kinds of gradient values of ALF filtering corresponding to each pixel point in a reference block;

s11: counting the sum of the ALF filtering gradient values in each direction in the reference block, selecting the ALF filtering gradient direction with the minimum sum of the gradient values as a candidate direction mode of current PU intra-frame prediction, and storing the ALF filtering gradient direction;

s12: calculating the range of the prediction direction of the current PU;

s13: reserving candidate modes in the CandList, determining a final candidate prediction mode range, and selecting an optimal intra-frame prediction mode;

s14: and determining the optimal prediction mode and the partition mode of the current CU, and performing predictive coding on the next CU.

In the step S1, in the step S,

main encoding parameters as shown in table 1, 10 frames are encoded for 3 sequences at four Quantization Parameters (QP) using the Encoder _ lowdelay _ P _ vtm. These 3 video sequences are all from official test sequences released by VVC, with different resolution, texture features. It should be noted that the obtaining condition of the encoded data is not limited to table 1, and may be set according to a specific requirement scenario.

TABLE 1 test conditions

According to some embodiments of the invention, in step S3, the gradient information includes four directions.

According to some embodiments of the invention, the method of statistical analysis is: and respectively calculating the sum SumGradV, SumGradH, SumGradG1 and SumGradG2 of the four directional gradients of each pixel in the reference block, then calculating the difference value between the maximum value MaxGrad and the minimum value MinGrad in the four gradient sums, and carrying out normalization processing to obtain the ratio GradRatio of the difference value to the gradient sum maximum value.

GradRatio reflects the equilibrium and consistency of all directional gradients, and if the value is smaller, the distribution of all directional gradients is proved to be more uniform. Meanwhile, the absolute magnitude of the gradient in each direction needs to be ensured within a certain range, which indicates that the pixel transformation ratio is slow along each gradient direction; otherwise, the image pixel changes more severely.

In step S4, the largest average pixel gradient value AvgGrad of the current CU reference block is calculated to describe the gradient change of the maximum gradient and direction. Here, the values of the variable thresholds α, β, α and β are set as empirical values, and are obtained through a large number of experiments and continuous test optimization. If the GradRatio of the current CU is smaller than alpha and the AvgGrad is smaller than beta, the pixel value change of the current CU area is more smooth, the image is flatter, and the division of the current block can be skipped; otherwise, it indicates that the current area image changes violently, the local detail information is more, the difference of different image parts is larger, and further division is needed.

In step S5, possible division modes are determined by calculating the GradRatio and AvgGrad of dividing the subblocks in each mode. For the trifurcate division and the binary division, if the sub-blocks exist and meet the condition that the GradRatio is greater than a set threshold value alpha or the AvgGrad is greater than beta, the current division mode is added into a candidate division mode list; otherwise, the current partition mode is skipped. And finally, screening the candidate partition modes by RDcost to obtain the optimal partition mode.

According to some embodiments of the present invention, in step S6, the patterns in the pattern list CandList are the patterns with the smallest SATD value.

According to some embodiments of the present invention, in step S10, the ALF filtering includes four gradient directions: hor, Ver, Diag1, Diag 2.

In step S10, the size of the reference block must be the same as the size of the current PU, so when traversing each pixel of the reference block, the width and height of the current PU are used as the maximum values of the loop.

In step S12, it is determined whether the direction mode Dir selected in the previous step overlaps with the angle mode in the candidate mode list of CandList. If the prediction direction is not repeated, directly taking a default prediction mode in the CandList list as the range of the prediction direction; if the selected gradient direction is in the CandList list, then the gradient direction Dir is selected as one of the candidate modes: at this time, if the gradient direction Dir selected in the previous step is Diag2, that is, the angle modes corresponding to the mode indexes 2 and 66, the prediction direction ranges of the current PU are [2,4] and [64,66 ]; otherwise, the prediction range of the current PU is the minimum value Dir-2 and the maximum value Dir + 2.

The prediction mode quick selection method provided by the embodiment of the invention at least has the following technical effects:

the embodiment of the invention provides a prediction mode rapid selection method based on ALF filtering in VVC, and the inventive concept of the method is as follows: (1) predicting the intra-frame prediction angle mode of the current block by using the direction gradient value obtained in the ALF filtering process, and reducing the search range of the angle prediction mode to reduce the encoding time as much as possible; (2) and skipping unnecessary CU division modes in advance by using the direction gradient values obtained by the ALF filtering, and accelerating the CU division process of inter-frame coding.

The prediction mode rapid selection method provided by the embodiment of the invention has the advantages of simple steps and small calculation amount, and can be conveniently put into practical application.

Drawings

Fig. 1 is a schematic diagram of a diamond filter employed by the ALF to filter luminance and chrominance components.

Fig. 2 is a schematic diagram of a down-sampling mode of four gradient directions in the ALF filtering.

Fig. 3 is a typical VCC video encoding flow diagram.

Fig. 4 is a diagram illustrating 35 intra prediction directions of HEVC.

Fig. 5 is a diagram illustrating 67 intra prediction directions of VVC.

Fig. 6 is a schematic diagram of multi-type CU partitioning of VVC.

Fig. 7 is a diagram illustrating an example of nested multi-type tree CTU partitioning.

Fig. 8 is a flowchart of a prediction mode fast selection method based on ALF filtering in VVC.

Fig. 9 is an intra encoding flow chart.

Detailed Description

The following are specific examples of the present invention, and the technical solutions of the present invention will be further described with reference to the examples, but the present invention is not limited to the examples.

Example 1

The present example provides a prediction mode fast selection method based on ALF filtering in VVC, where the flow is shown in fig. 8, and the specific steps include:

s1: data is prepared.

Specifically, in order to obtain the encoded data, the official test software VTM5.0 based on VVC performs statistical analysis on the encoded streams of 3 different types of video sequences. Main encoding parameters as shown in table 1, 10 frames are encoded for 3 sequences at four Quantization Parameters (QPs) using the Encoder _ lowdelay _ P _ vtm profile of VVC.

S2: and performing inter-frame prediction on the current CU, acquiring the optimal MV of the current PU, reserving the vector value of the optimal MV, and finding the corresponding optimal reference block.

S3: and counting the four-direction gradient information of the reference block. The sum of the four directional gradients of each pixel in the reference block, SumGradV, SumGradH, SumGradG1, SumGradG2, is calculated, respectively. Then, the difference between the maximum MaxGrad and the minimum MinGrad among the four gradient sums is calculated, and normalization processing is performed to find a ratio GradRatio of the difference to the gradient and the maximum. GradRatio reflects the equilibrium and consistency of all directional gradients, and if the value is smaller, the distribution of all directional gradients is proved to be more uniform. Meanwhile, the absolute magnitude of the gradient in each direction needs to be ensured within a certain range, which indicates that the pixel transformation ratio is slow along each gradient direction; otherwise, the image pixel changes more severely.

S4: and calculating the maximum average pixel gradient value AvgGrad of the current CU reference block to describe the gradient change condition of the maximum gradient and the direction. Here, the values of the variable thresholds α, β, α and β are set as empirical values, and are obtained through a large number of experiments and continuous test optimization. If the GradRatio of the current CU is smaller than alpha and the AvgGrad is smaller than beta, the pixel value change of the current CU area is more smooth, the image is flatter, and the division of the current block can be skipped; otherwise, it indicates that the current area image changes violently, the local detail information is more, the difference of different image parts is larger, and further division is needed.

S5: the modes are possibly divided by calculating the GradRatio and AvgGrad of the sub-blocks divided in each mode. For the trifurcate division and the binary division, if the sub-blocks exist and meet the condition that the GradRatio is greater than a set threshold value alpha or the AvgGrad is greater than beta, the current division mode is added into a candidate division mode list; otherwise, the current partition mode is skipped. And finally, screening the candidate partition modes by RDcost to obtain the optimal partition mode.

S6: entering current PU intra prediction, traversing 67 intra prediction modes, skipping new angle modes in VVC, performing SATD calculation only on 35 prediction modes in HEVC, selecting several modes with the smallest SATD value, and adding them into a mode list CandList, as shown in fig. 9.

S7: for the selected patterns, if the patterns are between patterns 2-66, the SATD values of each pattern and its adjacent two patterns are compared, and the pattern with the smallest SATD value is selected and added to the pattern list CandList.

S8: and constructing an MPM according to adjacent blocks at the left side and the upper side of the current block, traversing several modes in an MPM list, and adding the modes in the MPM into CandList if the modes in the MPM are not included in the CandList.

S9: the MIP module and the ISP module are executed separately to update the candidate list CandList.

S10: acquiring four gradient values of ALF filtering corresponding to each pixel point in a reference block, and dividing the values into four gradient directions: hor, Ver, Diag1, Diag 2. The size of the reference block must be the same as the size of the current PU, so that when traversing each pixel point of the reference block, the width and height of the current PU are used as the maximum values of the loop.

S11: and counting the sum of the ALF filtering gradient values in each direction in the reference block, selecting the ALF filtering gradient direction with the minimum value MinGrad of the sum of the gradient values as a candidate direction mode of the current PU intra-frame prediction, and storing.

S12: the range of prediction directions for the current PU is calculated. And judging whether the direction mode Dir selected in the last step is overlapped with the angle mode in the CandList candidate mode list. If the prediction direction is not repeated, directly taking a default prediction mode in the CandList list as the range of the prediction direction; if the selected gradient direction is in the CandList list, then the gradient direction Dir is selected as one of the candidate modes: at this time, if the gradient direction Dir selected in the previous step is Diag2, that is, the angle modes corresponding to the mode indexes 2 and 66, the prediction direction ranges of the current PU are [2,4] and [64,66 ]; otherwise, the prediction range of the current PU is the minimum value Dir-2 and the maximum value Dir + 2.

S13: in the prediction range, only the candidate modes in the CandList are reserved, the final candidate prediction mode range is determined, and the optimal intra-frame prediction mode is selected.

At the encoder settings, the video sequence used for the test was the one with the official recommended sampling format of 4:2:0, using the settings in the default lowdelay _ P configuration, respectively. The encoding performance is mainly evaluated by two indexes of BDBR (Bjotegaard DeltaBit rate) and TS, and the encoding performance of the algorithm is evaluated by taking an original VTM5.0 encoder as a reference. The BDBR represents the code rate difference of two coding methods under the same objective quality, and is obtained by respectively coding and calculating the code rate and the PSNR of the same video under four QP values (22, 27, 32 and 37). The BDBR can comprehensively reflect the code rate and the quality of the video, and represents the percentage of the code rate which can be saved by a better coding method under the same objective quality. Generally, the code rate is reduced and the performance is improved under the same psnr. A positive value indicates an increased code rate and a decreased performance. The TS is used to measure the reduction degree of the fast algorithm to the encoding time based on the original encoder, and is calculated as follows:

wherein, T_pTotal encoding time after VTM5.0 for embedding fast algorithm, T_OIs the total encoding time of the original encoder VTM 5.0. The results of the experiments are shown in tables 2 and 3.

TABLE 2 Intra-frame mode quick selection experimental results

Sequence of	Y(BDBR)	TS
			FourPeople	1.83％	10.21％
BasketballPass	1.27％	11.86％
			RaceHorses	1.42％	12.31％
Average	1.51％	11.46％

TABLE 3 fast decision-making experimental results for inter-frame CU partition modes

According to the experimental result, on average, the intra-frame prediction mode fast selection algorithm and the inter-frame CU fast partition decision algorithm of the invention enable the BDBR of the Y component to be increased by 1.51% and 1.42% respectively, which shows that the code rate is not increased obviously, the compression performance of the encoder is effectively ensured, the encoding time is reduced by 11.46% and 12.7% respectively compared with the original encoder, and the encoding complexity is reduced.

Claims

1. A prediction mode fast selection method based on ALF filtering in multifunctional video coding is characterized by comprising the following steps:

s5: obtaining an optimal division mode;

s12: calculating the range of the prediction direction of the current PU;

2. The method of claim 1, wherein in step S3, the gradient information comprises four directions.

3. The method of claim 2, wherein the statistical analysis comprises: and respectively calculating the sum SumGradV, SumGradH, SumGradG1 and SumGradG2 of the four directional gradients of each pixel in the optimal reference block, then calculating the difference value between the maximum value MaxGrad and the minimum value MinGrad in the four gradient sums, and carrying out normalization processing to obtain the ratio GradRatio of the difference value to the gradient and the maximum value.

4. The method of claim 1, wherein in step S6, the modes in said mode list CandList are the modes with the minimum SATD value.

5. The method of claim 1, wherein in step S10, the four gradient values of the ALF filtering include four gradient directions: hor, Ver, Diag1, Diag 2.