CN113709492A

CN113709492A - SHVC (scalable video coding) spatial scalable video coding method based on distribution characteristics

Info

Publication number: CN113709492A
Application number: CN202110978616.6A
Authority: CN
Inventors: 汪大勇; 解乐乐; 王欣; 王倩敏; 宋丽娟
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Guangzhou Dayu Chuangfu Technology Co ltd
Priority date: 2021-08-25
Filing date: 2021-08-25
Publication date: 2021-11-26
Anticipated expiration: 2041-08-25
Also published as: CN113709492B

Abstract

The invention belongs to the field of SHVC video coding, and particularly relates to a SHVC spatial scalable video coding method based on distribution characteristics, which comprises the following steps: acquiring the depth of a current coding unit CU, judging whether skipping is performed according to the depth of the CU, if skipping is selected, selecting the next depth for coding, if not, judging whether a current mode is an optimal mode, if not, adopting a direction mode to predict the optimal mode, if so, judging whether division is terminated according to the optimal mode, if so, outputting a division result, and if not, selecting the next depth for coding; the invention adopts a variable step length halving search method to predict the direction mode, and solves the problem of low efficiency caused by Hadamard transformation of the direction mode with low possibility.

Description

SHVC (scalable video coding) spatial scalable video coding method based on distribution characteristics

Technical Field

The invention belongs to the field of SHVC (scalable video coding) video coding, and particularly relates to an SHVC spatial scalable video coding method based on distribution characteristics.

Background

Technologies such as digital television broadcasting, video conferencing, wireless video streaming, smart phone communication and the like are increasingly widely applied in daily life of people, so that a plurality of different terminal devices are generated, and the terminal devices may have different screen resolutions, so that the video streaming is required to adapt to the resolutions of different screens; scalable high efficiency video coding (SHVC) is an effective approach to this problem. SHVC consists of a Base Layer (BL) and one or more Enhancement Layers (EL). As shown in fig. 1, to accommodate different screen resolutions, the Spatial Shvc (SSHVC) encodes different layers with different screen resolution sequences, and by selecting the appropriate layer, the SSHVC can accommodate various devices of different screen resolutions.

In SSHVC consisting of one BL and one or more ELs, the base layer BL comprises only intra-layer prediction and the enhancement layer EL further comprises inter-layer prediction; the intra prediction coding process in the enhancement layer is the same as that of HEVC. Since the BL and EL have the same content but different resolutions, non-sampled prediction, i.e., inter-layer prediction, is required for the BL; the corresponding mode is denoted as an inter-layer reference (ILR) mode. Since the coding process of HEVC is already very complex, SSHVC needs to code all its layers, and therefore there must be a more complex coding process, which will limit its wide application, especially in wireless and real-time applications. Therefore, it is important to reduce the encoding complexity and increase the encoding speed.

Some existing algorithms can improve the coding speed to some extent, but the spatial SHVC still has some problems to be solved: 1. texture features and correlation are commonly used to predict candidate depths; however, their connection to depth selection is not straightforward; therefore, using them alone to predict depth selection does not guarantee optimal performance; 2. to increase the coding speed, the mode selection is usually terminated early with residual coefficients; without studying the principle behind it, early termination mode selection using only residual coefficients does not lead to the best performance.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a distribution characteristic-based SHVC spatial scalable video coding method, which comprises the following steps:

s1: acquiring the depth of the current coding unit, if the current CU depth is 1 or 2, determining whether to skip the current depth according to a residual coefficient of the enhancement layer ILR mode, if so, performing step S5, otherwise, performing step S2;

s2: judging whether the ILR mode of the current CU is the optimal mode or not by adopting a GMM-EM method, if so, executing a step S4, otherwise, executing a step S3;

s3: performing intra-frame prediction on the mode of the current CU to obtain the optimal mode of the current CU;

s4: judging whether the current CU continues to be divided according to the residual coefficient of the optimal mode, and outputting the division result of the CU if the division is stopped; if the current CU continues to be divided, the depth of the current CU is obtained, if the current depth is 3, the CU directly jumps out of the division to obtain a final division result, and if the current depth is not 3, the step S5 is executed;

s5: the current CU is divided into four sub-CUs, and the four sub-CUs are subjected to steps S1 to S4.

Preferably, the process of determining whether to skip the current depth according to the residual coefficient includes: coding a current coding unit CU to obtain a residual coefficient map of an enhancement layer ILR mode; dividing the residual coefficient graph to obtain a first residual coefficient graph and a second residual coefficient graph; respectively calculating the expectation and variance of the first residual coefficient map and the second residual coefficient map, judging whether the expectation and variance of the first residual coefficient map are different from the expectation and variance of the second residual coefficient map, if the expectation and variance of the first residual coefficient map are different from the expectation and variance of the second residual coefficient map, skipping the current depth, otherwise, not skipping the current depth.

Further, the process of calculating the expectation and variance of the residual coefficient map comprises: obtaining residual coefficient samples of the divided residual coefficient graphs by subjecting each coefficient in the residual coefficient graphs to Gaussian distribution; and obtaining the probability density function and the corresponding likelihood function of the residual coefficient sample by adopting a maximum likelihood estimation algorithm according to the residual coefficient sample, and obtaining the expectation and the variance of the segmented residual coefficient graph according to the probability density function and the likelihood function.

Further, the process of determining whether the expectation and variance of the first residual coefficient map are different from the expectation and variance of the second residual coefficient map includes: inputting the expectation and the variance of the first residual coefficient graph into a judgment condition to obtain a first judgment result; inputting the expectation and the variance of the second residual coefficient map into a judgment condition to obtain a second judgment result; and comparing the first judgment result with the second judgment result to obtain a judgment result.

Further, the judgment conditions are as follows:

wherein the content of the first and second substances,

denotes the mean value of the samples, μ₁Indicates expectation, σ₁Denotes the standard deviation, n denotes the number of residual coefficients in each section, s_αRepresenting a threshold value.

Preferably, the step of determining whether the ILR mode of the current CU is the optimal mode by using the GMM-EM method includes: saving coding modes and rate distortion costs of each depth CU of a previous frame and a current frame; obtaining the coding mode and the rate distortion cost of adjacent CUs of the previous frame and the current frame of each CU according to the coding mode and the rate distortion cost of the CU of the previous frame and the current frame; adopting rec0 to store rate-distortion cost belonging to ILR mode, adopting rec1 to store rate-distortion cost of Intra mode; after the current CU codes and finishes the ILR mode, obtaining the rate distortion cost of the ILR mode adopted by the current CU according to the rate distortion cost stored by rec0 and rec 1; performing GMM conversion on the rate distortion cost of the ILR mode of the current CU to obtain the probability based on rate distortion; obtaining the probability of the current CU based on the quantity according to the coding mode of the adjacent CU; predicting the probability that the current CU adopts the ILR mode according to the probability based on rate distortion and the probability based on quantity; and judging whether the ILR mode of the current CU is the optimal mode according to the probability result.

Preferably, the intra prediction of the current CU mode includes: predicting the mode of the current CU by adopting a method based on a direction mode DM; there are 35 all directional patterns DM for CU; the process of predicting the mode of the CU includes:

step 1: selecting 0, 1, 10 and 26 in the direction mode DM for Hadamard transformation, selecting smaller Hadamard cost HCl in DM0 and DM1 and smaller Hadamard cost HC2 in DM10 and DM26, judging the sizes of HC1 and HC2, if the HCl is smaller than HC2, then DM0 and DM1 are the optimal DM, executing step 10, otherwise, executing step 2;

step 2: judging the Hadamard cost in DM10 and DM26, if the Hadamard cost ratio of DM10 is less than that of DM26, executing step 3, if the Hadamard cost ratio of DM10 is more than that of DM26, executing step 5, otherwise, executing step 7;

and step 3: detecting DM8, DM9, DM11 and DM12, judging whether LMDs exist in the modes of DM9, DM11 and DM12, if yes, the mode is the optimal direction mode, and executing step 10, otherwise, executing step 4;

and 4, step 4: detecting DM2, DM6, DM14 and DM18, and if the DM with the minimum Hadamard cost is not in DM2, DM6, DM8, DM12, DM14 and DM18, directly executing the step 10; otherwise, acquiring the optimal DM by adopting a binary search method, and executing the step 9;

and 5: detecting DM24, DM25, DM27 and DM28, if LMD exists in DM24, DM25 and DM27, the DM is the optimal DM, executing step 10, otherwise executing step 6;

step 6: detecting DM18, DM22, DM30 and DM34, if the DM with the minimum Hadamard cost is not in DM18, DM22, DM8, DM24, DM28, DM30 and DM34, directly executing step 10, otherwise, acquiring the optimal DM by adopting a binary search method, and executing step 9;

and 7: detecting other DMs except the DMs 10 and 26, if LMDs exist in the DM9, the DM10, the DM11, the DM25, the DM26 and the DM27, determining that the DM is the optimal DM, and executing the step 10, otherwise, executing the step 8;

and 8: detecting DM2, DM6, DM14, DM18, DM22, DM30 and DM34, if the DM with the minimum Hadamard cost is not in DM2, DM6, DM8, DM12, DM14, DM18, DM22, DM24, DM28, DM30 and DM34, directly executing step 10, otherwise, adopting a binary search method to obtain the optimal DM, and executing step 9;

and step 9: checking the middle of the DM having the minimum hadamard cost and the neighboring DMs checked left (right) thereof and selecting the DM having the minimum hadamard cost, repeating the process until the DM is an LMD, which is an optimal DM, and performing step 10;

step 10: the DM selection terminates.

Preferably, the process of determining whether the current CU continues to be divided according to the residual coefficient of the optimal mode is the same as the process of determining whether the current depth is skipped according to the residual coefficient of the enhancement layer ILR mode.

The invention has the advantages that: the Hadamard transform is terminated in advance according to the significance difference, and the variable step length halving search method is adopted to predict and select the directional mode of the CU, so that the problem of low efficiency caused by the Hadamard transform of the directional mode with low possibility is solved.

Drawings

Fig. 1 is a result of a base layer and an enhancement layer in a conventional SHVC in the present invention;

FIG. 2 is a flow chart of the algorithm of the present invention;

FIG. 3 is a graph of a residual coefficient map according to the present invention;

FIG. 4 is a diagram of a neighboring CU structure of the present invention;

FIG. 5 is a schematic view of all directional modes of the present invention;

FIG. 6 is a diagram illustrating the prediction results of the type 1 directional mode of the present invention;

FIG. 7 is a diagram illustrating the results of the variable step size check and binary search of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

A method of original encoding, the method comprising: coding the whole coding unit CU to obtain a rate distortion cost value RDcost of the coding unit, and marking the RDcost as C1; dividing the whole coding unit CU to obtain 4 sub-CUs with the depth of 1, coding the sub-CUs with the depth of 1 to obtain the optimal rate distortion cost value RDcost of each sub-CU, calculating the sum of all the optimal rate distortion cost values of the sub-CUs with the depth of 1, and recording the value as C2; comparing C1 with C2, and taking the minimum value as the optimal RDcost; if C1 is smaller than C2, the current CU is not divided, otherwise, the division is continued; and repeating the above processes until the depth of the current CU is 3, stopping dividing, and finally obtaining the optimal RDCost with the current depths of 0, 1, 2 and 3 respectively.

A SHVC spatial scalable video coding method based on distribution characteristics, as shown in fig. 2, the method comprising:

s1: acquiring the depth of the current coding unit, if the depth of the current CU is 1 or 2, judging whether to skip the current depth according to a residual error coefficient of an enhancement layer ILR mode, if so, executing a step S5, otherwise, executing a step S2;

s3: performing intra-frame prediction on the mode of the current CU, namely performing candidate direction mode prediction on the current CU according to a direction mode DM-based method to obtain the optimal mode of the current CU;

s4: judging the depth of the current CU according to the optimal mode, and if the current depth is 0, executing the step S5; if the current depth is 1 or 2, judging whether to terminate the CU division according to the final residual error coefficient, if so, outputting a CU division result, and if not, executing a step S5; if the current depth is 3, directly jumping out of the division to obtain a final division result;

The process of judging whether to skip the current depth according to the residual coefficient comprises the following steps: coding the current CU, and obtaining a residual coefficient diagram of the CU after the ILR mode is finished; dividing the residual coefficient graph to obtain a first residual coefficient graph and a second residual coefficient graph; respectively calculating the expectation and variance of the first residual coefficient map and the second residual coefficient map, judging whether the expectation and variance of the first residual coefficient map and the second residual coefficient map are different according to a hypothesis testing method, skipping the current depth if the expectation and variance of the first residual coefficient map and the second residual coefficient map are different, and not skipping the current depth if the expectation and variance of the first residual coefficient map and the second residual coefficient map are different.

Optionally, the manner of dividing the residual coefficient map includes dividing the residual coefficient map left and right to obtain a left half residual coefficient map and a right half residual coefficient map; and respectively calculating expectation and variance of the left half residual coefficient graph and the right half residual coefficient graph, judging whether the two parts have significant difference according to a hypothesis testing method, and skipping the current coding depth if the two parts have significant difference.

Optionally, the manner of dividing the residual coefficient map includes dividing the residual coefficient map up and down to obtain an upper half residual coefficient map and a lower half residual coefficient map; and respectively calculating expectation and variance of the residual coefficient map of the upper half part and the residual coefficient map of the lower half part, judging whether the two parts have significant difference according to a hypothesis testing method, and skipping the current coding depth if the two parts have significant difference.

Preferably, the manner of dividing the residual coefficient map includes dividing the residual coefficient map left and right or up and down to obtain a left half residual coefficient map and a right half residual coefficient map or an upper half residual coefficient map and a lower half residual coefficient map; and respectively calculating expectation and variance of the left half residual coefficient map and the right half residual coefficient map or the upper half residual coefficient map and the lower half residual coefficient map, judging whether the two parts have significant difference according to a hypothesis testing method, and skipping the current coding depth if the two parts have significant difference.

In SHVC, each CTU includes four depths, corresponding to Coding Units (CUs) of size 64 × 64 to 8 × 8, and each CU needs to check an ILR mode and an intra mode and then select a mode with a low Rate Distortion (RD) cost as a best mode. The corresponding mode distribution between inter-layer (ILR) mode and Intra (Intra) mode is shown in table 1:

Sequence	ILR	Intra
			Blue-sky	98.50％	1.50％
Ducks	99.76％	0.24％
			Park_Joy	99.07％	0.93％
Pedestrian	90.65％	9.35％
			Tractor	97.68％	2.32％
Town	96.84％	3.16％
			Station2	95.27％	4.73％
Average	96.82％	3.18％

TABLE 1ILR mode and Intra mode distributions

As can be seen from table 1, the average percentage of ILR mode is 96.82%, i.e. most users select ILR mode as the best mode because the content in BL and EL is the same, while QP in BL and EL is similar or even the same, and the inter-layer correlation is strong; by upsampling the located CU in the BL, the prediction of the ILR mode can be directly obtained, and the process is simple. Therefore, in the invention, the residual coefficient of the ILR mode is obtained without coding, and then whether the current CU needs to be further divided is judged according to the residual coefficient, so that the current CU can be directly skipped; otherwise, the ILR mode and the Intra mode need to be further encoded.

First, the CU of the residual coefficient is divided into upper and lower parts shown in fig. 3(a) and left and right parts shown in fig. 3 (b). If there is a significant difference between the two parts of any one partition, it means that the current CU needs to be further partitioned. If the CU can be predicted accurately with the best mode, the corresponding residual coefficients will follow a gaussian distribution. If the residual coefficients obey gaussian distribution, the residual coefficients of one part in each partition are respectively modeled as:

X～N(μ₁，σ₁ ²)

wherein X represents a residual coefficient set, μ, of a certain portion (here, the upper half or the left half)₁And σ₁ ²Respectively the expectation and variance of the part.

If x₁，x₂，....，x_nAnd if the sample in the residual coefficient set X is the sample, obtaining a probability density function of each part of the sample in the selected residual coefficient set by adopting Maximum Likelihood Estimation (MLE), wherein the expression of the function is as follows:

where X denotes the samples in the residual coefficient set X, μ₁And σ₁ ²Respectively the expectation and variance of the residual coefficient set.

The likelihood function corresponding to the probability density function is:

where L denotes a likelihood function and n denotes the number of residual coefficients in each section. To obtain mu₁And σ₁ ²The following calculation can be performed:

according to the calculation result of the above expression, it can be obtained:

wherein，

The average of the samples is indicated.

If Y is the residual coefficient set of another part, Y₁，y₂，....，y_nFor the samples in the set of residual coefficients,

for the average value of the sample, the condition for judging whether the two parts have significant difference is as follows:

where n is the number of residual coefficients in each part, and α is the significance level value; for any kind of alpha, the corresponding threshold s can be obtained by checking the Gaussian distribution table_α(ii) a If the above formula is satisfied, the two parts differ significantly.

Since the probability of different depths being skipped may be different, an optimal threshold for each depth needs to be selected. And for the depth 2, a common value is adopted for detection, and the coding efficiency corresponding to the value is high. In order to improve the coding efficiency, the maximum value of 3.49 in the gaussian distribution table is selected, and the multiple of the maximum value is further tested, and the corresponding coding efficiency is shown in table 2.

TABLE 2 coding efficiency under different test values

In table 2, the coding efficiency is represented by BDBR, which measures the bit rate difference for the same peak signal-to-noise ratio (PSNR) in EL. A positive or negative BDBR reflects a loss or increase in coding efficiency, respectively. As can be seen from Table 2, when the test value is greater than or equal to 20.94, the BDBRs of all the sequences are less than 0.1%. Therefore, test value 20.94 is selected as the threshold for depth 2. Likewise, the threshold for depth 1 is 31.41. If depth 0 is skipped, the corresponding coding efficiency is significantly reduced in some sequences; depth 0 is not skipped. The depth skip condition is:

where depth represents depth.

The process of judging whether the ILR mode of the current CU is the optimal mode by adopting the GMM-EM method comprises the following steps: saving coding modes and rate distortion costs of each depth CU of a previous frame and a current frame; obtaining the coding mode and the rate-distortion cost of adjacent CUs of the previous frame and the current frame of each CU according to the CU coding mode and the rate-distortion cost of the previous frame and the current frame, storing the rate-distortion cost belonging to the ILR mode by adopting rec0, storing the rate-distortion cost of the Intra mode by adopting rec1, and obtaining the rate-distortion cost of the ILR mode adopted by the current CU according to the rate-distortion costs stored by rec0 and rec 1; performing GMM conversion on the rate distortion cost of the ILR mode of the current CU to obtain the probability based on rate distortion; obtaining the probability of the current CU based on the quantity according to the coding mode of the adjacent CU; predicting the probability that the current CU adopts the ILR mode according to the probability based on rate distortion and the probability based on quantity; and judging whether the ILR mode of the current CU is the optimal mode according to the probability result. Specifically, the process of judging whether the ILR mode of the current CU is the optimal mode by using the GMM-EM method includes:

(1) firstly, using a variable curCosts0 of vector < pair < int, double > type to store the final coding mode and rate distortion cost of each depth 0 CU of the current frame, using preCosts0 to store the final coding mode and rate distortion cost of each depth 0 CU of the previous frame, wherein the type of the preCosts0 is the same as that of the curCosts0, copying the curCosts0 to the preCosts0 and re-initializing the curCosts0 each time the encoding is completed by one frame. The same applies to

depths

1, 2, and 3, so that the coding mode and the rate-distortion cost of each depth CU coded in the previous frame and the current frame are obtained.

(2) The adjacent CUs of the previous frame and the current frame of each CU can be obtained through (1), the coding modes and the rate-distortion costs of the adjacent CUs can be obtained, the rate-distortion costs belonging to the ILR mode are stored by using rec0, the rate-distortion costs belonging to the Intra mode are stored by using rec1, and the rate-distortion costs of the ILR mode adopted by the current CU can be obtained because the current CU completes the coding in the ILR mode at the moment.

The structure of the current CU and its neighboring CUs is shown in FIG. 4, where U is₀Is the current CU, U₁，U₂，U₃And U₄Is a neighboring CU, U of the current CU₅，U₆，U₇，U₈And U₉Is in the previous frame with U₀，U₁，U₂，U₃And U₄Co-located CUs.

(3) And (3) performing GMM conversion on the information obtained in the step (2) to obtain probability based on rate distortion, wherein the specific process comprises the following steps:

for arbitrary U_iThe rate distortion cost is denoted as rd_iThe corresponding gaussian mixture model is:

p(rd_i)＝π₁N(rd_i|μ₁，∑₁)+π₂N(rd_i|μ₂，∑₂)

wherein, pi₁To adopt the possibility of ILR mode, μ₁Sum Σ₁Respectively, the expected value and variance of its rate-distortion cost; pi₂Is the possibility of using the Intra mode, μ₂Sum Σ₂Respectively, the expectation and variance of its rate-distortion cost.

To obtain pi₁、μ₁、∑₁、π₂、μ₂And sigma₂And adopting maximum likelihood estimation for six parameter values, wherein the expression is as follows:

wherein M represents the current CU and the number of adjacent CUs thereof, p represents the probability, rd_iRepresenting the rate-distortion cost and N representing the gaussian distribution.

Preferably, the value of M is set to 10.

The log expression of the likelihood function is:

the maximum likelihood estimation expression and the logarithm expression of the likelihood function can be obtained as follows:

from the above formula, one can obtain:

where γ (i, k) is the probability that the ith sample is produced by the kth section, and T represents the transpose.

I.e. the sum of the probabilities that all samples are generated by the kth part, the expression to get the possibility to use ILR mode is:

where N is the sum of the probability sums of the two modes employed, γ (i, k) can be obtained according to the following equation:

repeating the iteration to obtain mu_k、∑_k、π_kAnd γ (i, k) until γ (i, k) converges.

Since the current CU is U₀In order to determine whether the ILR mode is the best mode, it is necessary to determine whether γ (0, k) converges. Let the i-th iteration of γ (0, k) be denoted as γ_i(0, k) isAvoiding unnecessary repeated iterations if gamma_i-1(0, k) and γ_iThe absolute difference between (0, k) is very small and the repeated iterations can be terminated. Selecting 0.01 as the threshold, then:

|γ_i(0，k)-γ_i-1(0，k)|≤0.01

if the above conditions are met, directly terminating the repeated iteration process; through this process, the probability that the current CU selects the ILR mode may be obtained, which is defined as a probability based on rate distortion.

(4) And obtaining the probability based on the quantity of the current CU according to the coding mode of the adjacent CU. The specific process is as follows:

since neighboring CUs are usually similar, neighboring CUs are used for prediction. The more ILR mode is used in neighboring CUs, the higher the probability that the current CU uses this mode, and vice versa. The probability that the current CU selects the ILR mode is proportional to the number of neighboring CUs using the ILR mode. As shown in fig. 4, the current CU has nine neighboring CUs. Thus, the possibility that the current CU selects ILR mode can be written as

Where k is the number of neighboring CUs using ILR mode. Since this probability is obtained based on the number of neighboring CUs using the ILR mode, it is defined as a number-based probability.

(5) Since both the rate-distortion based probability and the quantity based probability have a strong relationship with the selection of the ILR mode, the probability of using the ILR mode is predicted from the rate-distortion based probability and the quantity based probability. Let A and B denote probability based on rate distortion and probability based on quantity, respectively; because the two are independent of each other, the possibility of deep early termination is obtained, and the expression is as follows:

p_r＝p(A+B)＝p(A)+p(B)-p(A)p(B)

wherein p is_rIndicating the possibility of early termination of depth if p_rGreater than or equal to 0.6, the current CU terminates early; p (A + B) represents the probability based on rate-distortion and the probability based on quantity, i.e., the probability that both satisfy at least one, and p (A) represents the probability based onProbability of rate distortion, p (b) denotes the probability based on quantity. 0.6, 0.7, 0.8 and 0.9 were used during the test, the corresponding BDBRs are shown in table 3.

Pr&BDBR	0.6	0.7	0.8	0.9
					Blue-sky	-0.3％	-0.3％	-0.3％	-0.3％
Ducks	0.0％	0.0％	0.0％	0.0％
					Park_Joy	0.0％	0.0％	0.0％	0.0％
Pedestrian	-0.1％	-0.1％	-0.1％	-0.1％
					Tractor	-0.1％	0.0％	0.0％	0.0％
town	-0.1％	-0.1％	-0.1％	-0.1％
					station2	0.0％	-0.1％	-0.1％	-0.2％

TABLE 3 probability of deep early termination p_rAnd corresponding BDBR

As can be seen from Table 3, when p is_rAt increasing levels, the BDBR remains substantially unchanged except for the sequence "station 2". The BDBR of the sequence "station 2" reaches a minimum value when the BDBR is equal to 0.9. Therefore, 0.9 is selected as the optimum value. I.e., if pr is greater than 0.9, the current depth terminates early.

The Directional Mode (DM) in SHVC is shown in fig. 5. There are 2 non-directional modes, DC (DM0) and planar (DM1), and 33 directional modes (dm2.. DM 34). In general, dm0 and dm1 are well suited for simple CUs. Similar to HEVC, SHVC first checks 35 directional modes in coarse mode decision (RDM) to get Hadamard Cost (HC) to select the smallest N DMs in HC, then checks these DMs in Rate Distortion Optimization (RDO) process, and selects the directional mode with the smallest RDO value as the best DM; through the above process, the optimal coding efficiency can be obtained. However, checking many unnecessary direction patterns takes much unnecessary encoding time. Especially the RDM procedure always checks 35 directional patterns, which is very time consuming. Large-size CUs in the Enhancement Layer (EL) are usually very simple if they use the Intra mode; for small CUs in the EL, their texture does not change much because of their small size, so they are usually also simple. Obviously, a simple CU may have special DM characteristics. Studying these DM characteristics in EL helps to increase the coding speed. The probability that different DMs are selected as the best DM among all CUs may be different by first obtaining their probabilities and then studying the distribution of DMs by grouping DMs with similar probabilities.

There are 35 all directional modes DM of CU, namely DM0, DM1, DM2, DM3, DM4, DM5, DM6, DM7, DM8, DM9, DM10, DM11, DM12, DM13, DM14, DM15, DM16, DM17, DM18, DM19, DM20, DM21, DM22, DM23, DM24, DM25, DM26, DM27, DM28, DM29, DM30, DM31, DM32, DM33 and DM 34; and calculating the probability of selecting each directional mode by the CU, and classifying all the directional modes according to the calculated probability to obtain three types of division results.

The formula for calculating the probability is:

wherein n is_iIs DM_iM is the number of all CUs, where all CUs are selected as the number of best DMs.

The classification result of the directional pattern includes class 0, class 1, and class 2. The 0-class directional patterns include DM0 and DM 1; the class 1 directional patterns include DM8, DM9, DM10, DM11, DM12, DM24, DM25, DM26, DM27, and DM 28; the 2-class directional patterns include DM2, DM3, DM4, DM5, DM6, DM7, DM13, DM14, DM15, DM16, DM17, DM18, DM19, DM20, DM21, DM22, DM23, DM29, DM30, DM31, DM32, DM33, and DM 34. The 1-class directional modes are divided into two groups, namely, DM8, DM9, DM10, DM11 and DM12 are used as a horizontal directional mode group, and DM24, DM25, DM26, DM27 and DM28 are used as a vertical directional mode group. Dividing the 2 types of direction modes into 4 groups of data, namely taking DM2, DM3, DM4, DM5, DM6 and DM7 as a first direction mode group; DM13, DM14, DM15, DM16, DM17, and DM18 as a second directional pattern group; taking DM19, DM20, DM21, DM22 and DM23 as a third direction mode group; DM29, DM30, DM31, DM32, DM33, and DM34 are taken as a fourth directional pattern group.

And in the process of carrying out intra prediction on the mode of the current CU, carrying out significance difference prediction on the 0-type directional mode. The specific process comprises the following steps: will DM_iHC of (b) is represented by_iMin () is expressed as the smaller value between two different HC values; DM0 and DM1 in class 0 are non-directional modes, DM10 in the directional mode of class 1 is horizontal direction, and DM26 is vertical direction; if HC0 and HCl are significantly smaller than HC10 and HC26, then the optimal DM appears in class 0; checking DM0 and DM1 in class 0 and DM10 and DM26 in class 1 to determine whether the optimal DM is present in class 0 based on the difference between min (HC0, HCl) and min (HC10, HC 26); in performing the directional pattern prediction, the user may terminate the selection of DM early at any time.

As shown in fig. 6, a significance difference prediction descending direction search is performed for the type 1 directional pattern; since DM10 is a horizontal DM and DM26 is a vertical DM, if HC10 is significantly smaller than HC26, the optimal DM is likely to be in the horizontal direction pattern group, and conversely, the optimal DM is likely to be in the vertical direction pattern group. First, DM10 and DM26 in class 1 are examined, and a possible group of directional patterns is predicted from the difference between HC10 and HC 26. After obtaining the set of possible directional patterns, the two directly neighboring DMs of DM10 or DM26 are further checked according to the set of directional patterns, and then the best DM is searched according to their hadamard cost in the set. Since the probability that a DM in

classes

0 and 1 is the best DM is high, if one DM and its two immediately adjacent DMs are examined, and its HC is the smallest HC of all the examined DMs, then this DM is likely to be the best DM (lbd). To obtain LBD as soon as possible, a search is made in the direction of decreasing hadamard cost. For example, if the horizontal subclass is a very likely group of directional patterns, the two immediately adjacent DMs of DM10, DM9 and DM11, are further checked. There are three cases according to the combination of HC9, HC10 and HC11, as shown in fig. 6. The LBD is searched for according to the arrows in each case.

Preferably, the three cases include: (1) if HC10 is the smallest HC among all Hadamard costs and DM10 is LBD, then DM selection is terminated early; (2) since the hadamard cost is decreasing from the left and right, DM8 and DM12 are further checked in the decreasing direction to determine if DM9 or DM11 is LBD; if not, further checks of DM7 and DM13 in class 2 are required to determine whether DM8 or DM12 is LBD; (3) as the hadamard costs of the three directional modes are all monotonically decreasing, the LBD is searched along the decreasing direction; if HC9> HC10> HC11, further check DM12 to determine if DM11 is LBD; if not, further checking DM13 in class 2 and determining whether DM12 is an LBD; if HC11> HC10> HC9, further check DM8 to determine if DM9 is LBD; if not, DM7 in class 2 is further examined and it is determined whether DM8 is an LBD.

To determine DM8 or DM12 in class 1, one needs to check DM7 or DM13 in class 2. By predicting the descending direction search with significant differences for DM7 or DM13, if an LBD can be obtained, it can be considered as the best DM and DM selection can be terminated.

And performing variable-step two-dimensional search on the 2 types of direction modes. Specifically, the best DM in class 2 is searched using DM7 and DM13 or DM23 and DM29 as the starting DMs. For example, using DM7 in class 2 (selected) as the starting DM, check DM6 using step size 1 (the distance between 7 and 6 is 1); starting from DM6, DM2 was checked using a step size of 4 (distance between 6 and 2 is 4); if there is one DM in class 2 whose HC is the smallest among all the selected DMs, a binary search is used to find the best DM. More specifically, the process is: checking the midpoint between mDM with the smallest HC among all the checked DMs and its nearest left-checked neighbor lDM first, then checking the midpoint between mDM and its nearest right-checked neighbor rDM, and finally selecting the DM with the smallest HC among all the checked DMs; this process is repeated until DM becomes LBD. For example, if DM4 has the smallest HC among all the selected DMs, then its left and right nearest neighboring selected DMs are DM2 and DM6, the midpoint between DM2 and DM4, i.e., DM3, and the midpoint between DM4 and DM6, i.e., DM5, are further checked, then the DM with the smallest HC among all the selected DMs is selected, if the DM is LBD, the DM selection may be terminated early, otherwise, the process is repeated further. An example of a variable step check and binary search is shown in fig. 7.

Specifically, the process of intra-predicting the mode of the current CU includes:

step 1: selecting 0, 1, 10 and 26 in the direction mode DM for Hadamard transformation, selecting a smaller Hadamard cost HC1 in DM0 and DM1 and a smaller Hadamard cost HC2 in DM10 and DM26, judging the sizes of HC1 and HC2, if HC1 is smaller than HC2, then DM0 and DM1 are the optimal DM, executing step 10, otherwise executing step 2;

and 7: detecting the DM of the category 2 except the DM10 and the DM26, if LMD exists in DM9, DM10, DM11, DM25, DM26 and DM27, the DM is the optimal DM, executing the step 10, otherwise, executing the step 8;

step 10: the DM selection terminates.

In order to determine whether there is a significant difference between the HC of the two DMs, the corresponding residual coefficients are determined. Let R be₁And R₂Is the residual of two DMs, which differ by:

R＝R₁-R₂

by Hadamard transformation, the above equation can be rewritten as:

HRH＝HR₁H-HR₂H

where H denotes a Hadamard matrix.

According to the cauchy inequality, the expression of HRH can be rewritten as:

wherein, m is the side length of the current CU, then there are:

wherein x is_i，jIs the HRH value at the (i, j) position, calculated as:

wherein h is_ikDenotes the Hadamard value, h, at the (i, k) position_pjHadamard value, r, at the (p, j) position_kpRepresents the R value at the (k, p) position.

If any quantized value in HRH is less than k, R₁And R₂There was no significant difference. The following conditions should be satisfied:

wherein k represents a parameter value, Q_stepThe representation of the quantization step size can be obtained according to a quantization parameter.

According to the calculation formula of R, the calculation formula of HRH, the Cauchy inequality and the above conditions, the following can be obtained:

HC₁and HC₂The conditions without significant differences were:

HC₁and HC₂Conditions with significant differences were:

wherein HC₁And HC₂Representing the Hadamard transform values of two DMs. If the obtained data are significantly identical, the two are not significantly different, otherwise, the two are significantly different. If HC is present₁＜HC₂Then HC₁Is significantly less than HC₂And vice versa.

To obtain the most suitable k-value, the above conditions were tested and the corresponding BDBR was obtained as a result

Shown in table 4.

TABLE 4 different k values and corresponding BDBRs

As can be seen from table 4, there is a turning point when k is equal to 5, and if greater than or equal to 5, the corresponding BDBR in all sequences is less than 0.1%. This means that good performance can be obtained when k is taken to be 5. If further larger, the corresponding increase in coding speed will be smaller. Therefore, k is set to 5.

The specific content in step S4 is: after the current CU is coded, obtaining a final residual coefficient map of the current CU; respectively obtaining expectation and variance of the left half part and the right half part of the residual coefficient graph, and judging whether the two parts have significance difference according to a hypothesis testing method; and respectively obtaining the expectation and the variance of the upper half part and the lower half part of the residual coefficient graph, and judging whether the two parts have significant difference according to a hypothesis testing method.

μ₁And σ₁ ²Is the expected sum of the variances of one of the parts (here, the upper or left half), Z is the residual coefficient of the other part, Z₁，z₁，...z_nIs a sample thereof, and it can be tested whether Z also satisfies μ by the following formula₂And σ₂ ²The formula is:

where α is the significance level value and is the number of residual coefficients of each portion. By consulting the Gaussian distribution table, the correspondingThe threshold value of (2). If | γ is satisfied_i(0，k)-γ_i-1(0, k) | ≦ 0.01, the residual coefficients of the two parts use the same expected value and variance. Thus, there is no significant difference between the two parts and the current CU can terminate early.

Wherein e is at depth 1 and depth 2_αThe values of (A) are as follows:

if the left and right parts and the upper and lower parts have no significant difference, the division is terminated early.

To verify the performance of the proposed spatial SHVC fast intra prediction algorithm, the reference software shm11.0 was used and tested on intel (r)2.0ghz processor and 30gb memory server. The training sequence and the test sequence are not overlapped, so that the universality of the algorithm is ensured. The performance of the algorithm was evaluated in terms of both coding efficiency and coding speed. Coding efficiency includes bit rate and visual quality, expressed in BDBR. It refers to the bit rate difference at the same PSNR compared to the reference software in EL. The encoding speed is represented by TS, which evaluates only the percentage saved during the encoding run in EL.

To verify the performance of the proposed algorithm, the algorithm integrates all proposed strategies. The performance of the algorithm provided by the invention is compared with the performance of the PAPS algorithm, the EETBS algorithm and the FIICA algorithm. All algorithms are tested on the same computing platform. Since there are two setting modes for the scalability ratio and QP, respectively, their combination is divided into four cases (cases) in EL. case1 is scalable rate of 1.5 times and QP set (22, 26, 30, 34), case2 is scalable rate of 1.5 times and QP set (24, 28, 32, 36), case3 is scalable rate of 2 times and QP set (22, 26, 30, 34), case4 is scalable rate of 2 times and QP set (24, 28, 32, 36). Table 6(case1), table 7(case2), table 8(case3), and table 9(case4) list overall performance comparisons in terms of encoding efficiency and encoding speed, respectively.

TABLE 6 case1 Performance comparison

TABLE 7 case2 Performance comparison

TABLE 8 case3 Performance comparison

TABLE 9 case4 Performance comparison

In table 6(case1), the average BDBR for the algorithm used in the present invention, PAPS, EETBS and FIICA were 0.02%, 0.30%, 0.20% and 0.06%, respectively. The average TS for the four algorithms was 79.66%, 67.03%, 55.34%, and 47.15%, respectively. In the test, the BDBR of the algorithm adopted by the invention is smaller than the other three algorithms, and the coding speed is obviously higher than the other three algorithms. In Table 7(case2), the average BDBRs for the proposed algorithm, BDBR, PAPS, EETBS and FIICA are-0.14%, 0.38%, -0.20% and-0.18%, respectively; the average TS of the four algorithms are 81.26%, 65.85%, 56.30% and 45.75%, respectively; in the test, the BDBR of the algorithm is smaller than the PAPS algorithm and slightly larger than the EETBS and FIICA algorithms, and the coding speed of the algorithm is obviously higher than that of the other three algorithms. In table 8(case3), the average BDBRs of the proposed algorithm, PAPS, EETBS and FIICA are 0.94%, 0.62%, 0.35% and 0.38%, respectively. The average TS of the proposed algorithm is 76.34%, 68.30%, 54.49% and 42.22%, respectively. In the test, the BDBR of the algorithm provided by the invention is larger than the other three algorithms, and the coding speed is obviously higher than the other three algorithms. In table 9(case4), the average BDBRs of the proposed algorithm, PAPS, EETBS and FIICA are 0.68%, 0.31% and 0.40%, respectively. The average TS of the proposed algorithm is 78.02%, 66.67%, 54.11% and 43.25%, respectively. In the test, the BDBR of the algorithm provided by the invention is smaller than the PAPS algorithm and slightly larger than the EETBS and FIICA algorithms, and the coding speed is obviously higher than that of the other three algorithms.

To clearly demonstrate the performance of the algorithms presented herein, table 10 provides a comparison of the overall average performance of these four algorithms in all four cases.

TABLE 10 comparison of Overall average Performance of different algorithms

The overall average BDBR for the proposed algorithm, PAPS, EETBS and FIICA was 0.38%, 0.49%, 0.17% and 0.16%, respectively. The total average TS for the four algorithms was 78.82%, 66.96%, 55.06%, and 44.59%, respectively. Therefore, the encoding speed of the algorithm is obviously faster than that of the other three algorithms. Meanwhile, the BDBR of the algorithm is smaller than the PAPS algorithm and larger than the EETBS algorithm and the FIICA algorithm.

The above-mentioned embodiments, which further illustrate the objects, technical solutions and advantages of the present invention, should be understood that the above-mentioned embodiments are only preferred embodiments of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for SHVC spatial scalable video coding based on distribution characteristics, the method comprising:

s1: acquiring the depth of the current coding unit CU, if the depth of the current CU is 1 or 2, judging whether to skip the current depth according to a residual error coefficient of an enhancement layer ILR mode, if so, executing a step S5, otherwise, executing a step S2;

2. The SHVC spatial scalable video coding method according to claim 1, wherein the step of determining whether to skip the current depth according to the residual coefficients comprises: coding a current coding unit CU to obtain a residual coefficient map of an enhancement layer ILR mode; dividing the residual coefficient graph to obtain a first residual coefficient graph and a second residual coefficient graph; respectively calculating the expectation and variance of the first residual coefficient map and the second residual coefficient map, judging whether the expectation and variance of the first residual coefficient map are different from the expectation and variance of the second residual coefficient map, if the expectation and variance of the first residual coefficient map are different from the expectation and variance of the second residual coefficient map, skipping the current depth, otherwise, not skipping the current depth.

3. The SHVC spatial scalable video coding method according to claim 2, wherein the process of calculating the expectation and variance of the residual coefficient map comprises: obtaining residual coefficient samples of the divided residual coefficient graphs by subjecting each coefficient in the residual coefficient graphs to Gaussian distribution; and obtaining the probability density function and the corresponding likelihood function of the residual coefficient sample by adopting a maximum likelihood estimation algorithm according to the residual coefficient sample, and obtaining the expectation and the variance of the segmented residual coefficient graph according to the probability density function and the likelihood function.

4. The SHVC spatial scalable video coding method according to claim 2, wherein the determining whether the expectation and variance of the first residual coefficient map are different from the expectation and variance of the second residual coefficient map comprises: inputting the expectation and the variance of the first residual coefficient graph into a judgment condition to obtain a first judgment result; inputting the expectation and the variance of the second residual coefficient map into a judgment condition to obtain a second judgment result; and comparing the first judgment result with the second judgment result to obtain a judgment result.

5. The SHVC spatial scalable video coding method based on distribution characteristics as claimed in claim 4, wherein the determination condition is:

wherein the content of the first and second substances,

6. The SHVC spatial scalable video coding method according to claim 1, wherein the step of determining whether the ILR mode of the current CU is the optimal mode by using the GMM-EM method comprises: saving coding modes and rate distortion costs of each depth CU of a previous frame and a current frame; obtaining the coding mode and the rate distortion cost of adjacent CUs of the previous frame and the current frame of each CU according to the coding mode and the rate distortion cost of the CU of the previous frame and the current frame; adopting rec0 to store rate-distortion cost belonging to ILR mode, adopting rec1 to store rate-distortion cost of Intra mode; after the current CU codes and finishes the ILR mode, obtaining the rate distortion cost of the ILR mode adopted by the current CU according to the rate distortion cost stored by rec0 and rec 1; performing GMM conversion on the rate distortion cost of the ILR mode of the current CU to obtain the probability based on rate distortion; obtaining the probability of the current CU based on the quantity according to the coding mode of the adjacent CU; predicting the probability that the current CU adopts the ILR mode according to the probability based on rate distortion and the probability based on quantity; and judging whether the ILR mode of the current CU is the optimal mode or not according to the probability of the ILR mode.

7. The method of claim 1, wherein intra-predicting the mode of the current CU comprises: predicting the mode of the current CU by adopting a method based on a direction mode DM; there are 35 all directional patterns DM for CU; the process of predicting the mode of the CU includes:

and 7: detecting other DMs except the DMs 10 and 26, if LMDs exist in the DM9, the DM10, the DM11, the DM25, the DM26 and the DM27, determining that the DM is the optimal DM, and executing the step 10, otherwise executing the step 8;

and step 9: checking the middle of the DM having the minimum hadamard cost and the neighboring DMs checked left or right thereof and selecting the DM having the minimum hadamard cost, repeating the process until the DM is an LMD, which is an optimal DM, and performing step 10;

step 10: the DM selection terminates.

8. The method of claim 1, wherein the determining whether the current CU continues to be partitioned according to the residual coefficients of the best mode is the same as the determining whether to skip the current depth according to the residual coefficients of the enhancement layer ILR mode.