CN113055670A

CN113055670A - HEVC/H.265-based video coding method and system

Info

Publication number: CN113055670A
Application number: CN202110251124.7A
Authority: CN
Inventors: 程志刚; 程雨菡; 贾春华
Original assignee: Hangzhou Yuhan Technology Co ltd
Current assignee: Hangzhou Yuhan Technology Co ltd
Priority date: 2021-03-08
Filing date: 2021-03-08
Publication date: 2021-06-29
Anticipated expiration: 2041-03-08
Also published as: CN113055670B

Abstract

The invention discloses a video coding method and a system based on HEVC/H.265, comprising the following steps: acquiring video image data and establishing a virtual background frame; for the CU with the depth of l of the ith frame, calculating the inter-frame difference BD of the CU module at the same position as the adjacent frame_lAnd the background difference BD of the CU module at the same position as the virtual background frame_l ^’(ii) a For a CU with the ith frame depth of l, calculating BD of 4 sub-block CUs divided by the CU_l+1、BD_l+1 ^’(ii) a Based on division depth l, BD_l、BD_l ^’Selecting an optimal inter-frame prediction model; BD based on calculation_l、BD_l ^’、BD_l+1、BD_l+1 ^’CU partition continuation/termination determination is performed. The invention can effectively reduce the calculation complexity caused by interframe division and reduce the interframe coding complexity caused by traversing calculation rate distortion calculation selection.

Description

HEVC/H.265-based video coding method and system

Technical Field

The invention relates to the technical field of video coding, in particular to a method and a system for video coding based on HEVC/H.265.

Background

A video file is composed of successive image frames, each frame being a still image. Due to the persistence of vision effect of the human eye, a continuous video presentation is seen when a sequence of image frames is played at a certain rate. Because of the extremely high similarity between adjacent frames, in order to facilitate storage and transmission, the original video needs to be encoded and compressed to remove redundancy in spatial and temporal dimensions. The image video coding compression technology is a precondition for video processing of a computer, and the data bandwidth is very high after video signals are digitized, which is usually more than 20 Mbps; the data bandwidth is usually reduced to 1-10Mbps by using video coding technology, so that the video signal can be stored in a computer and processed correspondingly.

As shown in fig. 1, the video encoding process is mainly divided into four steps of prediction, transformation, quantization and entropy encoding, wherein the prediction is mainly divided into intra-frame prediction and inter-frame prediction.

The intra-frame prediction of video coding refers to that the current pixel block is predicted by using the pixel block coded by the current image by utilizing the correlation of a video spatial domain so as to achieve the aim of removing the video spatial domain redundancy.

Inter-frame prediction is used to reduce temporal redundancy, and is to perform prediction coding by using correlation between adjacent frames, that is, a current to-be-coded picture uses other coded and reconstructed pictures as reference frames, and the current to-be-coded block searches similar blocks in the reference frames as predicted values. When the current Block to be coded is predicted, the current Block to be coded may be divided into smaller Prediction Blocks (PB), and the Prediction blocks are used as basic units to search for the best-matched Prediction value, so as to reduce the size between the Prediction value and the actual value of the current Block to be coded as much as possible, reduce the number of coding bits, and improve the compression ratio.

In order to meet the requirement of compressing high-definition videos, a video coding cooperation group is established by a moving picture experts group of international standard organization ITU-T and a video coding experts group MPEG of ISO/IEC, and a new generation of high-performance video coding standard HEVC/H.265 is established. The goal of doubling the coding efficiency of H.265 over the previous H.264/AVC standard has been substantially achieved.

However, the coding structure uses a quadtree structure and a larger Coding Unit (CU), which results in a significant increase in the computational complexity of the encoder and makes it difficult for the encoding time to meet the real-time requirement. In order to improve the compression efficiency, the H.265 encoder has a more flexible block division mode; as shown in fig. 2, in h.265, each frame image is first sequentially divided into LCUs (largest coding units) of 64 × 64 size, and from the LCUs, coding depths are from 0 to 3, and each CU (coding unit) may be recursively divided into CUs of 4 depths (64 × 64, 32 × 32, 16 × 16, 8 × 8) to construct a quad-tree coding structure.

In the h.265 inter-coding process, a CU at each coded depth has its corresponding PU (prediction unit) partition mode for motion estimation and motion compensation. As shown in fig. 3, for a CU at a certain depth l, the inter prediction modes include SKIP, merge, rectangular motion partition (Square, inter2N × 2N, inter N × N), symmetric motion partition (SMP, inter2N × N, interN × 2N), asymmetric motion partition (AMP, inter2N × nU, inter2N × nD, inter nL × 2N, inter nR × 2N), and intra modes (intra 2N × 2N, intra N × N).

HEVC/h.265 has a 50% lower bitrate than h.264 for the same perceptual quality, but its coding complexity also increases. The HEVC video coding standard adopts a more flexible coding structure, and a series of new technologies are added. In the new technologies, links such as recursive quadtree division and prediction mode rate distortion cost minimization selection greatly improve the computational complexity of an encoder, and seriously hinder the popularization and application of the HEVC/H.265 video coding standard; wherein:

1. in the inter-frame coding process, H.265 adopts a quadtree partition structure to improve the coding performance, the size of a Coding Unit (CU) is changed from 16 × 16 of H.264 to 8 × 8 to 64 × 64, and the complexity of the whole inter-frame coding process is increased; meanwhile, in the CU division process, in order to determine the quadtree structure of the CU, a complete traversal of the depth l from 0 to 3 needs to be performed, and 4 is calculated in total⁰+4¹+4²+4³The calculation process is complicated as 85 recursive divisions.

2. In the inter-frame prediction process, H.265 traverses all inter-frame prediction modes according to the flow, selects the prediction mode with the minimum coding cost as the optimal prediction mode, the traversal sequence of the inter-frame prediction division mode is that the symmetric mode is performed first and then the asymmetric mode is performed, and the optimal symmetric prediction mode is obtained before the asymmetric mode is predicted. Namely according to SKIP, Merge, 2N × 2N, N × N, N × 2N, 2N × nU, 2N × nD, nL × 2N, nR × 2N. Therefore, the minimum number of traversals of the inter prediction mode is 1+4+16+256 × 6-1662, and the maximum number of traversals is 1+4+16+256 × 8-2216, and the calculation complexity of the inter prediction mode selection in the whole video coding process is very high, obviously, the calculation complexity of the coding end is very high due to the traversal calculation process, the coding time consumed by video compression is long, and the increasing real-time video compression requirement cannot be met.

In summary, a large amount of operation complexity is introduced in the inter-frame prediction process of HEVC, and how to effectively reduce the operation amount of an encoder becomes a problem to be solved urgently at present.

Disclosure of Invention

In view of the above problems in the prior art, the present invention provides a method and system for video coding based on HEVC/h.265.

The invention discloses a video coding method based on HEVC/H.265, which comprises the following steps:

acquiring video image data and establishing a virtual background frame;

for the CU with the depth of l of the ith frame, calculating the inter-frame difference BD of the CU module at the same position as the adjacent frame_lAnd the background difference BD of the CU module at the same position as the virtual background frame_l’；

For a CU with the ith frame depth of l, calculating BD of 4 sub-block CUs divided by the CU_l+1、BD_l+1’；

If BD_l≤T1、BD_l' < T2 and BD_l+1≤BD_l、BD_l+1’≤BD_lIf yes, the current depth l is the optimal depth, and subsequent inter-frame prediction is carried out;

in the inter prediction process:

if l is 0 and BD_l、BD_lIf the prediction model is not more than T3, selecting a SKIP model as the optimal inter-frame prediction model;

if l is 0 and T3 < BD_l、BD_l' < T4, or l ≦ 1 and BD_l、BD_l' < equal to T4; traversing the 2 Nx 2N interframe prediction model, and selecting the model with the minimum rate distortion cost as the optimal interframe prediction model;

if BD_l、BD_lIf the prediction rate is not less than T5, traversing the inter-frame prediction models except the AMP prediction model, and selecting the model with the minimum rate-distortion cost as the optimal inter-frame prediction model; wherein T3 < T4 < T5.

As a further improvement of the present invention, the method for establishing the virtual background frame includes:

taking the previous H frame of the original video as a sample, and performing data statistics on each pixel point of each frame, wherein the value range of the pixel points is 0-255;

and selecting the median of the pixel values of the pixels at the same position of each frame as the pixel value of the virtual background frame, thereby establishing the virtual background frame.

As a further improvement of the present invention,

the inter-frame difference BD_lThe calculation formula of (2) is as follows:

the inter-frame difference BD_lThe formula for calculation of' is:

in the formula (f)_i(x, y) represents the pixel value of the coordinate (x, y) in the CU of the ith frame in the sequence video, f_i-1(x, y) represents the pixel value of the coordinate (x, y) of the same-position CU in the adjacent previous frame, f_B(x, y) represents the pixel value with coordinates (x, y) in the same position CU in the background frame, N is the side length of CU block, and M is the length of CU block_dIs a coded index, M_d＝2^l，l∈[0，3]。

As a further improvement of the invention, the method also comprises the following steps:

if not, BD_l≤T1、BD_l' < T2 and BD_l+1≤BD_l、BD_l+1’≤BD_lThe condition of' continues the division until the division depth reaches l to 3, and terminates.

As a further improvement of the invention, the values of T1 and T2 are 8-15%.

if l is 0 and BD_l、BD_l' < T3, l 0 and T3 < BD_l、BD_l' < T4, l ≦ 1 and BD_l、BD_l’≤T4，BD_l、BD_lIf the condition of' T5 is not satisfied, all the interframe prediction models are traversed, and the model with the minimum rate distortion cost is selected as the optimal interframe prediction model.

As a further improvement of the invention, the value of T3 is 1-3%, the value of T4 is 5-15%, and the value of T5 is more than 30%.

The invention also discloses a system for video coding based on HEVC/H.265, which comprises the following steps:

the creating module is used for acquiring video image data and establishing a virtual background frame;

a calculating module for calculating the inter-frame difference BD of the I-th frame depth of the CU at the same position as the adjacent frame of the CU_lAnd a background difference BD of the CU module at the same position as the virtual background frame_l' and BD of 4 sub-blocks CU divided by the CU_l+1、BD_l+1’；

A division judgment module for:

when BD_l≤T1、BD_l' < T2 and BD_l+1≤BD_l、BD_l+1’≤BD_lWhen the current depth l is used as the optimal depth, the subsequent inter-frame prediction is carried out;

when not satisfying BD_l≤T1、BD_l' < T2 and BD_l+1≤BD_l、BD_l+1’≤BD_lWhen the condition is' continue strokeDividing until the division depth reaches l to 3, and terminating;

an inter-frame prediction judgment module to:

when l is 0 and BD_l、BD_lSelecting a SKIP model as an optimal inter-frame prediction model when the number is not more than T3;

when l is 0 and T3 < BD_l、BD_l' ≦ T4, or l ≦ 1 and BD_l、BD_lWhen' isless than or equal to T4, traversing the 2 Nx 2N interframe prediction model, and selecting the model with the minimum rate-distortion cost as the optimal interframe prediction model;

when BD_l、BD_lWhen the prediction rate is' equal to or more than T5, traversing the inter-frame prediction models except the AMP prediction model, and selecting the model with the minimum rate-distortion cost as the optimal inter-frame prediction model; wherein T3 < T4 < T5;

when l is 0 and BD_l、BD_l' < T3, l 0 and T3 < BD_l、BD_l' < T4, l ≦ 1 and BD_l、BD_l’≤T4，BD_l、BD_lAnd when the condition of' being equal to or more than T5 is not met, traversing all the inter-frame prediction models, and selecting the model with the minimum rate-distortion cost as the optimal inter-frame prediction model.

As a further improvement of the present invention, in the calculation module,

the inter-frame difference BD_lThe calculation formula of (2) is as follows:

the inter-frame difference BD_lThe formula for calculation of' is:

in the formula (f)_i(x, y) represents the pixel value of the coordinate (x, y) in the CU of the ith frame in the sequence video, f_i-1(x, y) represents the pixel value of the coordinate (x, y) of the same-position CU in the adjacent previous frame, f_B(x, y) represents the coordinates in the co-located CU in the background frame as(x, y) pixel value, N is the side length of CU block, M_dIs a coded index, M_d＝2^l，l∈[0，3]。

As a further improvement of the invention, the values of T1 and T2 are 8-15%, the value of T3 is 1-3%, the value of T4 is 5-15%, and the value of T5 is more than 30%.

Compared with the prior art, the invention has the beneficial effects that:

in order to reduce the calculation amount of the division between frames, the invention utilizes the particularity of the monitoring video, and determines whether the CU continues to divide by calculating the difference between the coding frame and the adjacent frame and the background frame, thereby determining the coding depth value l of the CU and reducing the complexity of the quad-tree division;

in order to reduce a large amount of calculation amount caused by traversal circulation in the process of selecting the prediction mode, the invention reasonably distributes the motion region to which the coding frame belongs by combining the difference values of the coding frame under the current depth and the adjacent frame and the background frame thereof, skips some unnecessary inter-frame prediction models, and further selects the optimal inter-frame prediction model through rate distortion calculation, thereby reducing the calculation amount caused by the rate distortion cost calculation of all models.

Drawings

FIG. 1 is a flow chart of a conventional video encoding;

FIG. 2 is a block diagram of an inter-frame partition quadtree;

FIG. 3 is a PU inter prediction model;

fig. 4 is a flowchart of a method for HEVC/h.265-based video coding according to an embodiment of the present invention;

fig. 5 is a block diagram of a system for HEVC/h.265-based video coding according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

The invention is described in further detail below with reference to the attached drawing figures:

in order to improve the coding performance of the encoder and better support high-resolution video coding, the HEVC video coding standard adopts a more flexible coding structure, and a series of new technologies are added. In the new technologies, links such as recursive quadtree division and prediction mode rate-distortion cost minimization selection greatly improve the computational complexity of the encoder, and seriously hinder the popularization and application of the HEVC video coding standard. In order to improve the video coding compression effect in the quadtree division link, the minimum size of a coding unit is further subdivided into 8 × 8 from the original 16 × 16 of H.264, and the complexity of the whole inter-frame coding is increased. In addition, the optimal inter-frame prediction mode selection process needs to go through a motion estimation and transformation unit traversal sub-loop, the process goes through the transformation unit and calculates the rate distortion cost for each search point during motion vector search, the mode with the minimum rate distortion cost is selected as the optimal prediction model, and the calculation amount of the inter-frame coding process is greatly increased through the traversal loop calculation process. Therefore, further optimizing the inter-frame quadtree partitioning process and inter-frame prediction model selection by algorithms becomes a key to reduce the complexity of video coding.

In order to reduce the calculation amount of the division between frames, the invention utilizes the particularity of the monitoring video (the motion characteristics of adjacent frames of a video image and the difference between the adjacent frames and a background frame in background modeling) to determine whether the CU continues to divide by calculating the difference between the coding frame and the adjacent frames and the background frame, thereby determining the coding depth value l of the CU, wherein the l belongs to [0, 3], and reducing the complexity of the quad-tree division.

Specifically, the method comprises the following steps:

as shown in fig. 4, the present invention provides a method for video coding based on HEVC/h.265, comprising:

step 1, acquiring video image data and establishing a virtual background frame;

specifically, the method comprises the following steps:

the monitoring video image data has the particularity that: the background is relatively fixed, the scene change is small, the motion content is relatively small, and the like; the invention utilizes the particularity to establish a virtual background frame; the method for establishing the virtual background frame specifically comprises the following steps: taking the previous H frame of the original video as a sample, and performing data statistics on each pixel point of each frame, wherein the value range of the pixel points is 0-255; and selecting the median of the pixel values of the pixels at the same position of each frame as the pixel value of the virtual background frame, thereby establishing the virtual background frame.

And step 2, calculating the inter-frame difference BD of a CU module at the same position as the adjacent frame for the CU with the ith frame depth l as 0_lAnd the background difference BD of the CU module at the same position as the virtual background frame_l’；

Specifically, the method comprises the following steps:

inter-frame difference BD_lThe calculation formula of (2) is as follows:

inter-frame difference BD_lThe formula for calculation of' is:

in the formula (f)_i(x, y) represents the pixel value of the coordinate (x, y) in the CU of the ith frame in the sequence video, f_i-1(x, y) represents the pixel value of the coordinate (x, y) of the same-position CU in the adjacent previous frame, f_B(x, y) represents coordinates (x, y) in co-located CU in background framePixel value, N is the side length of CU block, M_dIs a coding index, related to the coded depth l, M_d＝2^l，l∈[0，3]；BD_l、BD_l' reflecting the difference between the data values in the residual, BD_l、BD_l' the smaller the value, the smaller the co-located CU motion and texture variations between neighboring frames are represented; in general, the lower the degree of difference between a background frame and an adjacent frame, the BD_l、BD_lThe smaller the' value, the more uniform the distribution of the representative residuals, and the less likely the partitioning will continue down.

And 3, calculating BD of 4 sub-block CUs divided by the CU for the CU with the ith frame depth of l_l+1、BD_l+1’；

Specifically, the method comprises the following steps:

BD_l+1、BD_l+1' the calculation formula is the same as that of BD_l、BD_l' in agreement.

Step 4, based on the BD obtained by calculation_l、BD_l’、BD_l+1、BD_l+1', making CU partition continuation/termination judgment; the method specifically comprises the following steps:

if BD_l≤T1、BD_l' < T2 and BD_l+1≤BD_l、BD_l+1’≤BD_lIf yes, stopping dividing, wherein the current depth l is the optimal depth, and continuously performing inter-frame prediction and inter-frame coding rhythm downwards by adopting the current CU;

if not, BD_l≤T1、BD_l' < T2 and BD_l+1≤BD_l、BD_l+1’≤BD_lIf' is the condition, the division is continued until the division depth reaches l ═ 3, and the division is terminated; that is, if the above condition is not satisfied, it is further determined whether l is 3, and if so, the division depth reaches l is 3, and the process is terminated; if not, dividing the depth l +1, and continuing to judge;

furthermore, the values of T1 and T2 are 8-15%, and can be adjusted according to different video conditions and QP values.

After the CU inter-frame division model is determined, all models of inter-frame prediction need to be traversed, 11 PU modes corresponding to each CU exist, and if the traversal range of the PU modes corresponding to the current CU can be reduced, the selection process of the inter-frame modes can be accelerated; for this purpose, the invention continues to include, on the basis of step 4:

step 5, based on the division depth l, BD_l、BD_l' selecting an optimal inter prediction model specifically comprises:

if l is 0 and BD_l、BD_l' T3 is not more than T, which shows that the difference between the current LCU and the background frame and the adjacent frame is basically consistent, and the current LCU can directly jump out for continuous division, and then the SKIP model is selected as the optimal inter-frame prediction model;

if l is 0 and T3 < BD_l、BD_l' < T4, or l ≦ 1 and BD_l、BD_l' < equal to T4; the overall motion of the video frame is relatively smooth; traversing 2 Nx 2N interframe prediction models (SKIP, Merge, inter2N x 2N, intra 2N x 2N) and selecting the model with the minimum rate-distortion cost as the optimal interframe prediction model;

if BD_l、BD_l' ≧ T5, which indicates that the overall video motion is relatively complex, traversing the residual inter-frame prediction model except the AMP prediction model, and selecting the model with the minimum rate-distortion cost as the optimal inter-frame prediction model;

if l is 0 and BD_l、BD_l' < T3, l 0 and T3 < BD_l、BD_l' < T4, l ≦ 1 and BD_l、BD_l’≤T4，BD_l、BD_lIf the condition of' T5 is not satisfied, traversing all the inter-frame prediction models, and selecting the model with the minimum rate distortion cost as the optimal inter-frame prediction model;

furthermore, T3 is more than T4 and less than T5, the value of T3 is 1-3%, the value of T4 is 5-15%, and the value of T5 is more than 30%.

As shown in fig. 5, the present invention provides a system for video coding based on HEVC/h.265, comprising:

a creating module for implementing the step 1;

a calculating module for realizing the steps 2 and 3;

a dividing and judging module for realizing the step 4;

and an inter-frame prediction judgment module for implementing the step 5.

The invention has the advantages that:

the method utilizes the characteristic that the video background scene of the monitoring video is relatively fixed, and calculates the difference degree BD between the current CU and the 4 sub-CU modules_lAnd background frame difference degree value BD_l' instead of the traditional traversal loop, determine the trend of downward partitioning. The algorithm can effectively reduce the computational complexity caused by interframe division.

According to the method, the inter-frame prediction model proportion under the inter-frame difference degree and the background difference degree values at different depths is obtained through a large amount of data speculation, the inter-frame prediction model is deduced, unnecessary inter-frame prediction models are skipped, the calculated amount caused by traversing calculation rate distortion calculation selection is reduced, and the inter-frame coding complexity is reduced. Example (b):

the experimental environment adopted in the test is Windows Server 2008R2, the processor is Intel E5-2620 CPU @2.1GHz, the processor is a single processor, the core is 8, and the RAM is 32 GB.

By comparing three types of video road traffic monitoring videos, community monitoring videos and office monitoring videos as experimental contents, video compression software Hnew after an algorithm is designed is compared with HM12.0 in an H.265 sequence, and the performance of the algorithm is determined by PSNR (peak signal-to-noise ratio, a parameter for objectively evaluating video coding quality) and a time variation ratio delta T (T ═ T_HM-T_new)/T_HMX 100%) are compared.

The new algorithm sets relevant parameters to be T1-12%, T2-12%, T3-2%, T4-5%, and T5-30%, respectively, and performs calculation comparison. The results are shown in table 1:

TABLE 1

As can be seen from table 1, in the case that the video quality is almost unchanged (PSNR is decreased by only 0.06dB on average), the video coding time is greatly saved (average value Δ T is 30.5%), which indicates that the algorithm encodes the monitoring video under the conditions of increasing the CU partition rate and selecting the PU prediction model, and the video coding time consumption can be reduced under the condition of effectively ensuring the coding quality.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for video coding based on HEVC/H.265, comprising:

acquiring video image data and establishing a virtual background frame;

in the inter prediction process:

2. The method of claim 1, wherein the method for creating the virtual background frame comprises:

3. The method of claim 1,

the inter-frame difference BD_lThe calculation formula of (2) is as follows:

the inter-frame difference BD_lThe formula for calculation of' is:

4. The method of claim 1, further comprising:

5. The method of claim 1 or 4, wherein T1 and T2 are 8-15%.

6. The method of claim 1, further comprising:

7. The method of claim 1 or 6, wherein the value of T3 is 1-3%, the value of T4 is 5-15%, and the value of T5 is more than 30%.

8. A system for HEVC/h.265 based video coding, comprising:

A division judgment module for:

when not full ofFoot BD_l≤T1、BD_l' < T2 and BD_l+1≤BD_l、BD_l+1’≤BD_lWhen the condition is' satisfied, the division is continued until the division depth reaches l ═ 3, and the division is terminated;

an inter-frame prediction judgment module to:

9. The system of claim 8, wherein in the computing module,

the inter-frame difference BD_lThe calculation formula of (2) is as follows:

the inter-frame difference BD_lThe formula for calculation of' is:

in the formula (I), the compound is shown in the specification,f_i(x, y) represents the pixel value of the coordinate (x, y) in the CU of the ith frame in the sequence video, f_i-1(x, y) represents the pixel value of the coordinate (x, y) of the same-position CU in the adjacent previous frame, f_B(x, y) represents the pixel value with coordinates (x, y) in the same position CU in the background frame, N is the side length of CU block, and M is the length of CU block_dIs a coded index, M_d＝2^l，l∈[0，3]。

10. The system of claim 8, wherein the values of T1 and T2 are 8% to 15%, T3 is 1% to 3%, T4 is 5% to 15%, and T5 is more than 30%.