CN104602019A

CN104602019A - Video coding method and device

Info

Publication number: CN104602019A
Application number: CN201410855788.4A
Authority: CN
Inventors: 蔡砚刚; 魏伟; 白茂生; 刘阳
Original assignee: LeTV Information Technology Beijing Co Ltd
Current assignee: LeTV Information Technology Beijing Co Ltd
Priority date: 2014-12-31
Filing date: 2014-12-31
Publication date: 2015-05-06

Abstract

The invention discloses a video coding method and a video coding device. The video coding method comprises the following steps: performing downsampling on a to-be-coded initial resolution video image to generate a downsampling video image with one or more resolution stages; performing reference frame decision on the downsampling video image of each resolution stage to obtain the optimal reference frame of each image frame in the downsampling video image of each resolution stage; while performing reference frame decision on the initial resolution video image, adding a downsampling reference frame set of the existing PU and a reference frame in an existing reference list into a reference frame candidate set of the existing PU, and selecting the optimal reference frame of the existing PU from the reference frame candidate set. According to the method and the device disclosed by the invention, 50% reference frame decision time can be saved on an existing x265 coding framework.

Description

A kind of method for video coding and device

Technical field

The application relates to field of video encoding, particularly relates to a kind of method for video coding and device.

Background technology

When carrying out inter prediction encoding to video image, because the scenery in live image contiguous frames also exists certain correlation, therefore live image can be divided into some pieces or macro block, and manage to search out each piece or the position of macro block in contiguous frames image, and draw the relative displacement of locus between the two, the relative displacement obtained is exactly the motion vector of usual indication, and the process obtaining motion vector is called as estimation.

Motion vector and the predicated error that obtains after motion match are sent to decoding end jointly, in the position that decoding end indicates according to motion vector, from the contiguous reference frame image of having decoded, find corresponding block or macro block, and just obtain block or macro block position in the current frame after predicated error addition.

The interframe redundancy of video image can be removed by estimation, the bit number of transmission of video is greatly reduced.Therefore, estimation is an important component part in video coding system.General method for estimating is as follows: set the two field picture of t as present frame f (x, y), the two field picture of moment t ' is reference frame f ' (x, y), reference frame in advance or can lag behind present frame in time, as shown in Figure 1, as t ' <t, be referred to as forward motion estimation, as t ' >t, be referred to as backward motion estimation.When searching the optimum Match of the block in present frame t in reference frame t ', corresponding sports ground d (x can be obtained; T, t ± △ t), the motion vector of present frame can be obtained.

In order to improve precision of prediction, H.264 the reference picture (also reference frame can be called) of one or two image mated most with present frame as interframe encode can be selected in from before one group or below encoded image, this greatly increases making the complexity of inter prediction, but the result repeatedly compared, makes the precision of prediction after coupling significantly improve.H.264 can select at most best matching image in from 15 reference pictures, and be divided into two reference picture lists (also can be described as reference frame lists) with reference to image, be denoted as respectively: List0 and List1.

Prediction for inter-coded macroblocks and macroblock partition in P sheet can select reference picture from reference frame lists " List0 "; For the prediction of the inter-coded macroblocks in B sheet and macroblock partition, reference picture can be selected from reference frame lists " List0 " and " List1 ".

In List0: will based on POC (Picture Order Count, image sequence number) nearest forward direction image be designated as index0, then be all the other forward direction images (with POC incremental order), and backward image (with POC incremental order).

In List1: nearest backward image is designated as index0, be then all the other backward images (with POC incremental order), and forward direction image (with POC incremental order).

The prediction mode of P, B sheet as shown in Figure 2.

Such as, as shown in table 1, one H.264 decoder store 6 width short-term reference picture, its POC is respectively: 123,125,126,128,129 and 130.The POC of present image is 127.All 6 width short-term reference picture are all designated as " with for referencial use " in list0 and list1.

Index	List0	List1
			0	126	128
1	125	129
			2	123	130
3	128	126
			4	129	125
5	130	123

Table 1 short-term buffering index

HEVC (High Efficiency Video Coding, efficient video coding) inherit multi-reference frame scheme H.264 equally, select before one group or below the reference picture of one or two image mated most with present frame as interframe encode encoded image, this greatly increases making the complexity of inter prediction.

In recent years, along with the development of the Internet and hardware device, the cost that people make video is more and more low, to video share and requirements for access strengthens day by day.The resolution of video is more and more large, and HD video (resolution is 1920 × 1080, is denoted as 1080p) even ultra high-definition video (resolution is 3840 × 2160, is denoted as 4K) is made more and more, propagates and plays.This trend will accelerate the application of the field of video encoding of HEVC.Because HEVC adopts the mode of recurrence to divide encoding block, if continue to continue to use the reference frame decision scheme of H.264 high complexity, the exponential decline of coding rate can be brought.

Summary of the invention

Technical problems to be solved in this application are, overcome the deficiencies in the prior art, provide a kind of method for video coding and the device that can improve HEVC Video coding speed.

In order to solve the problem, the application provides a kind of method for video coding, and the method comprises:

Down-sampling is carried out to initial resolution video image to be encoded, generates the down-sampling video image of one or more stage resolution ratio;

For the down-sampling video image of each stage resolution ratio carries out reference frame decision-making respectively, obtain the optimum reference frame of the PU of each picture frame in the down-sampled images of each stage resolution ratio;

When reference frame decision-making is carried out to initial resolution video image, the reference frame jointly comprised in the down-sampling reference frame set of current PU and current reference frame list is included in the reference frame candidate collection of current PU, and from reference frame candidate collection, selects the optimum reference frame of current PU;

Wherein, the down-sampling reference frame set of described current PU comprises: the optimum reference frame of the PU in the down-sampled images of each stage resolution ratio that current PU is corresponding;

The PU of the down-sampled images of each stage resolution ratio that described current PU is corresponding comprises: in each down-sampling video image, the PU overlapping with the image-region shared by current PU or partially overlap.

In addition, when reference frame decision-making is carried out to initial resolution video image, also the reference frame of the adjacent block of current PU is included in the reference frame candidate collection of described current PU.

In addition, when down-sampling video image for each stage resolution ratio carries out reference frame decision-making respectively, take absolute length chang as the coding cost value of the match block of criterion calculation PU in each candidate reference frame, and candidate reference frame corresponding to the match block selecting coding cost value minimum is as the optimum reference frame of corresponding PU.

In addition, for initial resolution video image carry out reference frame decision-making time, take rate distortion costs as the coding cost value of the match block of criterion calculation PU in reference frame candidate collection in each candidate reference frame, and the candidate reference frame corresponding to match block selecting coding cost value minimum with λ × R (ref) sum is as the optimum reference frame of corresponding PU;

Wherein, λ is Lagrangian constant, and R (ref) represents the bit number that reference frame number ref takies.

In addition, the initial resolution video image carrying out described down-sampling is the primitive frame of video image.

The present invention also provides a kind of method for video coding, and the method comprises:

For video image to be encoded carry out reference frame decision-making time, by the reference frame of the adjacent block of current PU alternatively reference frame, and therefrom select the optimum reference frame of current PU.

In addition, for video image to be encoded carry out reference frame decision-making time, take rate distortion costs as the coding cost value of the match block of the current PU of criterion calculation in each candidate reference frame, and select coding cost value and the minimum candidate reference frame corresponding to match block of λ × R (ref) sum as the optimum reference frame of current PU;

Wherein, λ is Lagrangian constant, and R (ref) represents the bit number that reference frame number takies.

The present invention also provides a kind of video coding apparatus, comprises in this device: downsampling unit, predicting unit; Wherein:

Described downsampling unit, for carrying out down-sampling to initial resolution video image to be encoded, generates the down-sampling video image of one or more stage resolution ratio;

Described predicting unit, carries out reference frame decision-making respectively for the down-sampling video image for each stage resolution ratio, obtains the optimum reference frame of the PU of each picture frame in the down-sampled images of each stage resolution ratio;

Described predicting unit, also for when carrying out reference frame decision-making to initial resolution video image, the reference frame jointly comprised in the down-sampling reference frame set of current PU and current reference frame list is included in the reference frame candidate collection of current PU, and from reference frame candidate collection, selects the optimum reference frame of current PU;

In addition, the reference frame of the adjacent block of current PU, when carrying out reference frame decision-making to initial resolution video image, is also included in the reference frame candidate collection of described current PU by described predicting unit.

The present invention also provides a kind of video coding apparatus, comprises: predicting unit in this device; Wherein:

Described predicting unit, for when carrying out reference frame decision-making for video image to be encoded, by the reference frame of the adjacent block of current PU alternatively reference frame, and therefrom selects the optimum reference frame of current PU.

In sum, the present invention proposes a kind of quick multi-reference frame decision scheme that significantly can reduce operand, when the program is applied to video coding system, the candidate reference frame of present encoding block is obtained by the coded data of down-sampling frame, under the prerequisite of not sacrificing inter prediction effect, decrease the quantity of candidate reference frame, improve the speed of reference frame decision-making; In addition, the present invention can also obtain the candidate reference frame of present encoding block by the neighboring prediction block of present encoding block, further increase the prediction accuracy of reference frame decision-making.Adopt method of the present invention and device, the reference frame decision-making time of 50% can be saved on existing x265 coding framework.

Accompanying drawing explanation

Fig. 1 is forward and backward estimation schematic diagram in prior art;

Fig. 2 is the prediction mode schematic diagram of H.264 P sheet and B sheet in agreement;

Fig. 3 is the method for video coding flow chart of the present invention first embodiment of the method;

Fig. 4 is the schematic diagram adopting the mode of decimation to carry out down-sampling;

Fig. 5 is the spatial domain position candidate schematic diagram of the movable information of HEVC;

Fig. 6 is the method for video coding flow chart of the present invention second embodiment of the method;

Fig. 7 is the structural representation of video coding apparatus first embodiment of the present invention;

Fig. 8 is the structural representation of video coding apparatus second embodiment of the present invention.

Embodiment

The present invention mainly comprises two aspects: the decision-making of level reference frame, the decision-making of adjacent block reference frame.

The decision-making of above-mentioned level reference frame refers to, carries out down-sampling to initial resolution video image to be encoded, generates the down-sampling video image of one or more stage resolution ratios; For the PU (Prediction Unit, predicting unit) in the down-sampled images of each stage resolution ratio carries out reference frame decision-making respectively, obtain the reference frame of the PU of each picture frame in the down-sampled images of each stage resolution ratio; When carrying out reference frame decision-making to initial resolution video image, using the set of optimum reference frame of PU in the down-sampled images of each stage resolution ratio that overlaps with the image-region corresponding to current PU or partially overlap in the down-sampled images of each stage resolution ratio and the common factor of the reference frame lists (List0 and List1) of current PU as with reference to frame candidate collection, carry out the decision-making of reference frame.

The decision-making of above-mentioned adjacent block reference frame refers to, when carrying out reference frame decision-making to video image, being reference frame candidate collection, carrying out the decision-making of reference frame by the set of reference frames cooperation of the adjacent block of current PU.

Below in conjunction with accompanying drawing, embodiments of the present invention is described in detail.

first embodiment of the method

Fig. 3 is the method for video coding flow chart of the present invention first embodiment of the method; As shown in Figure 3, the method comprises:

Step 301: to initial resolution video image (L to be encoded ₀level video image) sequence carries out down-sampling, generates the down-sampling video image of each stage resolution ratio: L ₁..., L _dlevel video image; D be more than or equal to 1 integer;

In this step, different down-sampling ratio (i.e. the ratio of source resolution and target resolution) can be adopted to carry out down-sampling.When down-sampling ratio is Ds:1, represent every Ds pixel sampling 1 pixel, Ds be greater than 1 integer.

In the present embodiment, in order to retain the primary characteristic (such as, the textural characteristics of video) of video better, making Ds=2, namely adopting the mode of decimation to carry out down-sampling, as shown in Figure 4.

In addition, in order to accelerate the speed of computing, only down-sampling process being done to primitive frame in the present embodiment, reconstructed frame is not dealt with.

Step 302: to down-sampling video image (L at different levels ₁~ L _dlevel video image) carry out reference frame decision-making, obtain the reference frame of each PU (can be described as down-sampled images PU) in down-sampling video image at different levels;

Specifically, for L _deach PU in (1≤d≤D) level video image, can carry out reference frame decision-making in the following way:

L is obtained by estimation _dmatch block in current PU in level video image each reference frame in reference frame lists, and calculate the coding cost value (Cost) of each match block, the reference frame corresponding to match block selecting coding cost value minimum is as the optimum reference frame of this PU;

If the reference frame in current reference frame list comprises: ref ₀, ref ₁, ref ₂... .ref _n;

The coding cost value of the match block in each reference frame is respectively: Cost ₀, Cost ₁, Cost ₂... .Cost _n;

Then L _din level video image, the reference frame of current PU is:

ref = \underset{ref = 0,1,2 . . . n}{ArgMin} ({Cost}_{ref});

(formula 1)

That is, the reference frame corresponding to match block (blocks and optimal matching blocks) minimum for coding cost value is defined as optimum reference frame.

In the present embodiment, for L _d(1≤d≤D) level video image, with the coding cost value (Cost) of absolute length chang (SAD) for the above-mentioned each match block of criterion calculation:

SAD (s, c (MV)) = Σ_{x = 1, y = 1}^{Bx, By} | s [x, y] - c [x - {MV}_{x}, y - {MV}_{y}] |;

(formula 2)

Wherein, s is current initial data of carrying out encoding, and c is the data of the reference frame for carrying out motion compensation of coding and rebuilding; Bx, By are the wide high information of current prediction block; MV is the motion vector of candidate; MVx is the x coordinate of motion vector; MVy is the y coordinate of motion vector.

That is, for L _deach PU in (1≤d≤D) level video image, adopts absolute length chang (SAD) minimum principle to select the blocks and optimal matching blocks of current PU, and then selects the optimum reference frame of current PU.

Step 303: to initial resolution video image (L ₀level video image) in each PU carry out reference frame decision-making before, first determine the corresponding PU of each PU in each down-sampling video image, and the reference frame of each corresponding PU;

Specifically, by L ₀current PU in level video image is denoted as PU ₀, the corresponding PU of PU0 in each down-sampling video image comprises: in down-sampling video image, with area coincidence shared by PU0 or the down-sampling video image PU that partially overlaps.

PU ₀multiple corresponding PU can be comprised in each down-sampling video image, be denoted as down-sampling PU and gather P _b:

P _b={ PU _b1..., PU _bn; Set P _bin each PU _bican from identical or different down-sampling stage resolution ratio, 1≤i≤n, n be more than or equal to 1 integer.

Set P _bmiddle different PU _bican corresponding identical or different reference frame, can will gather P _bcorresponding reference frame set is denoted as down-sampling reference frame set R _b:

R _b={ R _b1..., R _bm, m be more than or equal to 1 and be less than or equal to 15 integer.

Step 304: by initial resolution video image (L ₀level video image) in the down-sampling reference frame set of each PU include in the reference frame candidate collection of corresponding PU with the reference frame jointly comprised in reference frame lists;

Specifically, by L ₀current PU in level video image is denoted as PU ₀, PU ₀down-sampling set of reference frames be combined into R _b={ R _b1..., R _bm; Reference frame lists List0={ref ₀₁... ref _0k, reference frame lists List1={ref ₁₁... ref _1j, k and j is positive integer, and k+j is less than or equal to 15; Then, PU ₀reference frame candidate collection C ₀:

C ₀=R _b∩ (List0 ∪ List1); Wherein, ∩ represents common factor, and ∪ represents union.

That is both R will be included in _bin be included in again reference frame in List0, and be both included in R _bin the reference frame be included in again in List1 include PU in ₀reference frame candidate collection C ₀in.

During specific implementation, also can first all be labeled as need not search for reference to each reference frame in frame list List0 and List1, then use R _bin the reference frame that comprises in the reference frame that comprises and reference frame lists List0 and List1 contrast, if the reference frame comprised in reference frame lists List0 and List1 is included in R simultaneously _bin, then the flag update with reference to reference frame corresponding in frame list is needs search.

Step 305: to initial resolution video image (L ₀level video image) in each PU, use the reference frame of its adjacent block to upgrade the reference frame candidate collection of this PU, the reference frame by each PU adjacent block also includes the reference frame candidate collection of corresponding PU in;

Fig. 5 is the spatial domain position candidate schematic diagram of the movable information of HEVC.

{ a as shown in Figure 5 can be adopted in the present embodiment ₀, a ₁, b ₀, b ₁, b ₂as the adjacent block of PU, and the reference frame of each adjacent block is included in the reference frame candidate collection of corresponding PU.

Specifically, by L ₀current PU in level video image is denoted as PU ₀, PU ₀reference frame candidate collection C ₀=R _b∩ (List0 ∪ List1); By PU ₀the set corresponding to reference frame of adjacent block be denoted as adjacent block reference frame set R _n={ R _n1..., R _ns, s be less than or equal to 5 integer;

Then: C ₀={ R _b∩ (List0 ∪ List1) } ∪ R _n;

That is, on the basis of step 304, R will be included in further _nin reference frame include PU in ₀reference frame candidate collection C ₀in.

During specific implementation, the reference frame comprised in the reference frame of the adjacent block of PU and reference frame lists List0 and List1 can be used to contrast, if the reference frame comprised in reference frame lists List0 and List1 is the reference frame of the adjacent block of PU simultaneously, then the flag update with reference to reference frame corresponding in frame list is needs search.

The reference frame of adjacent block is included in reference frame candidate collection and can reduce the probability losing optimum reference frame.

This step is optional step.

Step 306: use initial resolution video image (L ₀level video image) in the reference frame candidate collection of each PU carry out reference frame decision-making, obtain the optimum reference frame of each PU;

In this step, for L ₀level video image adopts rate distortion costs value to calculate the coding cost value of each match block, namely selects blocks and optimal matching blocks with rate distortion costs minimum principle; And the coding cost value of each match block is selected optimum reference frame in conjunction with bit number shared by reference frame number.

Specifically, by L ₀current PU in level video image is denoted as PU ₀, PU ₀reference frame candidate collection C ₀={ ref ₀... ref _n, n be less than 15 integer;

PU is obtained by estimation ₀at reference frame candidate collection C ₀in each reference frame in match block, and calculate the coding cost value (Cost) of each match block: Cost ₀, Cost ₁, Cost ₂... .Cost _n;

Then PU ₀reference frame be:

ref = \underset{ref = 0,1,2 . . . n}{ArgMin} ({Cost}_{ref} + λ \times R (ref));

(formula 3)

Wherein, R (ref) represents the bit number that reference frame number takies; λ is Lagrangian constant.

That is, coding cost value and the minimum reference frame of λ × R (ref) sum are defined as optimum reference frame.

In the present embodiment, for L ₀level video image, with the coding cost value (Cost) of rate distortion costs value J for the above-mentioned each match block of criterion calculation:

J (MV, λ)=SAD (s, c (MV))+λ × R (MV-PMV); (formula 4)

Wherein, MV is the motion vector of candidate, and λ is Lagrangian constant, and PMV is median prediction vector, and R (MV-PMV) represents the bit number that motion vector differential coding may expend.

Step 307: use initial resolution video image (L ₀level video image) in the reference frame of each PU carry out interframe encode, and carry out follow-up Video coding process.

second embodiment of the method

Fig. 6 is the method for video coding flow chart of the present invention second embodiment of the method; As shown in Figure 6, the method comprises:

Step 601: treat each PU in encode video image, use the reference frame of its adjacent block to determine the reference frame candidate collection of this PU, the reference frame by each PU adjacent block includes the reference frame candidate collection of corresponding PU in;

During specific implementation, can first all be labeled as need not search for reference to each reference frame in frame list List0 and List1, then the reference frame comprised in the reference frame of the adjacent block of PU and reference frame lists List0 and List1 is used to contrast, if the reference frame comprised in reference frame lists List0 and List1 is the reference frame of the adjacent block of PU simultaneously, then the flag update with reference to reference frame corresponding in frame list is needs search.

Step 602: use the reference frame candidate collection of each PU in video image to be encoded to carry out reference frame decision-making, obtain the optimum reference frame of each PU;

In this step, adopt rate distortion costs value to calculate the coding cost value of each match block, namely select blocks and optimal matching blocks with rate distortion costs minimum principle; And the coding cost value of each match block is selected optimum reference frame in conjunction with bit number shared by reference frame number.

Specifically, the current PU in video image to be encoded is denoted as PU ₀, PU ₀reference frame candidate collection C ₀={ ref ₀... ref _n, n be less than 15 integer;

PU is obtained by estimation ₀at reference frame candidate collection C ₀in the match block of each reference frame, and calculate the coding cost value (Cost) of each match block: Cost ₀, Cost ₁, Cost ₂... .Cost _n;

Then PU ₀reference frame be:

ref = \underset{ref = 0,1,2 . . . n}{ArgMin} ({Cost}_{ref} + λ \times R (ref));

In the present embodiment, with the coding cost value (Cost) of rate distortion costs value J for the above-mentioned each match block of criterion calculation:

J(MV,λ)＝SAD(s,c(MV))+λ×R(MV-PMV)；

Step 603: use the reference frame of each PU in video image to be encoded to carry out interframe encode, and carry out follow-up Video coding process.

device first embodiment

Fig. 7 is the structural representation of video coding apparatus first embodiment of the present invention; As shown in Figure 7, comprise in this device: downsampling unit, predicting unit; Wherein:

Downsampling unit, for carrying out down-sampling to initial resolution video image to be encoded, generates the down-sampling video image of one or more stage resolution ratio;

Predicting unit, carries out reference frame decision-making respectively for the down-sampling video image for each stage resolution ratio, obtains the optimum reference frame of the PU of each picture frame in the down-sampled images of each stage resolution ratio;

Predicting unit, also for when carrying out reference frame decision-making to initial resolution video image, the reference frame jointly comprised in the down-sampling reference frame set of current PU and current reference frame list is included in the reference frame candidate collection of current PU, and from reference frame candidate collection, selects the optimum reference frame of current PU;

In addition, the reference frame of the adjacent block of current PU, when carrying out reference frame decision-making to initial resolution video image, is also included in the reference frame candidate collection of described current PU by predicting unit.

device second embodiment

Fig. 8 is the structural representation of video coding apparatus second embodiment of the present invention; As shown in Figure 8, comprise in this device: predicting unit; Wherein:

Predicting unit, for when carrying out reference frame decision-making for video image to be encoded, by the reference frame of the adjacent block of current PU alternatively reference frame, and therefrom selects the optimum reference frame of current PU.

Claims

1. a method for video coding, is characterized in that, the method comprises:

2. method according to claim 1, is characterized in that,

When reference frame decision-making is carried out to initial resolution video image, also the reference frame of the adjacent block of current PU is included in the reference frame candidate collection of described current PU.

3. method according to claim 1, is characterized in that,

When down-sampling video image for each stage resolution ratio carries out reference frame decision-making respectively, take absolute length chang as the coding cost value of the match block of criterion calculation PU in each candidate reference frame, and candidate reference frame corresponding to the match block selecting coding cost value minimum is as the optimum reference frame of corresponding PU.

4. method according to claim 1, is characterized in that,

For initial resolution video image carry out reference frame decision-making time, take rate distortion costs as the coding cost value of the match block of criterion calculation PU in reference frame candidate collection in each candidate reference frame, and the candidate reference frame corresponding to match block selecting coding cost value minimum with λ × R (ref) sum is as the optimum reference frame of corresponding PU;

5. method according to claim 1, is characterized in that,

The initial resolution video image carrying out described down-sampling is the primitive frame of video image.

6. a method for video coding, is characterized in that, the method comprises:

7. method according to claim 6, is characterized in that,

For video image to be encoded carry out reference frame decision-making time, take rate distortion costs as the coding cost value of the match block of the current PU of criterion calculation in each candidate reference frame, and select coding cost value and the minimum candidate reference frame corresponding to match block of λ × R (ref) sum as the optimum reference frame of current PU;

8. a video coding apparatus, is characterized in that, comprises in this device: downsampling unit and predicting unit; Wherein:

9. device according to claim 8, is characterized in that,

The reference frame of the adjacent block of current PU, when carrying out reference frame decision-making to initial resolution video image, is also included in the reference frame candidate collection of described current PU by described predicting unit.

10. a video coding apparatus, is characterized in that, comprises: predicting unit in this device; Wherein: