CN100452880C

CN100452880C - Integral discrete cosine transform method in use for encoding video

Info

Publication number: CN100452880C
Application number: CNB2006100121618A
Authority: CN
Inventors: 赵欣; 王宇; 李凤亭
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2006-06-09
Filing date: 2006-06-09
Publication date: 2009-01-14
Anticipated expiration: 2026-06-09
Also published as: CN1874510A

Abstract

The present invention provides an integral discrete cosine transform method for video encoding, which belongs to the technical field of video transmission. The method is that firstly, every element in an integral discrete cosine transform core is disassembled in an equivalent mode, so N matrixes are obtained. The N matrixes are grouped and added, so M auxiliary transform cores are obtained. According to the M auxiliary transform cores, M auxiliary transforms are calculated, and according to the low-to-high order of a suffix, M auxiliary transform results are combined. A transformed result DX<T> matrix of a first processing unit is used as a Y matrix of a second processing unit. The steps are repeated, so an integral discrete cosine transform coefficient is obtained. The method of the present invention has the advantages that by using the cutting operation of integral DCT transform, before the cutting operation is carried out, partial computation redundancy is removed, so the whole bit wide of an adder in PU is decreased, and hardware resources are saved.

Description

A kind of integral discrete cosine transform method that is used for video coding

Technical field

The present invention relates to a kind of integral discrete cosine transform method that is used for video coding, belong to the video transmission technologies field.

Background technology

In prior art, name is called " Development of integer cosine transforms by theprinciple of dyadic symmetry " (Proc.Inst.Elect.Eng., Partl, vol.136, Aug.1989, pp.276-282.) paper discloses the integral discrete cosine transform method that is used for video coding that can be used for video coding, and its principle is: the mathematical expression of discrete cosine transform is F=DXD ^T, D wherein, X is the matrix of N * N, D ^TBe the transposition of D, D is called the transformation kernel of this conversion.For integral discrete cosine transform, the element of D all is an integer, and the size of D and numerical value are not unique, such as the transformation kernel in the middle of the benchmark class that is applied in video encoding and decoding standard H.264 is

[\begin{matrix} 1 & 1 & 1 & 1 \\ 2 & 1 & - 1 & - 2 \\ 1 & - 1 & - 1 & 1 \\ 1 & - 2 & 2 & - 1 \end{matrix}],

And the transformation kernel in the middle of digital audio/video encoding and decoding standard (hereinafter to be referred as AVS) is

[\begin{matrix} 8 & 8 & 8 & 8 & 8 & 8 & 8 & 8 \\ 10 & 9 & 6 & 2 & - 2 & - 6 & - 9 & - 10 \\ 10 & 4 & - 4 & - 10 & - 10 & - 4 & 4 & 10 \\ 9 & - 2 & - 10 & - 6 & 6 & 10 & 2 & - 9 \\ 8 & - 8 & - 8 & 8 & 8 & - 8 & - 8 & 8 \\ 6 & - 10 & 2 & 9 & - 9 & - 2 & 10 & - 6 \\ 4 & - 10 & 10 & - 4 & - 4 & 10 & - 10 & 4 \\ 2 & - 6 & 9 & - 10 & 10 & - 9 & 6 & - 2 \end{matrix}] .

The method of traditional above-mentioned integral discrete cosine transform of quick realization, be to adopt a kind of butterfly computation structure of flow graph type to calculate 1 integral discrete cosine transform of tieing up (hereinafter to be referred as DCT), realize whole two-dimentional integer dct transform by the polyphone of the identical processing unit (hereinafter to be referred as PU) of two these spline structures then, be referred to as the method that ranks decompose, its FB(flow block) is described as F=DXD with mathematical expression as shown in Figure 1 ^T=D (DX ^T) ^T, the calculating that PU finished is exactly DX ^T, the integer dct transform of one dimension just, so whole conversion can be connected in series by two PU and realize.Therefore the core calculations of integral discrete cosine transform is exactly PU, the mathematical operation that PU finished comprises: matrix transpose and matrix multiple, and for first PU, can be by with the matrix input of input matrix after according to transposition, and, need the arithmetic element of a dedicated calculation transposition for second PU.

In above-mentioned traditional DCT algorithm, be example with the transformation kernel that AVS was adopted, based on the realization flow graph of butterfly structure as shown in Figure 2:

X0～x7 is X ^TA columns certificate, the columns that whole flow graph has been realized X according to and transformation kernel multiplied result X0 ~ X7,, respectively with X ^T8 columns according to calculating through top flow graph, can obtain the result of calculation DX of a PU ^TThis structure is beneficial to hardware and realizes that because multiply operation is converted into the multiplication of small integer, the multiplication of small integer can be realized with shifting function.

The shortcoming of existing traditional butterfly structure is, because the integer transform nuclear element all is an integer, and the discrete cosine transform of standard nuclear is orthogonal matrix, the absolute value of element is all less than 1, therefore take advantage of with the integer nuclear phase and have bigger data gain, need extra break-in operation in order to offset this gain, because the The data binary representation, break-in operation specifically refers to some position, the ground of data binary representation is directly given up, and such operation equivalence is for carrying out the division of certain power of 2 to data.The result that break-in operation brings is that low some positions of data binary representation are rejected, and as shown in Figure 1, all there is an operation that moves to right each PU back.First PU a bit that moves to right, second PU b bit that moves to right is an example with the transformation kernel that AVS was adopted, the numerical value of a and b satisfies a+b=9, because this integer transform nuclear is compared with the floating number transformation kernel of standard, gain is 9 powers of 2, and this numerical value is because of different transformation kernels and difference.Because right-shift operation causes low some the data of result of calculation to be rejected, therefore the calculating of every relevant these low-order bit all is redundant in the middle of the PU, the calculating of these redundancies causes the waste of hardware resource, but since traditional butterfly structure intrinsic characteristic, these redundancies can't extract separately well, and are unfavorable to effectively utilizing of hardware resource.

Summary of the invention

The objective of the invention is to propose a kind of integral discrete cosine transform method that is used for video coding,, in the middle of the computational process of PU, directly remove part and calculate redundancy, so that the economize on hardware resource at the shortcoming of prior art.

The integral discrete cosine transform method that is used for video coding that the present invention proposes may further comprise the steps:

(1) each element in the integral discrete cosine transform nuclear is carried out equivalence and split, obtain N matrix, D ₀, D ₁... D _N-1, the expression formula D=(D of then former integral discrete cosine transform nuclear D ₀+ D ₁+ ... + D _N-1);

(2) above-mentioned N matrix grouping addition obtained M sub-transformation kernel H ₀, H ₁, H _M-1, wherein each sub-transformation kernel be one group of n matrix and, 1＜n＜N;

(3), calculate M sub-conversion H according to above-mentioned M sub-transformation kernel ₀X ^T, H ₁X ^T, H ₂X ^TH _M-1X ^T, and according to subscript order from low to high M sub-transformation results is merged, obtain the transformation results of first processing unit, i.e. DX ^T=(H ₀X ^T+ H ₁X ^T+ ... + H _M-1X ^T) ^T, wherein X is a frame of video luminance block matrix, X ^TIt is the transposition of X;

(4) with the transformation results DX of above-mentioned first processing unit ^TMatrix is as the Y matrix of second processing unit, and repeating step (1)～(3) obtain DY ^T, then the integral discrete cosine transform coefficient is: F=DXD ^T=(H ₀+ H ₁+ ... H _M-1) X (H ₀+ H ₁+ ... + H _M-1) ^T=(H ₀+ H ₁+ ... + H _M-1) (H ₀X ^T+ H ₁X ^T+ ... + H _M-1X ^T) ^T

In the said method, the method for each element in the integral discrete cosine transform nuclear being carried out the equivalence fractionation may further comprise the steps:

(1) according to the binary representation of integer element in the transformation kernel, be split as a plurality of 2 power and;

(2), the fractionation item of i power of each element in the integral discrete cosine transform nuclear is combined into matrix D i, wherein 0≤i≤N-1 according to order from low order power to high order power.

In the said method, the method that M sub-conversion merged is:

(1), 2 the j (common factor of inferior power of 0≤j≤N-1) is proposed respectively to each the sub-conversion in M the sub-conversion;

(2) to H ₀X ^TAnd H ₁X ^TMerge, establish from H ₁X ^TThe middle common factor that proposes is 2 ^J1, if j1≤a then only merges the part that is higher than the j1 position in the matrix element binary representation, if j1＞a then only merges the part that is higher than a position in the matrix element binary representation, the result of merging is designated as X _Temp, wherein a is by the decision of pending integral discrete cosine transform nuclear;

(3) with above-mentioned amalgamation result X _TempWith H ₂X ^TMethod by above-mentioned steps (2) merges, and the result of merging is designated as X _Temp

(4) repeating step (2) and (3) merge all sub-conversion successively one by one, finish the merging of M sub-conversion.

The integral discrete cosine transform method that is used for video coding that the present invention proposes, its characteristics and advantage are, adopt the diverse method of a kind of and traditional fast algorithm, start with from integer transform nuclear, with element is that the transformation kernel of small integer is split as the experimental process transformation kernel, because each sub-transformation kernel can propose the common factor of one 2 power, so can utilize the break-in operation of integer dct transform, reduce the whole bit wide of the central adder of PU, save hardware resource.Compare with traditional butterfly structure, the part that the inventive method can be isolated separately in the middle of the integral discrete cosine transform is calculated redundant, getting rid of these before break-in operation is carried out calculates redundant, thereby reduced the whole bit wide of the adder of PU, saved hardware resource, compare with existing butterfly structure, the inventive method adopts programmable gate array (FPGA) or customizable integrated circuit (ASIC) to realize that hardware resource can be saved more than 10%.

Description of drawings

Fig. 1 is the FB(flow block) that the structure of existing employing ranks decomposition realizes integer DCT.

Fig. 2 is the flow graph structure chart of the DCT that adopted of traditional realization AVS.

Fig. 3 is the register file structure block diagram that the computing of existing realization transposition is adopted.

Fig. 4 adopts the inventive method to realize the part flow graph structure chart of DCT.

Fig. 5 is the part in the flow graph structure chart shown in Figure 2.

Fig. 6 is the hardware structure diagram that is used to realize DCT of the present invention.

Fig. 7 and Fig. 8 are the effect comparison diagrams of the adder resource that adopts of the inventive method and existing method.

Embodiment

The integral discrete cosine transform method that is used for video coding that the present invention proposes at first carries out equivalence with each element in the integral discrete cosine transform nuclear and splits, and obtains N matrix, D ₀, D ₁... D _N-1, the expression formula D=(D of then former integral discrete cosine transform nuclear D ₀+ D ₁+ ... + D _M-1); Above-mentioned N matrix grouping addition obtained M sub-transformation kernel H ₀, H ₁, H _M-1, wherein each sub-transformation kernel be one group of N matrix and; According to above-mentioned M sub-transformation kernel, calculate M sub-conversion H ₀X ^T, H ₁X ^T, H ₂X ^TH _M-1X ^T, and according to subscript order from low to high M sub-transformation results is merged, obtain the transformation results of first processing unit, i.e. DX ^T=(H ₀X ^T+ H ₁X ^T+ ... + H _M-1X ^T) ^T, wherein X is a frame of video luminance block matrix, X ^TIt is the transposition of X; Transformation results DX with above-mentioned first processing unit ^TMatrix repeats above-mentioned steps as the Y matrix of second processing unit, obtains DY ^T, then the integral discrete cosine transform coefficient is: F=DXD ^T=(H ₀+ H ₁+ ... H _M-1) X (H ₀+ H ₁+ ... + H _M-1) ^T=(H ₀+ H ₁+ ... + H _M-1) (H ₀X ^T+ H ₀X ^T+ ... + H _M-1X ^T) ^T

In the said method, each element in the integral discrete cosine transform nuclear is carried out the method that equivalence splits can be may further comprise the steps: (1) according to the binary representation of integer element in the transformation kernel, be split as a plurality of 2 power and; (2), the fractionation item of i power of each element in the integral discrete cosine transform nuclear is combined into matrix D i, wherein 0≤i≤N-1 according to order from low order power to high order power.

In the said method, the method that sub-conversion merges to M can for: (1) to each the sub-conversion in M the sub-conversion, proposes 2 the j (common factor of inferior power of 0≤j≤N-1) respectively; (2) to H ₀X ^TAnd H ₁X ^TMerge, establish from H ₁X ^TThe middle common factor that proposes is 2 ^J1, if j1≤a then only merges the part that is higher than the j1 position in the matrix element binary representation, if j1＞a then only merges the part that is higher than a position in the matrix element binary representation, the result of merging is designated as X _Temp, wherein a is by the decision of pending integral discrete cosine transform nuclear; (3) with above-mentioned amalgamation result X _TempWith H ₂X ^TMethod by above-mentioned steps (2) merges, and the result of merging is designated as X _Temp(4) repeating step (2) and (3) merge all sub-conversion successively one by one, finish the merging of M sub-conversion.

The invention belongs to image encoding and image processing field, be applied to the quick realization of integral discrete cosine transform especially.

Dct transform is an elementary cell with 8 one dimension dct transforms among the present invention, and the structure that adopts ranks to decompose realizes two-dimensional dct transform.The mathematical notation that ranks decompose is F=DXD ^T=D (DX ^T) ^T, promptly with DX ^TBe a PU, the function that PU finishes is, to matrix transpose that will input, and the premultiplication matrix D, the serial by two PU can realize two-dimensional dct transform by the one dimension dct transform, as shown in Figure 1.Though the content of the mathematical operation of two PU is identical, but their matrix transpose operation is different, the matrix transpose operation of first PU can be imported by importing according to the matrix behind the transposition, and second PU must have special transposition arithmetic element, and transposition arithmetic element of the present invention is to realize by as shown in Figure 3 register file in the prior art.

The present invention at be the integer dct transform, the transformation kernel of integer dct transform is to be made of integer, adder easy to use and shift unit come hardware to realize.

For any one integer transform nuclear, small integer transformation kernel particularly, the present invention will each element wherein carries out the fractionation of equivalence, and the method for fractionation is the binary representation according to this integer element, be split as certain power of some 2 addition and, can be split as 2 such as element 6 ¹+ 2 ²According to order from low order power to high order power, the fractionation item of i power of the element of transformation kernel is combined into matrix D i, then Di only contains 2 ⁱWith 0 element.Obtain N matrix D according to top operation ₀, D ₁... D _N-1, the addition of dividing into groups once more of this N matrix is obtained M new sub-transformation kernel H ₀, H ₁, H _M-1, wherein the element of each transformation kernel be combined into this group matrix correspondence position element and.

This M that obtains sub-transformation kernel H ₀, H ₁, H _M-1, be called sub-transformation kernel in the present invention.Then the result of whole conversion is the sum as a result of this M sub-conversion, promptly

F＝DXD ^T＝(H ₀+H ₁+…+H _M-1)X(H ₀+H ₁+…+H _M-1) ^T

＝(H ₀+H ₁+…+H _M-1)(H ₀X ^T+H ₁X ^T+…+H _M-1X ^T) ^T

Wherein for H ₀X ^T, H ₁X ^TH _M-1X ^T, the present invention is defined as sub-conversion.

One 2 the j (common factor of inferior power of 0≤j≤N-1) can be proposed for each sub-conversion, this common factor is proposed, can reduce the order of magnitude of transformation kernel element, make the element of sub-transformation kernel all in a little scope, can simplify the calculating of sub-conversion greatly like this, and can utilize break-in operation that the input bit wide of adder is done reducing to a certain degree.Reduce to import the following description of method of bit wide:

Suppose that break-in operation is for blocking the minimum a position of binary representation as a result.

The structure that adopts ranks to decompose realizes integer DCT algorithm proposed by the invention, needs the result is merged after the result of each sub-conversion calculates, and at first merges H ₀X ^TAnd H ₁X ^T, the result of merging continues and H again ₂X ^TMerge, repeat this process, finished by merging up to whole sub-conversion.

Because H ₁X ^T Common factor 2 can be proposed ^J1, in the corresponding element addition to two sub-transformation results, overflowing can not appear in the addition of the low j1 position of the binary representation of element so, if j1≤a, so could be directly with H ₀X ^TResult's low j1 position give up and do not carry out the phase add operation, if j1＞a, so can be directly with H ₀X ^TResult's low a position give up and do not carry out addition, the result of merging is designated as X _Temp

As above describe preceding two sub-conversion are merged, the result of merging is designated as X _Temp.Afterwards again with X _TempWith sub-conversion H ₂X ^TMerge sub-conversion H ₂X ^T Common factor 2 can be proposed ^J2, according to top same process, obtain amalgamation result, still be designated as X _Temp

Afterwards again with X _TempMerge successively with remaining sub-conversion, being finished by merging up to whole sub-conversion obtains final result for H ₀X ^T+ H ₁X ^T+ ... + H _M-1X ^T, i.e. DX ^T, be designated as the Y matrix.

The structure that adopts ranks to decompose is carried out direct transform, needs the result of calculation DX with top first PU so ^TBe input to once more on second identical PU of structure as the Y matrix, calculate DY ^T,, just can finish a complete integer dct transform through the calculating of second PU.

Shown in Figure 4, the part implementation structure of describing for the flow graph of method proposed by the invention, Figure 5 shows that the way of realization of traditional butterfly computation same section, both made comparisons as can be seen that the increase of bit wide is littler than traditional butterfly computation on the direction that algorithm data proposed by the invention flows.The flow graph type that Fig. 4 and Fig. 5 have only provided a part of conversion coefficient as a comparison, be because have than big-difference on algorithm proposed by the invention and the traditional butterfly computation structure, be not easy to formal description, otherwise can compare in a jumble, be unfavorable for illustrating the difference of algorithm with flow graph.

Below introduce an embodiment of the inventive method:

The integer transform nuclear that is adopted with the AVS video encoding standard is that example illustrates the quick integer DCT implementation algorithm based on the transformation kernel Bit Plane Decomposition proposed by the invention.

[\begin{matrix} 8 & 8 & 8 & 8 & 8 & 8 & 8 & 8 \\ 10 & 9 & 6 & 2 & - 2 & - 6 & - 9 & - 10 \\ 10 & 4 & - 4 & - 10 & - 10 & - 4 & 4 & 10 \\ 9 & - 2 & - 10 & - 6 & 6 & 10 & 2 & - 9 \\ 8 & - 8 & - 8 & 8 & 8 & - 8 & - 8 & 8 \\ 6 & - 10 & 2 & 9 & - 9 & - 2 & 10 & - 6 \\ 4 & - 10 & 10 & - 4 & - 4 & 10 & - 10 & 4 \\ 2 & - 6 & 9 & - 10 & 10 & - 9 & 6 & - 2 \end{matrix}] .

The element of this integer transform nuclear all is an absolute value less than 16 integer, binary representation all be zero more than the 4th, therefore (wherein element 6 and-6 is comparatively special this transformation kernel can be split as four matrixes, they can have two kinds of different method for splitting: 6=4+2=8-2 separately,-6=-4-2=-8+2, through the relatively discovery 6=8-2 of practice, this fractionation form of-6=-8+2 is comparatively favourable)

[\begin{matrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & - 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & - 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & - 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & - 1 & 0 & 0 \end{matrix}],

[\begin{matrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 2 & 0 & - 2 & 2 & - 2 & 2 & 0 & - 2 \\ 2 & 0 & 0 & - 2 & - 2 & 0 & 0 & 2 \\ 0 & - 2 & - 2 & 2 & - 2 & 2 & 2 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ - 2 & - 2 & 2 & 0 & 0 & - 2 & 2 & 2 \\ 0 & - 2 & 2 & 0 & 0 & 2 & - 2 & 0 \\ 2 & 2 & 0 & - 2 & 2 & 0 & - 2 & - 2 \end{matrix}]

[\begin{matrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 4 & - 4 & 0 & 0 & - 4 & 4 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 4 & 0 & 0 & - 4 & - 4 & 0 & 0 & 4 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{matrix}],

[\begin{matrix} 8 & 8 & 8 & 8 & 8 & 8 & 8 & 8 \\ 8 & 8 & 8 & 0 & 0 & - 8 & - 8 & - 8 \\ 8 & 0 & 0 & - 8 & - 8 & 0 & 0 & 8 \\ 8 & 0 & - 8 & - 8 & 8 & 8 & 0 & - 8 \\ 8 & - 8 & - 8 & 8 & 8 & - 8 & - 8 & 8 \\ 8 & - 8 & 0 & 8 & - 8 & 0 & 8 & - 8 \\ 0 & - 8 & 8 & 0 & 0 & 8 & - 8 & 0 \\ 0 & - 8 & 8 & - 8 & 8 & - 8 & 8 & 0 \end{matrix}]

Wherein, two matrixes can be merged into above

H_{0} = [\begin{matrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 2 & 1 & - 2 & 2 & - 2 & 2 & - 1 & - 2 \\ 2 & 0 & 0 & - 2 & - 2 & 0 & 0 & 2 \\ 1 & - 2 & - 2 & 2 & - 2 & 2 & 2 & - 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ - 2 & - 2 & 2 & 1 & - 1 & - 2 & 2 & 2 \\ 0 & - 2 & 2 & 0 & 0 & 2 & - 2 & 0 \\ 2 & 2 & 1 & - 2 & 2 & - 1 & - 2 & - 2 \end{matrix}],

Following two matrixes can be merged into

H_{1} = [\begin{matrix} 8 & 8 & 8 & 8 & 8 & 8 & 8 & 8 \\ 8 & 8 & 8 & 0 & 0 & - 8 & - 8 & - 8 \\ 8 & 4 & - 4 & - 8 & - 8 & - 4 & 4 & 8 \\ 8 & 0 & - 8 & - 8 & 8 & 8 & 0 & - 8 \\ 8 & - 8 & - 8 & 8 & 8 & - 8 & - 8 & 8 \\ 8 & - 8 & 0 & 8 & - 8 & 0 & 8 & - 8 \\ 4 & - 8 & 8 & - 4 & - 4 & 8 & - 8 & 4 \\ 0 & - 8 & 8 & - 8 & 8 & - 8 & 8 & 0 \end{matrix}]

Such PU is broken down into DX ^T=H ₀X ^T+ H ₁X ^T, H wherein ₁Can also write

H_{1} = 4 \times [\begin{matrix} 2 & 2 & 2 & 2 & 2 & 2 & 2 & 2 \\ 2 & 2 & 2 & 0 & 0 & - 2 & - 2 & - 2 \\ 2 & 1 & - 1 & - 2 & - 2 & - 1 & 1 & 2 \\ 2 & 0 & - 2 & - 2 & 2 & 2 & 0 & - 2 \\ 2 & - 2 & - 2 & 2 & 2 & - 2 & - 2 & 2 \\ 2 & - 2 & 0 & 2 & - 2 & 0 & 2 & - 2 \\ 1 & - 2 & 2 & - 1 & - 1 & 2 & - 2 & 1 \\ 0 & - 2 & 2 & - 2 & 2 & - 2 & 2 & 0 \end{matrix}],

Two sub-transformation kernels of such PU just only contain-2 ,-1,1,2 four integer, because close between the coefficient, the repeatability of computing will strengthen, and the reusing degree of adder can utilize fully.

According to the left-right symmetric of integer dct transform nuclear, can be with two sub-conversion H ₀X ^TAnd H ₁X ^TTransformation kernel be reduced to

H_{0} = [\begin{matrix} 0 & 0 & 0 & 0 \\ 2 & 1 & - 2 & 2 \\ 2 & 0 & 0 & - 2 \\ 1 & - 2 & - 2 & 2 \\ 0 & 0 & 0 & 0 \\ - 2 & - 2 & 2 & 1 \\ 0 & - 2 & 2 & 0 \\ 2 & 2 & 1 & - 2 \end{matrix}],

H_{1} = 4 \times [\begin{matrix} 2 & 2 & 2 & 2 \\ 2 & 2 & 2 & 0 \\ 2 & 1 & - 1 & - 2 \\ 2 & 0 & - 2 & - 2 \\ 2 & - 2 & - 2 & 2 \\ 2 & - 2 & 0 & 2 \\ 1 & - 2 & 2 & - 1 \\ 0 & - 2 & 2 & - 2 \end{matrix}]

The even number line of two sub-conversion input is identical so, and the odd-numbered line input is identical, therefore between the even number line of two sub-conversion and can realize also between the odd-numbered line that adder is multiplexing, with H ₀X ^TAnd H ₁X ^TOdd number behavior example, H ₀X ^TCan be decomposed into odd-numbered line conversion H _0eX _p ^TWith even number line conversion H _0oX _m ^T, H wherein _0eX _p ^TThe transformation kernel correspondence H ₀The odd-numbered line element, H _0oX _m ^TThe transformation kernel correspondence H ₀The even number line element, the result of two conversion corresponds to H ₀X ^TOdd-numbered line result and even number line result, for H ₁X ^TSame conclusion is also arranged.As top description, H _0eX _p ^TAnd H _1eX _p ^TTherefore because input is identical, can adder multiplexing.

H_{0 e} = [\begin{matrix} 0 & 0 & 0 & 0 \\ 2 & 0 & 0 & - 2 \\ 0 & 0 & 0 & 0 \\ 0 & - 2 & 2 & 0 \end{matrix}],

H_{1 e} = 4 \times [\begin{matrix} 2 & 2 & 2 & 2 \\ 2 & 1 & - 1 & - 2 \\ 2 & - 2 & - 2 & 2 \\ 1 & - 2 & 2 & - 1 \end{matrix}],

X_{p}^{T} = [\begin{matrix} X_{i 0} + X_{i 7} \\ X_{i 1} + X_{i 6} \\ X_{i 2} + X_{i 5} \\ X_{i 3} + X_{i 4} \end{matrix}],

Observe H _0eAnd H _1eCan find that the calculating redundancy of these two transformation kernels is very big, can calculate X earlier _p ^T[0] ± X _p ^T[3] and X _p ^T[1] ± X _p ^T[2], the combination plus-minus by these four result of calculations just can obtain H _0eX _p ^TAnd H _1eX _p ^TResult of calculation.

Equally for

H_{0 o} = [\begin{matrix} 2 & 1 & - 2 & 2 \\ 1 & - 2 & - 2 & 2 \\ - 2 & - 2 & 2 & 1 \\ 2 & 2 & 1 & - 2 \end{matrix}],

H_{1 o} = 4 \times [\begin{matrix} 2 & 2 & 2 & 0 \\ 2 & 0 & - 2 & - 2 \\ 2 & - 2 & 0 & 2 \\ 0 & - 2 & 2 & - 2 \end{matrix}] = 8 \times [\begin{matrix} 1 & 1 & 1 & 0 \\ 1 & 0 & - 1 & - 1 \\ 1 & - 1 & 0 & 1 \\ 0 & - 1 & 1 & - 1 \end{matrix}],

X_{m}^{T} = [\begin{matrix} X_{i 0} - X_{i 7} \\ X_{i 1} - X_{i 6} \\ X_{i 2} - X_{i 5} \\ X_{i 3} - X_{i 4} \end{matrix}],

Can utilize the calculating redundancy of these two transformation kernels to compress the usage quantity of adder equally, and because H _1e Common factor 4 can be proposed, H _1oCan propose common factor 8, this is very useful for utilizing truncation in the middle of the video coding to reduce adder input bit wide, because at the dct transform of video coding, if decompose based on ranks, after first PU, can the minimum 3bit of transformation results be blocked, and because H _1e Common factor 4 can be proposed, H _1oCan propose common factor 8, therefore when carrying out sub-transformation calculations result's merging, can block 3bit together and consider H _0eX _p ^T+ H _1eX _p ^TLow 2 additions can not occur overflowing, therefore can directly ignore these two, reduce the input bit wide of adder, H _0oX _m ^T+ H _1oX _m ^TLow 3 additions can not occur overflowing, therefore can directly ignore these three, reduce the input bit wide of adder.Be further noted that H _0eFirst row and the third line element all be zero, therefore when sub-conversion merges for this two merging of going, do not need extra adder.

Through the series connection of above-described two PU, can finish method of the present invention, realize the integer dct transform of quick nondestructive.

Adopt the algorithm of mentioning among the present invention, realize meeting the integer dct transform of AVS standard, can add up obtaining the main hardware resource that algorithm proposed by the invention consumes based on the transformation kernel Bit Plane Decomposition:

Title	Number
Title	Number	The 8bit adder	8
The 9bit adder	8	The 8bit adder	8
The 9bit adder	8	The 10bit adder	10
The 11bit adder	10	The 10bit adder	10
The 11bit adder	10	The 12bit adder	2

And traditional main hardware resource that butterfly computation consumed:

Title	Number
Title	Number	The 8bit adder	8
The 9bit adder	8	The 8bit adder	8
The 9bit adder	8	The 10bit adder	2
The 11bit adder	6	The 10bit adder	2
The 11bit adder	6	The 12bit adder	4
The 13bit adder	6	The 12bit adder	4

The 14bit adder
The 14bit adder		The 15bit adder	4

Below illustrate that by Fig. 7 and Fig. 8 the inventive method compared with the prior art, advantage that is had and effect: Fig. 7 is a butterfly computation, Fig. 8 is a method of the present invention, transverse axis is represented the input bit wide of adder, the number that longitudinal axis representative uses, first PU that meets the integer dct transform of AVS standard has blocking of 3 bits, and for second PU blocking of 7 bits is arranged.Can relatively find out, the number of two kinds of adders that algorithm adopted is identical (38), but the adder that algorithm adopted that is based on Bit Plane Decomposition concentrates on 8,9,10,11 inputs, and butterfly computation from 8 to 15 does not wait, thereby on the low side based on the algorithm of Bit Plane Decomposition input bit wide integral body, if when therefore the DCT module adopts FPGA or ASIC to realize, can save resource more based on the algorithm of Bit Plane Decomposition.Adopting FPGA to realize, is example with Xilinx VirtexIV, can save resource more than 10% based on the algorithm of Bit Plane Decomposition.

For different integer transform nuclear, can adopt the algorithm of Bit Plane Decomposition, but for different integer transform nuclear, decomposing bit plane matrix that obtains and the bit plane grouping with maximum adder reusing degree may be different, but all is to extract big coefficient as far as possible, reduces the scope of the element of sub-transformation kernel, utilization is blocked and is reduced the adder bit wide, and adjust the bit plane coefficient, and make the reusing degree of adder big as far as possible, set about reducing expense to resource from these two aspects.

Claims

1, a kind of integral discrete cosine transform method that is used for video coding is characterized in that this method may further comprise the steps:

(1) each element in the integral discrete cosine transform nuclear is carried out equivalence and split, obtain N matrix, D ₀, D ₁... D _N-1, the expression formula D=(D of then former integral discrete cosine transform nuclear D ₀+ D ₁+ ...+D _N-1), wherein each element in the integral discrete cosine transform nuclear is carried out the method that equivalence splits, may further comprise the steps:

(1-1) according to the binary representation of integer element in the transformation kernel, be split as a plurality of 2 power and;

(1-2) according to order, the fractionation item of i power of each element in the integral discrete cosine transform nuclear is combined into matrix D i, wherein 0≤i≤N-1 from low order power to high order power;

(2) above-mentioned N matrix grouping addition obtained M sub-transformation kernel H ₀, H ₁, H _M-1

(3), calculate M sub-conversion H according to above-mentioned M sub-transformation kernel ₀X ^T, H ₁X ^T, H ₂X ^T... H _M-1X ^T, and according to subscript order from low to high M sub-transformation results is merged, obtain the transformation results of first processing unit, i.e. DX ^T=(H ₀X ^T+ H ₁X ^T+ ...+H _M-1X ^T) ^T, wherein X is a frame of video luminance block matrix, X ^TBe the transposition of X, the method that M sub-transformation results is merged wherein may further comprise the steps:

(3-1) to each the sub-conversion in M the sub-conversion, the common factor of 2 j power is proposed, wherein 0≤j≤N-1 respectively;

(3-2) to H ₀X ^TAnd H ₁X ^TMerge, establish from H ₁X ^TThe middle common factor that proposes is 2 ^J1, if j1≤a then only merges the part that is higher than the j1 position in the matrix element binary representation, if j1＞a then only merges the part that is higher than a position in the matrix element binary representation, the result of merging is designated as X _Temp, wherein a is by the decision of pending integral discrete cosine transform nuclear;

(3-3) with above-mentioned amalgamation result X _TempWith H ₂X ^TMethod by above-mentioned steps (3-2) merges, and the result of merging is designated as X _Temp

(3-4) repeating step (3-2) and (3-3) merges all sub-conversion successively one by one, finishes the merging of M sub-conversion;

(4) make above-mentioned transformation results DX ^TMatrix=Y, as the X matrix in the step (3), repeating step (1)～(3) obtain DY with the Y matrix ^T, then the integral discrete cosine transform coefficient is: F=DXD ^T=(H ₀+ H ₁+ ... H _M-1) X (H ₀+ H ₁+ ...+H _M-1) ^T=(H ₀+ H ₁+ ...+H _M-1) (H ₀X ^T+ H ₁X ^T+ ...+H _M-1X ^T) ^T