CN100452880C - Integral discrete cosine transform method in use for encoding video - Google Patents

Integral discrete cosine transform method in use for encoding video Download PDF

Info

Publication number
CN100452880C
CN100452880C CNB2006100121618A CN200610012161A CN100452880C CN 100452880 C CN100452880 C CN 100452880C CN B2006100121618 A CNB2006100121618 A CN B2006100121618A CN 200610012161 A CN200610012161 A CN 200610012161A CN 100452880 C CN100452880 C CN 100452880C
Authority
CN
China
Prior art keywords
sub
matrix
discrete cosine
cosine transform
conversion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2006100121618A
Other languages
Chinese (zh)
Other versions
CN1874510A (en
Inventor
赵欣
王宇
李凤亭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CNB2006100121618A priority Critical patent/CN100452880C/en
Publication of CN1874510A publication Critical patent/CN1874510A/en
Application granted granted Critical
Publication of CN100452880C publication Critical patent/CN100452880C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Complex Calculations (AREA)

Abstract

The present invention provides an integral discrete cosine transform method for video encoding, which belongs to the technical field of video transmission. The method is that firstly, every element in an integral discrete cosine transform core is disassembled in an equivalent mode, so N matrixes are obtained. The N matrixes are grouped and added, so M auxiliary transform cores are obtained. According to the M auxiliary transform cores, M auxiliary transforms are calculated, and according to the low-to-high order of a suffix, M auxiliary transform results are combined. A transformed result DX<T> matrix of a first processing unit is used as a Y matrix of a second processing unit. The steps are repeated, so an integral discrete cosine transform coefficient is obtained. The method of the present invention has the advantages that by using the cutting operation of integral DCT transform, before the cutting operation is carried out, partial computation redundancy is removed, so the whole bit wide of an adder in PU is decreased, and hardware resources are saved.

Description

A kind of integral discrete cosine transform method that is used for video coding
Technical field
The present invention relates to a kind of integral discrete cosine transform method that is used for video coding, belong to the video transmission technologies field.
Background technology
In prior art, name is called " Development of integer cosine transforms by theprinciple of dyadic symmetry " (Proc.Inst.Elect.Eng., Partl, vol.136, Aug.1989, pp.276-282.) paper discloses the integral discrete cosine transform method that is used for video coding that can be used for video coding, and its principle is: the mathematical expression of discrete cosine transform is F=DXD T, D wherein, X is the matrix of N * N, D TBe the transposition of D, D is called the transformation kernel of this conversion.For integral discrete cosine transform, the element of D all is an integer, and the size of D and numerical value are not unique, such as the transformation kernel in the middle of the benchmark class that is applied in video encoding and decoding standard H.264 is 1 1 1 1 2 1 - 1 - 2 1 - 1 - 1 1 1 - 2 2 - 1 , And the transformation kernel in the middle of digital audio/video encoding and decoding standard (hereinafter to be referred as AVS) is 8 8 8 8 8 8 8 8 10 9 6 2 - 2 - 6 - 9 - 10 10 4 - 4 - 10 - 10 - 4 4 10 9 - 2 - 10 - 6 6 10 2 - 9 8 - 8 - 8 8 8 - 8 - 8 8 6 - 10 2 9 - 9 - 2 10 - 6 4 - 10 10 - 4 - 4 10 - 10 4 2 - 6 9 - 10 10 - 9 6 - 2 .
The method of traditional above-mentioned integral discrete cosine transform of quick realization, be to adopt a kind of butterfly computation structure of flow graph type to calculate 1 integral discrete cosine transform of tieing up (hereinafter to be referred as DCT), realize whole two-dimentional integer dct transform by the polyphone of the identical processing unit (hereinafter to be referred as PU) of two these spline structures then, be referred to as the method that ranks decompose, its FB(flow block) is described as F=DXD with mathematical expression as shown in Figure 1 T=D (DX T) T, the calculating that PU finished is exactly DX T, the integer dct transform of one dimension just, so whole conversion can be connected in series by two PU and realize.Therefore the core calculations of integral discrete cosine transform is exactly PU, the mathematical operation that PU finished comprises: matrix transpose and matrix multiple, and for first PU, can be by with the matrix input of input matrix after according to transposition, and, need the arithmetic element of a dedicated calculation transposition for second PU.
In above-mentioned traditional DCT algorithm, be example with the transformation kernel that AVS was adopted, based on the realization flow graph of butterfly structure as shown in Figure 2:
X0~x7 is X TA columns certificate, the columns that whole flow graph has been realized X according to and transformation kernel multiplied result X0 ~ X7,, respectively with X T8 columns according to calculating through top flow graph, can obtain the result of calculation DX of a PU TThis structure is beneficial to hardware and realizes that because multiply operation is converted into the multiplication of small integer, the multiplication of small integer can be realized with shifting function.
The shortcoming of existing traditional butterfly structure is, because the integer transform nuclear element all is an integer, and the discrete cosine transform of standard nuclear is orthogonal matrix, the absolute value of element is all less than 1, therefore take advantage of with the integer nuclear phase and have bigger data gain, need extra break-in operation in order to offset this gain, because the The data binary representation, break-in operation specifically refers to some position, the ground of data binary representation is directly given up, and such operation equivalence is for carrying out the division of certain power of 2 to data.The result that break-in operation brings is that low some positions of data binary representation are rejected, and as shown in Figure 1, all there is an operation that moves to right each PU back.First PU a bit that moves to right, second PU b bit that moves to right is an example with the transformation kernel that AVS was adopted, the numerical value of a and b satisfies a+b=9, because this integer transform nuclear is compared with the floating number transformation kernel of standard, gain is 9 powers of 2, and this numerical value is because of different transformation kernels and difference.Because right-shift operation causes low some the data of result of calculation to be rejected, therefore the calculating of every relevant these low-order bit all is redundant in the middle of the PU, the calculating of these redundancies causes the waste of hardware resource, but since traditional butterfly structure intrinsic characteristic, these redundancies can't extract separately well, and are unfavorable to effectively utilizing of hardware resource.
Summary of the invention
The objective of the invention is to propose a kind of integral discrete cosine transform method that is used for video coding,, in the middle of the computational process of PU, directly remove part and calculate redundancy, so that the economize on hardware resource at the shortcoming of prior art.
The integral discrete cosine transform method that is used for video coding that the present invention proposes may further comprise the steps:
(1) each element in the integral discrete cosine transform nuclear is carried out equivalence and split, obtain N matrix, D 0, D 1... D N-1, the expression formula D=(D of then former integral discrete cosine transform nuclear D 0+ D 1+ ... + D N-1);
(2) above-mentioned N matrix grouping addition obtained M sub-transformation kernel H 0, H 1, H M-1, wherein each sub-transformation kernel be one group of n matrix and, 1<n<N;
(3), calculate M sub-conversion H according to above-mentioned M sub-transformation kernel 0X T, H 1X T, H 2X TH M-1X T, and according to subscript order from low to high M sub-transformation results is merged, obtain the transformation results of first processing unit, i.e. DX T=(H 0X T+ H 1X T+ ... + H M-1X T) T, wherein X is a frame of video luminance block matrix, X TIt is the transposition of X;
(4) with the transformation results DX of above-mentioned first processing unit TMatrix is as the Y matrix of second processing unit, and repeating step (1)~(3) obtain DY T, then the integral discrete cosine transform coefficient is: F=DXD T=(H 0+ H 1+ ... H M-1) X (H 0+ H 1+ ... + H M-1) T=(H 0+ H 1+ ... + H M-1) (H 0X T+ H 1X T+ ... + H M-1X T) T
In the said method, the method for each element in the integral discrete cosine transform nuclear being carried out the equivalence fractionation may further comprise the steps:
(1) according to the binary representation of integer element in the transformation kernel, be split as a plurality of 2 power and;
(2), the fractionation item of i power of each element in the integral discrete cosine transform nuclear is combined into matrix D i, wherein 0≤i≤N-1 according to order from low order power to high order power.
In the said method, the method that M sub-conversion merged is:
(1), 2 the j (common factor of inferior power of 0≤j≤N-1) is proposed respectively to each the sub-conversion in M the sub-conversion;
(2) to H 0X TAnd H 1X TMerge, establish from H 1X TThe middle common factor that proposes is 2 J1, if j1≤a then only merges the part that is higher than the j1 position in the matrix element binary representation, if j1>a then only merges the part that is higher than a position in the matrix element binary representation, the result of merging is designated as X Temp, wherein a is by the decision of pending integral discrete cosine transform nuclear;
(3) with above-mentioned amalgamation result X TempWith H 2X TMethod by above-mentioned steps (2) merges, and the result of merging is designated as X Temp
(4) repeating step (2) and (3) merge all sub-conversion successively one by one, finish the merging of M sub-conversion.
The integral discrete cosine transform method that is used for video coding that the present invention proposes, its characteristics and advantage are, adopt the diverse method of a kind of and traditional fast algorithm, start with from integer transform nuclear, with element is that the transformation kernel of small integer is split as the experimental process transformation kernel, because each sub-transformation kernel can propose the common factor of one 2 power, so can utilize the break-in operation of integer dct transform, reduce the whole bit wide of the central adder of PU, save hardware resource.Compare with traditional butterfly structure, the part that the inventive method can be isolated separately in the middle of the integral discrete cosine transform is calculated redundant, getting rid of these before break-in operation is carried out calculates redundant, thereby reduced the whole bit wide of the adder of PU, saved hardware resource, compare with existing butterfly structure, the inventive method adopts programmable gate array (FPGA) or customizable integrated circuit (ASIC) to realize that hardware resource can be saved more than 10%.
Description of drawings
Fig. 1 is the FB(flow block) that the structure of existing employing ranks decomposition realizes integer DCT.
Fig. 2 is the flow graph structure chart of the DCT that adopted of traditional realization AVS.
Fig. 3 is the register file structure block diagram that the computing of existing realization transposition is adopted.
Fig. 4 adopts the inventive method to realize the part flow graph structure chart of DCT.
Fig. 5 is the part in the flow graph structure chart shown in Figure 2.
Fig. 6 is the hardware structure diagram that is used to realize DCT of the present invention.
Fig. 7 and Fig. 8 are the effect comparison diagrams of the adder resource that adopts of the inventive method and existing method.
Embodiment
The integral discrete cosine transform method that is used for video coding that the present invention proposes at first carries out equivalence with each element in the integral discrete cosine transform nuclear and splits, and obtains N matrix, D 0, D 1... D N-1, the expression formula D=(D of then former integral discrete cosine transform nuclear D 0+ D 1+ ... + D M-1); Above-mentioned N matrix grouping addition obtained M sub-transformation kernel H 0, H 1, H M-1, wherein each sub-transformation kernel be one group of N matrix and; According to above-mentioned M sub-transformation kernel, calculate M sub-conversion H 0X T, H 1X T, H 2X TH M-1X T, and according to subscript order from low to high M sub-transformation results is merged, obtain the transformation results of first processing unit, i.e. DX T=(H 0X T+ H 1X T+ ... + H M-1X T) T, wherein X is a frame of video luminance block matrix, X TIt is the transposition of X; Transformation results DX with above-mentioned first processing unit TMatrix repeats above-mentioned steps as the Y matrix of second processing unit, obtains DY T, then the integral discrete cosine transform coefficient is: F=DXD T=(H 0+ H 1+ ... H M-1) X (H 0+ H 1+ ... + H M-1) T=(H 0+ H 1+ ... + H M-1) (H 0X T+ H 0X T+ ... + H M-1X T) T
In the said method, each element in the integral discrete cosine transform nuclear is carried out the method that equivalence splits can be may further comprise the steps: (1) according to the binary representation of integer element in the transformation kernel, be split as a plurality of 2 power and; (2), the fractionation item of i power of each element in the integral discrete cosine transform nuclear is combined into matrix D i, wherein 0≤i≤N-1 according to order from low order power to high order power.
In the said method, the method that sub-conversion merges to M can for: (1) to each the sub-conversion in M the sub-conversion, proposes 2 the j (common factor of inferior power of 0≤j≤N-1) respectively; (2) to H 0X TAnd H 1X TMerge, establish from H 1X TThe middle common factor that proposes is 2 J1, if j1≤a then only merges the part that is higher than the j1 position in the matrix element binary representation, if j1>a then only merges the part that is higher than a position in the matrix element binary representation, the result of merging is designated as X Temp, wherein a is by the decision of pending integral discrete cosine transform nuclear; (3) with above-mentioned amalgamation result X TempWith H 2X TMethod by above-mentioned steps (2) merges, and the result of merging is designated as X Temp(4) repeating step (2) and (3) merge all sub-conversion successively one by one, finish the merging of M sub-conversion.
The invention belongs to image encoding and image processing field, be applied to the quick realization of integral discrete cosine transform especially.
Dct transform is an elementary cell with 8 one dimension dct transforms among the present invention, and the structure that adopts ranks to decompose realizes two-dimensional dct transform.The mathematical notation that ranks decompose is F=DXD T=D (DX T) T, promptly with DX TBe a PU, the function that PU finishes is, to matrix transpose that will input, and the premultiplication matrix D, the serial by two PU can realize two-dimensional dct transform by the one dimension dct transform, as shown in Figure 1.Though the content of the mathematical operation of two PU is identical, but their matrix transpose operation is different, the matrix transpose operation of first PU can be imported by importing according to the matrix behind the transposition, and second PU must have special transposition arithmetic element, and transposition arithmetic element of the present invention is to realize by as shown in Figure 3 register file in the prior art.
The present invention at be the integer dct transform, the transformation kernel of integer dct transform is to be made of integer, adder easy to use and shift unit come hardware to realize.
For any one integer transform nuclear, small integer transformation kernel particularly, the present invention will each element wherein carries out the fractionation of equivalence, and the method for fractionation is the binary representation according to this integer element, be split as certain power of some 2 addition and, can be split as 2 such as element 6 1+ 2 2According to order from low order power to high order power, the fractionation item of i power of the element of transformation kernel is combined into matrix D i, then Di only contains 2 iWith 0 element.Obtain N matrix D according to top operation 0, D 1... D N-1, the addition of dividing into groups once more of this N matrix is obtained M new sub-transformation kernel H 0, H 1, H M-1, wherein the element of each transformation kernel be combined into this group matrix correspondence position element and.
This M that obtains sub-transformation kernel H 0, H 1, H M-1, be called sub-transformation kernel in the present invention.Then the result of whole conversion is the sum as a result of this M sub-conversion, promptly
F=DXD T=(H 0+H 1+…+H M-1)X(H 0+H 1+…+H M-1) T
=(H 0+H 1+…+H M-1)(H 0X T+H 1X T+…+H M-1X T) T
Wherein for H 0X T, H 1X TH M-1X T, the present invention is defined as sub-conversion.
One 2 the j (common factor of inferior power of 0≤j≤N-1) can be proposed for each sub-conversion, this common factor is proposed, can reduce the order of magnitude of transformation kernel element, make the element of sub-transformation kernel all in a little scope, can simplify the calculating of sub-conversion greatly like this, and can utilize break-in operation that the input bit wide of adder is done reducing to a certain degree.Reduce to import the following description of method of bit wide:
Suppose that break-in operation is for blocking the minimum a position of binary representation as a result.
The structure that adopts ranks to decompose realizes integer DCT algorithm proposed by the invention, needs the result is merged after the result of each sub-conversion calculates, and at first merges H 0X TAnd H 1X T, the result of merging continues and H again 2X TMerge, repeat this process, finished by merging up to whole sub-conversion.
Because H 1X T Common factor 2 can be proposed J1, in the corresponding element addition to two sub-transformation results, overflowing can not appear in the addition of the low j1 position of the binary representation of element so, if j1≤a, so could be directly with H 0X TResult's low j1 position give up and do not carry out the phase add operation, if j1>a, so can be directly with H 0X TResult's low a position give up and do not carry out addition, the result of merging is designated as X Temp
As above describe preceding two sub-conversion are merged, the result of merging is designated as X Temp.Afterwards again with X TempWith sub-conversion H 2X TMerge sub-conversion H 2X T Common factor 2 can be proposed J2, according to top same process, obtain amalgamation result, still be designated as X Temp
Afterwards again with X TempMerge successively with remaining sub-conversion, being finished by merging up to whole sub-conversion obtains final result for H 0X T+ H 1X T+ ... + H M-1X T, i.e. DX T, be designated as the Y matrix.
The structure that adopts ranks to decompose is carried out direct transform, needs the result of calculation DX with top first PU so TBe input to once more on second identical PU of structure as the Y matrix, calculate DY T,, just can finish a complete integer dct transform through the calculating of second PU.
Shown in Figure 4, the part implementation structure of describing for the flow graph of method proposed by the invention, Figure 5 shows that the way of realization of traditional butterfly computation same section, both made comparisons as can be seen that the increase of bit wide is littler than traditional butterfly computation on the direction that algorithm data proposed by the invention flows.The flow graph type that Fig. 4 and Fig. 5 have only provided a part of conversion coefficient as a comparison, be because have than big-difference on algorithm proposed by the invention and the traditional butterfly computation structure, be not easy to formal description, otherwise can compare in a jumble, be unfavorable for illustrating the difference of algorithm with flow graph.
Below introduce an embodiment of the inventive method:
The integer transform nuclear that is adopted with the AVS video encoding standard is that example illustrates the quick integer DCT implementation algorithm based on the transformation kernel Bit Plane Decomposition proposed by the invention.
8 8 8 8 8 8 8 8 10 9 6 2 - 2 - 6 - 9 - 10 10 4 - 4 - 10 - 10 - 4 4 10 9 - 2 - 10 - 6 6 10 2 - 9 8 - 8 - 8 8 8 - 8 - 8 8 6 - 10 2 9 - 9 - 2 10 - 6 4 - 10 10 - 4 - 4 10 - 10 4 2 - 6 9 - 10 10 - 9 6 - 2 .
The element of this integer transform nuclear all is an absolute value less than 16 integer, binary representation all be zero more than the 4th, therefore (wherein element 6 and-6 is comparatively special this transformation kernel can be split as four matrixes, they can have two kinds of different method for splitting: 6=4+2=8-2 separately,-6=-4-2=-8+2, through the relatively discovery 6=8-2 of practice, this fractionation form of-6=-8+2 is comparatively favourable)
0 0 0 0 0 0 0 0 0 1 0 0 0 0 - 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 - 1 0 0 0 0 0 0 0 0 0 0 0 1 - 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 - 1 0 0 , 0 0 0 0 0 0 0 0 2 0 - 2 2 - 2 2 0 - 2 2 0 0 - 2 - 2 0 0 2 0 - 2 - 2 2 - 2 2 2 0 0 0 0 0 0 0 0 0 - 2 - 2 2 0 0 - 2 2 2 0 - 2 2 0 0 2 - 2 0 2 2 0 - 2 2 0 - 2 - 2
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 - 4 0 0 - 4 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 - 4 - 4 0 0 4 0 0 0 0 0 0 0 0 , 8 8 8 8 8 8 8 8 8 8 8 0 0 - 8 - 8 - 8 8 0 0 - 8 - 8 0 0 8 8 0 - 8 - 8 8 8 0 - 8 8 - 8 - 8 8 8 - 8 - 8 8 8 - 8 0 8 - 8 0 8 - 8 0 - 8 8 0 0 8 - 8 0 0 - 8 8 - 8 8 - 8 8 0
Wherein, two matrixes can be merged into above
H 0 = 0 0 0 0 0 0 0 0 2 1 - 2 2 - 2 2 - 1 - 2 2 0 0 - 2 - 2 0 0 2 1 - 2 - 2 2 - 2 2 2 - 1 0 0 0 0 0 0 0 0 - 2 - 2 2 1 - 1 - 2 2 2 0 - 2 2 0 0 2 - 2 0 2 2 1 - 2 2 - 1 - 2 - 2 ,
Following two matrixes can be merged into
H 1 = 8 8 8 8 8 8 8 8 8 8 8 0 0 - 8 - 8 - 8 8 4 - 4 - 8 - 8 - 4 4 8 8 0 - 8 - 8 8 8 0 - 8 8 - 8 - 8 8 8 - 8 - 8 8 8 - 8 0 8 - 8 0 8 - 8 4 - 8 8 - 4 - 4 8 - 8 4 0 - 8 8 - 8 8 - 8 8 0
Such PU is broken down into DX T=H 0X T+ H 1X T, H wherein 1Can also write H 1 = 4 &times; 2 2 2 2 2 2 2 2 2 2 2 0 0 - 2 - 2 - 2 2 1 - 1 - 2 - 2 - 1 1 2 2 0 - 2 - 2 2 2 0 - 2 2 - 2 - 2 2 2 - 2 - 2 2 2 - 2 0 2 - 2 0 2 - 2 1 - 2 2 - 1 - 1 2 - 2 1 0 - 2 2 - 2 2 - 2 2 0 , Two sub-transformation kernels of such PU just only contain-2 ,-1,1,2 four integer, because close between the coefficient, the repeatability of computing will strengthen, and the reusing degree of adder can utilize fully.
According to the left-right symmetric of integer dct transform nuclear, can be with two sub-conversion H 0X TAnd H 1X TTransformation kernel be reduced to
H 0 = 0 0 0 0 2 1 - 2 2 2 0 0 - 2 1 - 2 - 2 2 0 0 0 0 - 2 - 2 2 1 0 - 2 2 0 2 2 1 - 2 , H 1 = 4 &times; 2 2 2 2 2 2 2 0 2 1 - 1 - 2 2 0 - 2 - 2 2 - 2 - 2 2 2 - 2 0 2 1 - 2 2 - 1 0 - 2 2 - 2
The even number line of two sub-conversion input is identical so, and the odd-numbered line input is identical, therefore between the even number line of two sub-conversion and can realize also between the odd-numbered line that adder is multiplexing, with H 0X TAnd H 1X TOdd number behavior example, H 0X TCan be decomposed into odd-numbered line conversion H 0eX p TWith even number line conversion H 0oX m T, H wherein 0eX p TThe transformation kernel correspondence H 0The odd-numbered line element, H 0oX m TThe transformation kernel correspondence H 0The even number line element, the result of two conversion corresponds to H 0X TOdd-numbered line result and even number line result, for H 1X TSame conclusion is also arranged.As top description, H 0eX p TAnd H 1eX p TTherefore because input is identical, can adder multiplexing.
H 0 e = 0 0 0 0 2 0 0 - 2 0 0 0 0 0 - 2 2 0 , H 1 e = 4 &times; 2 2 2 2 2 1 - 1 - 2 2 - 2 - 2 2 1 - 2 2 - 1 , X p T = X i 0 + X i 7 X i 1 + X i 6 X i 2 + X i 5 X i 3 + X i 4 ,
Observe H 0eAnd H 1eCan find that the calculating redundancy of these two transformation kernels is very big, can calculate X earlier p T[0] ± X p T[3] and X p T[1] ± X p T[2], the combination plus-minus by these four result of calculations just can obtain H 0eX p TAnd H 1eX p TResult of calculation.
Equally for
H 0 o = 2 1 - 2 2 1 - 2 - 2 2 - 2 - 2 2 1 2 2 1 - 2 , H 1 o = 4 &times; 2 2 2 0 2 0 - 2 - 2 2 - 2 0 2 0 - 2 2 - 2 = 8 &times; 1 1 1 0 1 0 - 1 - 1 1 - 1 0 1 0 - 1 1 - 1 ,
X m T = X i 0 - X i 7 X i 1 - X i 6 X i 2 - X i 5 X i 3 - X i 4 , Can utilize the calculating redundancy of these two transformation kernels to compress the usage quantity of adder equally, and because H 1e Common factor 4 can be proposed, H 1oCan propose common factor 8, this is very useful for utilizing truncation in the middle of the video coding to reduce adder input bit wide, because at the dct transform of video coding, if decompose based on ranks, after first PU, can the minimum 3bit of transformation results be blocked, and because H 1e Common factor 4 can be proposed, H 1oCan propose common factor 8, therefore when carrying out sub-transformation calculations result's merging, can block 3bit together and consider H 0eX p T+ H 1eX p TLow 2 additions can not occur overflowing, therefore can directly ignore these two, reduce the input bit wide of adder, H 0oX m T+ H 1oX m TLow 3 additions can not occur overflowing, therefore can directly ignore these three, reduce the input bit wide of adder.Be further noted that H 0eFirst row and the third line element all be zero, therefore when sub-conversion merges for this two merging of going, do not need extra adder.
Through the series connection of above-described two PU, can finish method of the present invention, realize the integer dct transform of quick nondestructive.
Adopt the algorithm of mentioning among the present invention, realize meeting the integer dct transform of AVS standard, can add up obtaining the main hardware resource that algorithm proposed by the invention consumes based on the transformation kernel Bit Plane Decomposition:
Title Number
The 8bit adder 8
The 9bit adder 8
The 10bit adder 10
The 11bit adder 10
The 12bit adder 2
And traditional main hardware resource that butterfly computation consumed:
Title Number
The 8bit adder 8
The 9bit adder 8
The 10bit adder 2
The 11bit adder 6
The 12bit adder 4
The 13bit adder 6
The 14bit adder
The 15bit adder 4
Below illustrate that by Fig. 7 and Fig. 8 the inventive method compared with the prior art, advantage that is had and effect: Fig. 7 is a butterfly computation, Fig. 8 is a method of the present invention, transverse axis is represented the input bit wide of adder, the number that longitudinal axis representative uses, first PU that meets the integer dct transform of AVS standard has blocking of 3 bits, and for second PU blocking of 7 bits is arranged.Can relatively find out, the number of two kinds of adders that algorithm adopted is identical (38), but the adder that algorithm adopted that is based on Bit Plane Decomposition concentrates on 8,9,10,11 inputs, and butterfly computation from 8 to 15 does not wait, thereby on the low side based on the algorithm of Bit Plane Decomposition input bit wide integral body, if when therefore the DCT module adopts FPGA or ASIC to realize, can save resource more based on the algorithm of Bit Plane Decomposition.Adopting FPGA to realize, is example with Xilinx VirtexIV, can save resource more than 10% based on the algorithm of Bit Plane Decomposition.
For different integer transform nuclear, can adopt the algorithm of Bit Plane Decomposition, but for different integer transform nuclear, decomposing bit plane matrix that obtains and the bit plane grouping with maximum adder reusing degree may be different, but all is to extract big coefficient as far as possible, reduces the scope of the element of sub-transformation kernel, utilization is blocked and is reduced the adder bit wide, and adjust the bit plane coefficient, and make the reusing degree of adder big as far as possible, set about reducing expense to resource from these two aspects.

Claims (1)

1, a kind of integral discrete cosine transform method that is used for video coding is characterized in that this method may further comprise the steps:
(1) each element in the integral discrete cosine transform nuclear is carried out equivalence and split, obtain N matrix, D 0, D 1... D N-1, the expression formula D=(D of then former integral discrete cosine transform nuclear D 0+ D 1+ ...+D N-1), wherein each element in the integral discrete cosine transform nuclear is carried out the method that equivalence splits, may further comprise the steps:
(1-1) according to the binary representation of integer element in the transformation kernel, be split as a plurality of 2 power and;
(1-2) according to order, the fractionation item of i power of each element in the integral discrete cosine transform nuclear is combined into matrix D i, wherein 0≤i≤N-1 from low order power to high order power;
(2) above-mentioned N matrix grouping addition obtained M sub-transformation kernel H 0, H 1, H M-1
(3), calculate M sub-conversion H according to above-mentioned M sub-transformation kernel 0X T, H 1X T, H 2X T... H M-1X T, and according to subscript order from low to high M sub-transformation results is merged, obtain the transformation results of first processing unit, i.e. DX T=(H 0X T+ H 1X T+ ...+H M-1X T) T, wherein X is a frame of video luminance block matrix, X TBe the transposition of X, the method that M sub-transformation results is merged wherein may further comprise the steps:
(3-1) to each the sub-conversion in M the sub-conversion, the common factor of 2 j power is proposed, wherein 0≤j≤N-1 respectively;
(3-2) to H 0X TAnd H 1X TMerge, establish from H 1X TThe middle common factor that proposes is 2 J1, if j1≤a then only merges the part that is higher than the j1 position in the matrix element binary representation, if j1>a then only merges the part that is higher than a position in the matrix element binary representation, the result of merging is designated as X Temp, wherein a is by the decision of pending integral discrete cosine transform nuclear;
(3-3) with above-mentioned amalgamation result X TempWith H 2X TMethod by above-mentioned steps (3-2) merges, and the result of merging is designated as X Temp
(3-4) repeating step (3-2) and (3-3) merges all sub-conversion successively one by one, finishes the merging of M sub-conversion;
(4) make above-mentioned transformation results DX TMatrix=Y, as the X matrix in the step (3), repeating step (1)~(3) obtain DY with the Y matrix T, then the integral discrete cosine transform coefficient is: F=DXD T=(H 0+ H 1+ ... H M-1) X (H 0+ H 1+ ...+H M-1) T=(H 0+ H 1+ ...+H M-1) (H 0X T+ H 1X T+ ...+H M-1X T) T
CNB2006100121618A 2006-06-09 2006-06-09 Integral discrete cosine transform method in use for encoding video Expired - Fee Related CN100452880C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006100121618A CN100452880C (en) 2006-06-09 2006-06-09 Integral discrete cosine transform method in use for encoding video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006100121618A CN100452880C (en) 2006-06-09 2006-06-09 Integral discrete cosine transform method in use for encoding video

Publications (2)

Publication Number Publication Date
CN1874510A CN1874510A (en) 2006-12-06
CN100452880C true CN100452880C (en) 2009-01-14

Family

ID=37484730

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006100121618A Expired - Fee Related CN100452880C (en) 2006-06-09 2006-06-09 Integral discrete cosine transform method in use for encoding video

Country Status (1)

Country Link
CN (1) CN100452880C (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106231304A (en) * 2016-08-30 2016-12-14 成都金本华电子有限公司 A kind of video decoding integer transform method based on one-dimensional quick dish algorithm improvement
CN106550267B (en) * 2016-11-25 2019-03-29 广州酷狗计算机科技有限公司 Multimedia messages coding/decoding method and device
CN107018420B (en) * 2017-05-08 2019-07-12 电子科技大学 A kind of low-power consumption two-dimension discrete cosine transform method and its circuit
CN112804531A (en) * 2021-04-12 2021-05-14 北京电信易通信息技术股份有限公司 Method for constructing HEVC (high efficiency video coding) coding chip based on HLS (hyper text transfer system)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN88102019A (en) * 1987-04-10 1988-12-21 菲利浦光灯制造公司 Use the television transmission system of transition coding
US5226002A (en) * 1991-06-28 1993-07-06 Industrial Technology Research Institute Matrix multiplier circuit
CN1253340A (en) * 1998-10-30 2000-05-17 惠普公司 Signal processing of distributed operation architecture

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN88102019A (en) * 1987-04-10 1988-12-21 菲利浦光灯制造公司 Use the television transmission system of transition coding
US5226002A (en) * 1991-06-28 1993-07-06 Industrial Technology Research Institute Matrix multiplier circuit
CN1253340A (en) * 1998-10-30 2000-05-17 惠普公司 Signal processing of distributed operation architecture

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种基于高度并行结构的二维DCT/IDCT处理器设计. 刘锋,代国定,庄奕琪.电路与系统学报,第8卷第3期. 2003
一种基于高度并行结构的二维DCT/IDCT处理器设计. 刘锋,代国定,庄奕琪.电路与系统学报,第8卷第3期. 2003 *

Also Published As

Publication number Publication date
CN1874510A (en) 2006-12-06

Similar Documents

Publication Publication Date Title
Meher et al. Efficient integer DCT architectures for HEVC
Chen et al. Efficient architecture of variable size HEVC 2D-DCT for FPGA platforms
CN100452880C (en) Integral discrete cosine transform method in use for encoding video
Lee et al. Precision-aware self-quantizing hardware architectures for the discrete wavelet transform
Zheng et al. A reconfigurable architecture for discrete cosine transform in video coding
Prabhu et al. Design of area and power efficient Radix-4 DIT FFT butterfly unit using floating point fused arithmetic
CN114007079A (en) Conversion circuit, method, device and encoder
CN108184127B (en) Configurable multi-size DCT (discrete cosine transform) transformation hardware multiplexing architecture
CN103092559A (en) Multiplying unit structure for discrete cosine transformation (DCT)/inverse discrete cosine transformation (IDCT) circuit under high efficiency video coding (HEVC) standard
Kim et al. Low-power multiplierless DCT for image/video coders
Tasdizen et al. A high performance and low cost hardware architecture for H. 264 transform and quantization algorithms
Wu et al. Hardware efficient multiplier‐less multi‐level 2D DWT architecture without off‐chip RAM
CN104811738B (en) The one-dimensional discrete cosine converting circuit of low overhead multi-standard 8 × 8 based on resource-sharing
CN104661036A (en) Video encoding method and system
CN1526103B (en) Discrete cosine transform device
CN203279074U (en) Two-dimensional discrete cosine transform (DCT)/inverse discrete cosine transform (IDCT) circuit
Braatz et al. A new hardware friendly 2D-DCT HEVC compliant algorithm and its high throughput and low power hardware design
Mozafari et al. Hartley Stochastic Computing For Convolutional Neural Networks
CN100388316C (en) High-precision number cosine converting circuit without multiplier and its conversion
Anas et al. FPGA implementation of a pipelined 2D-DCT and simplified quantization for real-time applications
Liu et al. Unified algorithms for computation of different points integer 1-D DCT/IDCT for the HEVC standard
Chandran et al. NEDA based hybrid architecture for DCT—HWT
CN109451307B (en) One-dimensional DCT operation method and DCT transformation device based on approximate coefficient
Yagain et al. High Speed ASIC Design of DCT for Image Compression
Vaithiyanathan et al. High speed low power DWT structure with log based FPU in FPGAs

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090114

Termination date: 20100609