CN100388316C

CN100388316C - High-precision number cosine converting circuit without multiplier and its conversion

Info

Publication number: CN100388316C
Application number: CNB2005100252037A
Authority: CN
Inventors: 林豪
Original assignee: Spreadtrum Communications Shanghai Co Ltd
Current assignee: Xiamen Ziguang exhibition Rui Technology Co. Ltd.
Priority date: 2005-04-19
Filing date: 2005-04-19
Publication date: 2008-05-14
Anticipated expiration: 2025-04-19
Also published as: CN1855149A

Abstract

The present invention provides a high precision number cosine transform and quantization method without a multiplier. The present invention comprises following steps: step 1. one multiplier is replaced with two shifters and one adder-subtracter to complete number cosine transform with proportionality coefficient s, and step 2. a transformed output result is multiplied with L on a quantize, and L is a quantized DCT transformation result obtained by dividing quantization coefficient by s. By selecting a given number, the multiplier which consumes more resources does not need to be used in a hardware circuit, and the present invention is a DCT transforming method without multiplier. The present invention has the parallelism and the consistency of height structurally, and a hardware computing unit can be repeatedly used, so the hardware circuit is very simple, and DCT IDCT can be realized in the same hardware circuit to reach very high computation precision.

Description

The number cosine converting circuit of high-precision multiplier-less and transform method thereof

Technical field

The present invention relates to a kind of number cosine converting circuit and transform method thereof of multiplier-less.

Background technology

Most of still image compression standards (as JPEG) and dynamic image compression standard are (as MPEG1, MPEG2, MPEG4, H263 etc.) in the cataloged procedure, at first adopt number cosine converting (Digital Cosine Transform, DCT) module is carried out the conversion of time domain to frequency domain to raw image data (or the view data after the estimation), adopts quantizer that frequency-region signal (being the result of DCT) is quantized then, at last the frequency-region signal after quantizing is compressed.In this process, quantizer is finished the operation of frequency-region signal divided by specific quantization parameter, and employing realizes the inverse that frequency-region signal multiply by quantization parameter usually.

Decode procedure is: the frequency-region signal after at first the compressed code flow decompress(ion) being obtained quantizing, then its inverse quantization is obtained correct frequency-region signal (being the result of DCT), (Invert DigitalCosine Transform, IDCT) module generates raw image data (or the view data after the estimation) by reverse number cosine converting at last.In this process, the frequency-region signal after quantizer will quantize multiply by specific quantization parameter, to recover correct frequency-region signal.

In a word, DCT (and IDCT) and quantizer are most important calculation procedures in compression of images (decompress(ion)) process.Design a kind of DCT of being convenient to hard-wired low complex degree (and IDCT) and quantizer, to improve system performance, reduce system power dissipation, to reduce system cost significant.

The DCT algorithm that uses in still image compression standard and the dynamic image compression standard is 8 * 8 two-dimensional dcts, and is as follows

F (u, v) = \frac{1}{4} C (u) C (v) Σ_{x = 0}^{7} Σ_{y = 0}^{7} f (x, y) \cos \frac{(2 x + 1) uπ}{16} \cos \frac{(2 y + 1) vπ}{16}

u，v，x，y＝0，1，2，... 7

X, y are the spatial domain coordinate, and u, v are the frequency domain coordinate

C (u), C (v) = \{\begin{matrix} \frac{1}{\sqrt{2}} & for u, v = 0 \\ 1 & otherwise \end{matrix}

The IDCT algorithm is 8 * 8 two-dimentional IDCT, and is as follows

f (x, y) = \frac{1}{4} Σ_{u = 0}^{7} Σ_{v = 0}^{7} C (u) C (v) F (u, v) \cos \frac{(2 x + 1) uπ}{16} \cos \frac{(2 y + 1) vπ}{16}

In hardware is realized, usually 8 * 8 two-dimensional dcts can be decomposed into 16 8 DCT of one dimension (as shown in Figure 1), one dimension DCT, IDCT are as follows:

F (u) = \frac{1}{2} C (u) Σ_{x = 0}^{7} f (x) \cos \frac{(2 x + 1) uπ}{16}

f (x) = \frac{1}{2} Σ_{u = 0}^{7} C (u) F (u) \cos \frac{(2 x + 1) uπ}{16}

As seen, comprise a large amount of multiplication in above-mentioned DCT, the IDCT formula, existing various software algorithms and hardware circuit are intended to reduce the number of times of multiplication, but too complicated algorithm is unfavorable for the hardware circuit realization.In addition, to use same structure also be the key factor that hardware circuit realize to need is considered for DCT and IDCT.

Such as, existing one dimension DCT, idct circuit adopt Fig. 2 and decomposing scheme shown in Figure 3 usually, below are called Chen scheme and Loeffler scheme according to the presenter.These two schemes have not only reduced the number of times of required multiplication, and have good structural symmetry and be convenient to realize with hardware circuit that especially DCT and IDCT can use same structure to realize.The core calculations unit of Chen scheme and Loeffler scheme is for intersecting multiplicaton addition unit circuit (as shown in Figure 4), and this element circuit comprises four multiplication and two addition (out1=in1*a+in2*b; Out2=in2*a-in1*b).The inverse operation of this computing unit circuit is this computing unit circuit itself just, so the DCT of Chen scheme and Loeffler scheme and IDCT can use same circuit to realize.

Many existing schemes substitute intersection multiplicaton addition unit (out1=in1*p1+in2 shown in Figure 4 with the displacement and the plus-minus method (as shown in Figure 5) of several series connection; Out2=in1* (1-p1*p2)-in2*p2), and, substitute multiplication with addition by selecting the least possible p value of binary expression figure place.Because the restriction of p value, the output of Fig. 5 also needs it to multiply by a specific scale-up factor s to obtain the result identical with Fig. 4.The operation of " multiply by a specific scale-up factor s " can carrying out (as shown in Figure 6) at the one dimension dct transform at last.Further, the operation of twice " multiply by a special ratios coefficient s " can be merged to carrying out (as shown in Figure 7) at last of 8 * 8 two-dimensional dcts.Like this, this class scheme has just been cancelled most of multiplication calculating.

Usually, in still image compression standard and the dynamic image compression standard, the result of dct transform will send into quantizer, divided by quantization parameter, promptly multiply by the inverse of quantization parameter in quantizer.So, can " multiply by a toatl proportion coefficient " with quantizer in " multiply by the inverse of quantization parameter " merge into multiplication one time, common such dct transform is otherwise known as " multiplier-less dct transform " or the dct transform of scale-up factor " band " (as shown in Figure 8).

But the computation schemes precision that substitutes a multiplication at the displacement of using series connection several times and plus-minus method is not high.If reach the requirement of IEEE Std 1180-1990 standard, required plus-minus method may be greater than saving the multiplication that gets off.In addition, the displacement of series connection and plus-minus method increase the step of calculating, cause calculation delay to increase, and are unfavorable for realizing high speed DCT circuit with hardware.

Summary of the invention

Technical matters to be solved by this invention provides a kind of hardware circuit of high-precision multiplier-less, only uses seldom shift unit and totalizer to replace multiplier, and can reach very high computational accuracy.

In order to solve the problems of the technologies described above, the technical solution adopted in the present invention is:

A kind of number cosine converting circuit of high-precision multiplier-less is characterized in that, comprises first module circuit, second element circuit, register that circuit connects;

Wherein, described first module circuit is made of 4 first adder-subtractors, comprises 4 input ends and 4 output terminals;

Described second element circuit is made up of 16 shift units and 12 adder-subtractors, all corresponding one second adder-subtractor that connects behind per two shift units, also all corresponding one the 3rd adder-subtractor that connects behind per two second adder-subtractors, the shared input end of per four shift units, each the 3rd adder-subtractor is equipped with an output terminal;

Input end, the output terminal of described first module circuit, second element circuit all are connected with register.

Further, the present invention also provides a kind of number cosine converting and quantization method of high-precision multiplier-less, makes h2/h6 ≈ C6/C2, h3/h5 ≈ C3/C5, h1/h7 ≈ C1/C7 and h4/h0 ≈ C4/C0, wherein Ck=cos (k ∏/16) (k=0,1,, 7), each h value is for being natural number, the figure place that its binary expression needs is the least possible, and the number of " 1 " that wherein comprises or " 1 " is minimum;

Computation process comprises the steps:

Step 1, replace a multiplier, to finish the number cosine converting of band scale-up factor s with two shift units and an adder-subtractor;

Step 2, the result exported in conversion in quantizer, be multiplied by L, wherein L be s divided by quantization parameter, the dct transform result after obtaining quantizing.

Further, the structure of the reverse number cosine converting circuit of high-precision multiplier-less of the present invention is identical with described number cosine converting circuit structure, and it comprises first module circuit, second element circuit, register that circuit connects;

Further, the reverse number cosine converting and the quantization method of high-precision multiplier-less of the present invention are at first selected particular value s ', and s ' and s differ a scale-up factor.

Computation process comprises the steps:

Step 1, the dct transform result be multiply by L ', obtain the x value, wherein to be s ' with quantization parameter long-pending for L ';

Step 2, replace a multiplier,, obtain correct inverse quantization and reverse number cosine converting result to finish the reverse number cosine converting of band scale-up factor s ' with two shift units and an adder-subtractor;

Advantage of the present invention is:

1, by selecting specific number, need not the multiplier that uses consumes resources more in the hardware circuit.Be a kind of " multiplier-less dct transform ";

2, have the concurrency and the consistance of height on the structure, the hardware computing unit can reuse, so hardware circuit is very simple;

3, DCT, IDCT can realize with same hardware circuit;

4, can arrive very high computational accuracy.

Description of drawings

Fig. 1 is a theory diagram of realizing 8 * 8 two-dimensional dcts with 8 DCT of one dimension

The theory diagram of Fig. 2 existing C hen scheme hardware counting circuit.

Fig. 3 is the theory diagram of existing Loeffler scheme hardware counting circuit.

Fig. 4 is the intersection multiplicaton addition unit synoptic diagram in existing C hen scheme and the Loeffler scheme.

Fig. 5 is the displacement of existing series connection and the computing unit synoptic diagram of plus-minus method.

Fig. 6 is the theory diagram that carries out multiply operation at last at the one dimension dct transform.

Fig. 7 is the theory diagram that carries out multiply operation at last at two-dimensional dct transform.

Fig. 8 is the dct transform and the quantizer principle of combining block diagram of multiplier-less.

Fig. 9 is DCT of the present invention (and IDCT) translation circuit principle schematic.

Figure 10 is a computing unit circuit exploded pictorial schematic diagram shown in Figure 9.

Figure 11 is the electrical block diagram of A unit shown in Figure 10.

Figure 12 is the electrical block diagram of B unit shown in Figure 10.

Figure 13 is the pipeline organization synoptic diagram of dct transform shown in Figure 9.

Figure 14 is the pipeline organization synoptic diagram of idct transform shown in Figure 9.

Figure 15 is the dct transform circuit theory synoptic diagram of another specific embodiment of the present invention.

Embodiment

The present invention adopts another thinking to realize intersection multiplicaton addition unit circuit in the Chen scheme, has realized a kind of high-precision " multiplier-less dct transform circuit ", now is described in detail as follows:

As shown in Figure 5, with the multiplication factor a of the core calculations unit of Chen scheme and Loeffler scheme, b is divided by s, and then output is also by divided by s, thus also need it multiply by s to obtain correct result with output.Core concept of the present invention is by changing s value, select proper A, and the B value makes A ≈ a/s and B ≈ b/s, A wherein, and B is a natural number, the figure place of its binary expression needs is the least possible, and the number of " 1 " that wherein comprises or " 1 " is minimum.For example limit A and B for smaller or equal to 24 natural number, and only comprise no more than 2 " 1 " or " 1 ", then multiply by A or B and can substitute multiplier with two shift units and an adder-subtractor, wherein, shift unit 0 ~ 4 bit that a binary number can be shifted left.A, B optionally number are 1,2,3,4,5,6,7 (7=8-1), 8,9,10,12,14 (14=16-2), 15 (15=16-1), 16,17,18,20 and 24.Other number comprises 11,13, and 19,21,22 and 23 can't be expressed as the binary number of 2 " 1 " or " 1 ".

The present invention proposes one dimension multiplier-less dct transform circuit as shown in Figure 9.By deriving as can be known, " multiply by a specific scale-up factor s " can be the carrying out of two-dimensional dct at last, and with quantizer in " multiply by the inverse of quantization parameter " merge into multiplication one time, as shown in Figure 8.

Below in conjunction with Fig. 2, the h among detailed description Fig. 9 and the computing method of s value:

For in v2, v3}-＞x2, the computing unit of x3} can be done following processing:

Consider

c ₂＝cos(2π/16)＝0.92387953

c ₆＝cos(6π/16)＝0.38268343

\frac{c_{2}}{c_{6}} = 2.41421356 \approx \frac{12}{5}

We can select

h ₂＝12

h ₆＝5

We can obtain error less than 0.5% approximation like this

\frac{h_{2}}{\sqrt{h_{2}^{2} + h_{6}^{2}}} = 0.92307692 = 0.99913126 c_{2} \approx c_{2}

\frac{h_{6}}{\sqrt{h_{2}^{2} + h_{6}^{2}}} = 0.38461538 = 1.00504843 c_{6} \approx c_{6}

For among Fig. 2 w4, w7}-＞x4, the computing unit of x7} can be done following processing:

Consider

c ₁＝cos(π/16)＝0.98078528

c ₇＝cos(7π/16)＝0.19509032

\frac{c_{1}}{c_{7}} = 5.0273394 \approx 5

We select

h ₁＝5

h ₇＝1

We can obtain error less than 0.6% approximation like this

\frac{h_{1}}{\sqrt{h_{1}^{2} + h_{7}^{2}}} = 0.98058067 = 1.00020865 c_{1} \approx c_{1}

\frac{h_{7}}{\sqrt{h_{1}^{2} + h_{7}^{2}}} = 0.19611613 = 0.99476935 c_{7} \approx c_{7}

For among Fig. 2 w5, w6}-＞x5, the computing unit of x6} can be done following processing:

Consider

c ₃＝cos(3π/16)＝0.83146961

c ₅＝cos(5π/16)＝0.55557023

\frac{c_{3}}{c_{5}} = 1.49660576 \approx \frac{3}{2}

We select

h ₃＝3

h ₅＝2

We can obtain error less than 0.2% approximation like this

\frac{h_{3}}{\sqrt{h_{3}^{2} + h_{5}^{2}}} = 0.83205029 = 1.00069838 c_{3} \approx c_{3}

\frac{h_{5}}{\sqrt{h_{3}^{2} + h_{5}^{2}}} = 0.55470019 = 0.99843397 c_{5} {\approx c}_{5}

For among Fig. 2 u4, u5, u6, u7}-＞v4, v5, v6, the computing unit of v7}, can do following processing:

c_{4} = \cos (4 π / 16) = 0.70710678 \approx \frac{12}{17}

We can obtain error less than 0.2% approximation like this

h ₀＝17

h ₄＝12

\frac{h_{4}}{h_{0}} = 0.70588235 = 0.99826840 c_{4} \approx c_{4}

For s0, s1, s2, s3 can obtain according to following formula:

S ₀＝4c ₄

s_{2} = \frac{2}{\sqrt{h_{2}^{2} + h_{6}^{2}}}

s_{1} = \frac{1}{2} \cdot \frac{(\frac{c_{0}}{h_{0}} + \frac{c_{4}}{h_{4}})}{\sqrt{h_{1}^{2} + h_{7}^{2}}}

s_{3} = \frac{1}{2} \cdot \frac{(\frac{c_{0}}{h_{0}} + \frac{c_{4}}{h_{4}})}{\sqrt{h_{3}^{2} + h_{5}^{2}}}

Utilize the symmetry and the consistance of circuit structure, computing unit shown in Figure 9 can be by 7 parts that are decomposed into shown in Figure 10, (w, x are kept in the register, are generally two's complement and represent for input f, output F and intermediate variable u, v).Wherein, part A 1, A2, A3, A4 is in full accord, constitute by 4 first adder-subtractors 1, comprise 4 input ends and 4 output terminals, can share a first module circuit, as shown in figure 11, (in hardware circuit design, generally do not distinguish complement adder and complement code subtracter, be collectively referred to as the complement code adder-subtractor).Part B1, B2, B3, B4 also can share one second element circuit, this second element circuit is made up of 16 shift units and 12 adder-subtractors, all corresponding one second adder-subtractor 3 that connects in per two shift units 2 backs, also all corresponding one the 3rd adder-subtractor 4 that connects in per two second adder-subtractors 3 backs, per four shift units, 2 shared input ends, each the 3rd adder-subtractor is equipped with an output terminal.As shown in figure 12.Described register all is connected with the input/output terminal of first module circuit, second element circuit.

In the first module circuit, output signal out0 and output signal out1 be input signal in0 and input signal in1 and or poor; Output signal out2 and output signal out3 are input signal in2 and input signal in3 and or poor.In second element circuit, input signal in0 displacement is obtained signal in0_0 and signal in0_1, summation (perhaps poor) obtains output signal out0_0 again, and then output signal out0_0 is multiplied by the long-pending of h for input signal in0; Earlier input signal in1 displacement is obtained signal in1_0 and signal in1_1, summation (perhaps poor) obtains output signal out0_1 again, and then output signal out0_1 is multiplied by the long-pending of h for input signal in1; At last, ask output signal out0_0 and output signal out0_1 and or difference obtain output signal out0.

Dct transform circuit of the present invention can adopt pipeline system shown in Figure 13, can reach 4 clock period to finish the processing speed of an one dimension dct transform.Transverse axis express time among Figure 13, each A or B are partly finished with 1 clock period.Arbitrary moment is finished the calculating of an A part and a B part at most, so only an A element circuit of need and one second element circuit can be realized the present invention, and can reach 4 clock period and finishes the processing speed of an one dimension dct transform.

As shown in figure 13, circuit of the present invention is to work according to the following steps to finish dct transform and quantification: (supposing that s obtains L divided by quantization parameter)

Step 1: from register, take out f1, f6, f2, f5} be as the input of first module circuit, and then the first module circuit be output as u1, u6, u2, u5} deposits register in;

Step 2: from register, take out f0, f7, f3, f4} be as the input of first module circuit, and then the first module circuit be output as u0, u7, u3, u4} deposits register in;

Step 3: from register, take out u1, u2, u3, u0} be as the input of first module circuit, and then the first module circuit be output as v1, v2, v3, v0} deposits register in; Simultaneously, from register, take out u7, u4, u5, u6} be as the input of second element circuit, and then second element circuit be output as v7, v4, v5, v6} deposits register in;

Step 4: from register, take out v4, v5, v6, v7} be as the input of first module circuit, and then the first module circuit be output as w4, w5, w6, w7} deposits register in; Simultaneously, from register, take out v1, v0, v2, v3} be as the input of second element circuit, and then second element circuit be output as x1, x0, x2, x3} deposits register in;

Step 5: from register, take out w4, w7, w5, w6} be as the input of second element circuit, and then second element circuit be output as x4, x7, x5, x6} deposits register in; Simultaneously, from register, take out one dimension dct transform next time f1, f6, f2, f5} be as the input of first module circuit, and then the first module circuit be output as one dimension dct transform next time u1, u6, u2, u5} deposits register in;

Step 6: repeating step 2 ~ 5, finish until 16 one dimension dct transforms;

Step 7:, obtain the result of 8 * 8 two-dimensional dct transforms with the x that the obtains L that goes up on duty; Idct circuit:

The Chen scheme has structural symmetry, with the data flow direction negate among Fig. 2, can obtain the idct circuit of Chen scheme.This be because among Fig. 2 all intersections to take advantage of the inverse operation that adds computing unit be exactly this computing unit itself.

The present invention has inherited this characteristic of Chen scheme, with the data flow direction negate among Fig. 9, can obtain idct circuit schematic diagram of the present invention.Different is that the inverse operation of B computing unit and B computing unit differ a scale-up factor.This scale-up factor can be compensated in s, the value of s ' when below being IDCT.

S_{0}^{'} = \frac{1}{s_{0}}

s_{2}^{'} = \frac{1}{s_{2}} \cdot (h_{2}^{2} + h_{6}^{2})

s_{1}^{'} = \frac{1}{s_{1}} \cdot \frac{(h_{1}^{2} + h_{7}^{2})}{h_{0} h_{4}}

s_{3}^{'} = \frac{1}{s_{3}} \cdot \frac{(h_{3}^{2} + h_{5}^{2})}{h_{0} h_{4}}

So dct transform and idct transform can be realized with same circuit among the present invention.As shown in figure 14, circuit of the present invention is to work according to the following steps to finish inverse quantization and idct transform: (supposing that s ' and amassing of quantization parameter are L ')

Step 1: the result of 8 * 8 two-dimensional dct transforms be multiply by L ', obtain the x value;

Step 2: from register, take out x4, x7, x5, x6} be as the input of second element circuit, and then second element circuit be output as w4, w7, w5, w6} deposits register in;

Step 3: from register, take out w4, w5, w6, w7} be as the input of A element circuit, and then the first module circuit be output as v4, v5, v6, v7} deposits register in; Simultaneously, from register, take out x1, x0, x2, x3} be as the input of second element circuit, and then second element circuit be output as v1, v0, v2, v3} deposits register in;

Step 4: from register, take out v1, v2, v3, v0} be as the input of first module circuit, and then the first module circuit be output as u1, u2, u3, u0} deposits register in; Simultaneously, from register, take out v7, v4, v5, v6} be as the input of second element circuit, and then second element circuit be output as u7, u4, u5, u6} deposits register in;

Step 5: from register, take out u0, u7, u3, u4} be as the input of first module circuit, and then the first module circuit be output as f0, f7, f3, f4} deposits register in;

Step 6: from register, take out u1, u6, u2, u5} be as the input of first module circuit, the then output of first module circuit f1, and f6, f2, f5} deposits register in; Simultaneously, from register, take out one dimension dct transform next time x4, x7, x5, x6} be as the input of second element circuit, and then second element circuit be output as one dimension dct transform next time w4, w7, w5, w6} deposits register in;

Step 7: repeating step 3 ~ 6, finish until 16 one dimension idct transforms;

As shown in figure 15: can select similarly in another specific embodiment of the present invention (Loeffler):

h ₂＝12

h ₆＝5

h ₁＝5

h ₇＝1

h ₃＝3

h ₅＝2

For t0 and t1, can be by following processing,

Consider

r_{0} = \frac{1}{\sqrt{h_{3}^{2} + h_{5}^{2}}} = 0.27735009

r_{1} = \frac{1}{\sqrt{h_{1}^{2} + h_{7}^{2}}} = 0.19611613

\frac{r_{0}}{r_{1}} = \frac{\sqrt{26}}{\sqrt{13}} = \sqrt{2} \approx \frac{17}{12}

We select

t ₀＝17

t ₁＝12

We can obtain error less than 0.1% approximation like this

\frac{t_{0}}{\frac{1}{2} (\frac{t_{0}}{r_{0}} + \frac{t_{1}}{r_{1}})} = 0.27759043 = 1.000866551 r_{0} {\approx r}_{0}

\frac{t_{1}}{\frac{1}{2} (\frac{t_{0}}{r_{0}} + \frac{t_{1}}{r_{1}})} = {0.19594619 = 0.999133448 r}_{1} {\approx r}_{1}

In a word, for dct transform

S ₀＝C ₄

s_{2} = \frac{1}{\sqrt{h_{2}^{2} + h_{6}^{2}}}

s_{1} = \frac{c_{4}}{\frac{1}{2} (\frac{t_{0}}{r_{0}} + \frac{t_{1}}{r_{1}})}

s_{3} = \frac{1}{\frac{1}{2} (\frac{t_{0}}{r_{0}} + \frac{t_{1}}{r_{1}})}

For idct transform

s_{0}^{'} = \frac{1}{s_{0}}

s_{2}^{'} = \frac{1}{s_{2}} \cdot (h_{2}^{2} + h_{6}^{2})

s_{1}^{'} = \frac{1}{s_{1}} \cdot \frac{(h_{1}^{2} + h_{7}^{2})}{t_{0} t_{1}}

s_{3}^{'} = \frac{1}{s_{3}} \cdot \frac{(h_{3}^{2} + h_{5}^{2})}{t_{0} t_{1}}

Similarly, dct transform circuit in this specific embodiment and idct transform circuit also can be realized by an A element circuit (as Figure 11) and a B element circuit (as Figure 12).But, can only reach 5 clock period to finish the processing speed of an one dimension dct transform.

More than each routine error of calculation in 1%, can satisfy the requirement of most of Standard of image compression.For reaching higher computational accuracy, we can seek the h value in bigger scope, approach C7/C1, C6/C2, C5/C3 and C4.If employing h is the natural number smaller or equal to 224, and only comprises no more than 3 " 1 " or " 1 ", then can reach the requirement of IEEE Std 1180-1990 standard.

Protection scope of the present invention is not limited to above-mentioned specific embodiment, the known technology conversion of all those skilled in that art all drops in protection scope of the present invention, such as to circuit theory diagrams Fig. 9 of the present invention and Figure 15 different decomposition methods and different pipeline organizations being arranged; In circuit theory diagrams Fig. 9 of the present invention and Figure 15, can adopt different special several to h value and various combination thereof.

Claims

1. the number cosine converting circuit of a high-precision multiplier-less is characterized in that, comprises first module circuit, second element circuit, register that circuit connects;

2. the number cosine converting of a high-precision multiplier-less and quantization method, it is characterized in that, at first select specific s value, make A ≈ a/s and B ≈ b/s, wherein a, b are multiplication factor, A, B is a natural number, the figure place that its binary expression needs is the least possible, and the number of " 1 " that wherein comprises or " 1 " is minimum, comprises the steps:

Step 1, replace a multiplier, to finish the number cosine converting of band scale-up factor s with shift unit and adder-subtractor;

Step 2, the result exported in conversion in quantizer, be multiplied by L, wherein L be s divided by quantization parameter, the dct transform result after obtaining quantizing;

Wherein, the system of selection of described s value is: make h2/h6 ≈ C6/C2, h3/h5 ≈ C3/C5, h1/h7 ≈ C1/C7 and h4/h0 ≈ C4/C0, Ck=cos (k П/16) (k=0 wherein, 1, ... 7), each h value is a natural number, and the figure place that its binary expression needs is the least possible, and the number of " 1 " that wherein comprises or " 1 " is minimum.

3. the number cosine converting of high-precision multiplier-less according to claim 2 and quantization method is characterized in that, described A, B or h optionally number are 1,2,3,4,5,6,7,8,9,10,12,14,15,16,17,18,20 and 24, then available two shift units and an adder-subtractor replace a multiplier, to finish number cosine converting.

4. the number cosine converting of high-precision multiplier-less according to claim 3 and quantization method is characterized in that, described A, B or h are 1,2,3,4,5,6,7,8,9,10,12,14,15,16,17,18,20 or 24 divided by or multiply by 2 power.

5. the number cosine converting of high-precision multiplier-less according to claim 2 and quantization method is characterized in that, the number cosine converting method of band scale-up factor s comprises the steps: in the step 1

Step 6: repeating step 2 ~ 5, finish until 16 one dimension dct transforms.

6. the number cosine converting of high-precision multiplier-less according to claim 5 and quantization method, it is characterized in that the input/output relation of described first module circuit is: output signal out0 and output signal out1 be input signal in0 and input signal in1 and or poor; Output signal out2 and output signal out3 are input signal in2 and input signal in3 and or poor.

7. the number cosine converting of high-precision multiplier-less according to claim 5 and quantization method, it is characterized in that, the input/output relation of described second element circuit is: input signal in0 displacement obtains signal in0_0 and signal in0_1, summation or difference obtain output signal out0_0 again, and then output signal out0_0 is multiplied by the long-pending of h for input signal in0; Input signal in1 displacement obtains signal in1_0 and in1_1, and summation or difference obtain output signal out0_1 again, and then output signal out0_1 is multiplied by the long-pending of h for input signal in1; At last, ask output signal out0_0 and output signal out0_1 and or difference obtain output signal out0.

8. the reverse number cosine converting circuit of a high-precision multiplier-less is characterized in that, comprises first module circuit, second element circuit, register that circuit connects;

9. the reverse number cosine converting and the quantization method of a high-precision multiplier-less is characterized in that, at first select particular value s ', and s ' and s differ a scale-up factor, and computation process comprises the steps:

Wherein, the reverse number cosine converting method of band scale-up factor s comprises the steps: in the described step 2

Step 1: from register, take out x4, x7, x5, x6} be as the input of second element circuit, and then second element circuit be output as w4, w7, w5, w6} deposits register in;

Step 2: from register, take out w4, w5, w6, w7} be as the input of first module circuit, and then the first module circuit be output as v4, v5, v6, v7} deposits register in; Simultaneously, from register, take out x1, x0, x2, x3} be as the input of second element circuit, and then second element circuit be output as v1, v0, v2, v3} deposits register in;

Step 3: from register, take out v1, v2, v3, v0} be as the input of first module circuit, and then the first module circuit be output as u1, u2, u3, u0} deposits register in; Simultaneously, from register, take out v7, v4, v5, v6} be as the input of second element circuit, and then second element circuit be output as u7, u4, u5, u6} deposits register in;

Step 4: from register, take out u0, u7, u3, u4} be as the input of first module circuit, and then the first module circuit be output as f0, f7, f3, f4} deposits register in;

Step 5: from register, take out u1, u6, u2, u5} be as the input of first module circuit, the then output of first module circuit f1, and f6, f2, f5} deposits register in; Simultaneously, from register, take out one dimension dct transform next time x4, x7, x5, x6} be as the input of second element circuit, and then second element circuit be output as one dimension dct transform next time w4, w7, w5, w6} deposits register in;

Step 6: repeating step 2 ~ 5, finish until 16 one dimension idct transforms.