CN103067023A

CN103067023A - Efficient discrete wavelet transform (DWT) encoding method and encoder based on promotion

Info

Publication number: CN103067023A
Application number: CN2012105070637A
Authority: CN
Inventors: 张为; 姜喆; 刘艳艳; 高志宇
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2012-11-29
Filing date: 2012-11-29
Publication date: 2013-04-24

Abstract

The invention relates to the field of image encoding and decoding, and provides an encoder which has high encoding efficiency and is capable of satisfying the application needs of high speed encoding. The technical scheme is that the efficient discrete wavelet transform (DWT) encoder based on promotion comprises a row encoding unit used for a row filter part in 2-D DWT, a line encoding unit used for a line filter part in the 2-D DWT and a transposition cache unit used for caching data between the row filter and the line filter, and adjusting a data flow structure. Data enter into the row encoding unit to change in a row-line mode, the data enter into the line encoding unit to change in a row mode, output data of the line filter form four sub-band signals of LL, LH, HL and HH after gain calculation, and after the four sub-band signals are arrayed again, the decomposition process of the 2-D DWT can be completed. The efficient DWT encoder is mainly used for image encoding and decoding.

Description

Based on the efficient DWT coding method and the encoder that promote

Technical field

The present invention relates to image coding and decoding field, specifically, relate to based on the efficient DWT coding method and the encoder that promote.

Background technology

Discrete wavelet transform (DWT) is a kind of effective multiresolution analysis instrument, has good time-frequency local characteristics, signal decomposition can be become have the different filial generations of time domain specification.Its code efficiency and image restoration quality all are higher than traditional discrete cosine transform (DCT), thereby are widely used in signal processing and image compression field, such as MPEG-4, and JPEG2000 etc.

Traditional wavelet is realized by convolution algorithm, calculation of complex, memory space requires high, be unfavorable for that VLSI realizes, referring to K.K.Parhi and T.Nishitani, " VLSI architecture for discrete wavelet transforms, " IEEE Trans.VeryLarge Scale Integra. (VLSI) Syst., vol.1.For addressing this problem, Daubechies etc. have proposed the key technology of Second Generation Wavelet Transformation---boosting algorithm, referring to I.Daubechies and W Sweldens, " Factoring wavelet transforminto lifting steps, " J.Fourier Anal.Appl., vol.4, no.3, pp.245-267, this algorithm of Mar.1998. tapers to amount of calculation half of convolution algorithm effectively, improved speed and the practicality of wavelet transformation, made effective hardware realize becoming possibility.

There are many hardware implementation mode in DWT encoder based on boosting algorithm at present, and various realization frameworks have all promoted the progress of technology to a certain extent, but still exist factors limiting hard-wired efficient.The problem that exists in typical several hardware implementation mode and the realization is as follows.

Implementation method 1: adopt direct mode that boosting algorithm is realized, according to the algorithm requirement, finish line by line line translation first, carry out by column again rank transformation.Referring to J.M.Jou, Y.H.Shiau, and C.C.Liu, " Efficient VLSI architectures forthe biorthogonal wavelet transform by flter bank and lifting scheme; " in Proc.IEEE ISCAS, May2001.

Problem 1: because this framework need to be finished whole line translations and just can carry out the calculating of rank transformation, therefore need the whole line translation result of buffer memory, and for the processing of big data quantity, such as high-definition image, the quantity of this intermediate object program is very huge, needs extra outside sram to store.Simultaneously, the end that begins need to wait for line translation of rank transformation, this has prolonged whole computing time, and in every grade of conversion owing to framework is not optimized, therefore have long critical path and delay time, further affected coder processes speed.

Implementation method 2: based on the DWT framework of row, this structure adopts the implementation method of line translation and rank transformation executed in parallel, has not only promoted processing speed, and can adopt the outer sram of the alternative sheet of transposition buffer memory between ranks.Referring to C.Chrysafits and A.Ortega, " Line-based, reduced memory, wavelet image compression. " IEEE Trans.SignalProcess., vol.9, no.3, pp.378-389, Mar.2000.

Problem 2: although realized the parallel processing of row-column transform, processing mode still is that serial process and the transposition buffer memory between ranks of single input is still larger.In each filter, the critical path time-delay is still longer simultaneously.

The people such as implementation method 3:Xiong have proposed FA and two kinds of implementations of HA: shorten output time-delay and computing time by parallel ranks filter circuit, and need not to arrange transposition buffer memory between ranks in framework.Referring to C.Xiong, J.Tian, andJ.Liu; " Efficient Achitectures for Two-Dimensional Discrete Wavelet Transform U sing LiftingScheme; " IEEE Trans.Image Process., vol.16, no.3, pp.607-614, Mar.2007.

Problem 3: although this framework multiplexing prediction and refresh circuit, critical path is become Tm+2Ta, wherein Tm and Ta are respectively the critical path time-delay of a multiplier and adder.But this framework does not still cut off multiplication and adds the delay accumulation of method, and critical path is still longer.Simultaneously, although this framework has adopted parallel design of filter, the transposition buffer memory is reduced to the size of 4 registers, and reduces by half computing time.But owing to need the input data are carried out buffer memory, total inner buffer still is 5.5N when processing the view data of N*N size, and hardware complexity is still higher.

A kind of important DWT framework between the conduct in recent years of implementation method 4:Flipping structure is proposed by huang the earliest.This design is carried out the equivalence distortion to promoting formula, has abandoned multiplier being arranged on the way between input node and computing node in the past, effectively reduces the critical path time-delay by the position that changes multiplier.And this framework can carry out the flowing water classification and process by adding the mode of pipeline register, and 5 level production lines namely can be down to the critical path time-delay size of 1 Tm.Referring to C.-T.Huang, P.-C.Tseng, and L.-G.Chen; " Flipping structure:An efficient VLSI architecture forlifting-based discrete wavelet transform; " IEEE Trans.SignalProcess., vol.52, no.4, pp.1080-1089, Apr.2004.

Problem 4: as a kind of improved based on row DWT framework, the Fliping structure can increase huge middle buffer memory too when carrying out the flowing water design.Therefore Huang thinks should limit the size of buffer memory in the middle of the number correspondence of pipeline register in the 2-D DWT framework in the 1-D DWT circuit pipeline register quantity in the 1-D DWT circuit, but also affect undoubtedly the shortening of critical path.Wherein, 1-D and 2-D are the dimension of wavelet transformation, and our regulation once goes merely filtering or row to be filtered into the 1-D wavelet transformation, and the filtering of namely going is listed as again and is filtered into the 2-D wavelet transformation.

Summary of the invention

The present invention is intended to overcome the deficiencies in the prior art, and a kind of higher code efficiency that has is provided, and can satisfy the encoder of the application needs of high spped coding, for achieving the above object, the technical scheme that the present invention takes is that based on the efficient DWT coding method that promotes, step is:

Formula (1)～(6) are the lifting step formula of 9/7DWT, totally are made of two parts, are respectively the lifting of 4 steps and 2 step convergent-divergents: formula (1)～(4) are 4 step lifting process, promote for corresponding 1 time successively, 1 prediction, 2 liftings, 2 predictions; Formula (5)～(6) are 2 step convergent-divergent processes, are respectively high frequency convergent-divergent and low frequency convergent-divergent;

\frac{1}{α} \times y (2 n + 1) = \frac{1}{α} \times x (2 n + 1) + x (2 n) + x (2 n + 2) - - - (1)

\frac{1}{β} \times y (2 n) = \frac{1}{β} \times x (2 n) + y (2 n - 1) + y (2 n + 1) - - - (2)

\frac{1}{γ} \times H (2 n + 1) = \frac{1}{γ} \times y (2 n + 1) + y (2 n) + y (2 n + 2) - - - (3)

\frac{1}{δ} \times L (2 n) = \frac{1}{δ} \times y (2 n) + H (2 n - 1) + H (2 n + 1) - - - (4)

H°(2n+1)＝e×H(2n+1) (5)

L°(2n)＝f×L(2n) (6)

Wherein, n is the integer more than or equal to 0, and x (n) is the original image value of input, and y (n) is the one-level operation result in the algorithm, and H ° (2n+1) and L ° (2n) is final high fdrequency component and the low frequency component that generates that decompose; The value of coefficient of correspondence is as follows: α=-1.586134342, β=-0.052980118, γ=0.882911075, δ=0.443506852, K=1.230174105.

With formula (1) substitution formula (2) and by equivalent variations to every the carrying out from new combination in the formula:

\frac{1}{αβ} \times y (2 n) = \frac{1}{αβ} \times x (2 n) + \frac{1}{α} \times y (2 n - 1) + \frac{1}{α} \times y (2 n + 1)

= \frac{1}{αβ} \times x (2 n) + \frac{1}{α} \times x (2 n + 1) + x (2 n) + x (2 n + 2)

+ \frac{1}{α} \times x (2 n - 1) + x (2 n - 2) + x (2 n) - - - (7)

In like manner,

= (\frac{1}{αβ} + 1) \times x (2 n) + \frac{1}{α} \times x (2 n + 1) + x (2 n) + x (2 n + 2)

+ \frac{1}{α} \times x (2 n - 1) + x (2 n - 2)

= [(\frac{1}{αβ} + 1) \times x (2 n) + \frac{1}{α} \times x (2 n - 1) + x (2 n - 2)] + [\frac{1}{α} \times x (2 n + 1) + x (2 n) + x (2 n + 2)]

Obtain:

\frac{1}{αβδγ} \times L (2 n) = \frac{1}{αβ} {[(\frac{1}{δγ} + 1) \times y (2 n) + \frac{1}{γ} \times y (2 n - 1) + y (2 n - 2)]

+ [\frac{1}{γ} \times y (2 n + 1) + y (2 n) + y (2 n + 2)]}

(8)

4 intermediate variable D are set ₁ ^k(n), D ₂ ^k(n), D ₃ ^k(n), D ₄ ^k(n) shown in formula (9)～(12), wherein k is the line number at current calculating place with respect to line translation, is current number of scans for rank transformation, and the parallel scanning of finishing adjacent two row of regulation is single pass in the rank transformation:

D_{1}^{k} (n) = \frac{1}{α} \times x (2 n + 1) + x (2 n) - - - (9)

D_{2}^{k} (n) = (\frac{1}{αβ} + 1) \times x (2 n) + \frac{1}{α} \times x (2 n - 1) + x (2 n - 2) - - - (10)

D_{3}^{k} (n) = \frac{1}{γ} \times y (2 n + 1) + y (2 n) - - - (11)

D_{4}^{k} (n) = (\frac{1}{δγ} + 1) \times y (2 n) + \frac{1}{γ} \times y (2 n - 1) + y (2 n - 2) - - - (12)

With variables D ₁ ^k(n), D ₂ ^k(n), D ₃ ^k(n), D ₄ ^k(n) substitution formula (1), (7), (3), (8) algorithm that is improved:

\frac{1}{α} \times y (2 n + 1) = D_{1}^{k} (n) + x (2 n + 2) - - - (13)

\frac{1}{αβ} \times y (2 n) = D_{2}^{k} (n) + D_{1}^{k} (n) + x (2 n + 2) - - - (14)

\frac{1}{γ} \times H (2 n + 1) = D_{3}^{k} (n) + y (2 n + 2) - - - (15)

\frac{1}{αβδγ} \times L (2 n) = D_{4}^{k} (n) + D_{3}^{k} (n) + y (2 n + 2) - - - (16) .

Based on the efficient DWT encoder that promotes, structure is:

The row coding unit is for the row filtering part of 2-D DWT;

The row coding unit is for the capable filtering part of 2-D DWT;

The transposition buffer unit is used for the data between the cache lines column filter, and adjusts data flow architecture;

Data enter the row coding unit and carry out rank transformation, and coding unit is the dual input dual output, and each clock cycle reads in two data and processes; After the row coding is finished certain line number, export acquired results to the transposition buffer unit; Data flow is through transposition, satisfied the requirement of capable coding, this moment, data entered the conversion of capable coding unit begin column, the data of row filtering output form LL, LH, 4 subband signals of HL, HH after calculating through gain, can finish the decomposable process of one-level 2-D DWT after 4 filial generation components are reset.

The row coding unit is by register, multiplier, adder and 4 dual-port ram formations that size is N, the dual-port ram that 4 sizes are N is sram1, sram2, sram3, sram4, be used for storing the intermediate variable that produces when calculating, N is the size of N*N view data: whole row coding unit is comprised of 5 level production lines, every level production line is reached by a register and forms in order to adder or the multiplier of finishing corresponding computing, after the work of row coding unit, data enter column filter according to the direction of row, each clock cycle is read in two data: be respectively odd term x (2n+1) and even item x (2n), in the 1st level production line the data of reading in are carried out multiplying afterwards, acquired results participates in the intermediate variable D of the 2nd level production line in next cycle ₁ ^k(n) and D ₂ ^k(n) calculating, and the intermediate variable D that will newly obtain ₁ ^k(n), D ₂ ^k(n) deposit sram1 and sram2 in, simultaneously, from sram1 and sram2, read required intermediate variable D ₁ ^K-1(n) and D ₂ ^K-1(n) be used for the calculating of y (2n-1)/α and y (2n-2)/α β; The intermediate variable D that upper two row deposit in ₁ ^K-1(n) and D ₂ ^K-1(n) keep N all after date and read, for the rank transformation of the two row data of newly reading in, the intermediate variable D that will newly produce simultaneously ₁ ^k(n) and D ₂ ^k(n) deposit sram1, sram2 in, to upgrade D ₁ ^K-1(n) and D ₂ ^K-1(n) content at place, address of living in; Take two registers of 3rd level streamline as the boundary, the first half circuit is finished the calculating of formula 13 and 14, and the latter half circuit is finished the calculating of formula 15 and 16; Latter half of circuit needs to change institute's multiplying factor of multiplier in the 3rd level streamline, accept the y (2n-1) that the 2nd level production line calculates/α and y (2n-2)/α β and finish corresponding multiplying; In the 4th level production line, carry out upgrading when formula 15,16 calculates corresponding intermediate variable D ₃ ^K-1(n) and D ₄ ^K-1(n), intermediate variable D ₃ ^K-1(n) and D ₄ ^K-1(n) corresponding stored is exported final low-and high-frequency result in the 5th level production line in sram3, sram4 afterwards respectively.

The row coding unit is comprised of row register, row multiplier, row adder and other 4 register buffer1, buffer2, buffer3, buffer4, whole row coding unit is comprised of 5 level production lines, every level production line forms by a capable register and in order to capable adder or the row multiplier of finishing corresponding computing, the input data are carried out multiplying in first order streamline, and carry out intermediate variable D in the streamline of the second level ₁ ^k(n+1) and D ₂ ^k(n+1) calculating also deposits corresponding buffer1, buffer2 in, reads simultaneously the intermediate variable D at another place, address among buffer1, the buffer2 ₁ ^k(n) and D ₂ ^k(n) be used for calculating y (2n+1)/α and y (2n)/α β; The buffer that arranges is used for high frequency intermediate variable capable and that low frequency is capable and alternately reads, and satisfies the calculating complete independently that high frequency is capable and low frequency is capable; In the 3rd level streamline Output rusults y (2n+1) of first half circuit/α and y (2n)/α β are carried out multiplying, and input to calculating and intermediate variable D that the 4th level production line is finished formula 15 and 16 ₃ ^k(n+1) and D ₄ ^k(n+1) renewal; Final low-and high-frequency component H and L export in the 5th level production line.

Technical characterstic of the present invention and effect:

2-D DWT code device proposed by the invention is under the prerequisite that adopts the dual input dual output, by the improvement algorithm that proposes, realized that 3 grades of pipeline can process the framework that one-level promotes, effectively reduced the quantity of internal register, and reached the critical path time-delay of a Tm.The mode of at first carrying out rank transformation is adopted in design, need not the rank transformation result is carried out buffer memory, directly data are sent into capable coding unit by the transposition buffer unit and processed, the transposition buffer memory of required 1.5N size when having substituted general processing N*N sized images data.Only in the row filter unit, adopt the inner buffer of 2 2N in the 2-D framework, had extremely low hardware store consumption.

Description of drawings

Fig. 1 is the improvement 9/7DWT algorithm schematic diagram based on promoting of the present invention;

Fig. 2 is the general frame of DWT encoder of the present invention;

Fig. 3 row coding unit;

Fig. 4 transposition buffer unit;

The signal of data flow before and after Fig. 5 transposition unit;

The capable coding unit of Fig. 6.

Embodiment

Example of the present invention provides a kind of improvement 9/7DWT algorithm and code device based on promoting, and is used for improving the output speed of DWT encoder, reduces hardware resource consumption, carries the decoding efficiency of encoder.

The present invention a kind ofly changes code device based on the Novel hoisting algorithm with based on the 2-d discrete wavelet of this algorithm.

Improved DWT algorithm: comprise two-stage prediction and renewal in 9/7 wavelet arithmetic, and final gain.And the combined with hardware framework has been set the intermediate variable item D of 4 correspondences in algorithm ₁ ^k(n), D ₂ ^k(n), D ₃ ^k(n), D ₄ ^k(n) be convenient to embody the corresponding relation of New Algorithm and hardware structure.

Improved DWT code device comprises:

The row coding unit is for the row filtering part of 2-D DWT.

The row coding unit is for the capable filtering part of 2-D DWT.

The transposition buffer unit is used for the data between the cache lines column filter, and adjusts data flow architecture.

For the dual input dual output framework of matching design, improve the processing speed of encoder, reduce the resource overhead of coded hardware, this implementation method provides a kind of improvement 9/7DWT algorithm based on promoting

Referring to Fig. 1, be the relation of improvement algorithm of the present invention and primal algorithm shown in the figure.Invention has proposed novel intermediate variable form, and based on this 9/7 original boosting algorithm has been carried out modification, thereby obtained inventing the improvement 9/7DWT algorithm that adopts according to the needs of design.

Formula (1)～(6) are the lifting step formula of original 9/7DWT.Totally consisted of by two parts, be respectively the lifting of 4 steps and 2 step convergent-divergents.Formula (1)～(4) are 4 step lifting process, promote successively 1 prediction, 2 liftings, 2 predictions for corresponding 1 time.Formula (5)～(6) are 2 step convergent-divergent processes, are respectively high frequency convergent-divergent and low frequency convergent-divergent.

\frac{1}{α} \times y (2 n + 1) = \frac{1}{α} \times x (2 n + 1) + x (2 n) + x (2 n + 2) - - - (1)

\frac{1}{β} \times y (2 n) = \frac{1}{β} \times x (2 n) + y (2 n - 1) + y (2 n + 1) - - - (2)

\frac{1}{γ} \times H (2 n + 1) = \frac{1}{γ} \times y (2 n + 1) + y (2 n) + y (2 n + 2) - - - (3)

\frac{1}{δ} \times L (2 n) = \frac{1}{δ} \times y (2 n) + H (2 n - 1) + H (2 n + 1) - - - (4)

H°(2n+1)＝e×H(2n+1) (5)

L°(2n)＝f×L(2n) (6)

Can find out that from above-mentioned lifting formula the boosting algorithm complexity is low, be fit to hardware and realize.But, hardware structure based on traditional boosting algorithm exists clock frequency low mostly, the problems such as the hardware implementation efficiency is on the low side, become the difficult problem that impact is used, this the present invention further optimized by new compound mode original boosting algorithm make the combination of data tightr, reach the design object that on the parallel scan basis, realizes single multiplier critical path time-delay with less pipeline series.The algorithmic formula of invention is derived as follows:

\frac{1}{αβ} \times y (2 n) = \frac{1}{αβ} \times x (2 n) + \frac{1}{α} \times y (2 n - 1) + \frac{1}{α} \times y (2 n + 1)

= \frac{1}{αβ} \times x (2 n) + \frac{1}{α} \times x (2 n + 1) + x (2 n) + x (2 n + 2)

+ \frac{1}{α} \times x (2 n - 1) + x (2 n - 2) + x (2 n) - - - (7)

In like manner,

= (\frac{1}{αβ} + 1) \times x (2 n) + \frac{1}{α} \times x (2 n + 1) + x (2 n) + x (2 n + 2)

+ \frac{1}{α} \times x (2 n - 1) + x (2 n - 2)

= [(\frac{1}{αβ} + 1) \times x (2 n) + \frac{1}{α} \times x (2 n - 1) + x (2 n - 2)] + [\frac{1}{α} \times x (2 n + 1) + x (2 n) + x (2 n + 2)]

Can obtain:

\frac{1}{αβδγ} \times L (2 n) = \frac{1}{αβ} {[(\frac{1}{δγ} + 1) \times y (2 n) + \frac{1}{γ} \times y (2 n - 1) + y (2 n - 2)]

+ [\frac{1}{γ} \times y (2 n + 1) + y (2 n) + y (2 n + 2)]} - - - (8)

4 intermediate variable D that arrange ₁ ^k(n), D ₂ ^k(n), D ₃ ^k(n), D ₄ ^k(n) shown in formula (9)～(12), wherein k is the line number at current calculating place with respect to line translation, is current number of scans for rank transformation, and the parallel scanning of finishing adjacent two row of regulation is single pass in the rank transformation.

D_{1}^{k} (n) = \frac{1}{α} \times x (2 n + 1) + x (2 n) - - - (9)

D_{2}^{k} (n) = (\frac{1}{αβ} + 1) \times x (2 n) + \frac{1}{α} \times x (2 n - 1) + x (2 n - 2) - - - (10)

D_{3}^{k} (n) = \frac{1}{γ} \times y (2 n + 1) + y (2 n) - - - (11)

D_{4}^{k} (n) = (\frac{1}{δγ} + 1) \times y (2 n) + \frac{1}{γ} \times y (2 n - 1) + y (2 n - 2) - - - (12)

With variables D ₁ ^k(n), D ₂ ^k(n), D ₃ ^k(n), D ₄ ^k(n) substitution formula (1), (7), (3), (8) obtain inventing the improvement algorithm that adopts:

\frac{1}{α} \times y (2 n + 1) = D_{1}^{k} (n) + x (2 n + 2) - - - (13)

\frac{1}{αβ} \times y (2 n) = D_{2}^{k} (n) + D_{1}^{k} (n) + x (2 n + 2) - - - (14)

\frac{1}{γ} \times H (2 n + 1) = D_{3}^{k} (n) + y (2 n + 2) - - - (15)

\frac{1}{αβδγ} \times L (2 n) = D_{4}^{k} (n) + D_{3}^{k} (n) + y (2 n + 2) - - - (16)

Apparatus of the present invention comprise:

The row coding unit is for the row filtering part of 2-D DWT.

The row coding unit is for the capable filtering part of 2-D DWT.

The present invention adopts the general structure of dual input dual output, as shown in Figure 2.After code device is started working, at first read in initial data, data enter the row coding unit and carry out rank transformation, and coding unit is the dual input dual output, and each clock cycle reads in two data and processes; After the row coding is finished certain line number, export acquired results to the transposition buffer unit.Data flow is through transposition, satisfied the requirement of capable coding, this moment, data entered the conversion of capable coding unit begin column, the data of row filtering output form LL, LH, 4 subband signals of HL, HH after calculating through gain, can finish the decomposable process of one-level 2-D DWT after 4 filial generation components are reset.

The row coding unit framework of invention is seen Fig. 3.The rectangle frame that wherein indicates D is register, indicate x and+circular frame represent respectively multiplier and adder.Transposition buffer memory between the reduction ranks improves computational speed, has adopted the structure of dual input dual output.Whole coding unit is comprised of 5 level production lines.After the work of row coding unit, data enter column filter according to the direction of row, and each clock cycle is read in two data: be respectively odd term x (2n+1) and even item x (2n).In the 1st level production line the data of reading in are carried out multiplying afterwards, acquired results participates in the intermediate variable D of the 2nd level production line in next cycle ₁ ^k(n) and D ₂ ^k(n) calculating, and the intermediate variable D that will newly obtain ₁ ^k(n), D ₂ ^k(n) deposit sram1 and sram2 in.Simultaneously, when k＞1, from sram1 and sram2, read required intermediate variable D ₁ ^K-1(n) and D ₂ ^K-1(n) be used for the calculating of y (2n-1)/α and y (2n-2)/α β; For rank transformation, owing to adopted based on the parallel transmission mode of going the intermediate variable D that therefore reads ₁ ^K-1(n) and D ₂ ^K-1(n) calculate the intermediate variable that deposits in for the respective column place of upper two row.Namely go up the intermediate variable D that two row deposit in ₁ ^K-1(n) and D ₂ ^K-1(n) keep N all after date and read, for the rank transformation of the two row data of newly reading in, the intermediate variable D that will newly produce simultaneously ₁ ^k(n) and D ₂ ^k(n) deposit sram1, sram2 in, to upgrade D ₁ ^K-1(n) and D ₂ ^K-1(n) content at place, address of living in.Sram1, sram2 are that two groups of sizes on the sheet are the dual-port ram of N, are used for storing the intermediate variable that produces when calculating.N is the size of N*N view data.Contrast formula 13 to 16 can find out, formula 13 and 15 and formula 14 and 16 have symmetrical formula structures, the difference of the two is following 2 points: institute's multiplying factor item of (1) multiplying is different.(2) corresponding raw image data x (n) all replaces with one-level operation result y (n) in the formula 13,14 in formula 15,16.Therefore in hardware circuit shown in Figure 3, the circuit of front and back has symmetrical structure, and two registers of the 3rd row are as the boundary in to scheme, and the first half circuit is finished the calculating of formula 13 and 14, and the latter half circuit is finished the calculating of formula 15 and 16.Therefore latter half of circuit needs to change institute's multiplying factor of multiplier in the 3rd level streamline, accepts the y (2n-1) that the 2nd level production line calculates/α and y (2n-2)/α β and finishes corresponding multiplying.Can finish follow-up coding work according to the processing mode identical with anterior circuit follow-up, namely in the 4th level production line, carry out the calculating of formula 15,16, upgrade simultaneously corresponding intermediate variable D ₃ ^K-1(n) and D ₄ ^K-1(n), in the 5th level production line, final low-and high-frequency result is exported afterwards.

Row coding unit based on this design can guarantee that column filter finishing on the basis of parallel scan, is limited in 4N with inner buffer.Simultaneously, finish the required register quantity of lifting calculating and successfully remain on 12.Such design has not only reduced the complexity of hardware, and the computing unit that arranges between per two registers all is no more than an adder or a multiplier, has realized the critical path time-delay of a Tm, and circuit can be operated under the higher clock frequency.

Fig. 4 is the transposition buffer unit.Because the parallel filtering circuit of the dual input dual output that rank transformation adopts, therefore avoided traditional based on the excessive problem of transposition buffer memory in the row DWT circuit framework, the less transposition module of expense only need be set can replace 1.5N transposition buffer memory in general the realization.Therefore the transposition buffer unit of invention employing has been used 3 registers and two MUX, and purpose is that the data flow that will export is converted into the required data order of row coding, to reach the purpose of ranks coding circuit concurrent working.Data before and after the transposition distribute and are provided by Fig. 5.

Fig. 6 is the architecture design of row coding unit, and the circuit framework of row coding unit and the framework of row coding unit are basic identical, all be based on 5 stage pipeline structure to build, and the front and rear part of framework have symmetry.If with the two paths of data flow point that reads in of dual input dual output framework although to do high frequency capable and low frequency is capable, then scanning sequency essence is the capable and low frequency Z-type scanning in the ranks of a kind of high frequency, namely alternately capable to high frequency of low frequency capable data read, per cycle is read in the adjacent two data that is positioned at delegation, until after last capable data of high frequency read in, jump into that next low frequency is capable to carry out reading of two new row data.Therefore row coding unit and row coding unit framework unique different are in the 2nd grade and the 4th level production line middle variables D ₁ ^k(n), D ₂ ^k(n), D ₃ ^k(n), D ₄ ^k(n) storage and reading manner.Being expert at need not to be listed as the middle buffer memory of 4N in the coding in the coding unit, and the buffer that only to need two sizes be 2 registers can finish the storage of intermediate variable.Storage mode is slightly different from rank transformation, and the input data are carried out multiplying in first order streamline, and carry out intermediate variable D in the streamline of the second level ₁ ^k(n+1) and D ₂ ^k(n+1) calculating also deposits buffer in, reads simultaneously the intermediate variable D at another place, address among the buffer ₁ ^k(n) and D ₂ ^k(n) be used for calculating y (2n+1)/α and y (2n)/α β.In addition, because line translation alternately reads two row data, so intermediate variable need not to keep N cycle, and the buffer of setting alternately reads for high frequency intermediate variable capable and that low frequency is capable, satisfies the calculating complete independently that high frequency is capable and low frequency is capable, does not interfere with each other.In the 3rd level streamline Output rusults y (2n+1) of first half circuit/α and y (2n)/α β are carried out multiplying, and input to calculating and intermediate variable D that the 4th level production line is finished formula 15 and 16 ₃ ^k(n+1) and D ₄ ^k(n+1) renewal.Final low-and high-frequency component H and L export in the 5th level production line.

In sum, the present invention at first proposes a kind of improved 9/7DWT boosting algorithm, and carries out based on this design of encoder, and encoder has been realized the coding circuit of DWT on the basis of adopting parallel scan.For the 2-D framework, owing to adopted the parallel scan mode of dual input dual output, inner buffer is limited in minimum 4N, and the transposition module of available 3 registers and 2 MUX formations replaces the transposition buffer memory of 1.5N; In each coding unit, realized simultaneously the critical path time-delay of a multiplier, and had less register expense, and when processing the N*N view data, be N the computing time of whole two field picture ²/ 2.Therefore, the present invention has low hardware complexity and processing speed faster.It is a kind of DWT code device of highly effective.

Claims

1. efficient DWT coding method based on lifting, it is characterized in that, step is: formula (1)～(6) are the lifting step formula of 9/7DWT, totally consisted of by two parts, be respectively that 4 steps promoted and 2 step convergent-divergents: formula (1)～(4) are 4 step lifting process, promote successively 1 prediction for corresponding 1 time, promote 2 predictions for 2 times; Formula (5)～(6) are 2 step convergent-divergent processes, are respectively high frequency convergent-divergent and low frequency convergent-divergent;

\frac{1}{α} \times y (2 n + 1) = \frac{1}{α} \times x (2 n + 1) + x (2 n) + x (2 n + 2) - - - (1)

\frac{1}{β} \times y (2 n) = \frac{1}{β} \times x (2 n) + y (2 n - 1) + y (2 n + 1) - - - (2)

\frac{1}{γ} \times H (2 n + 1) = \frac{1}{γ} \times y (2 n + 1) + y (2 n) + y (2 n + 2) - - - (3)

\frac{1}{δ} \times L (2 n) = \frac{1}{δ} \times y (2 n) + H (2 n - 1) + H (2 n + 1) - - - (4)

H°(2n+1)＝e×H(2n+1) (5)

L°(2n)＝f×L(2n) (6)

\frac{1}{αβ} \times y (2 n) = \frac{1}{αβ} \times x (2 n) + \frac{1}{α} \times y (2 n - 1) + \frac{1}{α} \times y (2 n + 1)

= \frac{1}{αβ} \times x (2 n) + \frac{1}{α} \times x (2 n + 1) + x (2 n) + x (2 n + 2)

+ \frac{1}{α} \times x (2 n - 1) + x (2 n - 2) + x (2 n) - - - (7)

= (\frac{1}{αβ} + 1) \times x (2 n) + \frac{1}{α} \times x (2 n + 1) + x (2 n) + x (2 n + 2)

+ \frac{1}{α} \times x (2 n - 1) + x (2 n - 2)

= [(\frac{1}{αβ} + 1) \times x (2 n) + \frac{1}{α} \times x (2 n - 1) + x (2 n - 2)] + [\frac{1}{α} \times x (2 n + 1) + x (2 n) + x (2 n + 2)]

In like manner, obtain:

\frac{1}{αβδγ} \times L (2 n) = \frac{1}{αβ} {[(\frac{1}{δγ} + 1) \times y (2 n) + \frac{1}{γ} \times y (2 n - 1) + y (2 n - 2)]

+ [\frac{1}{γ} \times y (2 n + 1) + y (2 n) + y (2 n + 2)]} - - - (8)

D_{1}^{k} (n) = \frac{1}{α} \times x (2 n + 1) + x (2 n) - - - (9)

D_{2}^{k} (n) = (\frac{1}{αβ} + 1) \times x (2 n) + \frac{1}{α} \times x (2 n - 1) + x (2 n - 2) - - - (10)

D_{3}^{k} (n) = \frac{1}{γ} \times y (2 n + 1) + y (2 n) - - - (11)

D_{4}^{k} (n) = (\frac{1}{δγ} + 1) \times y (2 n) + \frac{1}{γ} \times y (2 n - 1) + y (2 n - 2) - - - (12)

\frac{1}{α} \times y (2 n + 1) = D_{1}^{k} (n) + x (2 n + 2) - - - (13)

\frac{1}{αβ} \times y (2 n) = D_{2}^{k} (n) + D_{1}^{k} (n) + x (2 n + 2) - - - (14)

\frac{1}{γ} \times H (2 n + 1) = D_{3}^{k} (n) + y (2 n + 2) - - - (15)

\frac{1}{αβδγ} \times L (2 n) = D_{4}^{k} (n) + D_{3}^{k} (n) + y (2 n + 2) - - - (16) .

2. one kind based on the efficient DWT encoder that promotes, and it is characterized in that structure is: based on the efficient DWT encoder that promotes, structure is:

The row coding unit is for the row filtering part of 2-D DWT;

The row coding unit is for the capable filtering part of 2-D DWT;

3. a kind of efficient DWT encoder based on promoting as claimed in claim 2, it is characterized in that, the row coding unit is by register, multiplier, adder and 4 dual-port ram formations that size is N, the dual-port ram that 4 sizes are N is sram1, sram2, sram3, sram4, be used for storing the intermediate variable that produces when calculating, N is the size of N*N view data: whole row coding unit is comprised of 5 level production lines, every level production line is reached by a register and forms in order to adder or the multiplier of finishing corresponding computing, after the work of row coding unit, data enter column filter according to the direction of row, each clock cycle is read in two data: be respectively odd term x (2n+1) and even item x (2n), in the 1st level production line the data of reading in are carried out multiplying afterwards, acquired results participates in the intermediate variable D of the 2nd level production line in next cycle ₁ ^k(n) and D ₂ ^k(n) calculating, and the intermediate variable D that will newly obtain ₁ ^k(n), D ₂ ^k(n) deposit sram1 and sram2 in, simultaneously, from sram1 and sram2, read required intermediate variable D ₁ ^K-1(n) and D ₂ ^K-1(n) be used for the calculating of y (2n-1)/α and y (2n-2)/α β; The intermediate variable D that upper two row deposit in ₁ ^K-1(n) and D ₂ ^K-1(n) keep N all after date and read, for the rank transformation of the two row data of newly reading in, the intermediate variable D that will newly produce simultaneously ₁ ^k(n) and D ₂ ^k(n) deposit sram1, sram2 in, to upgrade D ₁ ^K-1(n) and D ₂ ^K-1(n) content at place, address of living in; Take two registers of 3rd level streamline as the boundary, the first half circuit is finished the calculating of formula 13 and 14, and the latter half circuit is finished the calculating of formula 15 and 16; Latter half of circuit needs to change institute's multiplying factor of multiplier in the 3rd level streamline, accept the y (2n-1) that the 2nd level production line calculates/α and y (2n-2)/α β and finish corresponding multiplying; In the 4th level production line, carry out upgrading when formula 15,16 calculates corresponding intermediate variable D ₃ ^K-1(n) and D ₄ ^K-1(n), intermediate variable D ₃ ^K-1(n) and D ₄ ^K-1(n) corresponding stored is exported final low-and high-frequency result in the 5th level production line in sram3, sram4 afterwards respectively.

4. a kind of efficient DWT encoder based on promoting as claimed in claim 2, it is characterized in that, the row coding unit is comprised of row register, row multiplier, row adder and other 4 register buffer1, buffer2, buffer3, buffer4, whole row coding unit is comprised of 5 level production lines, every level production line forms by a capable register and in order to capable adder or the row multiplier of finishing corresponding computing, the input data are carried out multiplying in first order streamline, and carry out intermediate variable D in the streamline of the second level ₁ ^k(n+1) and D ₂ ^k(n+1) calculating also deposits corresponding buffer1, buffer2 in, reads simultaneously the intermediate variable D at another place, address among buffer1, the buffer2 ₁ ^k(n) and D ₂ ^k(n) be used for calculating y (2n+1)/α and y (2n)/α β; The buffer that arranges is used for high frequency intermediate variable capable and that low frequency is capable and alternately reads, and satisfies the calculating complete independently that high frequency is capable and low frequency is capable; In the 3rd level streamline Output rusults y (2n+1) of first half circuit/α and y (2n)/α β are carried out multiplying, and input to calculating and intermediate variable D that the 4th level production line is finished formula 15 and 16 ₃ ^k(n+1) and D ₄ ^k(n+1) renewal; Final low-and high-frequency component H and L export in the 5th level production line.