CN103067023A - Efficient discrete wavelet transform (DWT) encoding method and encoder based on promotion - Google Patents

Efficient discrete wavelet transform (DWT) encoding method and encoder based on promotion Download PDF

Info

Publication number
CN103067023A
CN103067023A CN2012105070637A CN201210507063A CN103067023A CN 103067023 A CN103067023 A CN 103067023A CN 2012105070637 A CN2012105070637 A CN 2012105070637A CN 201210507063 A CN201210507063 A CN 201210507063A CN 103067023 A CN103067023 A CN 103067023A
Authority
CN
China
Prior art keywords
alpha
row
dwt
data
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012105070637A
Other languages
Chinese (zh)
Inventor
张为
姜喆
刘艳艳
高志宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN2012105070637A priority Critical patent/CN103067023A/en
Publication of CN103067023A publication Critical patent/CN103067023A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention relates to the field of image encoding and decoding, and provides an encoder which has high encoding efficiency and is capable of satisfying the application needs of high speed encoding. The technical scheme is that the efficient discrete wavelet transform (DWT) encoder based on promotion comprises a row encoding unit used for a row filter part in 2-D DWT, a line encoding unit used for a line filter part in the 2-D DWT and a transposition cache unit used for caching data between the row filter and the line filter, and adjusting a data flow structure. Data enter into the row encoding unit to change in a row-line mode, the data enter into the line encoding unit to change in a row mode, output data of the line filter form four sub-band signals of LL, LH, HL and HH after gain calculation, and after the four sub-band signals are arrayed again, the decomposition process of the 2-D DWT can be completed. The efficient DWT encoder is mainly used for image encoding and decoding.

Description

Based on the efficient DWT coding method and the encoder that promote
Technical field
The present invention relates to image coding and decoding field, specifically, relate to based on the efficient DWT coding method and the encoder that promote.
Background technology
Discrete wavelet transform (DWT) is a kind of effective multiresolution analysis instrument, has good time-frequency local characteristics, signal decomposition can be become have the different filial generations of time domain specification.Its code efficiency and image restoration quality all are higher than traditional discrete cosine transform (DCT), thereby are widely used in signal processing and image compression field, such as MPEG-4, and JPEG2000 etc.
Traditional wavelet is realized by convolution algorithm, calculation of complex, memory space requires high, be unfavorable for that VLSI realizes, referring to K.K.Parhi and T.Nishitani, " VLSI architecture for discrete wavelet transforms, " IEEE Trans.VeryLarge Scale Integra. (VLSI) Syst., vol.1.For addressing this problem, Daubechies etc. have proposed the key technology of Second Generation Wavelet Transformation---boosting algorithm, referring to I.Daubechies and W Sweldens, " Factoring wavelet transforminto lifting steps, " J.Fourier Anal.Appl., vol.4, no.3, pp.245-267, this algorithm of Mar.1998. tapers to amount of calculation half of convolution algorithm effectively, improved speed and the practicality of wavelet transformation, made effective hardware realize becoming possibility.
There are many hardware implementation mode in DWT encoder based on boosting algorithm at present, and various realization frameworks have all promoted the progress of technology to a certain extent, but still exist factors limiting hard-wired efficient.The problem that exists in typical several hardware implementation mode and the realization is as follows.
Implementation method 1: adopt direct mode that boosting algorithm is realized, according to the algorithm requirement, finish line by line line translation first, carry out by column again rank transformation.Referring to J.M.Jou, Y.H.Shiau, and C.C.Liu, " Efficient VLSI architectures forthe biorthogonal wavelet transform by flter bank and lifting scheme; " in Proc.IEEE ISCAS, May2001.
Problem 1: because this framework need to be finished whole line translations and just can carry out the calculating of rank transformation, therefore need the whole line translation result of buffer memory, and for the processing of big data quantity, such as high-definition image, the quantity of this intermediate object program is very huge, needs extra outside sram to store.Simultaneously, the end that begins need to wait for line translation of rank transformation, this has prolonged whole computing time, and in every grade of conversion owing to framework is not optimized, therefore have long critical path and delay time, further affected coder processes speed.
Implementation method 2: based on the DWT framework of row, this structure adopts the implementation method of line translation and rank transformation executed in parallel, has not only promoted processing speed, and can adopt the outer sram of the alternative sheet of transposition buffer memory between ranks.Referring to C.Chrysafits and A.Ortega, " Line-based, reduced memory, wavelet image compression. " IEEE Trans.SignalProcess., vol.9, no.3, pp.378-389, Mar.2000.
Problem 2: although realized the parallel processing of row-column transform, processing mode still is that serial process and the transposition buffer memory between ranks of single input is still larger.In each filter, the critical path time-delay is still longer simultaneously.
The people such as implementation method 3:Xiong have proposed FA and two kinds of implementations of HA: shorten output time-delay and computing time by parallel ranks filter circuit, and need not to arrange transposition buffer memory between ranks in framework.Referring to C.Xiong, J.Tian, andJ.Liu; " Efficient Achitectures for Two-Dimensional Discrete Wavelet Transform U sing LiftingScheme; " IEEE Trans.Image Process., vol.16, no.3, pp.607-614, Mar.2007.
Problem 3: although this framework multiplexing prediction and refresh circuit, critical path is become Tm+2Ta, wherein Tm and Ta are respectively the critical path time-delay of a multiplier and adder.But this framework does not still cut off multiplication and adds the delay accumulation of method, and critical path is still longer.Simultaneously, although this framework has adopted parallel design of filter, the transposition buffer memory is reduced to the size of 4 registers, and reduces by half computing time.But owing to need the input data are carried out buffer memory, total inner buffer still is 5.5N when processing the view data of N*N size, and hardware complexity is still higher.
A kind of important DWT framework between the conduct in recent years of implementation method 4:Flipping structure is proposed by huang the earliest.This design is carried out the equivalence distortion to promoting formula, has abandoned multiplier being arranged on the way between input node and computing node in the past, effectively reduces the critical path time-delay by the position that changes multiplier.And this framework can carry out the flowing water classification and process by adding the mode of pipeline register, and 5 level production lines namely can be down to the critical path time-delay size of 1 Tm.Referring to C.-T.Huang, P.-C.Tseng, and L.-G.Chen; " Flipping structure:An efficient VLSI architecture forlifting-based discrete wavelet transform; " IEEE Trans.SignalProcess., vol.52, no.4, pp.1080-1089, Apr.2004.
Problem 4: as a kind of improved based on row DWT framework, the Fliping structure can increase huge middle buffer memory too when carrying out the flowing water design.Therefore Huang thinks should limit the size of buffer memory in the middle of the number correspondence of pipeline register in the 2-D DWT framework in the 1-D DWT circuit pipeline register quantity in the 1-D DWT circuit, but also affect undoubtedly the shortening of critical path.Wherein, 1-D and 2-D are the dimension of wavelet transformation, and our regulation once goes merely filtering or row to be filtered into the 1-D wavelet transformation, and the filtering of namely going is listed as again and is filtered into the 2-D wavelet transformation.
Summary of the invention
The present invention is intended to overcome the deficiencies in the prior art, and a kind of higher code efficiency that has is provided, and can satisfy the encoder of the application needs of high spped coding, for achieving the above object, the technical scheme that the present invention takes is that based on the efficient DWT coding method that promotes, step is:
Formula (1)~(6) are the lifting step formula of 9/7DWT, totally are made of two parts, are respectively the lifting of 4 steps and 2 step convergent-divergents: formula (1)~(4) are 4 step lifting process, promote for corresponding 1 time successively, 1 prediction, 2 liftings, 2 predictions; Formula (5)~(6) are 2 step convergent-divergent processes, are respectively high frequency convergent-divergent and low frequency convergent-divergent;
1 α × y ( 2 n + 1 ) = 1 α × x ( 2 n + 1 ) + x ( 2 n ) + x ( 2 n + 2 ) - - - ( 1 )
1 β × y ( 2 n ) = 1 β × x ( 2 n ) + y ( 2 n - 1 ) + y ( 2 n + 1 ) - - - ( 2 )
1 γ × H ( 2 n + 1 ) = 1 γ × y ( 2 n + 1 ) + y ( 2 n ) + y ( 2 n + 2 ) - - - ( 3 )
1 δ × L ( 2 n ) = 1 δ × y ( 2 n ) + H ( 2 n - 1 ) + H ( 2 n + 1 ) - - - ( 4 )
H°(2n+1)=e×H(2n+1) (5)
L°(2n)=f×L(2n) (6)
Wherein, n is the integer more than or equal to 0, and x (n) is the original image value of input, and y (n) is the one-level operation result in the algorithm, and H ° (2n+1) and L ° (2n) is final high fdrequency component and the low frequency component that generates that decompose; The value of coefficient of correspondence is as follows: α=-1.586134342, β=-0.052980118, γ=0.882911075, δ=0.443506852, K=1.230174105.
With formula (1) substitution formula (2) and by equivalent variations to every the carrying out from new combination in the formula:
1 αβ × y ( 2 n ) = 1 αβ × x ( 2 n ) + 1 α × y ( 2 n - 1 ) + 1 α × y ( 2 n + 1 )
= 1 αβ × x ( 2 n ) + 1 α × x ( 2 n + 1 ) + x ( 2 n ) + x ( 2 n + 2 )
+ 1 α × x ( 2 n - 1 ) + x ( 2 n - 2 ) + x ( 2 n ) - - - ( 7 ) In like manner,
= ( 1 αβ + 1 ) × x ( 2 n ) + 1 α × x ( 2 n + 1 ) + x ( 2 n ) + x ( 2 n + 2 )
+ 1 α × x ( 2 n - 1 ) + x ( 2 n - 2 )
= [ ( 1 αβ + 1 ) × x ( 2 n ) + 1 α × x ( 2 n - 1 ) + x ( 2 n - 2 ) ] + [ 1 α × x ( 2 n + 1 ) + x ( 2 n ) + x ( 2 n + 2 ) ]
Obtain:
1 αβδγ × L ( 2 n ) = 1 αβ { [ ( 1 δγ + 1 ) × y ( 2 n ) + 1 γ × y ( 2 n - 1 ) + y ( 2 n - 2 ) ]
+ [ 1 γ × y ( 2 n + 1 ) + y ( 2 n ) + y ( 2 n + 2 ) ] } (8)
4 intermediate variable D are set 1 k(n), D 2 k(n), D 3 k(n), D 4 k(n) shown in formula (9)~(12), wherein k is the line number at current calculating place with respect to line translation, is current number of scans for rank transformation, and the parallel scanning of finishing adjacent two row of regulation is single pass in the rank transformation:
D 1 k ( n ) = 1 α × x ( 2 n + 1 ) + x ( 2 n ) - - - ( 9 )
D 2 k ( n ) = ( 1 αβ + 1 ) × x ( 2 n ) + 1 α × x ( 2 n - 1 ) + x ( 2 n - 2 ) - - - ( 10 )
D 3 k ( n ) = 1 γ × y ( 2 n + 1 ) + y ( 2 n ) - - - ( 11 )
D 4 k ( n ) = ( 1 δγ + 1 ) × y ( 2 n ) + 1 γ × y ( 2 n - 1 ) + y ( 2 n - 2 ) - - - ( 12 )
With variables D 1 k(n), D 2 k(n), D 3 k(n), D 4 k(n) substitution formula (1), (7), (3), (8) algorithm that is improved:
1 α × y ( 2 n + 1 ) = D 1 k ( n ) + x ( 2 n + 2 ) - - - ( 13 )
1 αβ × y ( 2 n ) = D 2 k ( n ) + D 1 k ( n ) + x ( 2 n + 2 ) - - - ( 14 )
1 γ × H ( 2 n + 1 ) = D 3 k ( n ) + y ( 2 n + 2 ) - - - ( 15 )
1 αβδγ × L ( 2 n ) = D 4 k ( n ) + D 3 k ( n ) + y ( 2 n + 2 ) - - - ( 16 ) .
Based on the efficient DWT encoder that promotes, structure is:
The row coding unit is for the row filtering part of 2-D DWT;
The row coding unit is for the capable filtering part of 2-D DWT;
The transposition buffer unit is used for the data between the cache lines column filter, and adjusts data flow architecture;
Data enter the row coding unit and carry out rank transformation, and coding unit is the dual input dual output, and each clock cycle reads in two data and processes; After the row coding is finished certain line number, export acquired results to the transposition buffer unit; Data flow is through transposition, satisfied the requirement of capable coding, this moment, data entered the conversion of capable coding unit begin column, the data of row filtering output form LL, LH, 4 subband signals of HL, HH after calculating through gain, can finish the decomposable process of one-level 2-D DWT after 4 filial generation components are reset.
The row coding unit is by register, multiplier, adder and 4 dual-port ram formations that size is N, the dual-port ram that 4 sizes are N is sram1, sram2, sram3, sram4, be used for storing the intermediate variable that produces when calculating, N is the size of N*N view data: whole row coding unit is comprised of 5 level production lines, every level production line is reached by a register and forms in order to adder or the multiplier of finishing corresponding computing, after the work of row coding unit, data enter column filter according to the direction of row, each clock cycle is read in two data: be respectively odd term x (2n+1) and even item x (2n), in the 1st level production line the data of reading in are carried out multiplying afterwards, acquired results participates in the intermediate variable D of the 2nd level production line in next cycle 1 k(n) and D 2 k(n) calculating, and the intermediate variable D that will newly obtain 1 k(n), D 2 k(n) deposit sram1 and sram2 in, simultaneously, from sram1 and sram2, read required intermediate variable D 1 K-1(n) and D 2 K-1(n) be used for the calculating of y (2n-1)/α and y (2n-2)/α β; The intermediate variable D that upper two row deposit in 1 K-1(n) and D 2 K-1(n) keep N all after date and read, for the rank transformation of the two row data of newly reading in, the intermediate variable D that will newly produce simultaneously 1 k(n) and D 2 k(n) deposit sram1, sram2 in, to upgrade D 1 K-1(n) and D 2 K-1(n) content at place, address of living in; Take two registers of 3rd level streamline as the boundary, the first half circuit is finished the calculating of formula 13 and 14, and the latter half circuit is finished the calculating of formula 15 and 16; Latter half of circuit needs to change institute's multiplying factor of multiplier in the 3rd level streamline, accept the y (2n-1) that the 2nd level production line calculates/α and y (2n-2)/α β and finish corresponding multiplying; In the 4th level production line, carry out upgrading when formula 15,16 calculates corresponding intermediate variable D 3 K-1(n) and D 4 K-1(n), intermediate variable D 3 K-1(n) and D 4 K-1(n) corresponding stored is exported final low-and high-frequency result in the 5th level production line in sram3, sram4 afterwards respectively.
The row coding unit is comprised of row register, row multiplier, row adder and other 4 register buffer1, buffer2, buffer3, buffer4, whole row coding unit is comprised of 5 level production lines, every level production line forms by a capable register and in order to capable adder or the row multiplier of finishing corresponding computing, the input data are carried out multiplying in first order streamline, and carry out intermediate variable D in the streamline of the second level 1 k(n+1) and D 2 k(n+1) calculating also deposits corresponding buffer1, buffer2 in, reads simultaneously the intermediate variable D at another place, address among buffer1, the buffer2 1 k(n) and D 2 k(n) be used for calculating y (2n+1)/α and y (2n)/α β; The buffer that arranges is used for high frequency intermediate variable capable and that low frequency is capable and alternately reads, and satisfies the calculating complete independently that high frequency is capable and low frequency is capable; In the 3rd level streamline Output rusults y (2n+1) of first half circuit/α and y (2n)/α β are carried out multiplying, and input to calculating and intermediate variable D that the 4th level production line is finished formula 15 and 16 3 k(n+1) and D 4 k(n+1) renewal; Final low-and high-frequency component H and L export in the 5th level production line.
Technical characterstic of the present invention and effect:
2-D DWT code device proposed by the invention is under the prerequisite that adopts the dual input dual output, by the improvement algorithm that proposes, realized that 3 grades of pipeline can process the framework that one-level promotes, effectively reduced the quantity of internal register, and reached the critical path time-delay of a Tm.The mode of at first carrying out rank transformation is adopted in design, need not the rank transformation result is carried out buffer memory, directly data are sent into capable coding unit by the transposition buffer unit and processed, the transposition buffer memory of required 1.5N size when having substituted general processing N*N sized images data.Only in the row filter unit, adopt the inner buffer of 2 2N in the 2-D framework, had extremely low hardware store consumption.
Description of drawings
Fig. 1 is the improvement 9/7DWT algorithm schematic diagram based on promoting of the present invention;
Fig. 2 is the general frame of DWT encoder of the present invention;
Fig. 3 row coding unit;
Fig. 4 transposition buffer unit;
The signal of data flow before and after Fig. 5 transposition unit;
The capable coding unit of Fig. 6.
Embodiment
Example of the present invention provides a kind of improvement 9/7DWT algorithm and code device based on promoting, and is used for improving the output speed of DWT encoder, reduces hardware resource consumption, carries the decoding efficiency of encoder.
The present invention a kind ofly changes code device based on the Novel hoisting algorithm with based on the 2-d discrete wavelet of this algorithm.
Improved DWT algorithm: comprise two-stage prediction and renewal in 9/7 wavelet arithmetic, and final gain.And the combined with hardware framework has been set the intermediate variable item D of 4 correspondences in algorithm 1 k(n), D 2 k(n), D 3 k(n), D 4 k(n) be convenient to embody the corresponding relation of New Algorithm and hardware structure.
Improved DWT code device comprises:
The row coding unit is for the row filtering part of 2-D DWT.
The row coding unit is for the capable filtering part of 2-D DWT.
The transposition buffer unit is used for the data between the cache lines column filter, and adjusts data flow architecture.
For the dual input dual output framework of matching design, improve the processing speed of encoder, reduce the resource overhead of coded hardware, this implementation method provides a kind of improvement 9/7DWT algorithm based on promoting
Referring to Fig. 1, be the relation of improvement algorithm of the present invention and primal algorithm shown in the figure.Invention has proposed novel intermediate variable form, and based on this 9/7 original boosting algorithm has been carried out modification, thereby obtained inventing the improvement 9/7DWT algorithm that adopts according to the needs of design.
Formula (1)~(6) are the lifting step formula of original 9/7DWT.Totally consisted of by two parts, be respectively the lifting of 4 steps and 2 step convergent-divergents.Formula (1)~(4) are 4 step lifting process, promote successively 1 prediction, 2 liftings, 2 predictions for corresponding 1 time.Formula (5)~(6) are 2 step convergent-divergent processes, are respectively high frequency convergent-divergent and low frequency convergent-divergent.
1 α × y ( 2 n + 1 ) = 1 α × x ( 2 n + 1 ) + x ( 2 n ) + x ( 2 n + 2 ) - - - ( 1 )
1 β × y ( 2 n ) = 1 β × x ( 2 n ) + y ( 2 n - 1 ) + y ( 2 n + 1 ) - - - ( 2 )
1 γ × H ( 2 n + 1 ) = 1 γ × y ( 2 n + 1 ) + y ( 2 n ) + y ( 2 n + 2 ) - - - ( 3 )
1 δ × L ( 2 n ) = 1 δ × y ( 2 n ) + H ( 2 n - 1 ) + H ( 2 n + 1 ) - - - ( 4 )
H°(2n+1)=e×H(2n+1) (5)
L°(2n)=f×L(2n) (6)
Can find out that from above-mentioned lifting formula the boosting algorithm complexity is low, be fit to hardware and realize.But, hardware structure based on traditional boosting algorithm exists clock frequency low mostly, the problems such as the hardware implementation efficiency is on the low side, become the difficult problem that impact is used, this the present invention further optimized by new compound mode original boosting algorithm make the combination of data tightr, reach the design object that on the parallel scan basis, realizes single multiplier critical path time-delay with less pipeline series.The algorithmic formula of invention is derived as follows:
With formula (1) substitution formula (2) and by equivalent variations to every the carrying out from new combination in the formula:
1 αβ × y ( 2 n ) = 1 αβ × x ( 2 n ) + 1 α × y ( 2 n - 1 ) + 1 α × y ( 2 n + 1 )
= 1 αβ × x ( 2 n ) + 1 α × x ( 2 n + 1 ) + x ( 2 n ) + x ( 2 n + 2 )
+ 1 α × x ( 2 n - 1 ) + x ( 2 n - 2 ) + x ( 2 n ) - - - ( 7 ) In like manner,
= ( 1 αβ + 1 ) × x ( 2 n ) + 1 α × x ( 2 n + 1 ) + x ( 2 n ) + x ( 2 n + 2 )
+ 1 α × x ( 2 n - 1 ) + x ( 2 n - 2 )
= [ ( 1 αβ + 1 ) × x ( 2 n ) + 1 α × x ( 2 n - 1 ) + x ( 2 n - 2 ) ] + [ 1 α × x ( 2 n + 1 ) + x ( 2 n ) + x ( 2 n + 2 ) ]
Can obtain:
1 αβδγ × L ( 2 n ) = 1 αβ { [ ( 1 δγ + 1 ) × y ( 2 n ) + 1 γ × y ( 2 n - 1 ) + y ( 2 n - 2 ) ]
+ [ 1 γ × y ( 2 n + 1 ) + y ( 2 n ) + y ( 2 n + 2 ) ] } - - - ( 8 )
4 intermediate variable D that arrange 1 k(n), D 2 k(n), D 3 k(n), D 4 k(n) shown in formula (9)~(12), wherein k is the line number at current calculating place with respect to line translation, is current number of scans for rank transformation, and the parallel scanning of finishing adjacent two row of regulation is single pass in the rank transformation.
D 1 k ( n ) = 1 α × x ( 2 n + 1 ) + x ( 2 n ) - - - ( 9 )
D 2 k ( n ) = ( 1 αβ + 1 ) × x ( 2 n ) + 1 α × x ( 2 n - 1 ) + x ( 2 n - 2 ) - - - ( 10 )
D 3 k ( n ) = 1 γ × y ( 2 n + 1 ) + y ( 2 n ) - - - ( 11 )
D 4 k ( n ) = ( 1 δγ + 1 ) × y ( 2 n ) + 1 γ × y ( 2 n - 1 ) + y ( 2 n - 2 ) - - - ( 12 )
With variables D 1 k(n), D 2 k(n), D 3 k(n), D 4 k(n) substitution formula (1), (7), (3), (8) obtain inventing the improvement algorithm that adopts:
1 α × y ( 2 n + 1 ) = D 1 k ( n ) + x ( 2 n + 2 ) - - - ( 13 )
1 αβ × y ( 2 n ) = D 2 k ( n ) + D 1 k ( n ) + x ( 2 n + 2 ) - - - ( 14 )
1 γ × H ( 2 n + 1 ) = D 3 k ( n ) + y ( 2 n + 2 ) - - - ( 15 )
1 αβδγ × L ( 2 n ) = D 4 k ( n ) + D 3 k ( n ) + y ( 2 n + 2 ) - - - ( 16 )
Apparatus of the present invention comprise:
The row coding unit is for the row filtering part of 2-D DWT.
The row coding unit is for the capable filtering part of 2-D DWT.
The transposition buffer unit is used for the data between the cache lines column filter, and adjusts data flow architecture;
The present invention adopts the general structure of dual input dual output, as shown in Figure 2.After code device is started working, at first read in initial data, data enter the row coding unit and carry out rank transformation, and coding unit is the dual input dual output, and each clock cycle reads in two data and processes; After the row coding is finished certain line number, export acquired results to the transposition buffer unit.Data flow is through transposition, satisfied the requirement of capable coding, this moment, data entered the conversion of capable coding unit begin column, the data of row filtering output form LL, LH, 4 subband signals of HL, HH after calculating through gain, can finish the decomposable process of one-level 2-D DWT after 4 filial generation components are reset.
The row coding unit framework of invention is seen Fig. 3.The rectangle frame that wherein indicates D is register, indicate x and+circular frame represent respectively multiplier and adder.Transposition buffer memory between the reduction ranks improves computational speed, has adopted the structure of dual input dual output.Whole coding unit is comprised of 5 level production lines.After the work of row coding unit, data enter column filter according to the direction of row, and each clock cycle is read in two data: be respectively odd term x (2n+1) and even item x (2n).In the 1st level production line the data of reading in are carried out multiplying afterwards, acquired results participates in the intermediate variable D of the 2nd level production line in next cycle 1 k(n) and D 2 k(n) calculating, and the intermediate variable D that will newly obtain 1 k(n), D 2 k(n) deposit sram1 and sram2 in.Simultaneously, when k>1, from sram1 and sram2, read required intermediate variable D 1 K-1(n) and D 2 K-1(n) be used for the calculating of y (2n-1)/α and y (2n-2)/α β; For rank transformation, owing to adopted based on the parallel transmission mode of going the intermediate variable D that therefore reads 1 K-1(n) and D 2 K-1(n) calculate the intermediate variable that deposits in for the respective column place of upper two row.Namely go up the intermediate variable D that two row deposit in 1 K-1(n) and D 2 K-1(n) keep N all after date and read, for the rank transformation of the two row data of newly reading in, the intermediate variable D that will newly produce simultaneously 1 k(n) and D 2 k(n) deposit sram1, sram2 in, to upgrade D 1 K-1(n) and D 2 K-1(n) content at place, address of living in.Sram1, sram2 are that two groups of sizes on the sheet are the dual-port ram of N, are used for storing the intermediate variable that produces when calculating.N is the size of N*N view data.Contrast formula 13 to 16 can find out, formula 13 and 15 and formula 14 and 16 have symmetrical formula structures, the difference of the two is following 2 points: institute's multiplying factor item of (1) multiplying is different.(2) corresponding raw image data x (n) all replaces with one-level operation result y (n) in the formula 13,14 in formula 15,16.Therefore in hardware circuit shown in Figure 3, the circuit of front and back has symmetrical structure, and two registers of the 3rd row are as the boundary in to scheme, and the first half circuit is finished the calculating of formula 13 and 14, and the latter half circuit is finished the calculating of formula 15 and 16.Therefore latter half of circuit needs to change institute's multiplying factor of multiplier in the 3rd level streamline, accepts the y (2n-1) that the 2nd level production line calculates/α and y (2n-2)/α β and finishes corresponding multiplying.Can finish follow-up coding work according to the processing mode identical with anterior circuit follow-up, namely in the 4th level production line, carry out the calculating of formula 15,16, upgrade simultaneously corresponding intermediate variable D 3 K-1(n) and D 4 K-1(n), in the 5th level production line, final low-and high-frequency result is exported afterwards.
Row coding unit based on this design can guarantee that column filter finishing on the basis of parallel scan, is limited in 4N with inner buffer.Simultaneously, finish the required register quantity of lifting calculating and successfully remain on 12.Such design has not only reduced the complexity of hardware, and the computing unit that arranges between per two registers all is no more than an adder or a multiplier, has realized the critical path time-delay of a Tm, and circuit can be operated under the higher clock frequency.
Fig. 4 is the transposition buffer unit.Because the parallel filtering circuit of the dual input dual output that rank transformation adopts, therefore avoided traditional based on the excessive problem of transposition buffer memory in the row DWT circuit framework, the less transposition module of expense only need be set can replace 1.5N transposition buffer memory in general the realization.Therefore the transposition buffer unit of invention employing has been used 3 registers and two MUX, and purpose is that the data flow that will export is converted into the required data order of row coding, to reach the purpose of ranks coding circuit concurrent working.Data before and after the transposition distribute and are provided by Fig. 5.
Fig. 6 is the architecture design of row coding unit, and the circuit framework of row coding unit and the framework of row coding unit are basic identical, all be based on 5 stage pipeline structure to build, and the front and rear part of framework have symmetry.If with the two paths of data flow point that reads in of dual input dual output framework although to do high frequency capable and low frequency is capable, then scanning sequency essence is the capable and low frequency Z-type scanning in the ranks of a kind of high frequency, namely alternately capable to high frequency of low frequency capable data read, per cycle is read in the adjacent two data that is positioned at delegation, until after last capable data of high frequency read in, jump into that next low frequency is capable to carry out reading of two new row data.Therefore row coding unit and row coding unit framework unique different are in the 2nd grade and the 4th level production line middle variables D 1 k(n), D 2 k(n), D 3 k(n), D 4 k(n) storage and reading manner.Being expert at need not to be listed as the middle buffer memory of 4N in the coding in the coding unit, and the buffer that only to need two sizes be 2 registers can finish the storage of intermediate variable.Storage mode is slightly different from rank transformation, and the input data are carried out multiplying in first order streamline, and carry out intermediate variable D in the streamline of the second level 1 k(n+1) and D 2 k(n+1) calculating also deposits buffer in, reads simultaneously the intermediate variable D at another place, address among the buffer 1 k(n) and D 2 k(n) be used for calculating y (2n+1)/α and y (2n)/α β.In addition, because line translation alternately reads two row data, so intermediate variable need not to keep N cycle, and the buffer of setting alternately reads for high frequency intermediate variable capable and that low frequency is capable, satisfies the calculating complete independently that high frequency is capable and low frequency is capable, does not interfere with each other.In the 3rd level streamline Output rusults y (2n+1) of first half circuit/α and y (2n)/α β are carried out multiplying, and input to calculating and intermediate variable D that the 4th level production line is finished formula 15 and 16 3 k(n+1) and D 4 k(n+1) renewal.Final low-and high-frequency component H and L export in the 5th level production line.
In sum, the present invention at first proposes a kind of improved 9/7DWT boosting algorithm, and carries out based on this design of encoder, and encoder has been realized the coding circuit of DWT on the basis of adopting parallel scan.For the 2-D framework, owing to adopted the parallel scan mode of dual input dual output, inner buffer is limited in minimum 4N, and the transposition module of available 3 registers and 2 MUX formations replaces the transposition buffer memory of 1.5N; In each coding unit, realized simultaneously the critical path time-delay of a multiplier, and had less register expense, and when processing the N*N view data, be N the computing time of whole two field picture 2/ 2.Therefore, the present invention has low hardware complexity and processing speed faster.It is a kind of DWT code device of highly effective.

Claims (4)

1. efficient DWT coding method based on lifting, it is characterized in that, step is: formula (1)~(6) are the lifting step formula of 9/7DWT, totally consisted of by two parts, be respectively that 4 steps promoted and 2 step convergent-divergents: formula (1)~(4) are 4 step lifting process, promote successively 1 prediction for corresponding 1 time, promote 2 predictions for 2 times; Formula (5)~(6) are 2 step convergent-divergent processes, are respectively high frequency convergent-divergent and low frequency convergent-divergent;
1 α × y ( 2 n + 1 ) = 1 α × x ( 2 n + 1 ) + x ( 2 n ) + x ( 2 n + 2 ) - - - ( 1 )
1 β × y ( 2 n ) = 1 β × x ( 2 n ) + y ( 2 n - 1 ) + y ( 2 n + 1 ) - - - ( 2 )
1 γ × H ( 2 n + 1 ) = 1 γ × y ( 2 n + 1 ) + y ( 2 n ) + y ( 2 n + 2 ) - - - ( 3 )
1 δ × L ( 2 n ) = 1 δ × y ( 2 n ) + H ( 2 n - 1 ) + H ( 2 n + 1 ) - - - ( 4 )
H°(2n+1)=e×H(2n+1) (5)
L°(2n)=f×L(2n) (6)
Wherein, n is the integer more than or equal to 0, and x (n) is the original image value of input, and y (n) is the one-level operation result in the algorithm, and H ° (2n+1) and L ° (2n) is final high fdrequency component and the low frequency component that generates that decompose; The value of coefficient of correspondence is as follows: α=-1.586134342, β=-0.052980118, γ=0.882911075, δ=0.443506852, K=1.230174105.
With formula (1) substitution formula (2) and by equivalent variations to every the carrying out from new combination in the formula:
1 αβ × y ( 2 n ) = 1 αβ × x ( 2 n ) + 1 α × y ( 2 n - 1 ) + 1 α × y ( 2 n + 1 )
= 1 αβ × x ( 2 n ) + 1 α × x ( 2 n + 1 ) + x ( 2 n ) + x ( 2 n + 2 )
+ 1 α × x ( 2 n - 1 ) + x ( 2 n - 2 ) + x ( 2 n ) - - - ( 7 )
= ( 1 αβ + 1 ) × x ( 2 n ) + 1 α × x ( 2 n + 1 ) + x ( 2 n ) + x ( 2 n + 2 )
+ 1 α × x ( 2 n - 1 ) + x ( 2 n - 2 )
= [ ( 1 αβ + 1 ) × x ( 2 n ) + 1 α × x ( 2 n - 1 ) + x ( 2 n - 2 ) ] + [ 1 α × x ( 2 n + 1 ) + x ( 2 n ) + x ( 2 n + 2 ) ] In like manner, obtain:
1 αβδγ × L ( 2 n ) = 1 αβ { [ ( 1 δγ + 1 ) × y ( 2 n ) + 1 γ × y ( 2 n - 1 ) + y ( 2 n - 2 ) ]
+ [ 1 γ × y ( 2 n + 1 ) + y ( 2 n ) + y ( 2 n + 2 ) ] } - - - ( 8 )
4 intermediate variable D are set 1 k(n), D 2 k(n), D 3 k(n), D 4 k(n) shown in formula (9)~(12), wherein k is the line number at current calculating place with respect to line translation, is current number of scans for rank transformation, and the parallel scanning of finishing adjacent two row of regulation is single pass in the rank transformation:
D 1 k ( n ) = 1 α × x ( 2 n + 1 ) + x ( 2 n ) - - - ( 9 )
D 2 k ( n ) = ( 1 αβ + 1 ) × x ( 2 n ) + 1 α × x ( 2 n - 1 ) + x ( 2 n - 2 ) - - - ( 10 )
D 3 k ( n ) = 1 γ × y ( 2 n + 1 ) + y ( 2 n ) - - - ( 11 )
D 4 k ( n ) = ( 1 δγ + 1 ) × y ( 2 n ) + 1 γ × y ( 2 n - 1 ) + y ( 2 n - 2 ) - - - ( 12 )
With variables D 1 k(n), D 2 k(n), D 3 k(n), D 4 k(n) substitution formula (1), (7), (3), (8) algorithm that is improved:
1 α × y ( 2 n + 1 ) = D 1 k ( n ) + x ( 2 n + 2 ) - - - ( 13 )
1 αβ × y ( 2 n ) = D 2 k ( n ) + D 1 k ( n ) + x ( 2 n + 2 ) - - - ( 14 )
1 γ × H ( 2 n + 1 ) = D 3 k ( n ) + y ( 2 n + 2 ) - - - ( 15 )
1 αβδγ × L ( 2 n ) = D 4 k ( n ) + D 3 k ( n ) + y ( 2 n + 2 ) - - - ( 16 ) .
2. one kind based on the efficient DWT encoder that promotes, and it is characterized in that structure is: based on the efficient DWT encoder that promotes, structure is:
The row coding unit is for the row filtering part of 2-D DWT;
The row coding unit is for the capable filtering part of 2-D DWT;
The transposition buffer unit is used for the data between the cache lines column filter, and adjusts data flow architecture;
Data enter the row coding unit and carry out rank transformation, and coding unit is the dual input dual output, and each clock cycle reads in two data and processes; After the row coding is finished certain line number, export acquired results to the transposition buffer unit; Data flow is through transposition, satisfied the requirement of capable coding, this moment, data entered the conversion of capable coding unit begin column, the data of row filtering output form LL, LH, 4 subband signals of HL, HH after calculating through gain, can finish the decomposable process of one-level 2-D DWT after 4 filial generation components are reset.
3. a kind of efficient DWT encoder based on promoting as claimed in claim 2, it is characterized in that, the row coding unit is by register, multiplier, adder and 4 dual-port ram formations that size is N, the dual-port ram that 4 sizes are N is sram1, sram2, sram3, sram4, be used for storing the intermediate variable that produces when calculating, N is the size of N*N view data: whole row coding unit is comprised of 5 level production lines, every level production line is reached by a register and forms in order to adder or the multiplier of finishing corresponding computing, after the work of row coding unit, data enter column filter according to the direction of row, each clock cycle is read in two data: be respectively odd term x (2n+1) and even item x (2n), in the 1st level production line the data of reading in are carried out multiplying afterwards, acquired results participates in the intermediate variable D of the 2nd level production line in next cycle 1 k(n) and D 2 k(n) calculating, and the intermediate variable D that will newly obtain 1 k(n), D 2 k(n) deposit sram1 and sram2 in, simultaneously, from sram1 and sram2, read required intermediate variable D 1 K-1(n) and D 2 K-1(n) be used for the calculating of y (2n-1)/α and y (2n-2)/α β; The intermediate variable D that upper two row deposit in 1 K-1(n) and D 2 K-1(n) keep N all after date and read, for the rank transformation of the two row data of newly reading in, the intermediate variable D that will newly produce simultaneously 1 k(n) and D 2 k(n) deposit sram1, sram2 in, to upgrade D 1 K-1(n) and D 2 K-1(n) content at place, address of living in; Take two registers of 3rd level streamline as the boundary, the first half circuit is finished the calculating of formula 13 and 14, and the latter half circuit is finished the calculating of formula 15 and 16; Latter half of circuit needs to change institute's multiplying factor of multiplier in the 3rd level streamline, accept the y (2n-1) that the 2nd level production line calculates/α and y (2n-2)/α β and finish corresponding multiplying; In the 4th level production line, carry out upgrading when formula 15,16 calculates corresponding intermediate variable D 3 K-1(n) and D 4 K-1(n), intermediate variable D 3 K-1(n) and D 4 K-1(n) corresponding stored is exported final low-and high-frequency result in the 5th level production line in sram3, sram4 afterwards respectively.
4. a kind of efficient DWT encoder based on promoting as claimed in claim 2, it is characterized in that, the row coding unit is comprised of row register, row multiplier, row adder and other 4 register buffer1, buffer2, buffer3, buffer4, whole row coding unit is comprised of 5 level production lines, every level production line forms by a capable register and in order to capable adder or the row multiplier of finishing corresponding computing, the input data are carried out multiplying in first order streamline, and carry out intermediate variable D in the streamline of the second level 1 k(n+1) and D 2 k(n+1) calculating also deposits corresponding buffer1, buffer2 in, reads simultaneously the intermediate variable D at another place, address among buffer1, the buffer2 1 k(n) and D 2 k(n) be used for calculating y (2n+1)/α and y (2n)/α β; The buffer that arranges is used for high frequency intermediate variable capable and that low frequency is capable and alternately reads, and satisfies the calculating complete independently that high frequency is capable and low frequency is capable; In the 3rd level streamline Output rusults y (2n+1) of first half circuit/α and y (2n)/α β are carried out multiplying, and input to calculating and intermediate variable D that the 4th level production line is finished formula 15 and 16 3 k(n+1) and D 4 k(n+1) renewal; Final low-and high-frequency component H and L export in the 5th level production line.
CN2012105070637A 2012-11-29 2012-11-29 Efficient discrete wavelet transform (DWT) encoding method and encoder based on promotion Pending CN103067023A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012105070637A CN103067023A (en) 2012-11-29 2012-11-29 Efficient discrete wavelet transform (DWT) encoding method and encoder based on promotion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012105070637A CN103067023A (en) 2012-11-29 2012-11-29 Efficient discrete wavelet transform (DWT) encoding method and encoder based on promotion

Publications (1)

Publication Number Publication Date
CN103067023A true CN103067023A (en) 2013-04-24

Family

ID=48109506

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012105070637A Pending CN103067023A (en) 2012-11-29 2012-11-29 Efficient discrete wavelet transform (DWT) encoding method and encoder based on promotion

Country Status (1)

Country Link
CN (1) CN103067023A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109379602A (en) * 2018-10-26 2019-02-22 西安科锐盛创新科技有限公司 Data access method and its system based on cloud storage
CN110365990A (en) * 2019-06-21 2019-10-22 武汉玉航科技有限公司 A kind of quasi- lossless video encoding system in narrowband
CN112136128A (en) * 2019-08-30 2020-12-25 深圳市大疆创新科技有限公司 Data processing method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101534439A (en) * 2008-03-13 2009-09-16 中国科学院声学研究所 Low power consumption parallel wavelet transforming VLSI structure
CN102572429A (en) * 2011-12-29 2012-07-11 东南大学 Hardware framework for two-dimensional discrete wavelet transformation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101534439A (en) * 2008-03-13 2009-09-16 中国科学院声学研究所 Low power consumption parallel wavelet transforming VLSI structure
CN102572429A (en) * 2011-12-29 2012-07-11 东南大学 Hardware framework for two-dimensional discrete wavelet transformation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WANG CHAO等: "An Efficient VLSI Architecture for Lifting-Based Discrete Wavelet Transform", 《THE PROCCEDINGS OF IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO,2007》 *
WEI ZHANG等: "An Efficient VLSI Architecture for Lifting-Based Discrete Wavelet Transform", 《IEEE TRANSCATIONS ON CIRCUITS AND SYSTEMS-II:EXPRESS BRIEFS》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109379602A (en) * 2018-10-26 2019-02-22 西安科锐盛创新科技有限公司 Data access method and its system based on cloud storage
CN109379602B (en) * 2018-10-26 2021-02-26 上海麦克风文化传媒有限公司 Data access method and system based on cloud storage
CN110365990A (en) * 2019-06-21 2019-10-22 武汉玉航科技有限公司 A kind of quasi- lossless video encoding system in narrowband
CN112136128A (en) * 2019-08-30 2020-12-25 深圳市大疆创新科技有限公司 Data processing method and device
WO2021035715A1 (en) * 2019-08-30 2021-03-04 深圳市大疆创新科技有限公司 Data processing method and device

Similar Documents

Publication Publication Date Title
Tian et al. Efficient multi-input/multi-output VLSI architecture for two-dimensional lifting-based discrete wavelet transform
Hsia et al. Improved low-complexity algorithm for 2-D integer lifting-based discrete wavelet transform using symmetric mask-based scheme
CN102572429B (en) Hardware framework for two-dimensional discrete wavelet transformation
CN103067023A (en) Efficient discrete wavelet transform (DWT) encoding method and encoder based on promotion
CN101697486A (en) Two-dimensional wavelet transformation integrated circuit structure
CN102333222B (en) Two-dimensional discrete wavelet transform circuit and image compression method using same
CN101534439A (en) Low power consumption parallel wavelet transforming VLSI structure
Yang et al. A block-based architecture for lifting scheme discrete wavelet transform
Nagabushanam et al. FPGA Implementation of 1D and 2D DWT Architecture using modified Lifting Scheme
Gnavi et al. Wavelet kernels on a DSP: a comparison between lifting and filter banks for image coding
CN102281437B (en) Lifting structure two-dimensional discrete wavelet transform interlaced scanning method for image compression
Zhang et al. Memory-efficient high-speed VLSI implementation of multi-level discrete wavelet transform
CN107577834A (en) A kind of two-dimensional discrete wavelet conversion architecture design based on boosting algorithm
Salehi et al. A block-based 2D Discrete Wavelet Transform structure with new scan method for overlapped sections
CN110365990A (en) A kind of quasi- lossless video encoding system in narrowband
Wu et al. Analysis and architecture design for high performance JPEG2000 coprocessor
CN102510489B (en) Method for realizing image compression based on folding flow line wavelet transformation
Yun et al. Adaptive directional lifting wavelet transform VLSI architecture
Cao et al. Efficient architecture for two-dimensional discrete wavelet transform based on lifting scheme
Darji et al. Memory efficient and low power VLSI architecture for 2-D lifting based DWT with dual data scan technique
Darji et al. High speed VLSI architecture for 2-D lifting Discrete Wavelet Transform
Srinivasarao et al. High Speed VLSI Architecture for 3-D Discrete Wavelet Transform
Guo et al. Efficient FPGA implementation of modified DWT for JPEG2000
Yang et al. Study on multiscale generalization of DEM based on lifting scheme
Cekli A Computationally Efficient Pipelined Architecture for 1D/2D Lifting Based Forward and Inverse Discrete Wavelet Transform for CDF 5/3 Filter.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130424