TW497356B

TW497356B - Architecture suitable for executing two-dimensional discrete wavelet transform

Info

Publication number: TW497356B
Application number: TW89102008A
Authority: TW
Inventors: Liang-Gee Chen; Po-Cheng Wu; Yuan-Chen Liu; Yeong-Kang Lai
Original assignee: Nat Science Council
Priority date: 2000-02-03
Filing date: 2000-02-03
Publication date: 2002-08-01

Abstract

The present invention provides an architecture suitable for executing two-dimensional discrete wavelet transform, which is applied in image compressing system for performing a plurality layers of frequency dividing operation to divide the frequency of an original image into a plurality of frequency bands. The architecture comprises: a transform module for dividing the frequency of an input image data into four frequency bands, wherein the frequency bands which are low in the horizontal and vertical directions are used as the input image data for the next layer of frequency dividing operation; a memory module for storing the frequency bands which are low in the horizontal and vertical directions; a multiplexer for selecting the image data to be fed into the transform module. To increase the hardware performance of the transform module in the design of the transform module, two techniques are utilized to increase the hardware efficiency. The first one is a multi-phase decomposing technique applied in a first order of horizontal filtering operation to decompose the coefficients of the filter into an odd number part and an even number part. The second one is a coefficient folding technique applied in a second order of vertical filtering operation to make every two coefficients of the filter share a set of multiplier, adder and register. Accordingly, the formed architecture for performing two-dimensional discrete wavelet transform has a 100% of hardware utilizing rate, a fast operating time, a regular data flow and a low control complexity.

Description

497356 五、發明說明（1) 本案係有關於二維離散波元轉換（two-dimensional discrete wavelet transform， 2-D DWT)，尤指一適用於執行二維離散波元轉換之硬體架構。在現今，各式各樣以能夠即時處理影像與聲音的微處理為’已經隨著超大型積體電路技術的蓬勃發展而逐漸被開發出來。而所謂的二維離散波元轉換（2 — D DWT)的技術f成為新一代影像處理系統的核心技術，在新一代的影像f縮標準如jPEG —2〇〇〇或MPeg-4，皆是採用2-D DWT的技術$做影像的壓縮與處理。二維離散波元轉換的技術發展至今其應用並不限於影像壓縮方面，諸如音訊處理、電月f圖學、數值分析、雷達目標辨識等方面皆有極大的應用價。而用以執行二維離散波元轉換技術的架構，主要是由複頻；慮波器（m u 11 i r a t e f i 11 e r s )所組成，由於其所需處理的資料量非常龐大，因此在如數位相機的實際應用上面’便具有舉足輕重的地位。而使用分離式F I R濾波器所實現之二維離散波元轉換’其具有以下的數學公式： Χ Llin\，n2)= κ-ικ-ι = ΣΣ^1).δ(Σ·2). 2,-0 2^-0 ΧΙΖ (2^1-21)(2^2-^2) (1 ΧΙΗ^ηΐ^η2)= κ-ι κ-ι = ΣΣ§(2·1)·Α(2·2)· Ζ|"0 -?ι)(2«2 (2) ^(«ι^2) = ΣΣΑ^ι)· ^¢2)-ϊι)(2«2 -i2) ( 3 ) 497356497356 V. Description of the Invention (1) This case is about two-dimensional discrete wavelet transform (2-D DWT), especially a hardware architecture suitable for performing two-dimensional discrete wavelet transform. Nowadays, a variety of micro-processing that can process images and sounds in real time has been gradually developed with the vigorous development of ultra-large integrated circuit technology. The so-called two-dimensional discrete wavelet conversion (2-D DWT) technology f has become the core technology of new-generation image processing systems. In the new-generation image f-shrinking standards such as jPEG-2000 or MPeg-4, both are Use 2-D DWT technology to do image compression and processing. The technical development of two-dimensional discrete wave element conversion has so far not been limited to image compression, such as audio processing, electronic graphics, numerical analysis, radar target identification, etc., which have great application costs. The architecture used to perform the two-dimensional discrete wave element conversion technology is mainly composed of complex frequency; mu 11 filter (mu 11 iratefi 11 ers), because of the huge amount of data it needs to process, It has a pivotal position in practical applications. The two-dimensional discrete wave element conversion using a discrete FIR filter has the following mathematical formula: χ Llin \, n2) = κ-ικ-ι = ΣΣ ^ 1) .δ (Σ · 2). 2 , -0 2 ^ -0 ΧΙZ (2 ^ 1-21) (2 ^ 2- ^ 2) (1 ΧΙΗ ^ ηΐ ^ η2) = κ-ι κ-ι = ΣΣ§ (2 · 1) · Α (2 · 2) · Z | " 0-? Ι) (2 «2 (2) ^ (« ι ^ 2) = ΣΣΑ ^ ι) · ^ ¢ 2) -ϊι) (2 «2 -i2) (3) 497356

五、發明說明（2) J { 丨-1尤-1 2|-〇2^-0 2 •^) (4) 上式中/ J疋一維離散波元轉換分頻的層數，ρ请处抑从長度’ g ( η )與h (： η )合别盘你、s、查a K疋濾波态的、古哭h(n)刀別為低通遽波器G(z)與高通濾波口口 H(z)的脈衝響應（impuise re 。r 1 π丄表所輸入的影像。㈠，化(_2)代咅圖請圖’其為一個三層之二維離散波元轉換示V. Description of the invention (2) J {丨 -1You-1 2 | -〇2 ^ -0 2 • ^) (4) In the above formula / J 疋 Number of layers for frequency division of one-dimensional discrete wave element conversion, ρ, please Suppression from the length 'g (η) and h (: η) discriminate you, s, check a K 疋 filter state, the ancient cry h (n) knife is a low-pass chirp waver G (z) and high-pass The impulse response of the filter port H (z) (impuise re. R 1 π)) is the input image. ㈠, the (2) generation is shown in the figure below. It is a three-dimensional two-dimensional discrete wave element conversion diagram

ί清波:第:F比丄evel)包含兩階（stage)，第一階執行水千濾波弟一匕執行垂直濾波，在第一層的分頻中，於入的影像大小為Nx N，所於屮66 %德i 4» , 頻中輸 X π υτ^ υπ-, 所輸出的&像為大小皆為N/2X N/2的入的^子頻帶（subband);在第二層的分頻珣、〜像為L L頻帶，所輸出的影像為大小皆為n / 4 X N/4的LLLH、LLHL及LLHH三個子頻帶；在第三層的分頻中，輸入的影像為LLLL頻帶，所輸出的影像為大小皆為 N/8x N/8的（LL)2LL、（LL)2LH、（LL)2HL及（LLVHH四個子頻帶，更多層的分頻可依此類推。ί Qingbo: No .: F than 丄 evel) contains two stages, the first stage performs water filtering and the first one performs vertical filtering. In the first frequency division, the input image size is Nx N. So at 屮 66% for i 4 », the frequency is input X π υτ ^ υπ-, and the output & image is a subband with a size of N / 2X N / 2; in the second layer The frequency division 珣, ~ image is the LL frequency band, and the output image is three sub-bands: LLLH, LLHL, and LLHH, all of which are n / 4 XN / 4; in the third frequency division, the input image is the LLLL frequency band The output image is (LL) 2LL, (LL) 2LH, (LL) 2HL, and (LLVHH) four sub-bands, all of which are N / 8x N / 8. The frequency division of more layers can be deduced by analogy.

在許多現今用以執行二維離散波元轉換的架構中，最普遍的一種是平行濾波器架構（p a r a 1 1 e 1 f i 11 e r archi tecture)。平行濾波器架構的設計主要是採用了改良式遞迴金字塔演演算法（modified recursive pyramid algorithm, MRPA)，它將第二層及第二層以後的運算分散穿插在第一層的運算中。最初，MRPA是被應用在一維離散Among the many architectures used today to perform two-dimensional discrete wavelet transformations, the most common one is a parallel filter architecture (p a r a 1 1 e 1 f i 11 e r archi tecture). The design of the parallel filter architecture mainly uses a modified recursive pyramid algorithm (MRPA), which intersperses the operations of the second and subsequent layers in the operations of the first layer. Originally, MRPA was applied in one-dimensional discrete

第6頁 497356 五、發明說明（3) 波元轉換上，由於經過降頻（decimation)的運算，每一層所需處理的資料量為上一層的一半，因此全部所需處理的資料量為： j Σ N 一 N N 2 22Page 6 497356 V. Description of the invention (3) In wave element conversion, the amount of data to be processed in each layer is half of that in the previous layer because of the decrement operation, so the total amount of data to be processed is: j Σ N-NN 2 22

N + N …+-= 2'-1 (5) 上式中J為分頻的層數，N是第一層分頻中所需處理的資料量，N/ 2是第二層分頻中所需處理的資料量，依此類推，N + N… +-= 2'-1 (5) In the above formula, J is the number of frequency division layers, N is the amount of data to be processed in the first frequency division, and N / 2 is the second frequency division. The amount of data to process, and so on,

N/ 2 ^是第J層分頻中所需處理的資料量，若J夠大，則（5 ) 式將變成： 2(l-2-J)N^2N= N+N ( β )N / 2 ^ is the amount of data to be processed in the frequency division of the Jth layer. If J is large enough, then the formula (5) will become: 2 (l-2-J) N ^ 2N = N + N (β)

由此可知，第二層及其以後所需處理的資料量（Ν ) 與第一層所需處理的資料量（Ν )相同，所以第一層的運算時間可以被填滿。請參見第二圖，此時每一層的運算時間都被填滿，硬體能夠達到充份的利用，因此，MRPA在一維離散波元轉換的技術上是適用的。然而，吾人卻發現MRP Α並不適用於二維離散波元轉換的技術上。請參見第三圖，此時每一層所需處理的資料量只有上一層的四分之一，因此全部所需要處理的資料量如下：It can be seen that the amount of data (N) to be processed in the second layer and beyond is the same as the amount of data (N) to be processed in the first layer, so the computing time of the first layer can be filled. Please refer to the second figure. At this time, the calculation time of each layer is filled, and the hardware can be fully used. Therefore, MRPA is applicable to the technology of one-dimensional discrete wave element conversion. However, we have found that MRP Α is not suitable for the technology of two-dimensional discrete wave element conversion. Please refer to the third figure. At this time, the amount of data to be processed in each layer is only a quarter of the previous layer, so the amount of data to be processed is as follows:

第7頁 497356 五、發明說明（4) N2-l· N2 N2 N2 N2 /7 + 7 +…+^ΓΓ (7) 其中J是分頻的層數，N 2是第一層分頻中所需處理的資料量，N 2/ 4是第二層分頻中所需處理的資料量，依此類推，N 2/4 I-1是第J層分頻中所需處理的資料量，當J夠大時，（7 )式將變成： -{\-A^j)N2^-N2 = N2^-N2 / r 3 3 3 ( i 由此可知第二層及其以後所需處理的資料量（N 2/ 3 ) 只有第一層（N 2)的三分之一，因此第一層的運算時間無法被填滿，造成硬體閒置與利用率不高，且需複雜的控制電路來處理各層之間相互交錯的資料流。請參見第四圖，其為一平行濾波器架構的示意圖，其主要包含四個濾、波器：Hor 1、Hor 2、Ver 1、Ver 2，以及兩個做行歹轉換的記憶體（transpose memory ): Storage 1與Storage 2。Hor 1專門執行第一層的水平濾波，Η 〇 r 2則執行第二層及以後所有的水平濾波，V e r 1 與 V e r 2則執行所有的垂直濾波。請參見第五圖，其為第四圖之平行濾波器在執行二維離散波元轉換時，四個濾波器的工作分配圖，由第五圖吾人可以求出四個濾波器在不同的分頻層數時，其個別與整體平均的硬體利用率，其中Page 7 497356 V. Description of the invention (4) N2-l · N2 N2 N2 N2 / 7 + 7 +… + ^ ΓΓ (7) where J is the number of frequency division layers, and N 2 is the frequency division layer in the first layer. The amount of data to be processed, N 2/4 is the amount of data to be processed in the second layer frequency division, and so on, N 2/4 I-1 is the amount of data to be processed in the J layer frequency division. When When J is large enough, the formula (7) will become:-{\-A ^ j) N2 ^ -N2 = N2 ^ -N2 / r 3 3 3 (i From this we can know the data to be processed in the second layer and later The amount (N 2/3) is only one-third of the first layer (N 2), so the computing time of the first layer cannot be filled, resulting in idle hardware and low utilization rate, and requires complex control circuits to Processes the interleaved data streams between layers. Please refer to Figure 4, which is a schematic diagram of a parallel filter architecture, which mainly includes four filters and wave filters: Hor 1, Hor 2, Ver 1, Ver 2, and two Transpose memory: Storage 1 and Storage 2. Hor 1 performs the horizontal filtering of the first layer, and 〇〇 2 performs the second horizontal filtering and all subsequent filtering, V er 1 and V er 2 performs all See the fifth figure, which is the parallel filter of the fourth figure, when the two-dimensional discrete wave element conversion is performed, the work distribution diagram of the four filters, from the fifth figure we can find four filters When the number of frequency division layers is different, its individual and overall average hardware utilization, where

497356 五、發明說明（5) J為分頻的層數： Hor 1 : 1497356 V. Description of the invention (5) J is the number of frequency division layers: Hor 1: 1

Ver Σ 1 · U 丄+ 2-4^1 2 8 32 2_4J_1 3 er Σ Z-l 2-4^1 2 8 32 2-4^1 :(1 - 4,Ver Σ 1 · U 丄 + 2-4 ^ 1 2 8 32 2_4J_1 3 er Σ Z-l 2-4 ^ 1 2 8 32 2-4 ^ 1: (1-4,

Hor έ士…·^4(i—4_(， (9) (10) (ID (12)Hor 士士 ... · ^ 4 (i—4_ (, (9) (10) (ID (12)

Average : (Hor 1 + Ver 1 + Ver 2 + Hor 2) (13) 二鲁(1-4_J)Average: (Hor 1 + Ver 1 + Ver 2 + Hor 2) (13) Erlu (1-4_J)

第一表列出平行濾波器架構在不同的分頻層數時之硬體利用率，吾人可發現在只有一層的分頻時，平行濾波器架構的硬體利用率只有5 0 %，隨著分頻層數的增加，利用率將收斂至6 6. 6 7%，因此其硬體利用率不高，需要較長的運算時間，這是平行濾波器架構，也是目前所採用之用以執行二維離散波元轉換之硬體架構設計上最大的缺點。職是之故，吾人鑑於習知架構的缺失，乃經悉心鑽研The first table lists the hardware utilization of the parallel filter architecture at different frequency division levels. We can find that with only one frequency division, the hardware utilization of the parallel filter architecture is only 50%. As the number of frequency division layers increases, the utilization rate will converge to 6 6. 6 7%, so its hardware utilization rate is not high and it requires a long computing time. This is a parallel filter architecture and is currently used to implement The biggest disadvantage of the hardware architecture design of two-dimensional discrete wave element conversion. Due to the lack of knowledge, I have studied carefully

第9頁 497356 五、發明說明（6) 並一本鍥而不捨的精神，終發展出一具有1 0 0私更體利用率、快速的運算時間、規則資料流與低控制複雜度的架構。以下為本案之說明。本案的目的在於發展一適用於執行二維離散波元轉換之架構，使得該架構具有1 0 0 %的硬體利用率、快速的運算時間、規則的資料流與低控制複雜度。根據本案構想，本案為一適用於執行二維離散波元轉換（two-dimensional discrete wavelet transform， 2D-DWT)之架構，應用於影像壓縮的系統上，用以執行複數層之分頻運算，以將一原始影像資料分頻成複數個頻帶，其包含：一轉換模組，用以將一輸入的影像資料分頻成四個頻帶，其在水平方向與垂直方向均為低頻的頻帶作為下一次分頻運算之輸入影像資料；以及一多工器，用以選取預定要饋入該轉換模組的影像資料。根據本案構想，其中該架構更包含了一記憶體模組，用以儲存該在水平方向與垂直方向均為低頻的頻帶。根據本案構想，其中該轉換模組包括了一用以執行水平濾波運算的第一階與一用以執行垂直濾波運算的第二階。根據本案構想，其中該第一階的濾波器的係數係採用多相分解的技術以分解為奇數編號與偶數編號兩部份。根據本案構想，其中該第二階的濾波器的係數係採用係數折疊的技術使得每兩個該係數共用一組乘法器、加法器與暫存器。Page 9 497356 V. Description of the invention (6) Combining the spirit of perseverance, a framework with 100% private utilization rate, fast calculation time, regular data flow and low control complexity was developed. The following is a description of this case. The purpose of this case is to develop an architecture suitable for performing two-dimensional discrete wave element conversion, so that the architecture has 100% hardware utilization, fast computing time, regular data flow, and low control complexity. According to the idea of this case, this case is a framework suitable for performing two-dimensional discrete wavelet transform (2D-DWT), which is applied to an image compression system, and is used to perform complex frequency division calculations. Dividing an original image data into a plurality of frequency bands, including: a conversion module for frequency dividing an input image data into four frequency bands, and a frequency band with low frequency in the horizontal and vertical directions is used as the next time Input image data for frequency division operation; and a multiplexer for selecting image data that is intended to be fed into the conversion module. According to the concept of the present case, the architecture further includes a memory module for storing the frequency band with low frequency in both the horizontal direction and the vertical direction. According to the concept of the present case, the conversion module includes a first stage for performing a horizontal filtering operation and a second stage for performing a vertical filtering operation. According to the idea of the present case, the coefficients of the first-order filter are decomposed into an odd-numbered part and an even-numbered part using a polyphase decomposition technique. According to the idea of the present case, the coefficients of the second-order filter adopt a coefficient folding technique so that every two of the coefficients share a set of multipliers, adders, and registers.

第10頁五、發明說明（7) 根據本案構想，其中第二裔，其包含：複數個暫存器區換°亥暫存器係為一列暫存分頻層數；複數個一對二且其個數為該架構之 demultiplexer)，八 η 為（lx 2 以接受一暫存器區i的輸*2複數個暫存塊之間 ::-個輸出成為下一個暫存器二的對二解二：成為該列暫存器的輸出；以及複數個-個輸出 Si“al)’分別電連接於每一個個 '擇^號線（select 該1二解多工器的輸出。冑-解多工器以選擇Page 10 V. Description of the invention (7) According to the idea of the present case, the second generation includes: a plurality of register regions, and the register is a row of temporary frequency division layers; a plurality of one-to-two and The number is the demultiplexer of the architecture, and eight η is (lx 2 to accept the input of a register area i * 2 between a plurality of temporary storage blocks ::-one output becomes the next register two to two Solution 2: Become the output of the register of the column; and a plurality of-output Si "al" 'are electrically connected to each of the' select ^ 'lines (select the output of the 1x2 multiplexer. 胄 -solution Multiplexer to select

根據本案構想，其中該記憶體模組的大小為該原始影 ^賁料大小的四分之一。本案的另一方面為一適用於執行二維離散波元轉換 (tw〇—dimensional discrete wavelet transform, 2D〜DWT)之架構，應用於影像壓縮的系統上，用以執行單 —層之分頻運算，以將一原始影像資料分頻成四個頻帶，其包含：一轉換模組，用以將該原始影像資料分頻成四個頰帶。根據本案另一方面的構想，其中該轉換模組包括了一 $以執行水平濾波運算的第一階與一用以執行垂直濾波運算的第二階。根據本案另一方面的構想，其中該第一階的濾波器的係、數係採用多相分解的技術以分解為奇數編號與偶數編號兩部份。根據本案另一方面的構想，其中根據該第二階的濾波According to the concept of the present case, the size of the memory module is a quarter of the size of the original image. Another aspect of the case is a framework suitable for performing two-dimensional discrete wavelet transform (2D ~ DWT), which is applied to an image compression system to perform single-layer frequency division operations. To divide an original image data into four frequency bands, which includes: a conversion module for dividing the original image data into four cheek bands. According to another aspect of the case, the conversion module includes a first stage for performing a horizontal filtering operation and a second stage for performing a vertical filtering operation. According to another aspect of the case, the system and number system of the first-order filter are decomposed into odd-numbered and even-numbered parts using a polyphase decomposition technique. According to another aspect of the present case, wherein the filtering according to the second order

第11頁 497356 五、發明說明（8) 一~' 器的係數係採用係數折疊的技術使得每兩個該係數共用一組乘法器、加法器與暫存器。本案得藉由下列圖示詳細說明，俾得一更深入之解。 ’、本案圖示中所包含之各元件列示如下：轉換模組 71 多工器 72 記憶體模組7 3 濾波器81 降頻器8 2 請參見第六圖，其為本案之執行二維離散波元轉換之硬體架構圖，其包含一轉換模組71 (transf ornl module)、一多工器72 (multiplexer)與一記憶體模組73 (RAM module)，其中該記憶體模組73的大小為Ν/2χ N/2。其執行分頻運算的方法說明如下：在第一層的分頻中，多工7 2選取輸入的影像進入轉換模組7 1，轉換模組7以等其分頻成LL、LH、HL、HH四個子頻帶，並將LL頻帶存回記憶體模組7 3中。如此完成第一層的分頻後，多工器7 2再從記憶體73選取資料，然後將LL頻帶送入轉換模組7 1，以進行第二層的分頻將LL頻帶分成LLLL、LLLH、LLHL、LLHH四個子頻帶，並將LLLL頻帶存回記憶體模組73中。如此完成第二層的分頻後，多工器7 2再從記憶體7 3選取資料，然後將LLLL頻帶送入轉換模組71，以進行第三層的分頻將LLLL 頻帶分成（LL)2LL、（LL)2LH、（LL)2HL、（LL)2HH四個子頻帶，並將（LL) 2LL頻帶存回記憶體模組73中。如此重複進行，直到達到所欲的第J層為止。倘若分頻的層數為單一Page 11 497356 V. Description of the invention (8) The coefficients of the ~ 'device use the technology of coefficient folding so that every two of the coefficients share a set of multipliers, adders and registers. This case can be explained in more detail with the following illustrations to gain a deeper understanding. 'The components included in the illustration of this case are listed as follows: Conversion module 71 Multiplexer 72 Memory module 7 3 Filter 81 Frequency converter 8 2 Please refer to the sixth figure, which is the two-dimensional implementation of this case Hardware architecture diagram of discrete wave element conversion, which includes a transf ornl module 71, a multiplexer 72, and a memory module 73, wherein the memory module 73 The size is N / 2χ N / 2. The method for performing the frequency division operation is described as follows: In the first layer of frequency division, the multiplexer 7 2 selects the input image and enters the conversion module 71. The conversion module 7 waits for its frequency division to be LL, LH, HL, HH four sub-bands, and stores the LL band back into the memory module 73. After the frequency division of the first layer is completed in this way, the multiplexer 7 2 selects data from the memory 73 and sends the LL frequency band to the conversion module 7 1 for frequency division of the second layer. The LL frequency band is divided into LLLL and LLLH. Four sub-bands, LLHL, LLHH, and stores the LLLL band back into the memory module 73. After completing the frequency division of the second layer in this way, the multiplexer 7 2 selects data from the memory 7 3 and sends the LLLL frequency band to the conversion module 71 for frequency division of the third layer to divide the LLLL frequency band (LL). 2LL, (LL) 2LH, (LL) 2HL, (LL) 2HH four sub-bands, and store the (LL) 2LL band back into the memory module 73. This is repeated until the desired layer J is reached. If the number of frequency division layers is single

第12頁 497356 五、發明說明（9) 層，亦即只需做一次分頻運算的話，則記憶模組7 3與多工器7 2就可以省去，該轉換模組71的輸出即是LL、LH、HL、 HH四個子頻帶。如此的作法優點是資料流（d a t a f 1 〇 w)相當規則，因此我們可以將注意力集中在轉換模組的設計上。請參見第七圖，其中該轉換模組為樹狀結構 (tree-structured) »其包含兩階，第一階執行水平濾波運算，第二階執行垂直濾波運算，而為了有效的設計轉換模組，假設第一階所需的硬體面積為a，運算時間為t，由第七圖可以看出原來的設計中，第二階所需的濾波器數目是第一階的兩倍，也就是說，第二階的所需的硬體面積為 2 a。另一方面，由於第一階有執行降頻運算，第二階中每個濾波器所需處理的資料量會是第一階的一半，因此第二階所需的運算時間為t / 2，但因第二階的運算必須等待第一階的運算，於是造成第二階的硬體效能上有2ax (t-t/2) = at的閒置，也就是說，轉換模組的原先設計方式是沒有效率的。為了解決此等問題，吾人首先針對單一的降頻濾波器 (decimation filter)來做考慮。請參見第八圖，該降頻濾波器乃由一濾波器81 (fi Iter)後面接著一個降頻率為2 的降頻器8 2 (t w 〇 - f ο 1 d e d d e c i m a t 〇 r )所組成。然而因為降頻器的關係，使得濾波後的資料，每兩筆中就有一筆被捨棄，造成濾波器硬體上極大的浪費，因此，吾人採用兩種不同的技術以增加其硬體效率，第一種是多相分解Page 12 497356 V. Description of the invention (9) layer, that is, if only one frequency division operation is needed, the memory module 7 3 and the multiplexer 7 2 can be omitted, and the output of the conversion module 71 is LL, LH, HL, HH four sub-bands. The advantage of this approach is that the data flow (d a t a f 1 〇 w) is quite regular, so we can focus our attention on the design of the conversion module. Please refer to the seventh figure, where the conversion module is tree-structured »It contains two stages, the first stage performs the horizontal filtering operation, and the second stage performs the vertical filtering operation. In order to effectively design the conversion module Suppose that the hardware area required for the first order is a and the operation time is t. From the seventh figure, we can see that in the original design, the number of filters required for the second order is twice that of the first order, that is, Say, the required hardware area for the second stage is 2a. On the other hand, because the first stage performs a frequency reduction operation, the amount of data required for each filter in the second stage will be half of the first stage, so the operation time required for the second stage is t / 2, However, because the second-stage operation must wait for the first-stage operation, the 2nd-stage hardware performance has 2ax (tt / 2) = at idle, which means that the original design of the conversion module was not efficient. In order to solve these problems, we first consider a single reduction filter. Referring to FIG. 8, the frequency reduction filter is composed of a filter 81 (fi Iter) followed by a frequency reducer 8 2 (t w 〇-f ο 1 d e d d e c i m a t 〇 r). However, because of the downconverter, every two strokes of the filtered data are discarded, resulting in a great waste of filter hardware. Therefore, I have adopted two different techniques to increase its hardware efficiency. The first is multiphase decomposition

第13頁 497356 五、發明說明（10) (polyphase decomposition)技術，如第九圖所八豆、採用多相分解技術的降頻濾波器，它將濾波器1 ’赵、八為、偶數編號與奇數編號兩部份，在偶數的時脈週栖二成的資料進入奇數編號部份並與奇數編號係數相乘二，入的時脈週期裡，輸人的資料進人偶數編號部份並與號係數相乘，輸出的結果為奇數部份與偶數部份奸用多相分解技術後，由於内部的工作時脈速率 = 輸入時脈速率的-_，因此吾人可以將資料輪人的卜== 间兩彳σ，如此一來資料量相同，運算時間減半，㈣出採㈣相分解技術的降頻= 第十圖第 2 技：=:2L=f=rf°ldlng)，如 (muitiPlier)、加法器（adder)及暫存器（r s 切換器（switch)切換資料流的路徑’其運作原理下^首先針對ΡΕ0來說明，在第〇個時脈週期，輸人 =〇)與係數al相乘，並加上暫存器以中的值（初始為、 )’而，、結果a 1 X ( 0 )再存入暫存器R〇。在第_ 7调期，輸入的資料x(1)與係數a〇相乘，並加上暫存器Μ中 ::時：、二Χ(〇二’然後輸出其結果a°X(1)+ alx(0)。在第 :寺：週期’輸入的資料χ(2)與係數存，m的值，即a2x⑴+a3x(〇)，然後其結果力上暫 a2x(l) + a3x(〇)再存入暫存器r〇〇 ( 的資料x(3)與係數a0相乘W，並 =週期’輸入刀上f存為R0中的值，即Page 13 497356 V. Description of the invention (10) (polyphase decomposition) technology, such as the eight beans in Figure 9 and the polyphase decomposition technology of the down-frequency filter, it will filter 1 'Zhao, Bawei, even number and Odd numbered two parts. In the even numbered clock cycle, 20% of the data enters the odd numbered part and is multiplied by two with the odd numbered coefficient. In the entered clock cycle, the input data is entered into the even numbered part and compared with The coefficients are multiplied, and the output is the odd part and the even part. After using polyphase decomposition technology, because the internal working clock rate = -_ of the input clock rate, we can turn the data around. = Between two 彳 σ, so that the amount of data is the same, the operation time is halved, and the frequency reduction of the mining phase decomposition technology = the second technique of the tenth figure: =: 2L = f = rf ° ldlng), such as (muitiPlier ), An adder (adder) and a register (rs switch) switch the path of the data stream 'the working principle ^ First of all, it will be explained for PE0, in the 0th clock cycle, input = 0) and coefficient multiply al and add the register to the value in (initial ,,) 'and Results a 1 X (0) is stored again in register R〇. In the 7th tuning period, the input data x (1) is multiplied by the coefficient a0, and is added to the register M ::::, two X (〇二 'and then the result a ° X (1) + alx (0). In the first: Temple: Period 'input data χ (2) and coefficients, the value of m is a2x⑴ + a3x (〇), and then the result is temporarily a2x (l) + a3x (〇 ) And then stored in the register r〇〇 (the data x (3) multiplied by the coefficient a0 W, and = cycle 'input f is stored as the value in R0, ie

第14頁五發明說明（11) a m^a2X(ma3x(〇)，然後輸出其結果類施，接下來的時脈週期以 ^ . οσ 1的運作原理與PE0類似。因每兩個係數共用此乘法态、加法器及暫左。σ π〜知體面積由Α降至Α/2匕所以這個技術約可將所需的:; 滤波器之時序s。$二表列出綱數折疊技彳标的降^ 上if =:社吾人分別對第-階與第二階的降頻濾波器浐上迹兩種技術，如士 # 〜知用四表所示，吾人可發現：得到四種不同的設計方法’如第術，而第二階採若第一階採用採用多相分解技體面積A盥運算時門、版折受技術，則第一階與第二階的石更為2a，總運算時間0 降為认ΐ/2。此時，總硬體面積有任何濾、波器處於閒置^其/乘降為^，且没效率可提高成原來的二户:對& 〃 = °兄轉換杈組的整體它設計方法1會造H :弟四表中所列出的其形，所以並非e & 苐一階濾波器的硬體有閒置的情所j並非疋有效率的設計策略。 ^ raster 儲存一列的資料以勃/此母個係數需要一個列暫存器來暫存為必須改成列暫存考。社夂目楚本所以第十圖中的的暫存琴改Λ列輕—° 見弟十一圖。其為圖十中輸入列會產生— Λ (n)表第_的資料，每兩個構，其由選擇V/V“t二圖(a;為列暫、释 L 旒（select Slgnal)，依序為 N/2J，The description of the invention on page 14 (11) am ^ a2X (ma3x (〇), and then output the results. The next clock cycle is similar to PE0 with ^. Οσ 1 because it is shared by every two coefficients. Multiplying state, adder and temporarily left. Σ π ~ the body area is reduced from Α to Α / 2, so this technique can reduce the time required for :; the filter timing s. The second table lists the outline number folding technique. The target reduction ^ on if =: the company I have two techniques for the first-order and second-order frequency reduction filters, as shown in the four tables, we can find: get four different The design method is the same as the first technique, and the second stage adopts the multi-phase decomposition technique when the area area A is used to calculate the door and plate folding technology. Then the first and second stage stones are more 2a. The calculation time 0 is reduced to ΐ / 2. At this time, any filters and wave filters in the total hardware area are idle ^ its / multiplying to ^, and the inefficiency can be improved to the original two households: pair & 〃 = ° The overall design method of the brother conversion branch group will make H: the shape listed in the fourth table, so the hardware of the first-order filter is not idle. Therefore, j is not an efficient design strategy. ^ Raster stores a row of data in order to store the data. This coefficient requires a column register to temporarily store it. It must be changed to a column temporary test. The Temporary Qin column of 改 in the figure is light — ° See the eleventh figure of the brother. It is the data of the input column in Figure 10 — Λ (n) Table _, for every two structures, it is selected by V / V " t two graphs (a; is the column temporary, release L 旒 (select Slgnal), in order N / 2J,

第15頁 497356 五、發明說明（12) n/2"，…’ N/8，N/4，N/2’ 及 J個暫存器區塊，Page 15 497356 V. Description of the invention (12) n / 2 ", ... ’N / 8, N / 4, N / 2’ and J register blocks,

大小依序為 N/2' N/2' N/2J' N/2J_2,…，N/16 n/R D :組成，而J為分頻層數。在不同層的分頻時’，立選擇^號與暫存器大小的關係說明如下： ^ ，第一層的分頻時，選擇信號N/2為卜暫存器的大小為所有j個暫存器區塊之和、餘為〇,此時列 +… (14) 此在第二層的分頻時，選擇時列暫存器的大小為前面j-丨個1 ’其餘為〇，廿裔區塊之知· 16 =第三層的分頻時，選擇信號N/8為 (15) 時列暫存器的大小為前面J-2個暫存器1，其餘為〇,此塊之和： ^ N N Ν ν fj ν+·^+ρ^_··+ί=^The size is N / 2 'N / 2' N / 2J 'N / 2J_2, ..., N / 16 n / R D: composition, and J is the number of frequency division layers. At the time of frequency division in different layers, the relationship between the number ^ and the size of the register is explained as follows: ^ When the frequency division of the first layer, the selection signal N / 2 is the size of the register and all j registers The sum of the memory block and the remainder is 0. At this time, the column + ... (14) When the frequency division of the second layer is selected, the size of the column register at the time of selection is the first j- 丨 1's and the rest is 0, 廿Knowledge of the sub-block · 16 = When the frequency of the third layer is divided, when the selection signal N / 8 is (15), the size of the column register is the first J-2 register 1 and the rest is 0. And: ^ NN Ν ν fj ν + · ^ + ρ ^ _ ·· + ί = ^

16 R (16) 依此類推，在第J-2層的分頻時， 1，其餘為0，此時列暫存器的大小Λ、，込擇信號Ν/Π為 ]為W面3個暫存器區塊16 R (16) By analogy, when the frequency division of the J-2 layer is 1, the rest is 0. At this time, the size of the column register Λ, and the selection signal N / Π are] W are 3 Register block

五、發明說明（13) 之和·V. Sum of Invention Description (13) ·

N NN N

N 17) I此時:分頻時，選擇信號N/2、1，A終 j暫存益的大小為前面2個暫存器區其餘為〇， A <和： (18) 此在第J層的分頻時，選擇信號N/2 時列暫存器的大小為前面第一個為^餘為〇，因此，在第一層的分頻中暫以二大：： N/2，可以儲存筮 s八姑a ，〜曰存态的長度為 I料。接下來在第/的V :，平^ ^ ^ ^ N/4 , ^ ^ 的”中再減半物i 6，依此類推，减/為=在2四層十二圖⑷中之電連接於該弟十:圖η:為第 I及其輸入~輪出關係。。中之一對一解夕工器，J見第十三圖，其為本案採這兩種技術所設計的轉換模：：設： 1…個係數:a。、小a2、a3,高通遽N 17) I: at the time of frequency division, the selection signal N / 2, 1, the size of the temporary storage benefit of A at the end of the two previous register areas is 0, A < and: (18) In the division of the J layer, the size of the column register when the signal N / 2 is selected is the first one, and the remainder is 0. Therefore, in the division of the first layer, the two are temporarily divided into two: N / 2,筮 s 八姑 a can be stored, and the length of the existing state is I. Next, in the / th V :, flat ^ ^ ^ ^ N / 4, ^ ^ of "", and then halve the i 6, and so on, minus / is = the electrical connection in the second and fourth layer of the twelfth figure Yu Dixi: Figure η: It is the first and its input-turn-out relationship. One of them is a solution tool, J see figure 13. It is a conversion model designed by using these two technologies for this case. ：： set ： 1… coefficients: a., Small a2, a3, high pass 遽

五、發明說明（14) bO、 bl、 b2、如… F IR直接幵彡★步弟階的降頻濾波器中，五丘用；to PI V式來製作，所以低通濾波器鱼高Έ〇人採用 :::相同的暫存器。雖然吾人假設第一、:通Jt波器可以 :有：同的長I，但在實際的應用上並不：弟二階的遽波 :’二於在第—階的降頻濾波器採用多相：：要的，此脈速率的一半。第十四圖為本】部資料輸入時的不意圖，以8x 8的影像區塊為例，經、_ f散波元轉換最後在第三層得到4個lx 1的像素。第丄矣一層的分頻後，的時序關係，其中的時脈週期是以内部、列出第十四圖個時脈週期有兩筆資料輸入，第丨至31個時脱為準，因此每 -層的分頻’帛32至39個時脈週期執行第週'執行第個時脈週期執行第三層的分頻。構：有規則性’因此容易擴充至多層的分頻，+受種類的限制。又應波為係數在今日執行二維離散波元轉換的架構中，較具代表性的有：平行濾波器架構（Parallel f ilter 八 architecture)、直接架構（direct architecture)、非分離式架構（non-separable architecture)、SIMD架構 (SIMD archi t e c t u r e )、心跳式陣列平行架構 (systolic-parallel architecture)。請參見第七表，吾人將本案所提之架構與上列的代表性架構做一比較，比較項目包括乘法器個數、加法器個數、記憶體大小、運算時間、控制的複雜度、硬體利用率等。運算時間已經調整為V. Description of the invention (14) bO, bl, b2, such as ... F IR is directly 幵彡 ★ In the step-down frequency reduction filter, Wuqiu is used; to PI V is used to make it, so the low-pass filter is high. 〇 people use ::: the same register. Although we assume that the first and the following Jt wave filters can: have: the same length I, but in practical applications it is not: the second-order chirp wave: 'two is used in the first-order down-frequency filter using polyphase :: Yes, half of this pulse rate. The fourteenth figure is the intention of the data input. Taking an 8x8 image block as an example, the _f scattered wave element conversion finally obtains 4 lx 1 pixels in the third layer. After the division of the first layer, the timing relationship of the clock cycle is based on the internal and listed in the fourteenth clock cycle. There are two data inputs. -Frequency division of the layer '帛 32 to 39 clock cycle execution cycle' Perform the third clock cycle to perform the frequency division of the third layer. Structure: It has regularity ', so it is easy to expand to multiple frequency divisions, + limited by the type. The response wave is the coefficient. Among the architectures that perform two-dimensional discrete wave element conversion today, the more representative are: Parallel filter architecture (direct architecture), direct architecture, non-separated architecture (non -separable architecture), SIMD architecture, and systolic-parallel architecture. Please refer to the seventh table. I compare the architecture mentioned in this case with the representative architecture listed above. The comparison items include the number of multipliers, the number of adders, memory size, operation time, control complexity, hardware Body utilization, etc. Calculation time has been adjusted to

第18頁 497356 五、發明說明（15) 相同的時脈速率，單位Λ 度，ρ是影像大小，；為'時：週期’參數Κ是滤波器的長算如下： J為刀頻層數，而本案的運算時間計 τ: + -^ +ill. 4. -l 2 19) 1 2 4 42 43 +- + ^r) = |(l-4-^ 上式中，最前面的因子1/2 θ 率是外部資料輸入時脈速率=一:本案之内部工作時脈速速度下，吾人可將資料輸入的時脈丄在相同的硬體低所需運算時間。從第七卜上、率提鬲為兩倍，以降構明顯優於其它架構， θ較=果顯不本案所提之架度、硬體利用率等。疋運异時間、控制的複雜此外’吾人針對本案盘目田對於在不同的分頻層數時，、盆=二遍的平行濾波器架構率做—比較。平行嗆、古时力，、所吊的運算時間與硬體利用它將第二居及篦一靥”从稱^木用了 MRPA的方式來設計，中，第八表顯示比較的結果。插在弟-層的運算分頻層數時的運算時五弟十五圖顯示了兩者在不同 ° 可以發現只有在只有一層的刀頻時（J = 1 )，本案之運算時 ^(1^4-1) 時脈週期，P #八相g t f間為 3 )N2二0. 5N2個〜w ’ j近者分頻層數的掷 2 曰加（J > 4)，運算時間收斂至三0〜4^) 巧J。然而，平行濾波器架構Page 18 497356 V. Description of the invention (15) The same clock rate, unit Λ degree, ρ is the image size, and is 'time: period' parameter κ is the length of the filter as follows: J is the number of knife frequency layers, The calculation time of this case is τ: +-^ + ill. 4. -l 2 19) 1 2 4 42 43 +-+ ^ r) = | (l-4- ^ In the above formula, the first factor 1 / 2 θ rate is the external data input clock rate = one: At the pulse speed speed of the internal work in this case, we can set the clock of the data input to the same hardware as the required computing time. From the seventh, the rate It is doubled to reduce the structure significantly better than other architectures. Θ is more than the framework and hardware utilization rate mentioned in this case. 疋 The time and complexity of control are different. When the number of frequency division layers is different, the ratio of the parallel filter structure of the basin = two passes is compared—the parallelism, the ancient time force, the suspended operation time, and the hardware use will make it the second and the first. "Congmu uses MRPA to design. In the eighth table, the comparison results are shown. The calculation time when inserting the frequency division number in the brother-layer calculation is five. The fifteenth figure shows that the two are at different degrees. It can be found that only when there is only one level of knife frequency (J = 1), the calculation in this case ^ (1 ^ 4-1) clock cycle, P #eight phase gtf is 3 ) N2 2 0. 5N2 ~ w 'j The nearest division frequency layer is tossed 2 (J > 4), the operation time converges to 3 0 ~ 4 ^) Q. However, the parallel filter architecture

第19頁 3 Ν24· 6 7N_時脈週期 497356 五、發明說明（16) 總是需要時脈週期的運算時間m 顯不了兩者在不同分頻層數時的硬體五用辜，、有50/。，隨著分頻層數的增加（J >7)，硬體收斂至66 _ 6 7%，而本案所提出的架構能夠一直的硬體利用率。芦付1 u U /〇Page 19 3 Ν24 · 6 7N_Clock cycle 497356 V. Description of the invention (16) The operation time m of the clock cycle is always required. 50 /. As the number of frequency division layers increases (J > 7), the hardware converges to 66 -67%, and the architecture proposed in this case can always use the hardware. Lu Fu 1 u U / 〇

關於記憶體的需求，本案需要一個大小為N/2x \/2的圮憶體模組3來儲存運算時的中間資料（intermediate data)。但是，若所處理的影像已經存在記憶體中，如數位相機，則吾人可以使用相同的記憶體來儲存這些中間資料’即將原始影像當成（LL)頻帶的輸入。如此一來，本案將不需要記憶體模組，第七表中的N V 4即可去掉，如此所需的記憶體可大大減少。Regarding the memory requirements, this case requires a memory module 3 of size N / 2x \ / 2 to store intermediate data during the operation. However, if the processed image is already stored in the memory, such as a digital camera, we can use the same memory to store these intermediate data ', that is, the original image is used as the input of the (LL) band. In this case, the memory module will not be needed in this case, and NV 4 in the seventh table can be removed, so the required memory can be greatly reduced.

、-為滿足即時處理的需求，目前已有許多執行二維離散波兀$換的架構被提出來，然而硬體利用率不高及需較長的運算時間是目前這些架構的最大缺點。因此，本案係針對上述缺點提出一有效的架構，且該架構已經過Veri 硬體描述語言正確驗證過。綜上所述，本案所提之架構其優點為1 0 0 %的硬體利用率、快速的運算時間、規則的資料流與低控制複雜度，因此使得本案所提之架構非常適合於新一代的影像壓縮標準，如JPEG — 2〇〇〇及MPEG-4。欠是以’本案得由熟習此技藝之人士任施匠思而為諸般飾’然皆不脫如附申請專利範圍所欲保護者。In order to meet the needs of real-time processing, many architectures have been proposed for performing two-dimensional discrete wave conversions. However, the low hardware utilization and the long computing time are the biggest disadvantages of these architectures. Therefore, this case proposes an effective architecture for the above disadvantages, and the architecture has been correctly verified by the Veri hardware description language. To sum up, the advantages of the architecture proposed in this case are 100% hardware utilization, fast computing time, regular data flow and low control complexity, so the architecture proposed in this case is very suitable for the new generation. Image compression standards, such as JPEG-2000 and MPEG-4. The reason is that “this case can be decorated by people who are familiar with this skill, but they are all as good as those who want to protect the scope of patent application.

第20頁 497356 圖式簡單說明第一圖 • 其係為一個三層之二維離散波元轉換的示意圖第二圖 ; 其係為習用改良式遞迴金字塔演算法 (MRPA) 運用於一維離散波元轉換〔1- -D DWT)的示意圖；第二圖 • 其係為習用改良式遞迴金字塔演算法 (MRPA) 運用於二維離散波元轉換 (2- -D DWT)的示意圖；第四圖其係為習用平行濾波器架構示意圖；第五圖 • 其係為習用平行濾波器架構的工作分g i己圖；第六圖 • 其係為本案之執行二維離散波元轉換架構示意圖第七圖其係為一樹狀結構的轉換模組；第八圖其係為一降頻率為 2之降頻濾波器示意圖；第九圖其係為採用多相分解技術之降頻濾波器結構圖，第十圖其係為採用係數折疊技術之降頻濾波器結構圖 ; 第十一圖 • 其為第十圖之暫存器修改為列暫存器後之降頻濾波器結構圖 5 第十二圖（A )··其係為列暫存器的結構圖；第十二圖（B):其顯示了電連接於該列暫存器中之一對二解多工器及其輸入-輸出關係；第十三圖：其係為本案之轉換模組架構圖；第十四圖：其係為本案三層之二維離散波元轉換的示意圖；第十五圖：其係為本案之架構與習用平行濾波器架構之運算時間比較圖；以及第十六圖.·其係為本案之架構與習用平行濾波器架構之硬Page 497356 Schematic description of the first diagram • It is a schematic diagram of a three-layer two-dimensional discrete wave element conversion. The second diagram; it is a conventional modified recursive pyramid algorithm (MRPA) applied to one-dimensional discrete Schematic diagram of wave element conversion (1- -D DWT); Figure 2 • It is a schematic diagram of the conventional modified recursive pyramid algorithm (MRPA) applied to two-dimensional discrete wave element transformation (2--D DWT); Figure 4 is a schematic diagram of a conventional parallel filter architecture; Figure 5 is a diagram of the work of a conventional parallel filter architecture; Figure 6 is a schematic diagram of the implementation of a two-dimensional discrete wave element conversion architecture in this case. Fig. 7 is a conversion module with a tree structure; Fig. 8 is a schematic diagram of a down-frequency filter with a frequency reduction of 2; Fig. 9 is a structural diagram of a down-frequency filter using polyphase decomposition technology. The tenth picture is collected Structure diagram of the frequency reduction filter of the coefficient folding technology; Figure 11 • This is the structure of the frequency reduction filter after the register of Figure 10 is changed to the column register Figure 5 Figure 12 (A) ·· Figure 12 shows the structure of a column register; Figure 12 (B): It shows a pair of two-demultiplexer and its input-output relationship electrically connected to the register of the column; Figure 13: It is the structure diagram of the conversion module of the case; Figure 14: It is a schematic diagram of the three-dimensional two-dimensional discrete wave element conversion of the case; Figure 15: It is the structure of the case and the conventional parallel filter structure Comparison chart of calculation time; and Figure 16. It is the hardware of this case and the conventional parallel filter structure.

第21頁 497356 圖式簡單說明體利用率比較圖。Page 21 497356 Schematic illustration of body utilization comparison chart.

Claims

497356 VI. Application for patent scope 1. A structure suitable for performing two-dimensional discrete wavelet tr an sf〇rm, which is used to perform the frequency division operation of the 4th complex Lou Zhang layer to divide The original image data is divided into a plurality of frequency bands, including: a conversion module for dividing an input image data into four frequency bands, and a frequency band with low frequencies in the horizontal and vertical directions is used as the next frequency division The input image data for the operation; and a multiplexer for selecting the image data that is intended to be fed into the conversion module. 2. The architecture described in item 1 of the scope of patent application, wherein the architecture further includes a memory module for storing the frequency band with low frequency in both the horizontal and vertical directions. 3. The architecture described in item 1 of the patent application scope, wherein the conversion module includes a first stage for performing a horizontal filtering operation and a second stage for performing a vertical filtering operation. 4. The architecture described in item 3 of the scope of patent application, wherein the first-order filter coefficients are decomposed into odd-numbered and even-numbered parts using a polyphase decomposition technique. 5. The architecture described in item 3 of the scope of the patent application, wherein the second-order filter coefficient is a coefficient folding technique so that every two of the coefficients share a group of multipliers, adders, and registers. 6. The architecture described in item 5 of the scope of patent application, wherein the register is a row of registers, including: a plurality of register blocks, and the number of which is the frequency division layer of the architecture;

497356 6. The scope of the patent application is a plurality of one-to-two demultiplexers (lx 2 demultiplexer), which are electrically connected between the plurality of register blocks to accept the output of one register block as an input, and One output of the one-to-two demultiplexer becomes the input of the next register block, and the other output becomes the output of the register of the column; and a plurality of selection signal lines (se 1 ectsigna 1) are electrically connected respectively. Select the output of the one-to-two demultiplexer for each one-to-two demultiplexer. 7. The architecture described in item 2 of the scope of patent application, wherein the size of the memory module is a quarter of the size of the original image data. 8. —Applicable to a two-dimensional discrete wavelet transform architecture, which performs a single-layer frequency division operation to divide an original image data into four frequency bands, including: The conversion module is used for frequency-dividing an original image data into four frequency bands. 9. The architecture described in item 8 of the scope of patent application, wherein the conversion modules each include a first stage for performing a horizontal filtering operation and a second stage for performing a vertical filtering operation. 10. The architecture described in item 9 of the scope of the patent application, wherein the first-order filter coefficients are decomposed into odd-numbered and even-numbered parts using a polyphase decomposition technique. 11. The architecture as described in item 9 of the scope of patent application, wherein the second-order filter coefficients use a coefficient folding technique so that every two of the coefficients are shared

Page 24 497356 6. Scope of Patent Application A set of multipliers, adders and registers.

Page 25 111