TWI414186B - Dct/idct circuit - Google Patents
Dct/idct circuit Download PDFInfo
- Publication number
- TWI414186B TWI414186B TW098127040A TW98127040A TWI414186B TW I414186 B TWI414186 B TW I414186B TW 098127040 A TW098127040 A TW 098127040A TW 98127040 A TW98127040 A TW 98127040A TW I414186 B TWI414186 B TW I414186B
- Authority
- TW
- Taiwan
- Prior art keywords
- module
- dct
- data
- idct
- microcode
- Prior art date
Links
Landscapes
- Complex Calculations (AREA)
Abstract
Description
本發明涉及視訊編解碼,尤其涉及一種離散余弦轉換及其逆轉換電路。The invention relates to video codec, in particular to a discrete cosine transform and an inverse conversion circuit thereof.
離散余弦轉換及其逆轉換(Discrete Cosine Transform / Inverse Discrete Cosine Transform,DCT/IDCT)分別用於數位影像資料的編碼與解碼過程。在數位影像編碼過程中,影像通常被細分為許多8*8的圖素(Pixel)的方塊,再逐一對各方塊進行離散余弦轉換(DCT),以轉換為頻域(Frequency Domain)的資料形態,而解碼過程則將該頻域資料進行逆離散余弦轉換(IDCT),以還原為圖素資料。通常,執行一二維DCT/IDCT,可先進行一回一維列(Row)或行(Column)轉換,之後再進行一維行或列轉換來達成,這些運算過程包括了大量的蝴蝶運算。Discrete Cosine Transform and Inverse Discrete Cosine Transform (DCT/IDCT) are used in the encoding and decoding process of digital image data. In the process of digital image encoding, the image is usually subdivided into a number of 8*8 Pixel squares, and then discrete cosine transform (DCT) is performed on a pair of blocks to convert to a frequency domain (Frequency Domain) data form. And the decoding process performs inverse discrete cosine transform (IDCT) on the frequency domain data to be reduced to pixel data. Usually, performing a two-dimensional DCT/IDCT can be done by first performing a one-dimensional row or column transformation, and then performing a one-dimensional row or column conversion. These operations include a large number of butterfly operations.
目前,視訊編解碼協定類型多樣,主流的如H.264、WMV9、MPEG-2等,其編碼與解碼過程皆需用到DCT/IDCT運算。然,各種編解碼協定所對應的具體運算方法不一樣,由於實際完成這些運算的是底層硬體電路,所以也就相應的需要各種設計不同的底層硬體電路來配合各種不同的編解碼協定。基於市場需求,目前市面上的各類視訊產品皆需要支援複數編解碼協定,這樣就需要提供複數對應的運算電路,使得底層硬體電路具有設計複雜、體積龐大、成本高等缺點。At present, video coding and decoding protocols are of various types. For mainstream applications such as H.264, WMV9, MPEG-2, etc., DCT/IDCT operations are required for both encoding and decoding processes. However, the specific operation methods corresponding to various codec protocols are different. Since the actual completion of these operations is the underlying hardware circuit, correspondingly, various underlying hardware circuits of different designs are required to cooperate with various codec protocols. Based on market demand, all kinds of video products on the market need to support multiple codec protocols. Therefore, it is necessary to provide complex arithmetic circuits, which makes the underlying hardware circuits have the disadvantages of complicated design, large size and high cost.
有鑒於此,需提供一種離散余弦轉換及其逆轉換(Discrete Cosine Transform / Inverse Discrete Cosine Transform,DCT/IDCT)電路,用於實現各種類型的蝴蝶運算,以同時支援多種編解碼協定,並進一步提升編解碼效率。In view of this, it is necessary to provide a Discrete Cosine Transform / Inverse Discrete Cosine Transform (DCT/IDCT) circuit for implementing various types of butterfly operations to simultaneously support multiple codec protocols and further improve Codec efficiency.
本發明實施方式中的DCT/IDCT電路,用於實現多種類型的DCT/IDCT。每一DCT/IDCT包括複數蝴蝶運算,每一蝴蝶運算包括複數運算係數。DCT/IDCT電路包括微碼記憶體、控制器及蝴蝶運算電路。微碼記憶體用於存儲DCT/IDCT對應的微碼列表,其中,微碼列表包括多組微碼,每一DCT/IDCT對應一組微碼。控制器用於從微碼記憶體內查找所要實現的DCT/IDCT對應的微碼組,並依序讀取其中的微碼。蝴蝶運算電路用於根據控制器所讀取的微碼進行蝴蝶運算,以實現DCT/IDCT。蝴蝶運算電路包括係數寄存模組、選擇輸入模組、運算模組及結果寄存模組。係數寄存模組用於存儲蝴蝶運算的運算係數。結果寄存模組用於存儲蝴蝶運算的運算結果。選擇輸入模組用於根據控制器所讀取的微碼選擇係數寄存模組內存放的運算係數或結果寄存模組內存放的運算結果以輸出。運算模組用於根據控制器所讀取的微碼將選擇輸入模組輸出的資料進行運算,並將運算結果存入結果寄存模組。The DCT/IDCT circuit in the embodiment of the present invention is used to implement various types of DCT/IDCT. Each DCT/IDCT includes a complex butterfly operation, and each butterfly operation includes a complex operation coefficient. The DCT/IDCT circuit includes a microcode memory, a controller, and a butterfly operation circuit. The microcode memory is used to store a microcode list corresponding to the DCT/IDCT, wherein the microcode list includes multiple sets of microcodes, and each DCT/IDCT corresponds to a set of microcodes. The controller is configured to search the microcode group corresponding to the DCT/IDCT to be implemented from the microcode memory, and sequentially read the microcode therein. The butterfly operation circuit is used to perform a butterfly operation according to the microcode read by the controller to implement DCT/IDCT. The butterfly operation circuit includes a coefficient registration module, a selection input module, an operation module, and a result registration module. The coefficient registration module is used to store the operation coefficients of the butterfly operation. The result registration module is used to store the operation result of the butterfly operation. The input module is selected for outputting the operation coefficient stored in the module or the result stored in the result registration module according to the microcode selection coefficient read by the controller. The operation module is configured to calculate the data output by the input module according to the microcode read by the controller, and store the operation result in the result registration module.
藉由以下對具體實施方式詳細的描述並結合附圖,將可輕易的瞭解上述內容及此項發明之技術效果。The above and the technical effects of the invention can be easily understood from the following detailed description of the embodiments and the accompanying drawings.
請參閱圖1,所示為本發明一實施方式中之離散余弦轉換及其逆轉換(Discrete Cosine Transform / Inverse Discrete Cosine Transform,DCT/IDCT)電路10的模組圖。Referring to FIG. 1, a block diagram of a Discrete Cosine Transform (Inverse Discrete Cosine Transform, DCT/IDCT) circuit 10 according to an embodiment of the present invention is shown.
一般情況下,各種視訊編解碼協定所包括的DCT/IDCT皆可轉換為複數蝴蝶運算(Butterfly Operation),且視訊編解碼協定對此都有詳細規定,只是不同的視訊編解碼協定所對應的具體蝴蝶運算不同。因此,若增加蝴蝶運算的效率,DCT/IDCT的效率也將隨之提升。請參閱圖2,所示為WMV9協定的8*8的IDCT轉換為複數蝴蝶運算的示意圖。其中,C0、C1……C7等為DCT/IDCT的運算係數,-3、8、3、-5等為DCT/IDCT的運算常量,D0、D1……D7等為DCT/IDCT的運算結果。DCT/IDCT的運算係數根據圖素方塊的參數不同而不同,而DCT/IDCT的運算常量則針對每一種DCT/IDCT都是固定的。從圖中可以看出,8*8的IDCT最終都可以轉換為蝴蝶運算。其他視訊編解碼協定的各種IDCT也可以轉換為類似的蝴蝶運算,因為此轉換為習知技術,故在此不做贅敘。In general, the DCT/IDCT included in various video codec protocols can be converted into a complex butterfly operation (Butterfly Operation), and the video codec protocol has detailed provisions for this, except for the specific video codec protocol. Butterfly operations are different. Therefore, if the efficiency of the butterfly operation is increased, the efficiency of the DCT/IDCT will also increase. Referring to FIG. 2, a schematic diagram of the 8*8 IDCT conversion of the WMV9 protocol into a complex butterfly operation is shown. Among them, C0, C1, ..., C7, etc. are operation coefficients of DCT/IDCT, and -3, 8, 3, -5, etc. are operation constants of DCT/IDCT, and D0, D1, ..., D7, etc. are calculation results of DCT/IDCT. The operation coefficients of the DCT/IDCT differ according to the parameters of the pixel block, and the operational constants of the DCT/IDCT are fixed for each DCT/IDCT. As can be seen from the figure, the 8*8 IDCT can be converted to a butterfly operation. The various IDCTs of other video codec protocols can also be converted into similar butterfly operations, since this conversion is a conventional technique and will not be described here.
需要說明的是,圖2僅為示例,本發明所示的DCT/IDCT電路10並非僅針對圖2所示的IDCT運算,而是可以解決所有的IDCT運算,如2*2、4*4等,且本發明所示的DCT/IDCT電路10能支援多種視訊編解碼協定,如H.264、MPEG-2等協定。It should be noted that FIG. 2 is only an example. The DCT/IDCT circuit 10 shown in the present invention is not only for the IDCT operation shown in FIG. 2, but can solve all IDCT operations, such as 2*2, 4*4, and the like. The DCT/IDCT circuit 10 shown in the present invention can support a variety of video codec protocols, such as H.264, MPEG-2, and the like.
請再次參閱圖1,在本實施方式中,DCT/IDCT電路10包括微碼記憶體12、控制器13及蝴蝶運算電路14。Referring again to FIG. 1, in the present embodiment, the DCT/IDCT circuit 10 includes a microcode memory 12, a controller 13, and a butterfly operation circuit 14.
微碼記憶體12用於存儲DCT/IDCT對應的微碼列表。在本實施方式中,每一種視訊協定對應多種類型的DCT/IDCT,如2*2、4*4、8*8等。每一種DCT/IDCT又對應一組微碼。因每一種視訊協定都規定了其對應的DCT/IDCT及其包括的蝴蝶運算,所以微碼記憶體12內存儲了多組微碼,形成微碼列表,每一組微碼包括複數微碼。每一蝴蝶運算對應至少一微碼。在本實施方式中,微碼列表根據視訊協定預先設定好並存儲於微碼記憶體12內。The microcode memory 12 is used to store a list of microcodes corresponding to the DCT/IDCT. In this embodiment, each type of video protocol corresponds to multiple types of DCT/IDCT, such as 2*2, 4*4, 8*8, and the like. Each DCT/IDCT corresponds to a set of microcodes. Since each video protocol specifies its corresponding DCT/IDCT and its butterfly operation, the microcode memory 12 stores a plurality of sets of microcodes to form a microcode list, and each set of microcodes includes a complex microcode. Each butterfly operation corresponds to at least one microcode. In the present embodiment, the microcode list is preset and stored in the microcode memory 12 in accordance with the video protocol.
控制器13用於從微碼記憶體12內讀取微碼並控制蝴蝶運算電路14進行蝴蝶運算,以實現DCT/IDCT。The controller 13 is for reading the microcode from the microcode memory 12 and controlling the butterfly operation circuit 14 to perform a butterfly operation to implement DCT/IDCT.
在本實施方式中,蝴蝶運算電路14用於根據微碼進行蝴蝶運算,包括係數寄存模組140、選擇輸入模組142、運算模組144及結果寄存模組146。In the present embodiment, the butterfly operation circuit 14 is configured to perform a butterfly operation based on the microcode, and includes a coefficient registration module 140, a selection input module 142, a calculation module 144, and a result registration module 146.
係數寄存模組140用於存儲蝴蝶運算的係數。The coefficient registration module 140 is used to store the coefficients of the butterfly operation.
結果寄存模組146用於存儲蝴蝶運算的結果。The result registration module 146 is used to store the results of the butterfly operation.
選擇輸入模組142用於根據微碼選擇係數寄存模組140內存放的蝴蝶運算的係數或結果寄存模組146內存放的蝴蝶運算的結果以輸出。The selection input module 142 is configured to output the result of the butterfly operation stored in the coefficient registration module 140 or the result of the butterfly operation stored in the result registration module 146 according to the microcode selection.
運算模組144用於根據微碼將從選擇輸入模組142接收的資料進行蝴蝶運算,並按照微碼將運算結果存入結果寄存模組146。詳而言之,控制器13控制選擇輸入模組142在特定的時鐘週期輸入特定的資料到運算模組144,以供運算模組144進行運算。The operation module 144 is configured to perform a butterfly operation on the data received from the selection input module 142 according to the microcode, and store the operation result in the result registration module 146 according to the microcode. In detail, the controller 13 controls the selection input module 142 to input specific data to the operation module 144 for a specific clock cycle for the operation module 144 to perform operations.
請參閱圖3,所示為圖1所示的蝴蝶運算電路14一實施方式的具體電路圖。需說明的是,在本發明的附圖中,為了簡單清楚,若元件名稱一致,則僅標識其中之一,用於指代所有名稱相同的元件。且,因運算單元結構一致,故省略了部分運算單元的詳細電路。Referring to FIG. 3, a specific circuit diagram of an embodiment of the butterfly operation circuit 14 shown in FIG. 1 is shown. It should be noted that, in the drawings of the present invention, for the sake of simplicity and clarity, if the component names are identical, only one of them is identified, and is used to refer to all components having the same name. Moreover, since the arithmetic unit has the same structure, the detailed circuit of the partial arithmetic unit is omitted.
在本實施方式中,選擇輸入模組142包括複數多工器1420與複數D觸發器1422。In the present embodiment, the selection input module 142 includes a complex multiplexer 1420 and a complex D flip-flop 1422.
多工器1420用於選擇輸出係數寄存模組140內存放的蝴蝶運算的係數或結果寄存模組146內存放的蝴蝶運算的結果。在本實施方式中,多工器1420還用於在運算模組144進行新一輪運算時選擇輸出運算模組144的上一輪運算的運算結果。The multiplexer 1420 is configured to select a coefficient of a butterfly operation stored in the output coefficient registration module 140 or a result of a butterfly operation stored in the result registration module 146. In the present embodiment, the multiplexer 1420 is further configured to select the operation result of the previous round of the output operation module 144 when the calculation module 144 performs a new round of calculation.
D觸發器1422用於控制多工器1420輸出資料的時鐘週期,以同步其輸入到運算模組144的資料。在本實施方式中,控制器13先將特定的係數藉由多工器1420送到D觸發器1422,再在特定的時間週期藉由D觸發器1422送到運算模組144。The D flip-flop 1422 is used to control the clock cycle of the output data of the multiplexer 1420 to synchronize the data input to the computing module 144. In the present embodiment, the controller 13 first sends a specific coefficient to the D flip-flop 1422 via the multiplexer 1420, and then sends it to the computing module 144 through the D flip-flop 1422 for a specific time period.
運算電路144包括至少二個運算單元145。在本實施方式中,每一運算單元145包括移位器1450與加減法器1452。The arithmetic circuit 144 includes at least two arithmetic units 145. In the present embodiment, each arithmetic unit 145 includes a shifter 1450 and an adder-subtractor 1452.
移位器1450與選擇輸入模組142相連,用於將選擇輸入模組142輸入的資料進行移位運算。在本實施方式中,若移位器1450將所輸入的資料向左移n位,表示將所輸入的資料乘以2n ,若向右移n位,表示將所輸入的資料除以2n 。如,向左移1位,表示將移位器1450將所輸入的資料乘以2,若向左移2位,表示將所輸入的資料乘以4,若向左移3位,表示將所輸入的資料乘以8。The shifter 1450 is connected to the selection input module 142 for performing a shift operation on the data input by the selection input module 142. In the present embodiment, if the shifter 1450 shifts the input data to the left by n bits, it means that the input data is multiplied by 2 n , and if the data is shifted to the right by n bits, the input data is divided by 2 n . . For example, shifting 1 bit to the left means that the shifter 1450 multiplies the input data by 2, and if it shifts 2 bits to the left, it means multiplying the input data by 4, and if it shifts 3 bits to the left, it means Multiply the entered data by 8.
加減運算器1452的第一輸入端與移位器1450的輸出端相連,第二輸入端與選擇輸入模組142相連,用於對移位器1450的輸出端輸出的資料、選擇輸入模組142輸入的資料進行加減運算,並將結果存入結果寄存模組146。The first input end of the addition and subtraction unit 1452 is connected to the output end of the shifter 1450, and the second input end is connected to the selection input module 142 for outputting data to the output end of the shifter 1450, and selecting the input module 142. The input data is subjected to addition and subtraction, and the result is stored in the result registration module 146.
請參閱圖4,所示為本發明一實施方式中根據蝴蝶運算編制微碼及根據微碼進行蝴蝶運算的示意圖。在本實施方式中,每一微碼包括運算指令、結果存放地、複數資料來源及移位值。Please refer to FIG. 4, which is a schematic diagram of preparing a microcode according to a butterfly operation and performing a butterfly operation according to a microcode according to an embodiment of the present invention. In the present embodiment, each microcode includes an operation instruction, a result storage location, a complex data source, and a shift value.
結果存放地用於指示加減運算器1452輸出的運算結果在結果寄存模組146中的存放位置。在本實施方式中,結果存放地預先設定好,加減運算器1452根據結果存放地的指示將結果存放到預先設定好的區域。舉例而言,若需要將運算結果存入第一寄存器,則可設定結果存放地為r1,若需要將運算結果存入第二寄存器,則可設定結果存放地為r2。The result storage location is used to indicate the storage location of the operation result output by the add/drop operator 1452 in the result registration module 146. In the present embodiment, the result storage location is set in advance, and the addition/subtraction unit 1452 stores the result in a predetermined area based on the instruction of the result storage location. For example, if the operation result needs to be stored in the first register, the result storage location can be set to r1. If the operation result needs to be stored in the second register, the result storage location can be set to r2.
資料來源用於指示運算模組144所進行運算的資料的來源。在本實施方式中,資料來源指示移位器1450與加減運算器1452從選擇輸入模組142獲取的資料的來源。其中,選擇輸入模組142輸入到運算模組144的資料包括兩類,一類為需要進行運算的DCT/IDCT的運算係數,另一類為結果寄存模組146中所存放的運算結果。舉例而言,若需要獲取DCT/IDCT的運算係數,可以將資料來源設定為相應D觸發器1422的輸出端,再控制相應D觸發器1422在特定的時鐘週期輸出所需的係數,若需要獲取運算結果,可以將資料來源設定為運算結果之結果存放地,如第一寄存器。The data source is used to indicate the source of the data computed by the computing module 144. In the present embodiment, the data source indicates the source of the data obtained by the shifter 1450 and the add/drop operator 1452 from the selection input module 142. The data input to the operation module 144 by the input module 142 includes two types, one is the operation coefficient of the DCT/IDCT that needs to be operated, and the other is the operation result stored in the result registration module 146. For example, if it is necessary to obtain the operation coefficient of the DCT/IDCT, the data source may be set as the output end of the corresponding D flip-flop 1422, and then the corresponding D flip-flop 1422 is controlled to output the required coefficient in a specific clock cycle, if necessary, As a result of the operation, the data source can be set to the result of the operation result, such as the first register.
移位值用於指示運算模組144中的移位器1450對選擇輸入模組142輸入的資料進行移位的位數。The shift value is used to indicate the number of bits in which the shifter 1450 in the arithmetic module 144 shifts the data input by the select input module 142.
運算指令用於指示運算模組144進行運算的資料及對進行運算的資料進行的運算規則,包括移位方向、複數運算資料及複數運算規則。運算電路144根據移位值與移位方向將複數運算資料進行移位運算得到移位結果,根據複數運算規則將移位結果與複數運算資料進行運算。舉例說明,運算指令可包括移位方向、第一運算資料、第二運算資料、第三運算資料、第四運算資料、第一運算規則、第二運算規則,此時,運算電路144可將第一運算資料先按照移位值與移位方向進行移位運算,再將移位運算的結果與第二運算資料按照第一運算規則進行運算,並將第三運算資料先按照移位值與移位方向進行移位運算,再將移位運算的結果與第四運算資料按照第二運算規則進行運算。The operation instruction is used to instruct the operation module 144 to perform calculations on the data and the calculation rules on the data to be operated, including the shift direction, the complex operation data, and the complex operation rule. The arithmetic circuit 144 shifts the complex arithmetic data according to the shift value and the shift direction to obtain a shift result, and calculates the shift result and the complex arithmetic data according to the complex arithmetic rule. For example, the operation instruction may include a shift direction, a first operation data, a second operation data, a third operation data, a fourth operation data, a first operation rule, and a second operation rule. An operation data is first subjected to a shift operation according to the shift value and the shift direction, and then the result of the shift operation and the second operation data are operated according to the first operation rule, and the third operation data is first shifted according to the shift value. The bit direction is shifted, and the result of the shift operation and the fourth operation data are calculated according to the second operation rule.
運算規則包括運算單元145可實現的運算,如加、減、求相反數後再加。The operation rule includes an operation that can be implemented by the operation unit 145, such as adding, subtracting, and finding the opposite number.
運算資料包括資料來源所指示的資料。在運算模組144進行新一輪運算時,運算資料還包括運算模組144的上一輪運算的運算結果。The operational data includes the information indicated by the source of the data. When the computing module 144 performs a new round of operations, the operational data further includes the operation result of the previous round of the operation of the computing module 144.
在本實施方式中,一蝴蝶運算對應至少一微碼,若運算模組144包括兩個運算單元145,運行每一微碼僅需一時鐘週期,則如果需要增加運算速度,則僅需要增加運算單元的數量。若如圖3所示,運算模組144包括四個運算單元,則運行每兩個微碼才需要一時鐘週期,明顯增加了運算速度。且,本發明運算單元145採用簡單的移位器1450來代替複雜的乘法器來實現乘法運算,明顯減少了電路的複雜度,大大降低了成本。In this embodiment, a butterfly operation corresponds to at least one microcode. If the operation module 144 includes two operation units 145, it takes only one clock cycle to run each microcode. If the operation speed needs to be increased, only the operation needs to be increased. The number of units. As shown in FIG. 3, the operation module 144 includes four arithmetic units, and it takes one clock cycle to run every two microcodes, which significantly increases the operation speed. Moreover, the arithmetic unit 145 of the present invention uses a simple shifter 1450 instead of a complicated multiplier to implement the multiplication operation, which significantly reduces the complexity of the circuit and greatly reduces the cost.
舉例而言,參閱圖4所示的第一個蝴蝶運算BF1,其需要計算的公式可表示為R2=5C0+16C4與R3=16C0-5C4,其中,C0與C4為DCT/IDCT的運算係數,5與16為DCT/IDCT的運算常量,R2與R3為該第一個蝴蝶運算BF1的運算結果,則其微碼可編寫為:“Asl_rA_add_rA_rB_add_rB, rD0, RH, RL, 2”與“Asl_rA_add_DA_rB_sub_DB, rD1, RL, RH, 4”,其中,“Asl_rA_add_rA_rB_add_rB”與“Asl_rA_add_DA_rB_sub_ DB”為運算指令,rD0與rD1為結果存放地,RH與RL為資料來源,2與4為移位值。在運算指令中,Asl表示移位器1450將接收的資料向左移,即對接收的資料進行乘法運算。rA、rB表示從資料來源獲取的資料,DA與DB表示上一個時鐘週期的運算結果。rA_add_rA表示先將rA向左移2位,再將移位結果與rA相加,rB_add_rB表示先將rB向左移2位,再將移位結果與rB相加,此時得出的運算結果為R0=(C0<<2)+C0=5C0、R1=(C4<<2)+C4=5C4。rA_add_DA表示先將rA向左移4位,再將移位結果與DA相加,rB_sub_DB表示先將rB向左移4位,再將移位結果與DB相減,其中DA與DB為上一時鐘週期的運算結果,此處為R0與R1,此時得出的運算結果為R2=(C4<<4)+5C0=16C4+5C0、R3=(C0<<4)-5C4=16C0-5C4。For example, referring to the first butterfly operation BF1 shown in FIG. 4, the formula to be calculated can be expressed as R2=5C0+16C4 and R3=16C0-5C4, where C0 and C4 are the operation coefficients of DCT/IDCT, 5 and 16 are DCT/IDCT operation constants, R2 and R3 are the operation results of the first butterfly operation BF1, then the microcode can be written as: "Asl_rA_add_rA_rB_add_rB, rD0, RH, RL, 2" and "Asl_rA_add_DA_rB_sub_DB, rD1 , RL, RH, 4", wherein "Asl_rA_add_rA_rB_add_rB" and "Asl_rA_add_DA_rB_sub_DB" are operation instructions, rD0 and rD1 are the result storage places, RH and RL are data sources, and 2 and 4 are shift values. In the arithmetic instruction, Asl indicates that the shifter 1450 shifts the received data to the left, that is, multiplies the received data. rA and rB represent data obtained from data sources, and DA and DB represent the results of the previous clock cycle. rA_add_rA means that rA is shifted to the left by 2 bits, and the shift result is added to rA. rB_add_rB means that rB is shifted to the left by 2 bits, and the shift result is added to rB. The result of the operation is R0=(C0<<2)+C0=5C0, R1=(C4<<2)+C4=5C4. rA_add_DA means that rA is shifted to the left by 4 bits, and then the shift result is added to DA. rB_sub_DB means that rB is shifted to the left by 4 bits, and then the shift result is subtracted from DB, where DA and DB are the previous clock. The result of the cycle operation, here R0 and R1, results in an operation of R2=(C4<<4)+5C0=16C4+5C0, R3=(C0<<4)-5C4=16C0-5C4.
圖4所示的第二個蝴蝶運算BF2的編制原理與第一個蝴蝶運算BF1差不多,其中的rB_isub_DB表示先將rB向左移位後求相反數,再將結果與DB相加,此時得出的運算結果為R5=-(C4<<4)+R1=-16C4+4C0。The second butterfly operation BF2 shown in Figure 4 is similar to the first butterfly operation BF1, where rB_isub_DB means that rB is shifted to the left and then the opposite is obtained, and then the result is added to the DB. The result of the operation is R5=-(C4<<4)+R1=-16C4+4C0.
請參閱圖5,所示為圖1所示的控制器13控制蝴蝶運算電路14進行蝴蝶運算的流程圖。在本實施方式中,該蝴蝶運算藉由圖3所示的具體電路完成。其中,將第一運算單元中的移位器與加減法器定義為第一移位器與第一加減法器,將第二運算單元中的移位器與加減法器定義為第二移位器與第二加減法器,在此以運算模組144包括這兩個運算單元來舉例說明蝴蝶運算電路14如何進行蝴蝶運算。Referring to FIG. 5, a flow chart of the controller 13 shown in FIG. 1 controlling the butterfly operation circuit 14 to perform a butterfly operation is shown. In the present embodiment, the butterfly operation is completed by the specific circuit shown in FIG. Wherein the shifter and the adder and subtracter in the first arithmetic unit are defined as a first shifter and a first adder-subtractor, and the shifter and adder-subtractor in the second arithmetic unit are defined as a second shift And the second adder-subtractor, where the arithmetic module 144 includes the two arithmetic units to illustrate how the butterfly operation circuit 14 performs the butterfly operation.
在步驟S500,控制器13根據所要進行的DCT/IDCT的類型從微碼記憶體12內找到對應的微碼組,按順序讀取該微碼組內的一微碼。在本實施方式中,所讀取的微碼包括運算指令、結果存放地、複數資料來源及移位值,運算指令包括移位方向、第一運算資料、第二運算資料、第三運算資料、第四運算資料、第一運算規則及第二運算規則。In step S500, the controller 13 finds the corresponding microcode group from the microcode memory 12 according to the type of DCT/IDCT to be performed, and sequentially reads a microcode in the microcode group. In the embodiment, the read microcode includes an operation instruction, a result storage place, a complex data source, and a shift value, and the operation instruction includes a shift direction, a first operation data, a second operation data, a third operation data, The fourth operational data, the first operational rule, and the second operational rule.
在步驟S502,控制器13根據所讀取的微碼的運算指令控制選擇輸入模組142將第一運算資料送入第一移位器。In step S502, the controller 13 controls the selection input module 142 to send the first operational data to the first shifter according to the read operation instruction of the microcode.
在步驟S504,控制器13控制第一移位器將第一運算資料按照移位值進行移位運算,並將運算結果送入第一加減法器的第一輸入端。In step S504, the controller 13 controls the first shifter to perform a shift operation on the first operational data according to the shift value, and sends the operation result to the first input end of the first adder-subtracter.
在步驟S506,控制器13根據微碼中的運算指令將第二運算資料送入第一加減法器的第二輸入端。In step S506, the controller 13 sends the second operational data to the second input of the first adder-subtracter according to the operation instruction in the microcode.
在步驟S508,控制器13根據第一運算規則控制第一加減法器對第一輸入端與第二輸入端的資料進行運算,並將運算結果存入結果存放地。In step S508, the controller 13 controls the first adder-subtractor to calculate the data of the first input end and the second input end according to the first operation rule, and stores the operation result in the result storage place.
在步驟S510,控制器13根據微碼的運算指令控制選擇輸入模組將第三運算資料送入第二移位器。In step S510, the controller 13 controls the selection input module to send the third operation data to the second shifter according to the operation instruction of the microcode.
在步驟S512,控制器13控制第二移位器將第三運算資料按照移位值進行移位運算,並將運算結果送入第二加減法器的第一輸入端。In step S512, the controller 13 controls the second shifter to perform a shift operation on the third operational data according to the shift value, and sends the operation result to the first input end of the second adder-subtracter.
在步驟S514,控制器13根據微碼的運算指令將第四運算資料送入第二加減法器的第二輸入端。In step S514, the controller 13 sends the fourth operational data to the second input of the second adder-subtracter according to the arithmetic instruction of the microcode.
在步驟S516,控制器13根據第二運算規則控制第二加減法器對第一輸入端與第二輸入端的資料進行運算,並將結果存入結果存放地。In step S516, the controller 13 controls the second adder-subtractor to calculate the data of the first input end and the second input end according to the second operation rule, and stores the result in the result storage place.
這樣,當控制器13控制蝴蝶運算電路14將該微碼組內的所有微碼都運算完,則該組微碼對應的DCT/IDCT運算也就完成了。Thus, when the controller 13 controls the butterfly operation circuit 14 to calculate all the microcodes in the microcode group, the DCT/IDCT operation corresponding to the group of microcodes is completed.
由於運算模組144至少包括兩個運算單元145,則上述兩組運算S502-S508與S510-S516可以看作同步進行,所以根據一個微碼所進行的運算可以看作在一個時鐘週期內完成。此時,若能增加一倍的運算單元145的數量,則可以將兩個微碼同步運算,進一步減少運算時間,提升運算效能。Since the operation module 144 includes at least two operation units 145, the two sets of operations S502-S508 and S510-S516 can be regarded as being synchronized, so that the operation performed according to one microcode can be regarded as being completed in one clock cycle. At this time, if the number of the operation units 145 can be doubled, the two microcodes can be synchronized, which further reduces the calculation time and improves the calculation efficiency.
在本實施方式中,按照運算所需的時鐘週期數量,根據DCT/IDCT的運算常量的取值不同可以將本發明實施方式實現的蝴蝶運算分為三個類型,具體請參閱圖6,所示為所述之蝴蝶運算的類型示意圖。其中,第一類為a與b分別或皆為1,此類蝴蝶運算對應一個微碼,僅需兩個運算單元145運算一個時鐘週期。第二類為a為2n 或1/2n ,b為2n 、1/2n 、2n ±1或1/2n ±1,此類蝴蝶運算對應二個微碼,需兩個運算單元145運算兩個時鐘週期。其他的為第三類,對應三個微碼,需兩個運算單元145運算三個時鐘週期。經過統計,若需要進行的IDCT運算對應WMV9視訊協定所規定的8*8的IDCT運算,那麼第一類蝴蝶運算的數量占總數的50%,第二類蝴蝶運算占總數的28%,第三類蝴蝶運算占總數的22%。若需要進行的IDCT運算對應H.264視訊協定所規定的8*8的IDCT運算,那麼第一類蝴蝶運算占總數的85%,第二類蝴蝶運算占總數的15%。由此可見,DCT/IDCT電路10能大大提高DCT/IDCT運算效率。In the present embodiment, according to the number of clock cycles required for the operation, the butterfly operations implemented by the embodiments of the present invention can be divided into three types according to the values of the operational constants of the DCT/IDCT. For details, refer to FIG. A schematic diagram of the type of butterfly operation described. The first type is a and b respectively or both, and such a butterfly operation corresponds to one microcode, and only two arithmetic units 145 are required to operate one clock cycle. The second category is 2 n or a 1/2 n, b is 2 n, 1/2 n, 2 n ± 1 or 1/2 n ± 1, these corresponding to two butterfly microcode required two operations Unit 145 operates for two clock cycles. The other is the third class, corresponding to three microcodes, requiring two arithmetic units 145 to operate for three clock cycles. After statistics, if the required IDCT operation corresponds to the 8*8 IDCT operation specified by the WMV9 video protocol, then the number of butterfly operations in the first category accounts for 50% of the total, and the butterfly operation in the second category accounts for 28% of the total. Butterfly-like operations account for 22% of the total. If the IDCT operation to be performed corresponds to the 8*8 IDCT operation specified by the H.264 video protocol, the first type of butterfly operation accounts for 85% of the total, and the second type of butterfly operation accounts for 15% of the total. It can be seen that the DCT/IDCT circuit 10 can greatly improve the computational efficiency of the DCT/IDCT.
本發明實施方式所提供的DCT/IDCT電路10採用將不同的視訊協定轉換為不同的微碼組,並根據微碼組進行蝴蝶運算的方式,能實現支援所有類型的視訊協定的所有DCT/IDCT,大大減少了DCT/IDCT所需的硬體電路的複雜度。且,由於本實施方式採用移位器來代替傳統電路中的乘法器來實現乘法運算,且採用複數運算單元並行運算,所以不僅可以顯著地降低電路成本,而且還能夠大大增加運算速度,提升運算效能。The DCT/IDCT circuit 10 provided by the embodiment of the present invention can realize all DCT/IDCT supporting all types of video protocols by converting different video protocols into different microcode groups and performing butterfly operations according to the microcode group. , greatly reducing the complexity of the hardware circuit required for DCT/IDCT. Moreover, since the shifting device is used in place of the multiplier in the conventional circuit to implement the multiplication operation, and the parallel operation is performed by the complex arithmetic unit, the circuit cost can be significantly reduced, and the operation speed can be greatly increased, and the operation can be greatly improved. efficacy.
綜上所述,本發明符合發明專利要件,爰依法提出專利申請。惟,以上所述者僅為本發明之較佳實施例,舉凡熟悉本案技藝之人士,在爰依本案發明精神所作之等效修飾或變化,皆應包含於以下之申請專利範圍內。In summary, the present invention complies with the requirements of the invention patent and submits a patent application according to law. The above description is only the preferred embodiment of the present invention, and equivalent modifications or variations made by those skilled in the art of the present invention should be included in the following claims.
10‧‧‧離散余弦轉換及其逆轉換電路10‧‧‧Discrete cosine transform and its inverse conversion circuit
12‧‧‧微碼記憶體12‧‧‧ microcode memory
13‧‧‧控制器13‧‧‧ Controller
14‧‧‧蝴蝶運算電路14‧‧‧Butter operation circuit
140‧‧‧係數寄存模組140‧‧‧ coefficient registration module
142‧‧‧選擇輸入模組142‧‧‧Select input module
1420‧‧‧多工器1420‧‧‧Multiplexer
1422‧‧‧D觸發器1422‧‧‧D trigger
144‧‧‧運算模組144‧‧‧ Computing Module
145‧‧‧運算單元145‧‧‧ arithmetic unit
1450‧‧‧移位器1450‧‧‧ shifter
1452‧‧‧加減法器1452‧‧‧Addition and Subtraction
146‧‧‧結果寄存模組146‧‧‧ Results Registration Module
圖1係本發明一實施方式的DCT/IDCT電路的模組圖。1 is a block diagram of a DCT/IDCT circuit according to an embodiment of the present invention.
圖2係WMV9協定的8*8的IDCT轉換為複數蝴蝶運算後的示意圖。Figure 2 is a schematic diagram of the 8*8 IDCT of the WMV9 protocol converted to a complex butterfly operation.
圖3係圖1所示的蝴蝶運算電路一實施方式的具體電路圖。3 is a detailed circuit diagram of an embodiment of the butterfly operation circuit shown in FIG. 1.
圖4係本發明一實施方式中根據蝴蝶運算編制微碼及根據微碼進行蝴蝶運算的示意圖。4 is a schematic diagram of a microcode based on a butterfly operation and a butterfly operation based on a microcode according to an embodiment of the present invention.
圖5係圖1所示的控制器控制蝴蝶運算電路進行蝴蝶運算的流程圖。FIG. 5 is a flow chart of the controller shown in FIG. 1 controlling the butterfly operation circuit to perform a butterfly operation.
圖6係本發明一實施方式中完成的蝴蝶運算的類型示意圖。6 is a schematic diagram showing the types of butterfly operations performed in an embodiment of the present invention.
10‧‧‧離散余弦轉換及其逆轉換電路 10‧‧‧Discrete cosine transform and its inverse conversion circuit
12‧‧‧微碼記憶體 12‧‧‧ microcode memory
13‧‧‧控制器 13‧‧‧ Controller
14‧‧‧蝴蝶運算電路 14‧‧‧Butter operation circuit
140‧‧‧係數寄存模組 140‧‧‧ coefficient registration module
142‧‧‧選擇輸入模組 142‧‧‧Select input module
144‧‧‧運算模組 144‧‧‧ Computing Module
146‧‧‧結果寄存模組 146‧‧‧ Results Registration Module
Claims (10)
微碼記憶體,用於存儲該等DCT/IDCT類型對應的微碼列表,其中,該微碼列表包括複數組微碼,每一DCT/IDCT類型對應一組微碼;
控制器,用於從該微碼記憶體內查找所要實現的DCT/IDCT對應的微碼組,並依序讀取其中的微碼;及
蝴蝶運算電路,用於根據該控制器所讀取的微碼進行蝴蝶運算,以實現該DCT/IDCT,其中,該蝴蝶運算電路包括:
係數寄存模組,用於存儲該蝴蝶運算的運算係數;
結果寄存模組,用於存儲該蝴蝶運算的運算結果;
選擇輸入模組,用於根據該控制器所讀取的微碼選擇該係數寄存模組內存放的該運算係數或該結果寄存模組內存放的該運算結果以輸出;及
運算模組,用於根據該控制器所讀取的微碼將該選擇輸入模組輸出的資料進行運算,並將運算結果存入該結果寄存模組。A discrete cosine transform and inverse transform (DCT/IDCT) circuit for implementing multiple types of DCT/IDCT, each DCT/IDCT including a complex butterfly operation, each butterfly operation including a complex operation coefficient, the DCT/IDCT circuit including :
a microcode memory, configured to store a microcode list corresponding to the DCT/IDCT type, wherein the microcode list includes a complex array microcode, and each DCT/IDCT type corresponds to a set of microcodes;
a controller, configured to search for a microcode group corresponding to the DCT/IDCT to be implemented from the microcode memory, and sequentially read the microcode therein; and a butterfly operation circuit for reading the micro according to the controller The code performs a butterfly operation to implement the DCT/IDCT, wherein the butterfly operation circuit includes:
a coefficient registration module for storing an operation coefficient of the butterfly operation;
a result registration module, configured to store an operation result of the butterfly operation;
Selecting an input module, configured to select, according to the microcode read by the controller, the operation coefficient stored in the coefficient registration module or the operation result stored in the result registration module to output; and the operation module, The data output by the selection input module is calculated according to the microcode read by the controller, and the operation result is stored in the result registration module.
運算指令,用於指示該運算模組進行運算的資料及對該等資料進行的運算規則;
結果存放地,用於指示該運算模組輸出的運算結果在該結果寄存模組中的存放位置;
複數資料來源,用於指示該選擇輸入模組選擇輸入該運算模組的資料;及
移位值,用於指示該運算模組對該選擇輸入模組所輸入的資料進行移位的位數。The DCT/IDCT circuit of claim 1, wherein each microcode comprises:
An operation instruction for instructing the operation module to perform calculation on the data and an operation rule on the data;
a storage location for indicating a storage location of the operation result output by the computing module in the result registration module;
The plurality of data sources are used to indicate that the selection input module selects the data input to the operation module; and the shift value is used to indicate the number of bits that the operation module shifts the data input by the selection input module.
移位方向,用於指示該運算模組對該選擇輸入模組所輸入的資料進行移位的方向;
複數運算資料,用於表示該運算模組進行運算的資料;及
複數運算規則,用於指示運算模組進行的運算。The DCT/IDCT circuit of claim 2, wherein the operation instruction comprises:
a shifting direction, configured to indicate a direction in which the computing module shifts the data input by the selection input module;
The complex operation data is used to indicate the data that the operation module performs the operation; and the complex operation rule is used to instruct the operation performed by the operation module.
移位器,與該選擇輸入模組相連,用於將選擇輸入模組輸入的資料進行移位運算;及
加減運算器,其第一輸入端與該移位器的輸出端相連,其第二輸入端與該選擇輸入模組相連,用於根據該控制器所選擇的微碼對從該移位器輸入的資料與該選擇輸入模組輸入的資料進行加減運算,以得出該運算結果。The DCT/IDCT circuit of claim 1, wherein the operation module comprises at least two operation units, each operation unit comprising:
a shifter connected to the selection input module for performing a shift operation on the data input by the selection input module; and an addition and subtraction operator, the first input end of which is connected to the output end of the shifter, and the second The input end is connected to the selection input module, and is configured to perform addition and subtraction on the data input from the shifter and the data input by the selection input module according to the microcode selected by the controller, to obtain the operation result.
複數多工器,用於選擇輸出該係數寄存模組內存放的該運算係數或該結果寄存模組內存放的該運算結果;及
複數D觸發器,用於控制該等多工器輸出資料的時鐘週期,以同步其輸入到該運算模組的資料。The DCT/IDCT circuit of claim 1, wherein the selection input module comprises:
a complex multiplexer for selectively outputting the operation coefficient stored in the coefficient registration module or the operation result stored in the result registration module; and a complex D flip-flop for controlling output data of the multiplexer Clock cycle to synchronize the data it inputs into the computing module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW098127040A TWI414186B (en) | 2009-08-12 | 2009-08-12 | Dct/idct circuit |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW098127040A TWI414186B (en) | 2009-08-12 | 2009-08-12 | Dct/idct circuit |
Publications (2)
Publication Number | Publication Date |
---|---|
TW201106703A TW201106703A (en) | 2011-02-16 |
TWI414186B true TWI414186B (en) | 2013-11-01 |
Family
ID=44814427
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW098127040A TWI414186B (en) | 2009-08-12 | 2009-08-12 | Dct/idct circuit |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI414186B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0572262A2 (en) * | 1992-05-28 | 1993-12-01 | C-Cube Microsystems, Inc. | Decoder for compressed video signals |
TW331611B (en) * | 1993-11-22 | 1998-05-11 | Winbond Electronics Corp | The loop & parallel processing method and apparatus for executing DCT and IDCT |
TW364269B (en) * | 1998-01-02 | 1999-07-11 | Winbond Electronic Corp | Discreet cosine transform/inverse discreet cosine transform circuit |
-
2009
- 2009-08-12 TW TW098127040A patent/TWI414186B/en not_active IP Right Cessation
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0572262A2 (en) * | 1992-05-28 | 1993-12-01 | C-Cube Microsystems, Inc. | Decoder for compressed video signals |
TW331611B (en) * | 1993-11-22 | 1998-05-11 | Winbond Electronics Corp | The loop & parallel processing method and apparatus for executing DCT and IDCT |
TW364269B (en) * | 1998-01-02 | 1999-07-11 | Winbond Electronic Corp | Discreet cosine transform/inverse discreet cosine transform circuit |
Also Published As
Publication number | Publication date |
---|---|
TW201106703A (en) | 2011-02-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP2004007337A5 (en) | ||
US9665540B2 (en) | Video decoder with a programmable inverse transform unit | |
JP5544240B2 (en) | Low power FIR filter in multi-MAC architecture | |
KR101098736B1 (en) | Device, system, and method for solving systems of linear equations using parallel processing | |
TW200414023A (en) | Method and system for performing a calculation operation and a device | |
TW200411540A (en) | Method and system for performing calculation operations and a device | |
JP2009512075A (en) | Efficient multiplication-free computation for signal and data processing | |
JP3852895B2 (en) | Method of performing two-dimensional discrete cosine transform capable of reducing multiplicative operation and its inverse transform | |
JP2008123479A (en) | Simd (single instruction multiple data) and memory array structure for the same | |
JP4698242B2 (en) | Parallel processing processor, control program and control method for controlling operation of parallel processing processor, and image processing apparatus equipped with parallel processing processor | |
Lo et al. | Improved SIMD architecture for high performance video processors | |
TWI414186B (en) | Dct/idct circuit | |
WO2013031083A1 (en) | Symmetric filter operation device and symmetric filter operation method | |
KR20180024805A (en) | Apparatus and Method of processing image | |
WO2013042249A1 (en) | Fast fourier transform circuit | |
Hinrichs et al. | A 1.3-GOPS parallel DSP for high-performance image-processing applications | |
WO2000031658A1 (en) | Processor and image processing device | |
KR101395143B1 (en) | Integer transform method for image processing and device threof | |
JP6687803B2 (en) | Systems and methods for piecewise linear approximation | |
KR101601864B1 (en) | Inverse transform method and apparatus for video codec | |
CN101989254B (en) | Discrete cosine and inverse discrete cosine transform circuit | |
CN104811738B (en) | The one-dimensional discrete cosine converting circuit of low overhead multi-standard 8 × 8 based on resource-sharing | |
CN101562744B (en) | Two-dimensional inverse transformation device | |
TWI235954B (en) | Method and system for performing a multiplication operation and a device | |
KR102664387B1 (en) | Apparatus and Method of processing image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MM4A | Annulment or lapse of patent due to non-payment of fees |