JP2006227666A

JP2006227666A - Matrix operation unit

Info

Publication number: JP2006227666A
Application number: JP2005037070A
Authority: JP
Inventors: Masahiro Koyama; 政洋小山
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2005-02-15
Filing date: 2005-02-15
Publication date: 2006-08-31

Abstract

<P>PROBLEM TO BE SOLVED: To provide a matrix operation unit in which the circuit scale of an orthogonal transformation circuit for compressing and extending image data in H.264 can be reduced. <P>SOLUTION: As for 4×4 integer approximate inverse discrete cosine transformation and 4×4 Hadamard transformation in H.264, products CB and CA of a data matrix C to be transformed and a constant matrix B and a constant matrix A are calculated, and the absolute values of the partial components of the matrix B and the matrix A are made different only by 1/2 and 1. Then, a data input circuit where the components of ±1/2 of the matrix B in the line direction data of the matrix C are multiplied is provided with a ×1/2 multiplier and a data selector, and the output of the data selector in the respective arithmetic modes of the product CB and the product CA is switched by the arithmetic result of the ×1/2 multiplier and the data before arithmetic operation according to a DCT_mode signal, and the product CB and the product CA can be obtained from an adder/subtractor group. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は行列演算装置に係り、特にＨ.２６４等による画像データの圧縮・伸長に用いられている直交変換回路に適用されて、整数近似離散コサイン変換又は整数近似逆離散コサイン変換とアダマール変換とを小規模な回路で実行させるための構成に関する。 The present invention relates to a matrix operation apparatus, and is applied to an orthogonal transformation circuit used for compression / decompression of image data according to H.264, etc., and preferably includes integer approximate discrete cosine transform or integer approximate inverse discrete cosine transform and Hadamard transform. The present invention relates to a configuration for executing the above in a small circuit.

ＭＰＥＧ（Moving Picture Experts Group）-１,２,４に続く国際標準規格の映像符号化方式としてＨ.２６４（「ＭＰＥＧ-４/ＡＶＣ」とも称されている）が注目されている。このＨ.２６４はＩＴＵ-Ｔ（International Telecommunication Union-Telecommunication Standardization Sector）とＩＳＯ（International Standardization Organization）/ＩＥＣ（International Electrotechnical Commission）がＪＶＴ（Joint Video Team）を結成して共同で標準化した次世代の高能率映像圧縮方式であり、データ圧縮率がＭＰＥＧ-２の３倍、ＭＰＥＧ-４の２倍となり、携帯電話機やデジタル・カメラや録画機等のさまざまな分野で効果が発揮されることが期待されている。 H.264 (also referred to as “MPEG-4 / AVC”) is attracting attention as an international standard video encoding system following MPEG (Moving Picture Experts Group) -1, 2, and 4. This H.264 is the next generation high standard that was jointly standardized by ITU-T (International Telecommunication Union-Telecommunication Standardization Sector) and ISO (International Standardization Organization) / IEC (International Electrotechnical Commission) forming JVT (Joint Video Team). It is an efficient video compression method, and the data compression rate is 3 times that of MPEG-2 and 2 times that of MPEG-4, and is expected to be effective in various fields such as mobile phones, digital cameras, and video recorders. ing.

Ｈ.２６４では画像圧縮エンジンとしてアダマール変換と整数近似離散コサイン変換が採用されている。そして、直交変換のための基本処理単位が４×４画素と小さくなっており、従来の離散コサイン変換では無限長係数を有限長に丸めて計算を行うことに起因してエンコーダとデコーダの間で誤差が発生するという問題があったが、変換係数を量子化係数と統合し、完全に整数演算で計算ができるように係数が整数近似されている。 In H.264, Hadamard transform and integer approximate discrete cosine transform are adopted as an image compression engine. The basic processing unit for orthogonal transform is as small as 4 × 4 pixels. In the conventional discrete cosine transform, calculation is performed by rounding an infinite length coefficient to a finite length. There is a problem that an error occurs, but the coefficients are integer approximated so that the transform coefficients are integrated with the quantized coefficients and can be calculated completely by integer arithmetic.

このため、被変換データをＣ_ji（但し、j=0,1,2,3、i=0,1,2,3）、変換後データをＴ_jiとすると、デコーダで用いられる整数近似逆離散コサイン変換は次の数式５で示される演算となり、加減算とビットシフト（1/2はleast significant bitを取り除く右シフト演算を表す）だけで計算を行うことができる。

尚、この数式５は計算順序によって変換データが異なるため、規格では変換の計算処理を被変換データＣ_jiを列方向に計算した後、行方向に計算するように定められている。 Therefore, if the converted data is C _ji (where j = 0,1,2,3, i = 0,1,2,3) and the converted data is T _ji , the integer approximate inverse discrete used by the decoder The cosine transform is an operation represented by the following Equation 5, and can be calculated only by addition / subtraction and bit shift (1/2 represents a right shift operation for removing the least significant bit).

Note that since the conversion data in Formula 5 differs depending on the calculation order, the standard specifies that the conversion calculation process is performed in the row direction after the conversion target data C _ji is calculated in the column direction.

また、エンコーダ側で用いられる整数近似離散コサイン変換は次の数式６で示される演算となり、デコーダの場合と同様に加減算とビットシフト（2は左シフト演算を表す）だけで計算を行うことができる。

The integer approximation discrete cosine transform used on the encoder side is an operation represented by the following Equation 6, and can be calculated by only addition and subtraction and bit shift (2 represents a left shift operation) as in the case of the decoder. .

Ｈ.２６４では、図１に示すように、マクロブロック（１６×１６画素）101を４×４画素ブロックである基本単位ブロック100に分割して直交変換を施す。ここに、輝度信号（Ｙ）のマクロブロック101は１６個の基本単位ブロック100で構成され、各色差信号（Ｃr,Ｃb）のブロック102,103はそれぞれ４個の基本単位ブロック100で構成されている。 In H.264, as shown in FIG. 1, a macroblock (16 × 16 pixels) 101 is divided into basic unit blocks 100 that are 4 × 4 pixel blocks and subjected to orthogonal transformation. Here, the macro block 101 of the luminance signal (Y) is composed of 16 basic unit blocks 100, and the blocks 102 and 103 of each color difference signal (Cr, Cb) are each composed of 4 basic unit blocks 100.

そして、特徴的な点として、図１に示すように、輝度信号（Ｙ）のマクロブロック101に関しては特定モードにおいて整数近似離散コサイン変換後の各基本処理単位のＤＣ成分のみを集めたブロック104を構成し、そのブロック104に対してアダマール変換を施す。このアダマール変換は、前記と同様に被変換データをＣ_ji、変換後データをＴ_jiとして、次の数式７に示す演算となる。

一方、色差信号（Ｃr,Ｃb）については常に階層直交変換が施される。そして、各色差信号（Ｃr,Ｃb）についても整数近似離散コサイン変換後の各基本処理単位のＤＣ成分のみを集めたブロック105,106を構成し、被変換データをＨ_ji、変換後データをＤ_jiとして、次の数式８によるアダマール変換を施す。

As a characteristic point, as shown in FIG. 1, with respect to the macroblock 101 of the luminance signal (Y), a block 104 in which only the DC components of each basic processing unit after integer approximation discrete cosine transform in a specific mode are collected. And Hadamard transform is applied to the block 104. This Hadamard transform is an operation shown in the following Expression 7, where C _{ji is} the data to be converted and T _ji is the post-conversion data, as described above.

On the other hand, hierarchical orthogonal transformation is always applied to the color difference signals (Cr, Cb). Also, for each color difference signal (Cr, Cb),

blocks

105 and 106 are collected that collect only the DC components of each basic processing unit after the integer approximate discrete cosine transform, and the converted data is H _ji and the converted data is D _ji. Then, Hadamard transformation is performed by the following formula 8.

尚、Ｈ.２６４の規格については次の非特許文献１に説明されており、また解説書として次の非特許文献２がある。
「Ｈ.２６４規格書」 Draft ITU-T Recommendation and Final International Standard of Joint Video Specification（ITU-T Rec. H.264 | ISO/IEC 14496-10 AVC）角野眞也、菊池義浩、鈴木輝彦共編「Ｈ.２６４/ＡＶＣ教科書」株式会社インプレスネットビジネスカンパニー２００４年１２月１日発行 The H.264 standard is described in the following non-patent document 1, and there is the following non-patent document 2 as an explanatory document.
"H.264 Standard" Draft ITU-T Recommendation and Final International Standard of Joint Video Specification (ITU-T Rec. H.264 | ISO / IEC 14496-10 AVC) "H.264 / AVC textbook" co-edited by Junya Tsuno, Yoshihiro Kikuchi, and Teruhiko Suzuki Issued Impress Net Business Company December 1, 2004

ところで、従来のＨ.２６４の画像データを扱うエンコーダやデコーダの直交変換回路では、整数近似離散コサイン変換（エンコーダの場合），整数近似逆離散コサイン変換（デコーダの場合）とアダマール変換とが別の回路で構成されており、またアダマール変換用の回路についても輝度信号（Ｙ）のＤＣ成分と各色差信号（Ｃr,Ｃb）のＤＣ成分とでそれぞれ別に構成されている。しかし、前記の数式５,６と数式７の係数の行列を比較すると、成分中の「1/2」,「2」と「1」とが相違しているだけであり、その点を除けば演算の形式が同一となる。また、数式７と数式８とでは、被変換データＣ₀₀,Ｃ₀₃,Ｃ₃₀,Ｃ₃₃とＨ₀₀,Ｈ₀₃,Ｈ₃₀,Ｈ₃₃に関してみれば、同一の係数による同一の演算になる。 By the way, in an orthogonal transform circuit of an encoder or decoder that handles conventional H.264 image data, integer approximate discrete cosine transform (in the case of an encoder), integer approximate inverse discrete cosine transform (in the case of a decoder), and Hadamard transform are different. The Hadamard transform circuit is also composed of a DC component of the luminance signal (Y) and a DC component of each color difference signal (Cr, Cb). However, comparing the matrixes of the coefficients in Equations 5 and 6 above, only “1/2”, “2” and “1” in the components are different. The calculation format is the same. In addition, in the formulas 7 and 8, regarding the converted data C ₀₀ , C ₀₃ , C ₃₀ , C ₃₃ and H ₀₀ , H ₀₃ , H ₃₀ , H ₃₃ , the same calculation is performed using the same coefficient.

そこで、本発明は、その点に着目して整数近似離散コサイン変換，整数近似逆離散コサイン変換とアダマール変換の演算を実行する回路を統合化することができる行列演算装置を提供し、もって直交変換回路の回路規模を縮小し、引いてはエンコーダやデコーダの消費電力も低減化することを目的として創作された。 In view of this, the present invention provides a matrix operation device capable of integrating a circuit that performs integer approximation discrete cosine transformation, integer approximation inverse discrete cosine transformation, and Hadamard transformation, and orthogonal transformation is performed. It was created with the aim of reducing the circuit scale of the circuit and, in turn, reducing the power consumption of the encoder and decoder.

第１の発明は、次の数式１の変数行列Ｃと次の数式２の定数行列Ｂ（但し、定数Ｋは１以外の有理数）との積ＣＢ、及び前記変数行列Ｃと次の数式３の定数行列Ａとの積ＣＡを演算する行列演算装置であって、

パラレルに入力される前記変数行列Ｃの行方向データ｛Ｃ_j0,Ｃ_j1,Ｃ_j2,Ｃ_j3｝の内のデータＣ_j1，Ｃ_j3に定数Ｋを乗算する２つの乗算器と、前記各乗算器に対応させて設けられ、前記積ＣＢの演算モードでは前記乗算器の出力側データを出力させ、前記積ＣＡの演算モードでは前記乗算器への入力側データを出力させる２つのデータ選択器と、入力される前記行方向データと前記各データ選択器の出力データとから(Ｃ_j0＋Ｃ_j2)，(Ｃ_j1＋αＣ_j3)，(αＣ_j1−Ｃ_j3)，(Ｃ_j0−Ｃ_j2)［但し、αは前記積ＣＢの演算モードではＫ、前記積ＣＡの演算モードでは１］を求める第１の加減算器群と、前記第１の加減算器群による演算結果を用いて(Ｃ_j0＋Ｃ_j2)＋(Ｃ_j1＋αＣ_j3)，(Ｃ_j0−Ｃ_j2)＋(αＣ_j1−Ｃ_j3)，(Ｃ_j0−Ｃ_j2)−(αＣ_j1−Ｃ_j3)，(Ｃ_j0＋Ｃ_j2)−(Ｃ_j1＋αＣ_j3)を求める第２の加減算器群とからなることを特徴とする行列演算装置に係る。 The first invention is a product CB of a variable matrix C of the following formula 1 and a constant matrix B of the following formula 2 (where the constant K is a rational number other than 1), and the variable matrix C and the following formula 3 A matrix computing device for computing a product CA with a constant matrix A,

Two multipliers for multiplying data C _j1 , C _j3 in the row direction data {C _j0 , C _j1 , C _j2 , C _j3 } of the variable matrix C inputted in parallel by a constant K, and the respective multiplications Two data selectors that are provided in correspondence with each other and that output data on the output side of the multiplier in the operation mode of the product CB and output data on the input side to the multiplier in the operation mode of the product CA; (C _j0 + C _j2 ), (C _j1 + αC _j3 ), (αC _j1 −C _j3 ), (C _j0 −C _j2 ) [however, from the row direction data inputted and the output data of each data selector , Α is (C _j0 + C _j2 ) using the first adder / subtractor group for _obtaining K in the operation mode of the product CB and 1 in the operation mode of the product CA and the first adder / subtractor group. _{_{+ (C j1 + αC j3)}} , (C j0 -C j2) + (αC j1 -C j3), (C j0 -C j2) - (αC j1 -C j3) _{_{(C j0 + C j2) -}} (C j1 + αC j3) according to the matrix calculation apparatus characterized by comprising a second adder-subtracter group seeking.

この発明によれば、各演算モードで２つのデータ選択器の出力を切り換えることにより、第２の加減算器群の演算結果として行列の積ＣＢと積ＣＡが求まる。即ち、２つの異なる行列積の演算を、特定の入力回路に乗算器とデータ選択器を設けるだけで、加減算器群を共用した単一回路で実行させることができる。この行列演算装置を１組用いることで、２×２アダマール変換に関しては１回で演算結果を得ることができ、４×４整数近似逆離散コサイン変換と４×４アダマール変換に関しては１次元の演算を実行できる。また、前記行列演算装置を４組用いれば、双方の演算に関して２次元の演算が可能になり、更に８組用いることにより１回で演算結果を得ることができる。 According to the present invention, by switching the outputs of the two data selectors in each operation mode, the matrix product CB and product CA are obtained as the operation results of the second adder / subtractor group. That is, two different matrix product operations can be executed by a single circuit sharing an adder / subtractor group simply by providing a multiplier and a data selector in a specific input circuit. By using one set of this matrix calculation device, the calculation result can be obtained once for the 2 × 2 Hadamard transform, and the one-dimensional calculation for the 4 × 4 integer approximate inverse discrete cosine transform and the 4 × 4 Hadamard transform. Can be executed. Further, if four sets of the matrix calculation devices are used, two-dimensional calculation can be performed for both calculations, and further eight sets can be used to obtain a calculation result at a time.

第２の発明は、前記変数行列Ｃと次の数式４の定数行列Ｄ（但し、定数Ｒは１以外の有理数）との積ＣＤ、及び前記変数行列Ｃと前記定数行列Ａとの積ＣＡを演算する行列演算装置であって、

パラレルに入力される前記変数行列Ｃの行方向データ｛Ｃ_j0,Ｃ_j1,Ｃ_j2,Ｃ_j3｝から(Ｃ_j0＋Ｃ_j3)，(Ｃ_j1＋Ｃ_j2)，(Ｃ_j1−Ｃ_j2)，(Ｃ_j0−Ｃ_j3)を求める入力側加減算器群と、前記入力側加減算器群の演算結果である(Ｃ_j1＋Ｃ_j2)と(Ｃ_j0−Ｃ_j3)にそれぞれ乗数Ｒを乗算する２つの乗算器と、前記各乗算器に対応させて設けられ、前記積ＣＤの演算モードでは前記乗算器の出力側データを出力させ、前記積ＣＡの演算モードでは前記乗算器への入力側データを出力させる２つのデータ選択器と、前記入力側加減算器群による演算結果と前記各データ選択器の出力データとから(Ｃ_j0＋Ｃ_j3)＋(Ｃ_j1＋Ｃ_j2)，β(Ｃ_j0−Ｃ_j3)＋(Ｃ_j1−Ｃ_j2)，(Ｃ_j0−Ｃ_j3)−(Ｃ_j1−Ｃ_j2)，(Ｃ_j0＋Ｃ_j3)−β(Ｃ_j1＋Ｃ_j2)［但し、βは前記積ＣＤの演算モードではＲ、前記積ＣＡの演算モードでは１］を求める出力側加減算器群とからなることを特徴とする行列演算装置に係る。 In a second aspect of the invention, a product CD of the variable matrix C and a constant matrix D of the following Equation 4 (where the constant R is a rational number other than 1) and a product CA of the variable matrix C and the constant matrix A are: A matrix computing device for computing,

From the row direction data {C _j0 , C _j1 , C _j2 , C _j3 } of the variable matrix C inputted in parallel, (C _j0 + C _j3 ), (C _j1 + C _j2 ), (C _j1 −C _j2 ), ( C _j0 −C _j3 ) and two multiplications for multiplying the multiplier R by (C _j1 + C _j2 ) and (C _j0 −C _j3 ), respectively, which are the calculation results of the input side adder / subtractor group. And output data of the multiplier in the calculation mode of the product CD, and output data of the input to the multiplier in the calculation mode of the product CA. (C _j0 + C _j3 ) + (C _j1 + C _j2 ), β (C _j0 −C _j3 ) + from the two data selectors, the calculation result of the input side adder / subtractor group and the output data of each data selector _{_{(C j1 -C j2), (}} C j0 -C j3) - (C j1 -C j2), (C j0 + C j3) -β (C j1 + C j2) [ where, beta is the calculation of the product CD The over de R, in operation mode of the product CA according to the matrix calculation apparatus characterized by comprising an output-side adder unit group to obtain the 1.

この発明においても、各演算モードで２つのデータ選択器の出力を切り換えることにより、出力側加減算器群の演算結果として行列の積ＣＢと積ＣＡが求まる。定数行列Ｄは全ての行方向データに定数Ｒ（１以外の有理数）を含んでいるため、第１の発明の構成を採用すると、乗算器とデータ選択器が４組必要になるが、この発明では、演算過程における定数βの乗算項の括り方を考慮することにより、２組の乗算器とデータ選択器で足りる回路構成を実現している。 Also in the present invention, by switching the outputs of the two data selectors in each operation mode, the matrix product CB and product CA are obtained as the operation results of the output side adder / subtractor group. Since the constant matrix D includes constant R (rational number other than 1) in all the row direction data, if the configuration of the first invention is adopted, four sets of multipliers and data selectors are required. Then, by considering how to tie multiplication terms of the constant β in the calculation process, a circuit configuration that is sufficient with two sets of multipliers and a data selector is realized.

本発明の行列演算装置は、以上の構成に基づいて、次のような効果を奏する。
第１の発明は、変数行列Ｃ（数式１）と定数行列Ｂ（数式２）の積ＣＢと、変数行列Ｃ（数式１）と定数行列Ａ（数式３）の積ＣＡとを同一回路で演算できるようにし、Ｈ.２６４のデコーダのように整数近似逆離散コサイン変換とアダマール変換を実行する直交変換回路の回路規模の縮小化と製造プロセスの簡素化を図ると共に、消費電力の低減化も実現する。
第２の発明は、変数行列Ｃ（数式１）と定数行列Ｄ（数式４）の積ＣＤと、変数行列Ｃ（数式１）と定数行列Ａ（数式３）の積ＣＡとを同一回路で演算できるようにし、Ｈ.２６４のエンコーダ側の直交変換回路に適用されて請求項１の発明と同様の効果を有する。特に、積ＣＤを求める場合に請求項１の行列演算装置を適用すると４組の乗算器とデータ選択器が必要になるが、２組の乗算器とデータ選択器で回路を構成できる。 The matrix operation device of the present invention has the following effects based on the above configuration.
The first invention calculates the product CB of the variable matrix C (Equation 1) and the constant matrix B (Equation 2) and the product CA of the variable matrix C (Equation 1) and the constant matrix A (Equation 3) in the same circuit. This reduces the circuit scale of the orthogonal transform circuit that performs integer approximate inverse discrete cosine transform and Hadamard transform as in the H.264 decoder, simplifies the manufacturing process, and reduces power consumption. To do.
The second invention calculates the product CD of the variable matrix C (Equation 1) and the constant matrix D (Equation 4) and the product CA of the variable matrix C (Equation 1) and the constant matrix A (Equation 3) in the same circuit. Thus, the present invention is applied to an orthogonal transformation circuit on the encoder side of H.264 and has the same effect as that of the invention of claim 1. In particular, when the matrix arithmetic apparatus according to claim 1 is applied to obtain the product CD, four sets of multipliers and data selectors are required, but a circuit can be configured with two sets of multipliers and data selectors.

以下、本発明の行列演算装置の実施形態及びその行列演算装置を適用した直交変換回路の実施例について図２から図６を用いて説明する。
［実施形態１］
この実施形態はＨ.２６４による画像データのデコードを行う場合の直交変換処理に適用される行列演算装置に係る。先ず、上記数式５の４×４整数近似逆離散コサイン変換と数式７の４×４アダマール変換とは、それぞれ次の数式９と数式１０に展開できる。

また、数式８の２×２アダマール変換は次の数式１１に展開される。

Hereinafter, an embodiment of a matrix operation device of the present invention and an example of an orthogonal transform circuit to which the matrix operation device is applied will be described with reference to FIGS.
[Embodiment 1]
This embodiment relates to a matrix operation apparatus applied to orthogonal transform processing when decoding image data according to H.264. First, the 4 × 4 integer approximate inverse discrete cosine transform of Equation 5 and the 4 × 4 Hadamard transform of Equation 7 can be developed into the following

Equations

9 and 10, respectively.

Further, the 2 × 2 Hadamard transform of Expression 8 is expanded into the following Expression 11.

そして、数式９及び数式１０から被変換係数行列の列方向演算で実行される計算式は、αを４×４整数近似逆離散コサイン変換の演算時には1/2に、４×４アダマール変換の演算時には１に設定される変数として定義すると、次の数式１２に示す４通りとなる。

また、被変換係数行列の行方向演算で実行される計算式は、同様にして次の数式１３に示す４通りとなる。

Then, the calculation formula executed in the column direction calculation of the transformed coefficient matrix from Equation 9 and Equation 10 is as follows: α is halved at the time of 4 × 4 integer approximate inverse discrete cosine transform, and 4 × 4 Hadamard transform is calculated. If it is defined as a variable that is sometimes set to 1, the following four formulas 12 are obtained.

Similarly, there are four calculation formulas executed in the row direction calculation of the transform coefficient matrix as shown in the following formula 13.

従って、数式１２及び数式１３の形に着目すると、図２に示す行列演算装置を用いることにより４×４整数近似逆離散コサイン変換と４×４アダマール変換に必要な演算を列単位又は行単位で一括して行うことが可能になる。この行列演算装置は、２個の加算器11,14と２個の減算器13,14とからなる第１の加減算器群11〜14と、同様に２個の加算器15,16と２個の減算器17,18とからなる第２の加減算器群15〜18と、２個の×1/2演算器19,20と、２個のデータ選択器21,22とで構成されている。そして、４つの入力IN0,IN1,IN2,IN3に対して、第１の加減算器群11〜14では、加算器11がIN0とIN1を加算し、減算器13がIN0からIN1を減算し、減算器14がIN2又はIN2に×1/2演算器19で1/2を乗算した1/2＊IN2のいずれかをデータ選択器21で選択した結果からIN3を減算し、加算器12がIN3又はIN3に×1/2演算器20で1/2を乗算した1/2＊IN3のいずれかをデータ選択器22で選択した結果とIN2を加算する。 Therefore, focusing on the form of Equation 12 and Equation 13, by using the matrix operation device shown in FIG. 2, the operations necessary for 4 × 4 integer approximate inverse discrete cosine transform and 4 × 4 Hadamard transform are performed in units of columns or rows. It is possible to perform all at once. This matrix arithmetic unit includes a first adder / subtractor group 11 to 14 including two adders 11 and 14 and two subtractors 13 and 14, and similarly two adders 15 and 16 and two adders 15 and 16. The second adder / subtractor group 15 to 18 including the subtracters 17 and 18, the two × 1/2 calculators 19 and 20, and the two data selectors 21 and 22 are included. In the first adder / subtractor group 11 to 14, the adder 11 adds IN0 and IN1 to the four inputs IN0, IN1, IN2, and IN3, and the subtractor 13 subtracts IN1 from IN0. The subtractor 14 subtracts IN3 from the result of selecting either 1/2 * IN2 obtained by multiplying 1/2 by IN2 or IN2 by the 1/2 calculator 19 by the data selector 21, and the adder 12 is IN3 or IN2 is added to the result of selecting either 1/2 * IN3 obtained by multiplying 1/2 by 1/2 by × 1/2 computing unit 20 by data selector 22.

また、それら第１の加減算器群11〜14の演算結果に対して、第２の加減算器群15〜18では、加算器15が加算器11と加算器12の演算結果を加算し、加算器16が減算器13と減算器14の演算結果を加算し、減算器17が減算器13の演算結果から減算器14の演算結果を減算し、減算器18が加算器11の演算結果から加算器12の演算結果を減算し、それら第２の加減算器群15〜18の演算結果を４つの出力OUT0,OUT1,OUT2,OUT3としている。但し、各データ選択器21,22はDTC_mode信号によって制御され、４×４アダマール変換の演算時にはDTC_mode信号が“0”に設定されてそれぞれIN2,IN3を選択出力し、４×４整数近似逆離散コサイン変換の演算時にはDTC_mode信号が“1”に設定されてそれぞれ×1/2演算器19,20の演算結果を選択出力するようになっている。 Further, in the second adder / subtractor groups 15-18, the adder 15 adds the operation results of the adder 11 and the adder 12 to the calculation results of the first adder / subtractor groups 11-14, and the adder 16 adds the operation results of the subtractor 13 and the subtractor 14, the subtracter 17 subtracts the operation result of the subtractor 14 from the operation result of the subtractor 13, and the subtractor 18 adds the operation result of the adder 11. Twelve calculation results are subtracted, and the calculation results of the second adder / subtractor groups 15 to 18 are set as four outputs OUT0, OUT1, OUT2, and OUT3. However, the data selectors 21 and 22 are controlled by the DTC_mode signal, and when calculating 4 × 4 Hadamard transform, the DTC_mode signal is set to “0” to select and output IN2 and IN3, respectively, and 4 × 4 integer approximate inverse discrete At the time of the cosine transform calculation, the DTC_mode signal is set to “1”, and the calculation results of the × 1/2 calculators 19 and 20 are selectively output.

前記の数式１２の演算（列方向演算）に際しては、IN0,IN1,IN2,IN3に対してそれぞれＣ_j0,Ｃ_j1,Ｃ_j2,Ｃ_j3を入力し、４×４アダマール変換ではDTC_mode信号を“0”とし、４×４整数近似逆離散コサイン変換ではDTC_mode信号を“1”として、OUT0,OUT1,OUT2,OUT3からそれぞれＸ_j0,Ｘ_j1,Ｘ_j2,Ｘ_j3を得る。また、前記の数式１３の演算（行方向演算）に際しては、IN0,IN1,IN2,IN3に対してそれぞれＸ_j0,Ｘ_j1,Ｘ_j2,Ｘ_j3を入力することにより最終演算結果が得られる。 In the calculation of Equation 12 (column direction calculation), C _j0 , C _j1 , C _j2 , and C _j3 are input to IN0, IN1, IN2, and IN3, respectively. In the 4 × 4 Hadamard transform, the DTC_mode signal is “ In the 4 × 4 integer approximate inverse discrete cosine transform, the DTC_mode signal is set to “1”, and X _j0 , X _j1 , X _j2 , and X _j3 are obtained from OUT0, OUT1, OUT2, and OUT3, respectively. Further, in the calculation (row direction calculation) of Equation 13, the final calculation result is obtained by inputting X _j0 , X _j1 , X _j2 , and X _j3 to IN0, IN1, IN2, and IN3, respectively.

尚、２×２アダマール変換の演算に関しては、上記の数式１１より計算式は次の数式１４に示す４通りとなる。

従って、図２の行列演算装置において、IN0,IN1,IN2,IN3に対してそれぞれＨ₀₀,Ｈ₁₀,Ｈ₀₁,Ｈ₁₁を入力し、DTC_mode信号を“0”に設定することで最終演算結果が得られる。 In addition, regarding the calculation of 2 × 2 Hadamard transform, there are four calculation formulas shown in the following formula 14 from the above formula 11.

Therefore, in the matrix operation device of FIG. 2, the final operation result is obtained by inputting H ₀₀ , H ₁₀ , H ₀₁ , and H ₁₁ to IN0, IN1, IN2, and IN3, respectively, and setting the DTC_mode signal to “0”. Is obtained.

［実施形態２］
この実施形態はＨ.２６４による画像データのエンコードを行う場合の直交変換処理に適用される行列演算装置に係る。先ず、Ｈ.２６４の上記数式６の４×４整数近似離散コサイン変換は次の数式１５に展開できる。

また、数式７の４×４アダマール変換と数式８の２×２アダマール変換がそれぞれ数式１０と数式１１に展開できることは、実施形態１の場合と同様である。 [Embodiment 2]
This embodiment relates to a matrix operation apparatus applied to orthogonal transform processing when encoding image data according to H.264. First, the 4 × 4 integer approximate discrete cosine transform of the above equation 6 of H.264 can be expanded into the following equation 15.

Further, the 4 × 4 Hadamard transform of Formula 7 and the 2 × 2 Hadamard transform of Formula 8 can be expanded into Formula 10 and Formula 11, respectively, as in the case of the first embodiment.

そして、数式１５及び数式１０から被変換係数行列の列方向演算で実行される計算式は、βを４×４整数近似逆離散コサイン変換の演算時には２に、４×４アダマール変換の演算時には１に設定される変数として定義すると、次の数式１６に示す４通りとなる。

また、被変換係数行列の行方向演算で実行される計算式は、同様にして次の数式１７に示す４通りとなる。

Then, the calculation formula executed by the column direction calculation of the transformed coefficient matrix from Formula 15 and Formula 10 is β when calculating 4 × 4 integer approximate inverse discrete cosine transform, and 1 when calculating 4 × 4 Hadamard transform. If the variables are defined as the variables set in (4), the following four formulas 16 are obtained.

Further, the calculation formulas executed in the row direction calculation of the transformed coefficient matrix are similarly four formulas shown in the following formula 17.

従って、数式１６及び数式１７の形に着目すると、図３に示す行列演算装置を用いることにより４×４整数近似逆離散コサイン変換と４×４アダマール変換に必要な演算を列単位又は行単位で一括して行うことが可能になる。この行列演算装置は、２個の加算器51,52と２個の減算器53,54とからなる入力側の加減算器群51〜54と、同様に２個の加算器55,58と２個の減算器56,57とからなる出力側の加減算器群55〜58と、２個の×２演算器59,60と、２個のデータ選択器61,62とで構成されている。そして、４つの入力IN0,IN1,IN2,IN3に対して、入力側の加減算器群51〜54では、加算器51がIN0とIN3を加算し、加算器52がIN1とIN2を加算し、減算器53がIN1からIN2を減算し、減算器54がIN0からIN3を減算する。また、出力側の加減算器群55〜58では、加算器55が加算器51と加算器52の演算結果を加算し、減算器56が加算器51の演算結果から減算器52の演算結果又はその演算結果に×２演算器59で２を乗算した演算結果のいずれかをデータ選択器61で選択した結果を減算し、減算器57が減算器54の演算結果から減算器53の演算結果を減算し、加算器58が減算器54の演算結果又はその演算結果に×２演算器60で２を乗算した演算結果のいずれかをデータ選択器62で選択した結果と減算器53の演算結果を加算し、それら出力側の加減算器群55〜58の演算結果を４つの出力OUT0,OUT1,OUT2,OUT3としている。尚、各データ選択器61,62はDTC_mode信号によって制御され、４×４アダマール変換の演算時にはDTC_mode信号が“0”に設定されてそれぞれ加算器52と減算器54の演算結果を選択出力し、４×４整数近似逆離散コサイン変換の演算時にはDTC_mode信号が“1”に設定されてそれぞれ×２演算器59,60の演算結果を選択出力するようになっている。 Therefore, focusing on the form of Equation 16 and Equation 17, by using the matrix operation device shown in FIG. 3, the operations required for 4 × 4 integer approximate inverse discrete cosine transform and 4 × 4 Hadamard transform are performed in units of columns or rows. It is possible to perform all at once. This matrix arithmetic unit includes an adder / subtractor group 51 to 54 on the input side composed of two adders 51 and 52 and two subtractors 53 and 54, and two adders 55 and 58 in the same manner. The adder / subtractor group 55 to 58 on the output side composed of the subtracters 56 and 57, the two × 2 arithmetic units 59 and 60, and the two data selectors 61 and 62 are configured. In the adder / subtractor group 51 to 54 on the input side for the four inputs IN0, IN1, IN2, and IN3, the adder 51 adds IN0 and IN3, and the adder 52 adds IN1 and IN2, and subtracts them. The subtractor 53 subtracts IN2 from IN1, and the subtractor 54 subtracts IN3 from IN0. In addition, in the adder / subtractor groups 55 to 58 on the output side, the adder 55 adds the calculation results of the adder 51 and the adder 52, and the subtractor 56 calculates the calculation result of the subtractor 52 from the calculation result of the adder 51 or its The result selected by the data selector 61 is subtracted from the operation result obtained by multiplying the operation result by 2 by the × 2 calculator 59, and the subtractor 57 subtracts the operation result of the subtractor 53 from the operation result of the subtractor 54. The adder 58 adds the calculation result of the subtractor 53 with the result of selecting the calculation result of the subtractor 54 or the calculation result obtained by multiplying the calculation result by 2 by the × 2 calculator 60 by the data selector 62. The calculation results of the adder / subtractor groups 55 to 58 on the output side are the four outputs OUT0, OUT1, OUT2, and OUT3. The data selectors 61 and 62 are controlled by the DTC_mode signal, and the DTC_mode signal is set to “0” at the time of calculation of 4 × 4 Hadamard transform, and the calculation results of the adder 52 and the subtractor 54 are selected and output, respectively. At the time of computation of 4 × 4 integer approximate inverse discrete cosine transform, the DTC_mode signal is set to “1”, and the computation results of the × 2 computing units 59 and 60 are selectively output.

この実施例は前記の実施形態１又は実施形態２の行列演算装置を１組用いた場合におけるＨ.２６４画像データ処理用の直交変換回路に係り、図４にそのシステム回路を示す。同図において、70が行列演算装置に相当し、デコーダの場合には図２の回路が、エンコーダの場合には図３の回路が１組用いられる。71は４個のデータ選択器からなる入力切換部であり、Path_mode信号を“0”又は“1”に設定することにより、それぞれの選択器が行列演算装置70のIN0,IN1,IN2,IN3に対する入力を被変換係数行列データであるＣ_ji系列と前記Ｃ_ji系列について列方向演算を行って転置させた後のＸ_ij系列とに切り換えるようになっている。また、72は転置メモリであり、被変換係数行列データのＣ_ji系列から行列演算装置70が列方向演算によって求めたＸ_ji系列をセーブし、そのセーブしたＸ_ji系列から転置したＸ_ij系列を読み出して入力切換部71へ出力できる入出力構造になっている。そして、この直交変換回路の構成は、入力行列をＣ、係数行列をＺとし、行列の転置行列生成をt()の表現とした場合に、出力行列ＵがＵ＝t(Ｚ)・{Ｃ・Ｚ}で表され、その式はＵ＝t(t({Ｃ・Ｚ})・Ｚ)と変形できるために、入力行列Ｃと係数行列Ｚの乗算として纏めることができることに基づいている。 This example relates to an orthogonal transformation circuit for processing H.264 image data in the case where one set of the matrix arithmetic apparatus of the first embodiment or the second embodiment is used, and FIG. 4 shows a system circuit thereof. In the figure, reference numeral 70 corresponds to a matrix arithmetic unit, and in the case of a decoder, the circuit of FIG. 2 is used, and in the case of an encoder, one set of the circuit of FIG. 3 is used. Reference numeral 71 denotes an input switching unit composed of four data selectors. By setting the Path_mode signal to “0” or “1”, each selector corresponds to IN0, IN1, IN2, and IN3 of the matrix arithmetic unit 70. The input is switched between a C _ji sequence which is converted coefficient matrix data and an X _ij sequence obtained by transposing the C _ji sequence by performing column direction calculation. Reference numeral 72 denotes a transposition memory, which saves the X _ji sequence obtained by the matrix operation unit 70 by the column direction calculation from the C _ji sequence of the transformed coefficient matrix data, and the transposed X _ij sequence from the saved X _ji sequence. The input / output structure can read and output to the input switching unit 71. The configuration of this orthogonal transformation circuit is such that the output matrix U is U = t (Z) · {C when the input matrix is C, the coefficient matrix is Z, and the transpose matrix generation of the matrix is expressed as t (). Z}, which is based on the fact that it can be transformed as U = t (t ({C · Z}) · Z), and can be summarized as a multiplication of the input matrix C and the coefficient matrix Z.

この実施例の直交変換回路は、以上の構成に基づいて４×４の行列演算をクロックに同期しながら次の手順で実行する。但し、ここではエンコーダの場合（行列演算装置70が図３の回路であり、整数近似離散コサイン変換とアダマール変換を実行する場合）を例にとって説明する。
(1) 整数近似離散コサイン変換とアダマール変換のいずれを演算するかをDTC_mode信号で設定し、Path_mode信号を“0”に設定する。
(2) 入力データ列｛Ｃ₀₀,Ｃ₀₁,Ｃ₀₂,Ｃ₀₃｝，｛Ｃ₁₀,Ｃ₁₁,Ｃ₁₂,Ｃ₁₃｝，｛Ｃ₂₀,Ｃ₂₁,Ｃ₂₂,Ｃ₂₃｝，｛Ｃ₃₀,Ｃ₃₁,Ｃ₃₂,Ｃ₃₃｝を行列演算装置70へ入力する。
(3) 入力されたデータ列について行列演算装置70が演算した後の１次出力データ｛Ｘ₀₀,Ｘ₀₁,Ｘ₀₂,Ｘ₀₃｝，｛Ｘ₁₀,Ｘ₁₁,Ｘ₁₂,Ｘ₁₃｝，｛Ｘ₂₀,Ｘ₂₁,Ｘ₂₂,Ｘ₂₃｝，｛Ｘ₃₀,Ｘ₃₁,Ｘ₃₂,Ｘ₃₃｝を一旦転置メモリ72にセーブする。
(4) Path_mode信号を“1”に切り換え設定して転置メモリ72から読み出した転置データ列｛Ｘ₀₀,Ｘ₁₀,Ｘ₂₀,Ｘ₃₀｝，｛Ｘ₀₁,Ｘ₁₁,Ｘ₂₁,Ｘ₃₁｝，｛Ｘ₀₂,Ｘ₁₂,Ｘ₂₂,Ｘ₃₂｝，｛Ｘ₀₃,Ｘ₁₃,Ｘ₂₃,Ｘ₃₃｝を行列演算装置70へ入力する。
(5) 入力されたデータ列について行列演算装置70が演算した後の２次出力データ｛Ｔ₀₀,Ｔ₁₀,Ｔ₂₀,Ｔ₃₀｝，｛Ｔ₀₁,Ｔ₁₁,Ｔ₂₁,Ｔ₃₁｝，｛Ｔ₀₂,Ｔ₁₂,Ｔ₂₂,Ｔ₃₂｝，｛Ｔ₀₃,Ｔ₁₃,Ｔ₂₃,Ｔ₃₃｝が直交変換の演算結果となる。
この実施例では演算時間が最短でも８サイクル必要になるが、回路規模を小さくできるという有利性がある。尚、２×２アダマール変換に関しては前記(3)での１次出力がそのまま直列変換となる。 The orthogonal transform circuit of this embodiment executes the 4 × 4 matrix operation in the following procedure while synchronizing with the clock based on the above configuration. However, here, a description will be given by taking as an example the case of an encoder (when the matrix calculation device 70 is the circuit of FIG. 3 and executes integer approximation discrete cosine transform and Hadamard transform).
(1) Set which of integer approximation discrete cosine transform and Hadamard transform is to be calculated using the DTC_mode signal, and set the Path_mode signal to “0”.
(2) Input data string {C ₀₀ , C ₀₁ , C ₀₂ , C ₀₃ }, {C ₁₀ , C ₁₁ , C ₁₂ , C ₁₃ }, {C ₂₀ , C ₂₁ , C ₂₂ , C ₂₃ }, {C ₃₀ , C ₃₁ , C ₃₂ , C ₃₃ } are input to the matrix arithmetic unit 70.
(3) Primary output data {X ₀₀ , X ₀₁ , X ₀₂ , X ₀₃ }, {X ₁₀ , X ₁₁ , X ₁₂ , X ₁₃ } after the matrix arithmetic unit 70 operates on the input data string, {X ₂₀ , X ₂₁ , X ₂₂ , X ₂₃ }, {X ₃₀ , X ₃₁ , X ₃₂ , X ₃₃ } are temporarily saved in the transposition memory 72.
(4) Transposed data string {X ₀₀ , X ₁₀ , X ₂₀ , X ₃₀ }, {X ₀₁ , X ₁₁ , X ₂₁ , X ₃₁ } read from the transposed memory 72 by switching the Path_mode signal to “1” , {X ₀₂ , X ₁₂ , X ₂₂ , X ₃₂ }, {X ₀₃ , X ₁₃ , X ₂₃ , X ₃₃ } are input to the matrix arithmetic unit 70.
(5) Secondary output data {T ₀₀ , T ₁₀ , T ₂₀ , T ₃₀ }, {T ₀₁ , T ₁₁ , T ₂₁ , T ₃₁ }, after the matrix arithmetic unit 70 operates on the input data string, {T ₀₂ , T ₁₂ , T ₂₂ , T ₃₂ }, {T ₀₃ , T ₁₃ , T ₂₃ , T ₃₃ } are the results of the orthogonal transformation.
In this embodiment, 8 cycles are required even at the shortest calculation time, but there is an advantage that the circuit scale can be reduced. As for the 2 × 2 Hadamard transform, the primary output in (3) is directly converted into a serial transform.

この実施例は前記の実施形態１又は実施形態２の行列演算装置を４組用いた場合におけるＨ.２６４画像データ処理用の直交変換回路に係り、図５にそのシステム回路を示す。同図において、70-1〜4がそれぞれ行列演算装置に相当し、それぞれについてデコーダの場合には図２の回路が、エンコーダの場合には図３の回路が１組用いられる。71-1〜4はそれぞれが４個のデータ選択器からなる入力切換部であり、Path_mode信号を“0”又は“1”に設定することにより、４個の選択器が行列演算装置701〜4のIN0,IN1,IN2,IN3に対する入力を被変換係数行列データであるＣ_ji系列と前記Ｃ_ji系列について列方向演算を行って転置させた後のＸ_ij系列とに切り換えるようになっている。また、73はフリップフロップ回路（以下、「ＦＦ回路」という）であり、被変換係数行列データのＣ_ji系列から行列演算装置70-1〜4が列方向演算によって求めたＸ_ji系列をセーブし、その出力端子が前記Ｘ_ji系列の行列に対して転置行列Ｘ_ijとなる条件で入力切換部71-1〜4の入力端子に接続されている。 This example relates to an orthogonal transformation circuit for processing H.264 image data in the case where four sets of the matrix arithmetic apparatuses of the first embodiment or the second embodiment are used, and FIG. 5 shows a system circuit thereof. In the figure, reference numerals 70-1 to 70-4 each correspond to a matrix arithmetic unit, and for each of the decoders, the circuit of FIG. 2 is used, and in the case of an encoder, one set of the circuit of FIG. 3 is used. Reference numerals 71-1 to 4 are input switching units each composed of four data selectors. By setting the Path_mode signal to “0” or “1”, the four selectors are changed to matrix arithmetic devices 701 to 4. The inputs to IN0, IN1, IN2, and IN3 are switched between a C _ji sequence that is transformed coefficient matrix data and an X _ij sequence after transposing the C _ji sequence by performing a column direction calculation. Reference numeral 73 denotes a flip-flop circuit (hereinafter referred to as “FF circuit”), which saves the X _ji series obtained by the matrix arithmetic units 70-1 to 70-4 by column direction calculation from the C _ji series of the transformed coefficient matrix data. The output terminals are connected to the input terminals of the input switching units 71-1 to 7-4 under the condition that the transposed matrix X _{ij is used} with respect to the matrix of the X _ji series.

この実施例の直交変換回路は、以上の構成に基づいて４×４の行列演算をクロックに同期しながら次の手順で実行する。但し、ここでも、実施例１と同様に、エンコーダの場合を例にとって説明する。
(1) 整数近似離散コサイン変換とアダマール変換のいずれを演算するかをDTC_mode信号で設定し、Path_mode信号を“0”に設定する。
(2) 入力データ列｛Ｃ₀₀,Ｃ₀₁,Ｃ₀₂,Ｃ₀₃,Ｃ₁₀,Ｃ₁₁,Ｃ₁₂,Ｃ₁₃,Ｃ₂₀,Ｃ₂₁,Ｃ₂₂,Ｃ₂₃,Ｃ₃₀,Ｃ₃₁,Ｃ₃₂,Ｃ₃₃｝を行列演算装置70-1〜4へ入力する。
(3) 入力されたデータ列について行列演算装置701〜4が演算した後の１次出力データ｛Ｘ₀₀,Ｘ₀₁,Ｘ₀₂,Ｘ₀₃,Ｘ₁₀,Ｘ₁₁,Ｘ₁₂,Ｘ₁₃,Ｘ₂₀,Ｘ₂₁,Ｘ₂₂,Ｘ₂₃,Ｘ₃₀,Ｘ₃₁,Ｘ₃₂,Ｘ₃₃｝を一旦ＦＦ回路73にセーブする。
(4) Path_mode信号を“1”に切り換え設定してＦＦ回路73の出力を行列演算装置701〜4へ入力する。［行列演算装置70-1〜4への入力条件は、前記１次出力データの行列に対する転置行列｛Ｘ₀₀,Ｘ₁₀,Ｘ₂₀,Ｘ₃₀,Ｘ₀₁,Ｘ₁₁,Ｘ₂₁,Ｘ₃₁,Ｘ₀₂,Ｘ₁₂,Ｘ₂₂,Ｘ₃₂,Ｘ₀₃,Ｘ₁₃,Ｘ₂₃,Ｘ₃₃｝となる。］
(5) 入力されたデータ列について行列演算装置701〜4が演算した後の２次出力データ｛Ｔ₀₀,Ｔ₁₀,Ｔ₂₀,Ｔ₃₀,Ｔ₀₁,Ｔ₁₁,Ｔ₂₁,Ｔ₃₁,Ｔ₀₂,Ｔ₁₂,Ｔ₂₂,Ｔ₃₂,Ｔ₀₃,Ｔ₁₃,Ｔ₂₃,Ｔ₃₃｝が直交変換の演算結果となる。
この実施例では演算時間を最短で２サイクルにできるという有利性がある。回路規模については、実施例１の場合より大きくなるが、整数近似離散コサイン変換とアダマール変換とを独立した回路で構成する場合と比較すれば遥かに小さい回路規模で足りる。尚、２×２アダマール変換に関しては、実施例１の場合と同様に前記(3)での１次出力がそのまま直列変換となる。 The orthogonal transform circuit of this embodiment executes the 4 × 4 matrix operation in the following procedure while synchronizing with the clock based on the above configuration. However, here, as in the first embodiment, the case of an encoder will be described as an example.
(1) Set which of integer approximation discrete cosine transform and Hadamard transform is to be calculated using the DTC_mode signal, and set the Path_mode signal to “0”.
(2) Input data string {C ₀₀ , C ₀₁ , C ₀₂ , C ₀₃ , C ₁₀ , C ₁₁ , C ₁₂ , C ₁₃ , C ₂₀ , C ₂₁ , C ₂₂ , C ₂₃ , C ₃₀ , C ₃₁ , C ₃₂ , C ₃₃ } is input to the matrix computing devices 70-1 to 70-4.
(3) Primary output data {X ₀₀ , X ₀₁ , X ₀₂ , X ₀₃ , X ₁₀ , X ₁₁ , X ₁₂ , X ₁₃ , X after the matrix arithmetic units 701 to 4 calculate the input data string ₂₀ , X ₂₁ , X ₂₂ , X ₂₃ , X ₃₀ , X ₃₁ , X ₃₂ , X ₃₃ } are temporarily saved in the FF circuit 73.
(4) The Path_mode signal is switched to “1” and the output of the FF circuit 73 is input to the matrix arithmetic devices 701 to 4. [Matrix input condition to the arithmetic unit 70-1～4, the transposed matrix to the primary output data of the matrix _{_{{X 00, X 10, X}} 20, X 30, X 01, X 11, X 21, X 31, _{_{_{X 02, X 12, X 22}}} , X 32, X 03, X 13, X 23, the X _33}. ]
(5) Secondary output data {T ₀₀ , T ₁₀ , T ₂₀ , T ₃₀ , T ₀₁ , T ₁₁ , T ₂₁ , T ₃₁ , T after the matrix arithmetic units 701 to 4 calculate the input data string ₀₂ , T ₁₂ , T ₂₂ , T ₃₂ , T ₀₃ , T ₁₃ , T ₂₃ , T ₃₃ } are the results of the orthogonal transformation.
This embodiment has the advantage that the calculation time can be reduced to two cycles. The circuit scale is larger than that of the first embodiment, but a much smaller circuit scale is sufficient as compared with the case where the integer approximate discrete cosine transform and Hadamard transform are configured by independent circuits. As for the 2 × 2 Hadamard transform, the primary output in (3) is directly converted into a serial transform as in the first embodiment.

この実施例は前記の実施形態１又は実施形態２の行列演算装置を８組用いた場合におけるＨ.２６４画像データ処理用の直交変換回路に係り、図６にそのシステム回路を示す。同図において、70-1〜4,70-5〜8がそれぞれ行列演算装置に相当し、それぞれについてデコーダの場合には図２の回路が、エンコーダの場合には図３の回路が１組用いられる。また、行列演算装置70-1〜4の出力端子と行列演算装置70-5〜8の入力端子とは、行列演算装置70-1〜4の出力データ｛Ｘ₀₀,Ｘ₀₁,Ｘ₀₂,Ｘ₀₃,Ｘ₁₀,Ｘ₁₁,Ｘ₁₂,Ｘ₁₃,Ｘ₂₀,Ｘ₂₁,Ｘ₂₂,Ｘ₂₃,Ｘ₃₀,Ｘ₃₁,Ｘ₃₂,Ｘ₃₃｝が行列演算装置70-5〜8に対して転置行列のデータ｛Ｘ₀₀,Ｘ₁₀,Ｘ₂₀,Ｘ₃₀,Ｘ₀₁,Ｘ₁₁,Ｘ₂₁,Ｘ₃₁,Ｘ₀₂,Ｘ₁₂,Ｘ₂₂,Ｘ₃₂,Ｘ₀₃,Ｘ₁₃,Ｘ₂₃,Ｘ₃₃｝となって入力される関係で接続されている。 This example relates to an orthogonal transformation circuit for H.264 image data processing in the case where eight sets of the matrix arithmetic apparatuses of the first embodiment or the second embodiment are used, and FIG. 6 shows a system circuit thereof. In the same figure, 70-1 to 4 and 70-5 to 8 correspond to matrix operation devices, respectively. In the case of a decoder, the circuit of FIG. 2 is used, and in the case of an encoder, one set of the circuit of FIG. 3 is used. It is done. Further, the output terminals of the matrix operation devices 70-1 to 70-4 and the input terminals of the matrix operation devices 70-5 to 8 are the output data {X ₀₀ , X ₀₁ , X ₀₂ , X of the matrix operation devices 70-1 to 70-4. _03, with respect to _{_{_{X 10, X 11, X 12}}} , X 13, X 20, X 21, X 22, X 23, X 30, X 31, X 32, X 33} is the matrix calculation unit 70-5～8 Transposed matrix data {X ₀₀ , X ₁₀ , X ₂₀ , X ₃₀ , X ₀₁ , X ₁₁ , X ₂₁ , X ₃₁ , X ₀₂ , X ₁₂ , X ₂₂ , X ₃₂ , X ₀₃ , X ₁₃ , X ₂₃ , X ₃₃ } and are connected by the input relationship.

この実施例の直交変換回路は、以上の構成に基づいて４×４の行列演算をクロックに同期しながら次の手順で実行する。但し、この実施例でも、前記実施例１,２と同様に、エンコーダの場合を例にとって説明する。
(1) 整数近似離散コサイン変換とアダマール変換のいずれを演算するかをDTC_mode信号で設定し、Path_mode信号を“0”に設定する。
(2) 入力データ列｛Ｃ₀₀,Ｃ₀₁,Ｃ₀₂,Ｃ₀₃,Ｃ₁₀,Ｃ₁₁,Ｃ₁₂,Ｃ₁₃,Ｃ₂₀,Ｃ₂₁,Ｃ₂₂,Ｃ₂₃,Ｃ₃₀,Ｃ₃₁,Ｃ₃₂,Ｃ₃₃｝を行列演算装置70-1〜4へ入力する。
(3) 行列演算装置70-1〜4へ入力されたデータ列に対応する行列演算装置70-5〜8の出力データ｛Ｔ₀₀,Ｔ₁₀,Ｔ₂₀,Ｔ₃₀,Ｔ₀₁,Ｔ₁₁,Ｔ₂₁,Ｔ₃₁,Ｔ₀₂,Ｔ₁₂,Ｔ₂₂,Ｔ₃₂,Ｔ₀₃,Ｔ₁₃,Ｔ₂₃,Ｔ₃₃｝がそのまま直交変換の演算結果となる。
この実施例では、演算時間を最短で１サイクルにできるという有利性がある。一方、回路規模については、実施例２の場合よりも更に大きくなるが、それでも整数近似離散コサイン変換とアダマール変換とを独立した回路で構成した同一演算能力の直交変換回路と比較しても回路規模は１/２で足りることになる。尚、２×２アダマール変換に関しては、初段の行列演算装置70-1〜4の出力がそのまま直列変換となる。 The orthogonal transform circuit of this embodiment executes the 4 × 4 matrix operation in the following procedure while synchronizing with the clock based on the above configuration. However, in this embodiment as well, as in the first and second embodiments, the case of an encoder will be described as an example.
(1) Set which of integer approximation discrete cosine transform and Hadamard transform is to be calculated using the DTC_mode signal, and set the Path_mode signal to “0”.
(2) Input data string {C ₀₀ , C ₀₁ , C ₀₂ , C ₀₃ , C ₁₀ , C ₁₁ , C ₁₂ , C ₁₃ , C ₂₀ , C ₂₁ , C ₂₂ , C ₂₃ , C ₃₀ , C ₃₁ , C ₃₂ , C ₃₃ } is input to the matrix computing devices 70-1 to 70-4.
(3) Output data {T ₀₀ , T ₁₀ , T ₂₀ , T ₃₀ , T ₀₁ , T ₁₁ , output data of the matrix operation devices 70-5 to 8 corresponding to the data strings input to the matrix operation devices 70-1 to 70-4 T ₂₁ , T ₃₁ , T ₀₂ , T ₁₂ , T ₂₂ , T ₃₂ , T ₀₃ , T ₁₃ , T ₂₃ , T ₃₃ } are directly used as the result of the orthogonal transformation.
In this embodiment, there is an advantage that the calculation time can be shortened to one cycle. On the other hand, the circuit scale is even larger than in the case of the second embodiment. However, even if compared with the orthogonal transform circuit having the same arithmetic capability in which the integer approximate discrete cosine transform and Hadamard transform are configured as independent circuits, the circuit scale is still larger. Would be 1/2. Regarding the 2 × 2 Hadamard transform, the outputs of the first-stage matrix computing devices 70-1 to 70-4 are directly converted into a serial transform.

本発明の行列演算装置はＨ.２６４等の画像データ処理用の直交変換回路に適用できる。 The matrix operation apparatus of the present invention can be applied to an orthogonal transformation circuit for image data processing such as H.264.

Ｈ.２６４におけるマクロブロックの直交変換処理単位を示す図である。It is a figure which shows the orthogonal transformation process unit of the macroblock in H.264. 本発明の実施形態１に係る行列演算装置の構成を示すブロック図である。It is a block diagram which shows the structure of the matrix calculating apparatus which concerns on Embodiment 1 of this invention. 本発明の実施形態２に係る行列演算装置の構成を示すブロック図である。It is a block diagram which shows the structure of the matrix calculating apparatus which concerns on Embodiment 2 of this invention. 実施形態に係る行列演算装置を１組用いた直交変換回路（実施例１）のブロック図である。It is a block diagram of the orthogonal transformation circuit (Example 1) using 1 set of the matrix arithmetic units which concern on embodiment. 実施形態に係る行列演算装置を４組用いた直交変換回路（実施例２）のブロック図である。It is a block diagram of the orthogonal transformation circuit (Example 2) using 4 sets of the matrix arithmetic units which concern on embodiment. 実施形態に係る行列演算装置を８組用いた直交変換回路（実施例３）のブロック図である。It is a block diagram of an orthogonal transformation circuit (Example 3) using eight sets of matrix operation devices according to the embodiment.

Explanation of symbols

11,12,15,16,51,52,55,57…加算器、13,14,17,18,53,54,56,58…減算器、19,20…×1/2演算器、21,22,61,62…データ選択器、59,60…×２演算器、70,70-1〜4,70-5〜8…行列演算装置、71,71-1〜4…入力切換部、72…転置メモリ、73…ＦＦ回路、101…輝度信号のマクロブロック、102…色差信号（Ｃb）のブロック、103…色差信号（Ｃr）のブロック、104…輝度信号（Ｙ）のマクロブロックについて整数近似離散コサイン変換後における各基本処理単位のＤＣ成分のみを集めたブロック、105…色差信号（Ｃb）のブロックについて整数近似離散コサイン変換後における各基本処理単位のＤＣ成分のみを集めたブロック、106…色差信号（Ｃr）のブロックについて整数近似離散コサイン変換後における各基本処理単位のＤＣ成分のみを集めたブロック。
11,12,15,16,51,52,55,57 ... adder, 13,14,17,18,53,54,56,58 ... subtractor, 19,20 ... × 1/2 calculator, 21 , 22, 61, 62 ... data selector, 59, 60 ... x 2 computing unit, 70, 70-1 to 4, 70-5 to 8 ... matrix computing device, 71, 71-1 to 4 ... input switching unit, 72: transposed memory, 73: FF circuit, 101: macro block of luminance signal, 102: block of color difference signal (Cb), 103: block of color difference signal (Cr), 104: integer for macro block of luminance signal (Y) A block that collects only the DC components of each basic processing unit after the approximate discrete cosine transform, 105... A block that collects only the DC components of each basic processing unit after the integer approximate discrete cosine transform for the block of the color difference signal (Cb) 106. A block in which only the DC components of each basic processing unit after the integer approximate discrete cosine transform are collected for the color difference signal (Cr) block.

Claims

The product CB of the variable matrix C of the following formula 1 and the constant matrix B of the following formula 2 (where the constant K is a rational number other than 1), and the product of the variable matrix C and the constant matrix A of the following formula 3 A matrix computing device for computing CA,

Two multipliers for multiplying data C _j1 , C _j3 in the row direction data {C _j0 , C _j1 , C _j2 , C _j3 } of the variable matrix C inputted in parallel by a constant K;
Two data provided corresponding to each of the multipliers for outputting the output side data of the multiplier in the operation mode of the product CB and outputting the input side data to the multiplier in the operation mode of the product CA. A selector,
(C _j0 + C _j2 ), (C _j1 + αC _j3 ), (αC _j1 −C _j3 ), (C _j0 −C _j2 ) [however, from the row direction data inputted and the output data of each data selector α is a first adder / subtractor group for obtaining K in the calculation mode of the product CB and 1 in the calculation mode of the product CA;
Using the calculation result of the first adder / subtractor group, (C _j0 + C _j2 ) + (C _j1 + αC _j3 ), (C _j0 −C _j2 ) + (αC _j1 −C _j3 ), (C _j0 −C _j2 ) _And a second adder / subtractor group for _obtaining (αC _j1 −C _j3 ), (C _j0 + C _j2 ) − (C _j1 + αC _j3 ).

A matrix computing device that computes a product CD of the variable matrix C and a constant matrix D of the following Equation 4 (where the constant R is a rational number other than 1) and a product CA of the variable matrix C and the constant matrix A: There,

From the row direction data {C _j0 , C _j1 , C _j2 , C _j3 } of the variable matrix C inputted in parallel, (C _j0 + C _j3 ), (C _j1 + C _j2 ), (C _j1 −C _j2 ), ( C _j0 −C _j3 ), an input side adder / subtractor group;
Two multipliers each multiplying the multiplier R by (C _j1 + C _j2 ) and (C _j0 −C _j3 ), which are the calculation results of the input side adder / subtractor group,
Two data provided corresponding to each of the multipliers for outputting the output side data of the multiplier in the operation mode of the product CD and outputting the input side data to the multiplier in the operation mode of the product CA. A selector,
(C _j0 + C _j3 ) + (C _j1 + C _j2 ), β (C _j0 −C _j3 ) + (C _j1 −C _j2 ) from the calculation result by the input side adder / subtractor group and the output data of each data selector. , (C _j0 −C _j3 ) − (C _j1 −C _j2 ), (C _j0 + C _j3 ) −β (C _j1 + C _j2 ) [where β is R in the product CD operation mode, and calculation of the product CA A matrix operation device comprising: an output side adder / subtractor group for obtaining 1] in the mode.