JP2901896B2

JP2901896B2 - Orthogonal transform processor

Info

Publication number: JP2901896B2
Application number: JP7094656A
Authority: JP
Inventors: 真木豊蔵; 潔岡本; 義史松本
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1994-05-10
Filing date: 1995-04-20
Publication date: 1999-06-07
Anticipated expiration: 2014-06-07
Also published as: JPH08185389A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、画像情報処理に好適に
利用される直交変換プロセッサに関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an orthogonal transform processor suitably used for image information processing.

【０００２】[0002]

【従来の技術】近年、２次元画像データの高能率圧縮符
号化方式の重要な一部分として、直交変換を実現する小
規模な回路が要求されている。符号器では、順方向の直
交変換、例えば離散コサイン変換（discrete cosine tr
ansform ：略してＤＣＴ）、離散サイン変換（discrete
sine transform ：略してＤＳＴ）などが利用される。
復号器では、逆方向の直交変換、例えば逆離散コサイン
変換（inverse discretecosine transform ：略してＩ
ＤＣＴ）、逆離散サイン変換（inverse discretesine t
ransform ：略してＩＤＳＴ）などが利用される。2. Description of the Related Art In recent years, as an important part of a high-efficiency compression encoding method for two-dimensional image data, a small-scale circuit for realizing orthogonal transformation has been required. In the encoder, a forward orthogonal transform, for example, a discrete cosine transform (discrete cosine tr
ansform: DCT for short, discrete sine transform (discrete
sine transform: DST) is used.
In the decoder, an orthogonal transform in the reverse direction, for example, an inverse discrete cosine transform (I.sub.
DCT), inverse discretesine t
ransform: IDST for short) is used.

【０００３】米国特許４，７９１，５９８には、２個の
１次元ＤＣＴプロセッサと、その間に介在した転置メモ
リとで構成された２次元ＤＣＴプロセッサが開示されて
いる。２個の１次元ＤＣＴプロセッサの各々は、乗算器
を用いずにＲＯＭ（read only memory）を用いてベクト
ル内積を求めるための分布演算（distributed arithmet
ic：略してＤＡ）回路を内蔵したものである。ＤＡ回路
は、複数個のＲＯＭ／累算器（ROM and accumulator ：
略してＲＡＣ）を備えている。各ＲＡＣは、離散コサイ
ン行列に基づくベクトル内積の部分和をルックアップテ
ーブルの形式で格納したＲＯＭと、該ＲＯＭからビット
スライスワードをアドレスとして順次索引された部分和
を桁合わせ加算して入力ベクトルに対応するベクトル内
積を得るための累算器とを有するものである。このよう
な２次元ＤＣＴプロセッサの構成は、２次元ＩＤＣＴプ
ロセッサに転用可能である。US Pat. No. 4,791,598 discloses a two-dimensional DCT processor comprising two one-dimensional DCT processors and a transposed memory interposed therebetween. Each of the two one-dimensional DCT processors is a distributed arithmetic (distributed arithmet) for obtaining a vector inner product using a ROM (read only memory) without using a multiplier.
ic: Abbreviated DA) circuit. The DA circuit has a plurality of ROM / accumulators.
(RAC for short). Each RAC includes a ROM in which a partial sum of a vector inner product based on a discrete cosine matrix is stored in the form of a look-up table, and a partial sum sequentially indexed from the ROM using a bit slice word as an address by digit-aligning and adding to an input vector. And an accumulator for obtaining a corresponding vector dot product. Such a configuration of the two-dimensional DCT processor can be applied to a two-dimensional IDCT processor.

【０００４】８×８要素からなる入力データに２次元Ｉ
ＤＣＴ処理を施すものとする。入力データは、要素ｙ_ij
（ｉ＝０〜７，ｊ＝０〜７）を持つ８行８列の行列Ｙで
表わされる。また、８行８列の逆離散コサイン行列Ｄを
考える。行列Ｄの各要素ｄ_ijは、ｄ_i0＝１／（２・２^0.5），ｉ＝０〜７ｄ_ij＝（１／２）ｃｏｓ｛（２ｉ＋１）ｊπ／１６｝，ｉ＝０〜７，ｊ＝１〜７ …（１）である。行列Ｙの２次元ＩＤＣＴはＤＹＤ^Tである。こ
こに、Ｄ^Tは行列Ｄの転置行列である。行列Ｙの１次元
ＩＤＣＴすなわち行列積ＤＹを計算するための１次元Ｉ
ＤＣＴプロセッサと、転置手段とを用いれば、中間行列
Ｘ＝（ＤＹ）^Tが容易に求められる。最終結果ＤＹＤ^T
も同様にして求められる。なぜなら、ＤＹＤ^T＝（Ｄ
（ＤＹ）^T）^T＝（ＤＸ）^Tであるからである。つま
り、行列積ＤＹを計算するための１次元ＩＤＣＴプロセ
ッサは、２次元ＩＤＣＴを実現する上で重要な役割を担
っている。The input data consisting of 8 × 8 elements has a two-dimensional I
DCT processing is performed. The input data is an element y _ij
It is represented by a matrix Y of 8 rows and 8 columns having (i = 0 to 7, j = 0 to 7). Also, consider an 8 × 8 inverse discrete cosine matrix D. Each element d _ij of the matrix D is represented by d _i0 = 1 / (2 · 2 ^0.5 ), i = 0 to 7 d _ij = (1/2) cos {(2i + 1) jπ / 16}, i = 0 to 7, j = 1 to 7 (1) 2D IDCT matrix Y is DYD ^T. Here, ^DT is the transposed matrix of the matrix D. One-dimensional IDCT for matrix Y, that is, one-dimensional I for calculating matrix product DY
By using the DCT processor and the transposing means, the intermediate matrix X = (DY) ^T can be easily obtained. The final result DYD ^T
Is similarly obtained. This is because, DYD ^T = (D
This is because (DY) ^T ) ^T = (DX) ^T. That is, the one-dimensional IDCT processor for calculating the matrix product DY plays an important role in realizing the two-dimensional IDCT.

【０００５】行列Ｙの第ｊ列に関する１次元ＩＤＣＴの
結果は、８行８列の行列Ｗの第ｊ列で表わされる。ここ
に、行列Ｗの各要素ｗ_ijは、ｗ_ij＝Σ_k=0 ⁷ｄ_ikｙ_kj ，ｉ＝０〜７，ｊ＝０〜７ …（２）である。要素ｗ_ijは、行列Ｄの第ｉ行と行列Ｙの第ｊ列
との内積であって、８個の積の和である。この要素ｗ_ij
を求める処理は、８ポイントＩＤＣＴ処理と呼ばれる。[0005] The result of the one-dimensional IDCT on the j-th column of the matrix Y is represented by the j-th column of the matrix W having 8 rows and 8 columns. Here, each element w _ij of the matrix W is as follows: w _ij = Σ _{k =} ⁷ d _ik y _kj , i = 0 to 7, j = 0 to 7 (2) The element w _ij is an inner product of the i-th row of the matrix D and the j-th column of the matrix Y, and is a sum of eight products. This element w _ij
Is called an 8-point IDCT process.

【０００６】８個の乗算器と８個の累算器とを備えた１
次元ＩＤＣＴプロセッサによれば、行列Ｗの第ｊ列を構
成する８個の内積ｗ_0j，ｗ_1j，ｗ_2j，ｗ_3j，ｗ_4j，
ｗ_5j，ｗ_6j，ｗ_7jを並列に計算することができる。ここ
に、ｗ_0j＝Σ_k=0 ⁷ｄ_0kｙ_kj ｗ_1j＝Σ_k=0 ⁷ｄ_1kｙ_kj ｗ_2j＝Σ_k=0 ⁷ｄ_2kｙ_kj ｗ_3j＝Σ_k=0 ⁷ｄ_3kｙ_kj ｗ_4j＝Σ_k=0 ⁷ｄ_4kｙ_kj ｗ_5j＝Σ_k=0 ⁷ｄ_5kｙ_kj ｗ_6j＝Σ_k=0 ⁷ｄ_6kｙ_kj ｗ_7j＝Σ_k=0 ⁷ｄ_7kｙ_kj …（３）である。[0006] A 1 comprising eight multipliers and eight accumulators
According to dimension IDCT processor, eight inner products w _0j constituting the j-th column of the matrix _{_{W, w 1j, w 2j,}} w 3j, w 4j,
w _5j , w _6j , w _7j can be calculated in parallel. _{_{Here, w 0j = Σ k = 0}} 7 d 0k y kj w 1j = Σ k = 0 7 d 1k y kj w 2j = Σ k = 0 7 d 2k y kj w 3j = Σ k = 0 7 d 3k y _kj w _4j = Σ _{k = 0} ⁷ d _4ky _kj w _5j = Σ _{k = 0} ⁷ d _5k y _kj w _6j = Σ _{k = 0} ⁷ d _6k y _kj w _7j = Σ _{k = 0} ⁷ d _7k y _kj ... (3)

【０００７】[0007]

【発明が解決しようとする課題】上記８個の乗算器を備
えた１次元ＩＤＣＴプロセッサは、ＶＬＳＩ（very lar
ge scale integration）への実装に際して乗算器がチッ
プ上の大きな面積を占める問題があった。The one-dimensional IDCT processor having the above-mentioned eight multipliers is a VLSI (very lar
There is a problem in that the multiplier occupies a large area on the chip when mounting on (ge scale integration).

【０００８】また、式（３）で表わされた８個の内積の
並列計算を上記従来のＤＡ回路で実現する場合には、大
きいＲＯＭサイズを要するという問題があった。Further, when the parallel calculation of the eight inner products represented by the equation (3) is realized by the above-mentioned conventional DA circuit, there is a problem that a large ROM size is required.

【０００９】本発明の目的は、１次元ＩＤＣＴプロセッ
サなどの直交変換プロセッサの回路規模を低減すること
にある。An object of the present invention is to reduce the circuit scale of an orthogonal transform processor such as a one-dimensional IDCT processor.

【００１０】[0010]

【課題を解決するための手段】上記の目的を達成するた
め、本発明に係る第１の直交変換プロセッサは、逆離散
コサイン行列や逆離散サイン行列の要素の規則性に鑑み
て乗算器の個数を低減し、各乗算器の結果を複数の累算
器へ分配することとしたものである。In order to achieve the above-mentioned object, a first orthogonal transform processor according to the present invention comprises a number of multipliers in consideration of the regularity of an element of an inverse discrete cosine matrix or an inverse discrete sine matrix. And the result of each multiplier is distributed to a plurality of accumulators.

【００１１】また、本発明に係る第２の直交変換プロセ
ッサは、複数の内積計算の各々を２個の定数乗算と１個
の部分内積の計算とに分割し、２個の定数乗算を定数乗
算回路で実行することとしたものである。しかも、複数
個の部分内積の計算をＤＡ回路で並列実行することとし
た。A second orthogonal transformation processor according to the present invention divides each of a plurality of inner product calculations into two constant multiplications and one partial inner product calculation, and performs constant multiplication of the two constant multiplications. This is to be executed by a circuit. In addition, the calculation of a plurality of partial inner products is performed in parallel by the DA circuit.

【００１２】[0012]

【作用】本発明に係る第１の直交変換プロセッサによれ
ば、例えば８ポイントＩＤＣＴ処理の場合に従来は８個
の乗算器を要したところ、乗算器数が４又は３に低減さ
れる。According to the first orthogonal transform processor of the present invention, for example, in the case of 8-point IDCT processing, eight multipliers are conventionally required, but the number of multipliers is reduced to four or three.

【００１３】また、本発明に係る第２の直交変換プロセ
ッサによれば、定数乗算回路中の２個又は１個の乗算器
を要するのみである。また、内積計算の一部が定数乗算
回路で実行されるので、ＤＡ回路のＲＯＭサイズが低減
される。According to the second orthogonal transform processor of the present invention, only two or one multiplier in the constant multiplication circuit is required. Further, since part of the inner product calculation is performed by the constant multiplication circuit, the ROM size of the DA circuit is reduced.

【００１４】[0014]

【実施例】以下、図面を参照しながら、本発明の実施例
に係る１次元ＩＤＣＴプロセッサについて説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, a one-dimensional IDCT processor according to an embodiment of the present invention will be described with reference to the drawings.

【００１５】（実施例１）まず、ｔ_n（ｎ＝０〜７）
を、ｔ₀＝１／（２・２^0.5）ｔ_n＝（１／２）ｃｏｓ（ｎπ／１６），ｎ＝１〜７ …（４）のように定義する。すると、前記の式（３）で表わされ
た８個の内積の計算は、コサイン関数の対称性を利用し
て、図１のように表現される。(Embodiment 1) First, t _n (n = 0 to 7)
Is defined as t ₀ = 1 / (2 · 2 ^0.5 ) t _n = (１／) cos (nπ / 16), n = 1 to 7 (4) Then, the calculation of the eight inner products represented by the above equation (3) is expressed as shown in FIG. 1 using the symmetry of the cosine function.

【００１６】図１の行列演算では、符号（±）を度外視
すると、ｙ_0jに掛けるべき係数はｔ₀であり、ｙ_1jに掛
けるべき係数はｔ₁，ｔ₃，ｔ₅，ｔ₇であり、ｙ_2jに
掛けるべき係数はｔ₂，ｔ₆であり、ｙ_3jに掛けるべき
係数はｔ₃，ｔ₇，ｔ₁，ｔ₅であり、ｙ_4jに掛けるべ
き係数はｔ₄であり、ｙ_5jに掛けるべき係数はｔ₅，ｔ
₁，ｔ₇，ｔ₃であり、ｙ_6jに掛けるべき係数はｔ₆，
ｔ₂であり、ｙ_7jに掛けるべき係数はｔ₇，ｔ₅，
ｔ₃，ｔ₁である。したがって、図２に示すように、入
力データの８個の要素ｙ_ij（ｉ＝０〜７）が順次供給さ
れるとき、１サイクルに最大４個の乗算を実行すればよ
い。図２の手順で用いられる係数行列Ｅを図３に示す。
係数行列Ｅは、図１中の８行８列の逆離散コサイン行列
のうちの第０行から第３行までの各要素の絶対値を構成
要素とした４行８列の行列である。In the matrix operation shown in FIG. 1, when the sign (±) is ignored, the coefficients to be _applied to y _0j are t ₀ , and the coefficients to be _applied to y _1j are t ₁ , t ₃ , t ₅ and t ₇ . , Y _2j are coefficients t ₂ , t ₆ , coefficients y _3j are t ₃ , t ₇ , t ₁ , t ₅ , coefficients y _4j are t ₄ , y _The coefficients to be multiplied by _5j are t ₅ and t
₁ , t ₇ and t ₃ , and the coefficients to be multiplied by y _6j are t ₆ ,
t ₂ , and the coefficients to be multiplied by y _7j are t ₇ , t ₅ ,
t ₃ and t ₁ . Accordingly, as shown in FIG. 2, when eight elements y _ij (i = 0 to 7) of the input data are sequentially supplied, a maximum of four multiplications may be performed in one cycle. FIG. 3 shows a coefficient matrix E used in the procedure of FIG.
The coefficient matrix E is a 4-row, 8-column matrix in which the absolute values of the elements from the 0th row to the 3rd row of the 8-row, 8-column inverse discrete cosine matrix in FIG. 1 are constituent elements.

【００１７】本発明の第１の実施例に係る１次元ＩＤＣ
Ｔプロセッサの構成を図４に示す。この構成は、図３の
係数行列Ｅを採用したものである。図４において、１０
１〜１０４は第１〜第４の係数メモリ、１０５〜１０８
は第１〜第４の乗算器、１０９〜１１６は第１〜第８の
累算器、１１７は８入力セレクタである。第１の係数メ
モリ１０１には行列Ｅの第０行の８個の要素が、第２の
係数メモリ１０２には行列Ｅの第１行の８個の要素が、
第３の係数メモリ１０３には行列Ｅの第２行の８個の要
素が、第４の係数メモリ１０４には行列Ｅの第３行の８
個の要素がそれぞれ格納されている。入力端子から、２
の補数表示の２進数データｙ_ij（ｉ＝０〜７，ｊ＝０〜
７）が、ｙ₀₀〜ｙ₇₀、ｙ₀₁〜ｙ₇₁、…、ｙ₀₇〜ｙ₇₇の順
序で第１〜第４の乗算器１０５〜１０８へ供給される。
第１の乗算器１０５はｙ_ijと第１の係数メモリ１０１の
出力との乗算を、第２の乗算器１０６はｙ_ijと第２の係
数メモリ１０２の出力との乗算を、第３の乗算器１０７
はｙ_ijと第３の係数メモリ１０３の出力との乗算を、第
４の乗算器１０８はｙ_ijと第４の係数メモリ１０４の出
力との乗算をそれぞれ実行するものである。第１〜第８
の累算器１０９〜１１６は、第１〜第４の乗算器１０５
〜１０８の結果を用いて８個の内積ｗ_0j，ｗ_1j，ｗ_2j，
ｗ_3j，ｗ_4j，ｗ_5j，ｗ_6j，ｗ_7jを求めるための累算を並
列に実行するものである。８入力セレクタ１１７は、第
１〜第８の累算器１０９〜１１６の結果を順次選択し
て、データｗ_ij（ｉ＝０〜７，ｊ＝０〜７）を、ｗ₀₀〜
ｗ₇₀、ｗ₀₁〜ｗ₇₁、…、ｗ₀₇〜ｗ₇₇の順序で出力するも
のである。One-dimensional IDC according to a first embodiment of the present invention
FIG. 4 shows the configuration of the T processor. This configuration employs the coefficient matrix E shown in FIG. In FIG. 4, 10
1 to 104 are first to fourth coefficient memories, 105 to 108
Denotes first to fourth multipliers, 109 to 116 denote first to eighth accumulators, and 117 denotes an eight-input selector. The first coefficient memory 101 has eight elements in the zeroth row of the matrix E, and the second coefficient memory 102 has eight elements in the first row of the matrix E.
The third coefficient memory 103 stores eight elements of the second row of the matrix E, and the fourth coefficient memory 104 stores eight elements of the third row of the matrix E.
Elements are stored respectively. From the input terminal, 2
Binary data y _ij (i = 0 to 7, j = 0 to 0)
_{_{_{7), y 00 ~y 70, y 01}}} ~y 71, ..., it is supplied to a multiplier 105 to 108 sequence in the first to fourth y ₀₇ ~y _77.
The first multiplier 105 multiplies y _{ij by} the output of the first coefficient memory 101, the second multiplier 106 multiplies y _{ij by} the output of the second coefficient memory 102, and the third multiplication Container 107
Is for multiplying y _{ij by} the output of the third coefficient memory 103, and the fourth multiplier 108 is for multiplying y _{ij by} the output of the fourth coefficient memory 104. 1st to 8th
Are stored in the first to fourth multipliers 105.
Eight of the inner product w _0j using the results of ~108, w _1j, w _2j,
Accumulation for _obtaining w _3j , w _4j , w _5j , w _6j , w _7j is performed in parallel. The eight-input selector 117 sequentially selects the results of the first to eighth accumulators 109 to 116 and converts the data w _ij (i = 0 to 7, j = 0 to 7) into w ₀₀ to
The output is performed in the order of w ₇₀ , w _{01 to} w ₇₁ ,..., w _{07 to} w ₇₇ .

【００１８】図４中のｗ_1jを求めるための累算器１１０
の内部構成を図５に示す。図５において、２０１は４入
力セレクタ、２０２は２の補数器、２０３は加算器、２
０４は累算レジスタ、２０５はバッファレジスタであ
る。４入力セレクタ２０１は、第１〜第４の乗算器１０
５〜１０８の結果のうちの１つを選択するものである。
２の補数器２０２は、ｉの値に応じて、４入力セレクタ
２０１の出力をそのまま通過させたり、４入力セレクタ
２０１の出力の２の補数を出力したりするものである。
具体的には、図１中の逆離散コサイン行列のうちの第１
行（ｔ₀，ｔ₃，ｔ₆，−ｔ₇，−ｔ₄，−ｔ₁，−ｔ
₂，−ｔ₅）と入力データベクトル（ｙ_0j，ｙ_1j，
ｙ_2j，ｙ_3j，ｙ_4j，ｙ_5j，ｙ_6j，ｙ_7j）との内積ｗ_1jの
計算に対応して、ｉ＝０，１，２のサイクルでは４入力
セレクタ２０１の出力をそのまま通過させ、ｉ＝３，
４，５，６，７のサイクルでは４入力セレクタ２０１の
出力の２の補数を出力するように制御される。データｘ
の２の補数は、ｘの全てのビットを反転させたうえ、１
を加算することによって求められる。加算器２０３は、
２の補数器２０２の結果と累算レジスタ２０４の保持出
力との和を求めるものである。累算レジスタ２０４の保
持内容は予め０に初期化されたうえ、加算器２０３の結
果に書き換えられる。バッファレジスタ２０５は、当該
１次元ＩＤＣＴプロセッサのパイプライン動作を保証す
るように、累算レジスタ２０４の出力を保持するもので
ある。図４中の他の累算器の内部構成も図５と同様であ
る。An accumulator 110 for _obtaining w _1j in FIG.
Is shown in FIG. In FIG. 5, 201 is a 4-input selector, 202 is a two's complementer, 203 is an adder,
04 is an accumulation register and 205 is a buffer register. The four-input selector 201 includes the first to fourth multipliers 10
One of the results 5 to 108 is selected.
The two's complementer 202 passes the output of the four-input selector 201 as it is or outputs the two's complement of the output of the four-input selector 201 according to the value of i.
Specifically, the first of the inverse discrete cosine matrices in FIG.
Rows (t ₀ , t ₃ , t ₆ , -t ₇ , -t ₄ , -t ₁ , -t
₂ , −t ₅ ) and the input data vector (y _0j , y _1j ,
In response to the calculation of the inner product w _1j with y _2j , y _3j , y _4j , y _5j , y _6j , y _7j ), the output of the 4-input selector 201 is passed as it is in the cycle of i = 0, 1, and 2. , I = 3
In the cycles of 4, 5, 6, and 7, the output of the 4-input selector 201 is controlled to output the 2's complement. Data x
Complements all bits of x, plus 1
Is calculated by adding The adder 203
The sum of the result of the two's complementer 202 and the output held by the accumulation register 204 is obtained. The content held in the accumulation register 204 is initialized to 0 in advance, and is rewritten with the result of the adder 203. The buffer register 205 holds the output of the accumulation register 204 so as to guarantee the pipeline operation of the one-dimensional IDCT processor. The internal configuration of the other accumulators in FIG. 4 is the same as that in FIG.

【００１９】以下、図４及び図５に基づいて、本発明の
第１の実施例に係る１次元ＩＤＣＴプロセッサの動作を
説明する。The operation of the one-dimensional IDCT processor according to the first embodiment of the present invention will be described below with reference to FIGS.

【００２０】第１のサイクルでは、入力端子からデータ
ｙ₀₀が供給される。一方、係数メモリ１０１〜１０４か
らそれぞれｔ₀，ｔ₀，ｔ₀，ｔ₀が読み出され、乗算
器１０５〜１０８により４個の積ｔ₀ｙ₀₀，ｔ₀ｙ₀₀，
ｔ₀ｙ₀₀，ｔ₀ｙ₀₀が並列に計算される。次に、累算器
１０９〜１１６の４入力セレクタ２０１により、４個の
乗算器１０５〜１０８の結果のうちの１個がそれぞれ選
択される。この場合、４個の乗算器１０５〜１０８の結
果は全て同じであるので、どれを選択してもよい。累算
器１０９〜１１６の２の補数器２０２は、それぞれ４入
力セレクタ２０１の出力をそのまま通過させる。累算器
１０９〜１１６の加算器２０３は、２の補数器２０２の
結果と、予め０に初期化された累算レジスタ２０４の出
力との和を計算し、その加算結果を累算レジスタ２０４
にそれぞれ書き込む。この結果、累算器１０９〜１１６
の全ての累算レジスタ２０４に同じ積ｔ₀ｙ₀₀が格納さ
れる。[0020] In the first cycle, the data y ₀₀ is supplied from the input terminal. On the other hand, t ₀ , t ₀ , t ₀ , and t ₀ are read from the coefficient memories 101 to 104, respectively, and the four products t ₀ y ₀₀ , t ₀ y ₀₀ ,
t ₀ y ₀₀ and t ₀ y ₀₀ are calculated in parallel. Next, one of the results of the four multipliers 105 to 108 is selected by the four-input selector 201 of the accumulators 109 to 116, respectively. In this case, since the results of the four multipliers 105 to 108 are all the same, any one may be selected. Each of the two's complementers 202 of the accumulators 109 to 116 passes the output of the four-input selector 201 as it is. The adder 203 of the accumulators 109 to 116 calculates the sum of the result of the two's complementer 202 and the output of the accumulation register 204 which has been initialized to 0 in advance, and outputs the addition result to the accumulation register 204
Write to each. As a result, the accumulators 109 to 116
The same product t ₀ y ₀₀ is stored in all the accumulation registers 204.

【００２１】第２のサイクルでは、入力端子からデータ
ｙ₁₀が供給される。一方、係数メモリ１０１〜１０４か
らそれぞれｔ₁，ｔ₃，ｔ₅，ｔ₇が読み出され、乗算
器１０５〜１０８により４個の積ｔ₁ｙ₁₀，ｔ₃ｙ₁₀，
ｔ₅ｙ₁₀，ｔ₇ｙ₁₀が並列に計算される。次に、累算器
１０９〜１１６の４入力セレクタ２０１により、４個の
乗算器１０５〜１０８の結果のうちの１個がそれぞれ選
択される。この場合、第１の累算器１０９では第１の乗
算器１０５の結果ｔ₁ｙ₁₀が、第２の累算器１１０では
第２の乗算器１０６の結果ｔ₃ｙ₁₀が、第３の累算器１
１１では第３の乗算器１０７の結果ｔ₅ｙ₁₀が、第４の
累算器１１２では第４の乗算器１０８の結果ｔ₇ｙ
₁₀が、第５の累算器１１３では第４の乗算器１０８の結
果ｔ₇ｙ₁₀が、第６の累算器１１４では第３の乗算器１
０７の結果ｔ₅ｙ₁₀が、第７の累算器１１５では第２の
乗算器１０６の結果ｔ₃ｙ₁₀が、第８の累算器１１６で
は第１の乗算器１０５の結果ｔ₁ｙ₁₀がそれぞれ選択さ
れる。第１〜第４の累算器１０９〜１１２の２の補数器
２０２は、それぞれ４入力セレクタ２０１の出力をその
まま通過させる。第５〜第８の累算器１１３〜１１６の
２の補数器２０２は、それぞれ４入力セレクタ２０１の
出力の２の補数を出力する。累算器１０９〜１１６の加
算器２０３は、２の補数器２０２の結果と累算レジスタ
２０４の出力との和を計算し、その加算結果を累算レジ
スタ２０４にそれぞれ書き込む。この結果、第１の累算
器１０９ではｔ₀ｙ₀₀＋ｔ₁ｙ₁₀が、第２の累算器１１
０ではｔ₀ｙ₀₀＋ｔ₃ｙ₁₀が、第３の累算器１１１では
ｔ₀ｙ₀₀＋ｔ₅ｙ₁₀が、第４の累算器１１２ではｔ₀ｙ
₀₀＋ｔ₇ｙ₁₀が、第５の累算器１１３ではｔ₀ｙ₀₀−ｔ
₇ｙ₁₀が、第６の累算器１１４ではｔ₀ｙ₀₀−ｔ₅ｙ₁₀
が、第７の累算器１１５ではｔ₀ｙ₀₀−ｔ₃ｙ₁₀が、第
８の累算器１１６ではｔ₀ｙ₀₀−ｔ₁ｙ₁₀がそれぞれ累
算レジスタ２０４に格納される。[0021] In the second cycle, the data y ₁₀ is supplied from the input terminal. On the other hand, t ₁ , t ₃ , t ₅ , and t ₇ are read from the coefficient memories 101 to 104, respectively, and the four products t ₁ y ₁₀ , t ₃ y ₁₀ ,
_{_{_{t 5 y 10, t 7 y}}} 10 are calculated in parallel. Next, one of the results of the four multipliers 105 to 108 is selected by the four-input selector 201 of the accumulators 109 to 116, respectively. In this case, the result t ₁ y ₁₀ of the first accumulator 109 first multiplier 105, the result t ₃ y ₁₀ of the second accumulator 110 and the second multiplier 106, a third Accumulator 1
11, the result t ₅ y ₁₀ of the third multiplier 107 is output, and the fourth accumulator 112 is the result t ₇ y of the fourth multiplier 108.
₁₀ , the fifth accumulator 113 outputs the result t ₇ y ₁₀ of the fourth multiplier 108, and the sixth accumulator 114 outputs the third multiplier 1
Result t ₅ y ₁₀ 07, the result t ₃ y ₁₀ of the seventh accumulator 115 second multiplier 106, eighth the accumulator 116 of the first multiplier 105 result t ₁ y ₁₀ are selected respectively. The two's complementers 202 of the first to fourth accumulators 109 to 112 pass the output of the four-input selector 201 as they are. The two's complement units 202 of the fifth to eighth accumulators 113 to 116 each output the two's complement of the output of the four-input selector 201. The adders 203 of the accumulators 109 to 116 calculate the sum of the result of the two's complementer 202 and the output of the accumulation register 204, and write the addition result to the accumulation register 204. As a result, in the first accumulator 109, t ₀ y ₀₀ + t ₁ y ₁₀ is changed to the second accumulator 11
0 In _{_{_{t 0 y 00 + t 3 y}}} 10 is, in the third accumulator _{_{_{111 t 0 y 00 + t 5}}} y 10 is, t ₀ In the fourth accumulator 112 y
₀₀ + t ₇ y ₁₀ is equal to t ₀ y ₀₀ −t in the fifth accumulator 113.
₇ y ₁₀ is, t ₀ the accumulator 114 of the 6 y ₀₀ -t ₅ y ₁₀
But seventh accumulator 115 at t ₀ y ₀₀ -t ₃ y ₁₀ of, first _{_{_{8 t 0 y 00 -t 1 y}}} 10 In accumulator 116 is stored in the accumulator register 204, respectively.

【００２２】第３から第８のサイクルでは、入力端子か
らデータｙ₂₀，ｙ₃₀，ｙ₄₀，ｙ₅₀，ｙ₆₀，ｙ₇₀が順次供
給される。したがって、第８サイクルの終りには、累算
器１０９〜１１６の累算レジスタ２０４に、８個の内積
ｗ₀₀，ｗ₁₀，ｗ₂₀，ｗ₃₀，ｗ₄₀，ｗ₅₀，ｗ₆₀，ｗ₇₀が格
納される。[0022] In the third from the eighth cycle, the data _{_{_{y 20, y 30, y 40}}} , y 50, y 60, y 70 is sequentially supplied from the input terminal. Thus, at the end of the eighth cycle, the accumulator to the accumulator register 204 of 109 to 116, eight inner products _{_{_{w 00, w 10, w 20}}} , w 30, w 40, w 50, w 60, w 70 Is stored.

【００２３】第９のサイクルでは、入力端子からデータ
ｙ₀₁が供給されて上記第１のサイクルと同様の処理が実
行されるとともに、累算器１０９〜１１６の累算レジス
タ２０４の保持内容ｗ₀₀，ｗ₁₀，ｗ₂₀，ｗ₃₀，ｗ₄₀，ｗ
₅₀，ｗ₆₀，ｗ₇₀がバッファレジスタ２０５へそれぞれ転
送される。そして、８入力セレクタ１１７は、第１の累
算器１０９の出力ｗ₀₀を選択出力する。[0023] In the ninth cycle, from the input terminal is supplied with the data y ₀₁ together with the processing similar to the above first cycle is executed, the accumulation register 204 of the accumulator 109 to 116 holding contents w ₀₀ , W ₁₀ , w ₂₀ , w ₃₀ , w ₄₀ , w
₅₀ , w ₆₀ and w ₇₀ are transferred to the buffer register 205, respectively. The 8-input selector 117 selectively outputs the output w ₀₀ of the first accumulator 109.

【００２４】第１０のサイクルでは、入力端子からデー
タｙ₁₁が供給されて、上記第２のサイクルと同様の処理
が実行される。８入力セレクタ１１７は、第２の累算器
１１０の出力ｗ₁₀を選択出力する。In the tenth cycle, data y ₁₁ is supplied from the input terminal, and the same processing as in the second cycle is executed. 8-input selector 117 selectively outputs the output w ₁₀ of the second accumulator 110.

【００２５】以下同様の処理を繰り返すことにより、連
続的に供給される入力データｙ₀₀〜ｙ₇₀，ｙ₀₁〜ｙ₇₁，
…，ｙ₀₇〜ｙ₇₇に対応した出力データｗ₀₀〜ｗ₇₀，ｗ₀₁
〜ｗ₇₁，…，ｗ₀₇〜ｗ₇₇が連続して得られる。Thereafter, by repeating the same processing, the input data y _{00 to} y ₇₀ , y _{01 to} y ₇₁ , which are continuously supplied,
..., output data w _{00 to} w ₇₀ , w ₀₁ corresponding to y _{07 to} y ₇₇
_{_{~w 71, ..., w 07 ~w}} 77 can be obtained continuously.

【００２６】図５の累算器１１０の変形例を図６に示
す。図６の例では、上記２の補数器２０２に代えて１の
補数器２１２が用いられる。１の補数器２１２は、上記
ｉの値に応じて、４入力セレクタ２０１の出力をそのま
ま通過させたり、４入力セレクタ２０１の出力の１の補
数を出力したりするものである。具体的には、図１中の
逆離散コサイン行列のうちの第１行（ｔ₀，ｔ₃，
ｔ₆，−ｔ₇，−ｔ₄，−ｔ₁，−ｔ₂，−ｔ₅）と入
力データベクトル（ｙ_0j，ｙ_1j，ｙ_2j，ｙ_3j，ｙ_4j，ｙ
_5j，ｙ_6j，ｙ_7j）との内積ｗ_1jの計算に対応して、ｉ＝
０，１，２のサイクルでは４入力セレクタ２０１の出力
をそのまま通過させ、ｉ＝３，４，５，６，７のサイク
ルでは４入力セレクタ２０１の出力の１の補数を出力す
るように制御される。データｘの１の補数は、ｘの全て
のビットを反転させることによって求められる。累算レ
ジスタ２０４の初期値は、上記逆離散コサイン行列の第
１行を構成する８個の要素のうちの負の要素の数、すな
わち５に設定される。FIG. 6 shows a modification of the accumulator 110 shown in FIG. In the example of FIG. 6, a one's complementer 212 is used in place of the two's complementer 202. The one's complementer 212 passes the output of the four-input selector 201 as it is or outputs the one's complement of the output of the four-input selector 201 according to the value of i. Specifically, the first row (t ₀ , t ₃ ,
t ₆ , −t ₇ , −t ₄ , −t ₁ , −t ₂ , −t ₅ ) and input data vectors (y _0j , y _1j , y _2j , y _3j , y _4j , y
_5j, y _6j, in response to the calculation of the inner product w _1j of the y _7j), i =
In the cycles of 0, 1, and 2, the output of the 4-input selector 201 is passed as it is, and in the cycle of i = 3, 4, 5, 6, and 7, the output of the 4-input selector 201 is controlled to output the one's complement. You. The one's complement of data x is determined by inverting all bits of x. The initial value of the accumulation register 204 is set to the number of negative elements out of the eight elements constituting the first row of the inverse discrete cosine matrix, that is, five.

【００２７】以上のとおり、第１の実施例によれば、乗
算器の数が４に低減された１次元ＩＤＣＴプロセッサを
実現できる。なお、累算器１０９〜１１６の４入力セレ
クタ２０１（図５及び図６）を省略して固定配線を採用
してもよい。この場合、第１及び第８の累算器１０９，
１１６へは第１の乗算器１０５の結果のみが、第２及び
第７の累算器１１０，１１５へは第２の乗算器１０６の
結果のみが、第３及び第６の累算器１１１，１１４へは
第３の乗算器１０７の結果のみが、第４及び第５の累算
器１１２，１１３へは第４の乗算器１０８の結果のみが
それぞれ供給される。As described above, according to the first embodiment, a one-dimensional IDCT processor in which the number of multipliers is reduced to four can be realized. Note that the 4-input selector 201 (FIGS. 5 and 6) of the accumulators 109 to 116 may be omitted and fixed wiring may be employed. In this case, the first and eighth accumulators 109,
116, the result of the first multiplier 105 only, the second and seventh accumulators 110, 115 only the result of the second multiplier 106, the third and sixth accumulators 111, Only the result of the third multiplier 107 is supplied to 114, and only the result of the fourth multiplier 108 is supplied to the fourth and fifth accumulators 112 and 113, respectively.

【００２８】（実施例２）図１の行列演算を遂行するた
めには、図２に示すように、８サイクルで２２個の乗算
を実行する必要がある。１サイクルの平均乗算個数は
２．７５である。そこで、第２の実施例では、図７に示
すように、入力データの８個の要素ｙ_ij（ｉ＝０〜７）
が順次供給されるとき、１サイクルに最大３個の乗算を
実行することとした。そのため、入力要素を保持するた
めのレジスタを設け、あるサイクルの入力要素に加えて
前サイクルの入力要素を利用できるようにした。すなわ
ち、あるサイクルでは第１群の係数ｔ₁，ｔ₀と要素ｙ
_1j，ｙ_0jとの乗算が、次のサイクルでは第２群の係数ｔ
₃，ｔ₅，ｔ₇と要素ｙ_1jとの乗算が、次のサイクルで
は第３群の係数ｔ₃，ｔ₂，ｔ₆と要素ｙ_3j，ｙ_2jとの
乗算が、次のサイクルでは第４群の係数ｔ₇，ｔ₁，ｔ
₅と要素ｙ_3jとの乗算が、次のサイクルでは第５群の係
数ｔ₅，ｔ₄と要素ｙ_5j，ｙ_4jとの乗算が、次のサイク
ルでは第６群の係数ｔ₁，ｔ₇，ｔ₃と要素ｙ_5jとの乗
算が、次のサイクルでは第７群の係数ｔ₇，ｔ₆，ｔ₂
と要素ｙ_7j，ｙ_6jとの乗算が、次のサイクルでは第８群
の係数ｔ₅，ｔ₃，ｔ₁と要素ｙ_7jとの乗算がそれぞれ
実行される。図７の手順で用いられる係数行列Ｇを図８
に示す。係数行列Ｇは、上記第１群の係数を含む３個の
係数ｔ₁，ｔ₀，ｔ₀を持つ第０列と、上記第２群の係
数ｔ₃，ｔ₅，ｔ₇を持つ第１列と、上記第３群の係数
ｔ₃，ｔ₂，ｔ₆を持つ第２列と、上記第４群の係数ｔ
₇，ｔ₁，ｔ₅を持つ第３列と、上記第５群の係数を含
む３個の係数ｔ₅，ｔ₄，ｔ₄を持つ第４列と、上記第
６群の係数ｔ₁，ｔ₇，ｔ₃を持つ第５列と、上記第７
群の係数ｔ₇，ｔ₆，ｔ₂を持つ第６列と、上記第８群
の係数ｔ₅，ｔ₃，ｔ₁を持つ第７列とで構成された３
行８列の行列である。(Embodiment 2) In order to perform the matrix operation of FIG. 1, it is necessary to execute 22 multiplications in eight cycles as shown in FIG. The average number of multiplications in one cycle is 2.75. Therefore, in the second embodiment, as shown in FIG. 7, eight elements y _ij (i = 0 to 7) of the input data
Are sequentially supplied, a maximum of three multiplications are performed in one cycle. Therefore, a register for holding the input element is provided so that the input element of the previous cycle can be used in addition to the input element of a certain cycle. That is, in a certain cycle, the coefficients t ₁ and t ₀ of the first group and the element y
_1j , y _0j is multiplied by the coefficient t of the second group in the next cycle.
_3, the multiplication of the t _5, t ₇ and element y _1j is coefficient t ₃ of the third group in the next cycle, t _2, t ₆ and element y _3j, multiplies the y _2j, in the next cycle the Four groups of coefficients t ₇ , t ₁ , t
₅ multiplies the and elements y _3j, coefficients t ₅ of the fifth group in the next cycle, t ₄ and element y _5j, multiplies the y _4j, coefficients t ₁ of the sixth group in the next cycle, t ₇ , T ₃ and the element y _5j are multiplied by the coefficients t ₇ , t ₆ , t ₂ of the seventh group in the next cycle.
And the elements y _7j , y _6j, and in the next cycle, the multiplication of the eighth group of coefficients t ₅ , t ₃ , t ₁ and the element y _7j are executed. The coefficient matrix G used in the procedure of FIG.
Shown in The coefficient matrix G has a 0-th column having three coefficients t ₁ , t ₀ , and t ₀ including the coefficients of the first group, and a first column having the coefficients t ₃ , t ₅ , and t ₇ of the second group. A second column having the third group of coefficients t ₃ , t ₂ , t _6, and a fourth group of coefficients t
_7, t _1, t and a third column with _5, the fourth column with three coefficients t _5, t _4, t ₄ containing the coefficients of the fifth group, the coefficient t ₁ of the sixth group, The fifth column having t ₇ and t ₃ and the seventh column
A third column composed of a sixth column having group coefficients t ₇ , t ₆ and t ₂ and a seventh column having coefficients t ₅ , t ₃ and t ₁ of the eighth group.
It is a matrix of 8 rows.

【００２９】本発明の第２の実施例に係る１次元ＩＤＣ
Ｔプロセッサの構成を図９に示す。この構成は、図８の
係数行列Ｇを採用したものである。図９において、３０
１は入力レジスタ、３０２〜３０４は第１〜第３の係数
メモリ、３０５は２入力セレクタ、３０６〜３０８は第
１〜第３の乗算器、３０９は一時レジスタ、３１０〜３
１７は第１〜第８の累算器、３１８は８入力セレクタで
ある。第１の係数メモリ３０２には行列Ｇの第０行の８
個の要素が、第２の係数メモリ３０３には行列Ｇの第１
行の８個の要素が、第３の係数メモリ３０４には行列Ｇ
の第２行の８個の要素がそれぞれ格納されている。入力
端子から、２の補数表示の２進数データｙ_ij（ｉ＝０〜
７，ｊ＝０〜７）が、ｙ₀₀〜ｙ₇₀、ｙ₀₁〜ｙ₇₁、…、ｙ
₀₇〜ｙ₇₇の順序で、入力レジスタ３０１及び２入力セレ
クタ３０５へ供給される。２入力セレクタ３０５は、入
力端子から直接供給されたデータと入力レジスタ３０１
の出力データとのいずれかを選択するものである。第１
の乗算器３０６は２入力セレクタ３０５の出力と第１の
係数メモリ３０２の出力との乗算を、第２の乗算器３０
７は入力レジスタ３０１の出力と第２の係数メモリ３０
３の出力との乗算を、第３の乗算器３０８は入力レジス
タ３０１の出力と第２の係数メモリ３０４の出力との乗
算をそれぞれ実行するものである。一時レジスタ３０９
は、第１の乗算器３０６の出力を一時保持するものであ
る。第１〜第８の累算器３１０〜３１７は、一時レジス
タ３０９の出力データと第１〜第３の乗算器３０６〜３
０８の結果とを用いて８個の内積ｗ_0j，ｗ_1j，ｗ_2j，ｗ
_3j，ｗ_4j，ｗ_5j，ｗ_6j，ｗ_7jを求めるための累算を並列
に実行するものであって、各々の内部構成は図５又は図
６のとおりである。８入力セレクタ３１８は、第１〜第
８の累算器３１０〜３１７の結果を順次選択して、デー
タｗ_ij（ｉ＝０〜７，ｊ＝０〜７）を、ｗ₀₀〜ｗ₇₀、ｗ
₀₁〜ｗ₇₁、…、ｗ₀₇〜ｗ₇₇の順序で出力するものであ
る。One-dimensional IDC according to a second embodiment of the present invention
FIG. 9 shows the configuration of the T processor. This configuration employs the coefficient matrix G shown in FIG. In FIG. 9, 30
1 is an input register, 302 to 304 are first to third coefficient memories, 305 is a two-input selector, 306 to 308 are first to third multipliers, 309 is a temporary register, 310 to 310
Reference numeral 17 denotes a first to eighth accumulators, and 318 denotes an 8-input selector. The first coefficient memory 302 stores 8 in the 0th row of the matrix G.
Elements are stored in the second coefficient memory 303 in the first coefficient of the matrix G.
The eight elements in the row are stored in the third coefficient memory 304 in the matrix G
Of the second row are stored. From the input terminal, binary number data y _ij (i = 0 to 2's complement notation)
7, j = 0~7) _{_{_{is, y 00 ~y 70, y 01}}} ~y 71, ..., y
In the order of ₀₇ ~y _77, it is supplied to the input register 301 and the 2-input selector 305. The two-input selector 305 determines whether the data supplied directly from the input terminal and the input register 301
Of the output data. First
Multiplier 306 multiplies the output of the two-input selector 305 by the output of the first coefficient memory 302,
7 is the output of the input register 301 and the second coefficient memory 30
The third multiplier 308 executes multiplication of the output of the input register 301 and the output of the second coefficient memory 304, respectively. Temporary register 309
Is to temporarily hold the output of the first multiplier 306. The first to eighth accumulators 310 to 317 store output data of the temporary register 309 and first to third multipliers 306 to 306.
08 and the eight inner products w _0j , w _1j , w _2j , w
Accumulation for _obtaining _3j , _w4j , _w5j , _w6j , _w7j is performed in parallel, and the internal configuration of each is as shown in FIG. 5 or FIG. The 8-input selector 318 sequentially selects the results of the first to eighth accumulators 310 to 317 and converts the data w _ij (i = 0 to 7, j = 0 to 7) into w ₀₀ to w ₇₀ , w
₀₁ ~w _71, ..., and outputs in the order of w ₀₇ ~w _77.

【００３０】以下、図９及び図５に基づいて、本発明の
第２の実施例に係る１次元ＩＤＣＴプロセッサの動作を
説明する。The operation of the one-dimensional IDCT processor according to the second embodiment of the present invention will be described below with reference to FIGS.

【００３１】第１のサイクルでは、入力端子からデータ
ｙ₀₀が供給される。更に、第１のサイクルの終わりで、
該データｙ₀₀が入力レジスタ３０１に書き込まれる。[0031] In the first cycle, the data y ₀₀ is supplied from the input terminal. Further, at the end of the first cycle,
The data y ₀₀ is written to the input register 301.

【００３２】第２のサイクルでは、入力端子からデータ
ｙ₁₀が供給され、該データｙ₁₀が２入力セレクタ３０５
により選択される。一方、係数メモリ３０２〜３０４か
らそれぞれｔ₁，ｔ₀，ｔ₀が読み出され、乗算器３０
６〜３０８により３個の積ｔ₁ｙ₁₀，ｔ₀ｙ₀₀，ｔ₀ｙ
₀₀が並列に計算される。次に、累算器３１０〜３１７の
４入力セレクタ２０１により、第２及び第３の乗算器３
０７，３０８の結果のうちの１個がそれぞれ選択され
る。この場合、第２及び第３の乗算器３０７，３０８の
結果は同じであるので、いずれを選択してもよい。累算
器３１０〜３１７の２の補数器２０２は、それぞれ４入
力セレクタ２０１の出力をそのまま通過させる。累算器
３１０〜３１７の加算器２０３は、２の補数器２０２の
結果と、予め０に初期化された累算レジスタ２０４の出
力との和を計算し、その加算結果を累算レジスタ２０４
にそれぞれ書き込む。この結果、累算器３１０〜３１７
の全ての累算レジスタ２０４に、同じ積ｔ₀ｙ₀₀が格納
される。更に、第２のサイクルの終わりで、データｙ₁₀
が入力レジスタ３０１に書き込まれ、かつ第１の乗算器
３０６の結果ｔ₁ｙ₁₀が一時レジスタ３０９に書き込ま
れる。[0032] In the second cycle, the data y ₁₀ is supplied from the input terminal, the data y ₁₀ is the two-input selector 305
Is selected by On the other hand, t ₁ , t ₀ , and t ₀ are read from the coefficient memories 302 to 304, respectively, and
6 to 308, the three products t ₁ y ₁₀ , t ₀ y ₀₀ , t ₀ y
₀₀ are calculated in parallel. Next, the second and third multipliers 3 are input by the four-input selector 201 of the accumulators 310 to 317.
One of the results of 07, 308 is selected, respectively. In this case, the results of the second and third multipliers 307 and 308 are the same, and either may be selected. Each of the two's complementers 202 of the accumulators 310 to 317 passes the output of the four-input selector 201 as it is. The adder 203 of the accumulators 310 to 317 calculates the sum of the result of the two's complementer 202 and the output of the accumulation register 204 which has been initialized to 0 in advance, and outputs the addition result to the accumulation register 204.
Write to each. As a result, the accumulators 310 to 317
, The same product t ₀ y ₀₀ is stored in all the accumulation registers 204. Further, at the end of the second cycle, the data y ₁₀
Is written to the input register 301, and the result t ₁ y ₁₀ of the first multiplier 306 is written to the temporary register 309.

【００３３】第３のサイクルでは、入力端子からデータ
ｙ₂₀が供給される。２入力セレクタ３０５は、入力レジ
スタ３０１の出力データｙ₁₀を選択する。一方、係数メ
モリ３０２〜３０４からそれぞれｔ₃，ｔ₅，ｔ₇が読
み出され、乗算器３０６〜３０８により３個の積ｔ₃ｙ
₁₀，ｔ₅ｙ₁₀，ｔ₇ｙ₁₀が並列に計算される。次に、累
算器３１０〜３１７の４入力セレクタ２０１により、一
時レジスタ３０９の出力データと３個の乗算器３０６〜
３０８の結果とのうちの１個がそれぞれ選択される。こ
の場合、第１の累算器３１０では一時レジスタ３０９の
出力データｔ₁ｙ₁₀が、第２の累算器３１１では第１の
乗算器３０６の結果ｔ₃ｙ₁₀が、第３の累算器３１２で
は第２の乗算器３０７の結果ｔ₅ｙ₁₀が、第４の累算器
３１３では第３の乗算器３０８の結果ｔ₇ｙ₁₀が、第５
の累算器３１４では第３の乗算器３０８の結果ｔ₇ｙ₁₀
が、第６の累算器３１５では第２の乗算器３０７の結果
ｔ₅ｙ₁₀が、第７の累算器３１６では第１の乗算器３０
６の結果ｔ₃ｙ₁₀が、第８の累算器３１７では一時レジ
スタ３０９の出力データｔ₁ｙ₁₀がそれぞれ選択され
る。第１〜第４の累算器３１０〜３１３の２の補数器２
０２は、それぞれ４入力セレクタ２０１の出力をそのま
ま通過させる。第５〜第８の累算器３１４〜３１７の２
の補数器２０２は、それぞれ４入力セレクタ２０１の出
力の２の補数を出力する。累算器３１０〜３１７の加算
器２０３は、２の補数器２０２の結果と累算レジスタ２
０４の出力との和を計算し、その加算結果を累算レジス
タ２０４にそれぞれ書き込む。この結果、第１の累算器
３１０ではｔ₀ｙ₀₀＋ｔ₁ｙ₁₀が、第２の累算器３１１
ではｔ₀ｙ₀₀＋ｔ₃ｙ₁₀が、第３の累算器３１２ではｔ
₀ｙ₀₀＋ｔ₅ｙ₁₀が、第４の累算器３１３ではｔ₀ｙ₀₀
＋ｔ₇ｙ₁₀が、第５の累算器３１４ではｔ₀ｙ₀₀−ｔ₇
ｙ₁₀が、第６の累算器３１５ではｔ₀ｙ₀₀−ｔ₅ｙ
₁₀が、第７の累算器３１６ではｔ₀ｙ₀₀−ｔ₃ｙ₁₀が、
第８の累算器３１７ではｔ₀ｙ₀₀−ｔ₁ｙ₁₀がそれぞれ
累算レジスタ２０４に格納される。更に、第３のサイク
ルの終わりで、データｙ₂₀が入力レジスタ３０１に書き
込まれ、かつ第１の乗算器３０６の結果ｔ₃ｙ₁₀が一時
レジスタ３０９に書き込まれる。[0033] In the third cycle, the data y ₂₀ is supplied from the input terminal. 2-input selector 305 selects the output data y ₁₀ of the input register 301. On the other hand, t ₃ , t ₅ , and t ₇ are read from the coefficient memories 302 to 304, respectively, and the _three products t ₃ y are calculated by the multipliers 306 to 308.
_{_{_{10, t 5 y 10, t}}} 7 y 10 are calculated in parallel. Next, the output data of the temporary register 309 and the three multipliers 306 to 306 are output by the four-input selector 201 of the accumulators 310 to 317.
One of each of the results of 308 is selected. In this case, the first accumulator 310 outputs the output data t ₁ y _{10 of the} temporary register 309, the second accumulator 311 outputs the result t ₃ y ₁₀ of the first multiplier 306, and outputs the third accumulation results t ₅ y ₁₀ of the vessel 312 a second multiplier 307, the result t ₇ y ₁₀ of the fourth accumulator 313 the third multiplier 308, a fifth
In the accumulator 314, the result t ₇ y _{10 of} the third multiplier 308
But sixth result t ₅ y ₁₀ of the accumulator 315 the second multiplier 307 of the seventh multiplier 30 accumulator in 316 first of
Results t ₃ y ₁₀ of 6, the output data t ₁ y ₁₀ of the eighth accumulators in 317 temporary register 309 is selected. Two's complementer 2 of first to fourth accumulators 310 to 313
02 passes the output of the 4-input selector 201 as it is. Fifth to eighth accumulators 314 to 317-2
Output the two's complement of the output of the four-input selector 201, respectively. The adder 203 of the accumulators 310 to 317 stores the result of the two's complementer 202 and the accumulation register 2
The sum of the sum of the output and the sum of the outputs is calculated, and the sum is written to the accumulation register 204. As a result, in the first accumulator 310, t ₀ y ₀₀ + t ₁ y ₁₀ is converted into the second accumulator 311.
In the third accumulator 312, t ₀ y ₀₀ + t ₃ y ₁₀
₀ y ₀₀ + t ₅ y ₁₀ is output to the fourth accumulator 313 by t ₀ y ₀₀
+ T ₇ y ₁₀ is equal to t ₀ y ₀₀ −t _{7 in} the fifth accumulator 314.
y ₁₀ is equal to t ₀ y ₀₀ −t ₅ y in the sixth accumulator 315.
₁₀ in the seventh accumulator 316, t ₀ y ₀₀ −t ₃ y ₁₀
T ₀ y ₀₀ -t ₁ y ₁₀ In accumulator 317 eighth is stored in the accumulator register 204, respectively. Further, at the end of the third cycle, the data y ₂₀ is written to the input register 301 and the result t ₃ y ₁₀ of the first multiplier 306 is written to the temporary register 309.

【００３４】第４のサイクルでは、入力端子からデータ
ｙ₃₀が供給され、該データｙ₃₀が２入力セレクタ３０５
により選択される。一方、係数メモリ３０２〜３０４か
らそれぞれｔ₃，ｔ₂，ｔ₆が読み出され、乗算器３０
６〜３０８により３個の積ｔ₃ｙ₃₀，ｔ₂ｙ₂₀，ｔ₆ｙ
₂₀が並列に計算される。次に、累算器３１０〜３１７の
４入力セレクタ２０１により、第２及び第３の乗算器３
０７，３０８の結果のうちの１個がそれぞれ選択され
る。この場合、第１、第４、第５及び第８の累算器３１
０，３１３，３１４，３１７では第２の乗算器３０７の
結果ｔ₂ｙ₂₀が、第２、第３、第６及び第７の累算器３
１１，３１２，３１５，３１６では第３の乗算器３０８
の結果ｔ₆ｙ₂₀がそれぞれ選択される。第１、第２、第
７及び第８の累算器３１０，３１１，３１６，３１７の
２の補数器２０２は、それぞれ４入力セレクタ２０１の
出力をそのまま通過させる。第３〜第６の累算器３１２
〜３１５の２の補数器２０２は、それぞれ４入力セレク
タ２０１の出力の２の補数を出力する。累算器３１０〜
３１７の加算器２０３は、２の補数器２０２の結果と累
算レジスタ２０４の出力との和を計算し、その加算結果
を累算レジスタ２０４にそれぞれ書き込む。この結果、
第１の累算器３１０ではｔ₀ｙ₀₀＋ｔ₁ｙ₁₀＋ｔ₂ｙ₂₀
が、第２の累算器３１１ではｔ₀ｙ₀₀＋ｔ₃ｙ₁₀＋ｔ₆
ｙ₂₀が、第３の累算器３１２ではｔ₀ｙ₀₀＋ｔ₅ｙ₁₀−
ｔ₆ｙ₂₀が、第４の累算器３１３ではｔ₀ｙ₀₀＋ｔ₇ｙ
₁₀−ｔ₂ｙ₂₀が、第５の累算器３１４ではｔ₀ｙ₀₀−ｔ
₇ｙ₁₀−ｔ₂ｙ₂₀が、第６の累算器３１５ではｔ₀ｙ₀₀
−ｔ₅ｙ₁₀−ｔ₆ｙ₂₀が、第７の累算器３１６ではｔ₀
ｙ₀₀−ｔ₃ｙ₁₀＋ｔ₆ｙ₂₀が、第８の累算器３１７では
ｔ₀ｙ₀₀−ｔ₁ｙ₁₀＋ｔ₂ｙ₂₀がそれぞれ累算レジスタ
２０４に格納される。更に、第４のサイクルの終わり
で、データｙ₃₀が入力レジスタ３０１に書き込まれ、第
１の乗算器３０６の結果ｔ₃ｙ₃₀が一時レジスタ３０９
に書き込まれる。[0034] In the fourth cycle, the data y ₃₀ is supplied from the input terminal, the data y ₃₀ is the two-input selector 305
Is selected by On the other hand, t ₃ , t ₂ , and t ₆ are read from the coefficient memories 302 to 304, respectively, and
6-308 by three product _{_{_{t 3 y 30, t 2 y}}} 20, t 6 y
₂₀ are calculated in parallel. Next, the second and third multipliers 3 are input by the four-input selector 201 of the accumulators 310 to 317.
One of the results of 07, 308 is selected, respectively. In this case, the first, fourth, fifth and eighth accumulators 31
At 0, 313, 314, and 317, the result t ₂ y ₂₀ of the second multiplier 307 is stored in the second, third, sixth, and seventh accumulators 3
In 11, 312, 315 and 316, the third multiplier 308
As a result, t ₆ y ₂₀ is selected. The two's complementers 202 of the first, second, seventh, and eighth accumulators 310, 311, 316, and 317 each pass the output of the four-input selector 201 as it is. Third to sixth accumulators 312
The two's complement units 202 to 315 output the two's complement of the output of the four-input selector 201, respectively. Accumulator 310
The adder 203 at 317 calculates the sum of the result of the two's complementer 202 and the output of the accumulation register 204, and writes the addition result to the accumulation register 204. As a result,
In the first accumulator _{_{_{310 t 0 y 00 + t 1}}} y 10 + t 2 y 20
However, in the second accumulator 311, t ₀ y ₀₀ + t ₃ y ₁₀ + t ₆
In the third accumulator 312, y ₂₀ is equal to t ₀ y ₀₀ + t ₅ y ₁₀ −
t ₆ y ₂₀ is equal to t ₀ y ₀₀ + t ₇ y in the fourth accumulator 313.
₁₀ −t ₂ y ₂₀ is calculated by the fifth accumulator 314 as t ₀ y ₀₀ −t.
₇ y ₁₀ −t ₂ y ₂₀ is calculated by the sixth accumulator 315 as t ₀ y ₀₀
−t ₅ y ₁₀ −t ₆ y ₂₀ is equal to t _{0 in} the seventh accumulator 316.
_{_{_{y 00 -t 3 y 10 + t}}} 6 y 20 is the eighth accumulator 317 at _{_{_{_{t 0 y 00 -t 1 y 10}}}} + t 2 y 20 of is stored in the accumulator register 204, respectively. Further, at the end of the fourth cycle, the data y ₃₀ is written to the input register 301, and the result t ₃ y _{30 of} the first multiplier 306 is stored in the temporary register 309.
Is written to.

【００３５】第５のサイクルでは、入力端子からデータ
ｙ₄₀が供給される。２入力セレクタ３０５は、入力レジ
スタ３０１の出力データｙ₃₀を選択する。一方、係数メ
モリ３０２〜３０４からそれぞれｔ₇，ｔ₁，ｔ₅が読
み出され、乗算器３０６〜３０８により３個の積ｔ₇ｙ
₃₀，ｔ₁ｙ₃₀，ｔ₅ｙ₃₀が並列に計算される。次に、累
算器３１０〜３１７の４入力セレクタ２０１により、一
時レジスタ３０９の出力データと３個の乗算器３０６〜
３０８の結果とのうちの１個がそれぞれ選択される。こ
の場合、第１の累算器３１０では一時レジスタ３０９の
出力データｔ₃ｙ₃₀が、第２の累算器３１１では第１の
乗算器３０６の結果ｔ₇ｙ₃₀が、第３の累算器３１２で
は第２の乗算器３０７の結果ｔ₁ｙ₃₀が、第４の累算器
３１３では第３の乗算器３０８の結果ｔ₅ｙ₃₀が、第５
の累算器３１４では第３の乗算器３０８の結果ｔ₅ｙ₃₀
が、第６の累算器３１５では第２の乗算器３０７の結果
ｔ₁ｙ₃₀が、第７の累算器３１６では第１の乗算器３０
６の結果ｔ₇ｙ₃₀が、第８の累算器３１７では一時レジ
スタ３０９の出力データｔ₃ｙ₃₀がそれぞれ選択され
る。第１、第５、第６及び第７の累算器３１０，３１
４，３１５，３１６の２の補数器２０２は、それぞれ４
入力セレクタ２０１の出力をそのまま通過させる。第
２、第３、第４及び第８の累算器３１１，３１２，３１
３，３１７の２の補数器２０２は、それぞれ４入力セレ
クタ２０１の出力の２の補数を出力する。累算器３１０
〜３１７の加算器２０３は、２の補数器２０２の結果と
累算レジスタ２０４の出力との和を計算し、その加算結
果を累算レジスタ２０４にそれぞれ書き込む。この結
果、第１の累算器３１０ではｔ₀ｙ₀₀＋ｔ₁ｙ₁₀＋ｔ₂
ｙ₂₀＋ｔ₃ｙ₃₀が、第２の累算器３１１ではｔ₀ｙ₀₀＋
ｔ₃ｙ₁₀＋ｔ₆ｙ₂₀−ｔ₇ｙ₃₀が、第３の累算器３１２
ではｔ₀ｙ₀₀＋ｔ₅ｙ₁₀−ｔ₆ｙ₂₀−ｔ₁ｙ₃₀が、第４
の累算器３１３ではｔ₀ｙ₀₀＋ｔ₇ｙ₁₀−ｔ₂ｙ₂₀−ｔ
₅ｙ₃₀が、第５の累算器３１４ではｔ₀ｙ₀₀−ｔ₇ｙ₁₀
−ｔ₂ｙ₂₀＋ｔ₅ｙ₃₀が、第６の累算器３１５ではｔ₀
ｙ₀₀−ｔ₅ｙ₁₀−ｔ₆ｙ₂₀＋ｔ₁ｙ₃₀が、第７の累算器
３１６ではｔ₀ｙ₀₀−ｔ₃ｙ₁₀＋ｔ₆ｙ₂₀＋ｔ₇ｙ
₃₀が、第８の累算器３１７ではｔ₀ｙ₀₀−ｔ₁ｙ₁₀＋ｔ
₂ｙ₂₀−ｔ₃ｙ₃₀がそれぞれ累算レジスタ２０４に格納
される。更に、第５のサイクルの終わりで、データｙ₄₀
が入力レジスタ３０１に書き込まれ、かつ第１の乗算器
３０６の結果ｔ₇ｙ₃₀が一時レジスタ３０９に書き込ま
れる。In the fifth cycle, data y ₄₀ is supplied from the input terminal. 2-input selector 305 selects the output data y ₃₀ of the input register 301. On the other hand, t ₇ , t ₁ , and t ₅ are read from the coefficient memories 302 to 304, respectively, and the three products t ₇ y are calculated by the multipliers 306 to 308.
₃₀ , t ₁ y ₃₀ and t ₅ y ₃₀ are calculated in parallel. Next, the output data of the temporary register 309 and the three multipliers 306 to 306 are output by the four-input selector 201 of the accumulators 310 to 317.
One of each of the results of 308 is selected. In this case, the first accumulator 310 outputs the output data t ₃ y _{30 of the} temporary register 309, the second accumulator 311 outputs the result t ₇ y ₃₀ of the first multiplier 306, and outputs the third accumulation The result t ₁ y ₃₀ of the second multiplier 307 is output by the multiplier 312, and the result t ₅ y _{30 of} the third multiplier 308 is output by the fourth accumulator 313.
In the accumulator 314, the result t ₅ y _{30 of} the third multiplier 308
However, the result t ₁ y ₃₀ of the second multiplier 307 in the sixth accumulator 315 and the first multiplier ₃₀ in the seventh accumulator 316
Results t ₇ y ₃₀ of 6, the output data t ₃ y ₃₀ of the eighth accumulators in 317 temporary register 309 is selected. First, fifth, sixth and seventh accumulators 310, 31
The four's complementer 202 of 4,315,316 is 4
The output of the input selector 201 is passed as it is. Second, third, fourth and eighth accumulators 311, 312, 31
3,317 two's complement units 202 each output the two's complement of the output of the four-input selector 201. Accumulator 310
The adders 203 to 317 calculate the sum of the result of the two's complementer 202 and the output of the accumulation register 204, and write the addition result to the accumulation register 204. As a result, in the first accumulator _{_{_{310 t 0 y 00 + t 1}}} y 10 + t 2
y ₂₀ + t ₃ y ₃₀ is calculated by the second accumulator 311 as t ₀ y ₀₀ +
_{_{_{t 3 y 10 + t 6 y}}} 20 -t 7 y 30 is a third accumulator 312
In _{_{_{t 0 y 00 + t 5 y}}} 10 -t 6 y 20 -t 1 y 30 is a fourth
In the accumulator 313, t ₀ y ₀₀ + t ₇ y ₁₀ −t ₂ y ₂₀ −t
₅ y ₃₀ is equal to t ₀ y ₀₀ −t ₇ y _{10 in} the fifth accumulator 314.
−t ₂ y ₂₀ + t ₅ y ₃₀ is equal to t _{0 in} the sixth accumulator 315.
_{_{_{_{y 00 -t 5 y 10 -t 6}}}} y 20 + t 1 y 30 is the 7 t ₀ the accumulator 316 _{_{_{y 00 -t 3 y 10 + t}}} 6 y 20 + t 7 y
_In the eighth accumulator 317, t ₀ y ₀₀ −t ₁ y ₁₀ + t
₂ y ₂₀ -t ₃ y ₃₀ is stored in the accumulator register 204, respectively. Further, at the end of the fifth cycle, the data y ₄₀
Is written to the input register 301, and the result t ₇ y ₃₀ of the first multiplier 306 is written to the temporary register 309.

【００３６】第６から第９のサイクルでは、入力端子か
らデータｙ₅₀，ｙ₆₀，ｙ₇₀，ｙ₀₁が順次供給される。し
たがって、第９サイクルの終りには、累算器３１０〜３
１７の累算レジスタ２０４に、８個の内積ｗ₀₀，ｗ₁₀，
ｗ₂₀，ｗ₃₀，ｗ₄₀，ｗ₅₀，ｗ₆₀，ｗ₇₀が格納される。更
に、第９のサイクルの終わりで、データｙ₀₁が入力レジ
スタ３０１に書き込まれ、かつ第１の乗算器３０６の結
果ｔ₅ｙ₇₀が一時レジスタ３０９に書き込まれる。In the sixth to ninth cycles, data y ₅₀ , y ₆₀ , y ₇₀ and y ₀₁ are sequentially supplied from the input terminals. Therefore, at the end of the ninth cycle, the accumulators 310 to 3
In the 17 accumulation registers 204, eight inner products w ₀₀ , w ₁₀ ,
_{_{_{w 20, w 30, w 40}}} , w 50, w 60, w 70 are stored. Further, at the end of the ninth cycle, the data y ₀₁ is written to the input register 301 and the result t ₅ y ₇₀ of the first multiplier 306 is written to the temporary register 309.

【００３７】第１０のサイクルでは、入力端子からデー
タｙ₁₁が供給されて上記第２のサイクルと同様の処理が
実行されるとともに、累算器３１０〜３１７の累算レジ
スタ２０４の保持内容ｗ₀₀，ｗ₁₀，ｗ₂₀，ｗ₃₀，ｗ₄₀，
ｗ₅₀，ｗ₆₀，ｗ₇₀がバッファレジスタ２０５へそれぞれ
転送される。そして、８入力セレクタ３１８は、第１の
累算器３１０の出力ｗ₀₀を選択出力する。In the tenth cycle, data y ₁₁ is supplied from the input terminal, the same processing as in the second cycle is executed, and the contents w ₀₀ of the accumulation register 204 of the accumulators 310 to 317 are held. , W ₁₀ , w ₂₀ , w ₃₀ , w ₄₀ ,
w ₅₀ , w ₆₀ , and w ₇₀ are transferred to the buffer register 205, respectively. The 8-input selector 318 selectively outputs the output w ₀₀ of the first accumulator 310.

【００３８】以下同様の処理を繰り返すことにより、連
続的に供給される入力データｙ₀₀〜ｙ₇₀，ｙ₀₁〜ｙ₇₁，
…，ｙ₀₇〜ｙ₇₇に対応した出力データｗ₀₀〜ｗ₇₀，ｗ₀₁
〜ｗ₇₁，…，ｗ₀₇〜ｗ₇₇が連続して得られる。Thereafter, by repeating the same processing, the input data y _{00 to} y ₇₀ , y _{01 to} y ₇₁ , which are continuously supplied,
..., output data w _{00 to} w ₇₀ , w ₀₁ corresponding to y _{07 to} y ₇₇
_{_{~w 71, ..., w 07 ~w}} 77 can be obtained continuously.

【００３９】以上のとおり、第２の実施例によれば、乗
算器の数が３に低減された１次元ＩＤＣＴプロセッサを
実現できる。As described above, according to the second embodiment, a one-dimensional IDCT processor in which the number of multipliers is reduced to three can be realized.

【００４０】（実施例３）式（４）から、ｔ₀＝ｔ₄で
あることが直ちに分かる。この関係を利用すると、式
（２）は、ｗ_ij＝ｄ_i0ｙ_0j＋Σ_k=1 ³ｄ_ikｙ_kj＋ｄ_i4ｙ_4j＋Σ_k=5 ⁷ｄ_ikｙ_kj ＝ｔ₀ｙ_0j＋Σ_k=1 ³ｄ_ikｙ_kj±ｔ₀ｙ_4j＋Σ_k=5 ⁷ｄ_ikｙ_kj ＝ｔ₀ｙ_0j±ｔ₀ｙ_4j＋Σ_k=1 ³ｄ_ikｙ_kj＋Σ_k=5 ⁷ｄ_ikｙ_kj ＝ｔ₀ｙ_0j±ｔ₀ｙ_4j＋ω_ij …（５）のように変形される。ここに、式（５）中の“±”は、
ｉ＝０，３，４，７の場合に“＋”を、ｉ＝１，２，
５，６の場合に“−”をそれぞれ意味する（図１参
照）。また、式（５）中のω_ijは部分内積であって、 ω_ij＝Σ_k=1 ³ｄ_ikｙ_kj＋Σ_k=5 ⁷ｄ_ikｙ_kj …（６）である。式（６）によれば、図１の行列演算のサイズ
は、図１０のように低減される。(Embodiment 3) From equation (4), it can be immediately understood that t ₀ = t ₄ . Using this relationship, equation (2) _{_{_{is, w ij = d i0 y 0j}}} + Σ k = 1 3 d ik y kj + d i4 y 4j + Σ k = 5 7 d ik y kj = t 0 y 0j + Σ k = 1 ^{_{_{_{3 d ik y kj ± t 0}}}} y 4j + Σ k = 5 7 d ik y kj = t 0 y 0j ± t 0 y 4j + Σ k = 1 3 d ik y kj + Σ k = 5 7 d ik y kj = t 0 y _0j ± t ₀ y _4j + ω _ij (5) Here, “±” in equation (5) is
"+" when i = 0,3,4,7, i = 1,2,2
In the case of 5, 6, it means "-", respectively (see FIG. 1). Further, omega _ij in equation (5) is a partial inner product is a _{_{^{ω ij = Σ k = 1 3}}} d ik y kj + Σ k = 5 7 d ik y kj ... (6). According to equation (6), the size of the matrix operation in FIG. 1 is reduced as shown in FIG.

【００４１】本発明の第３の実施例に係る１次元ＩＤＣ
Ｔプロセッサの構成を図１１に示す。この構成は、式
（５）の演算を実行するものである。図１１において、
１０は入力バッファ、１１は定数乗算回路、１２は分布
演算（ＤＡ）回路、１３は合成演算（ＲＡ）回路であ
る。入力端子から、１６ビット長の２の補数表示の２進
数データｙ_ij（ｉ＝０〜７，ｊ＝０〜７）が、ｙ₀₀〜ｙ
₇₀、ｙ₀₁〜ｙ₇₁、…、ｙ₀₇〜ｙ₇₇の順序で入力バッファ
１０へ供給される。入力バッファ１０は、データｙ_0j，
ｙ_4jを定数乗算回路１１へ、データｙ_1j，ｙ_2j，ｙ_3j，
ｙ_5j，ｙ_6j，ｙ_7jをＤＡ回路１２へそれぞれ供給する。
定数乗算回路１１は、２個の定数乗算ｔ₀ｙ_0j，ｔ₀ｙ
_4jを実行するものである。ＤＡ回路１２は、図１０の行
列演算を実行することにより部分内積ω_ijを求めるもの
である。ＲＡ回路１３は、ｔ₀ｙ_0j、ｔ₀ｙ_4j及びω_ij
から、式（５）に従って内積ｗ_ijを求めるものである。One-dimensional IDC according to a third embodiment of the present invention
FIG. 11 shows the configuration of the T processor. This configuration executes the calculation of the equation (5). In FIG.
Reference numeral 10 denotes an input buffer, 11 denotes a constant multiplication circuit, 12 denotes a distribution operation (DA) circuit, and 13 denotes a synthesis operation (RA) circuit. From the input terminal, binary data y _ij (i = 0 to 7, j = 0 to 7) represented by 2's complement with a 16-bit length are input to y _{00 to} y
_{_{_{70, y 01 ~y 71, ...}}} , is supplied to the input buffer 10 in the order of y ₀₇ ~y _77. The input buffer 10 stores data y _0j ,
y _4j is _supplied to the constant multiplication circuit 11 and data y _1j , y _2j , y _3j ,
y _5j , y _6j , and y _7j are supplied to the DA circuit 12, respectively.
The constant multiplication circuit 11 performs two constant multiplications t ₀ y _0j and t ₀ y
_4j . The DA circuit 12 obtains the partial inner product ω _ij by executing the matrix operation shown in FIG. The RA circuit 13 _calculates t ₀ y _0j , t ₀ y _4j and ω _ij
_{Is used} to determine the inner product w _ij according to equation (5).

【００４２】入力バッファ１０の内部構成を図１２に示
す。入力バッファ１０は、各々データｙ_0j，ｙ_1j，
ｙ_2j，ｙ_3j，ｙ_4j，ｙ_5j，ｙ_6j，ｙ_7jを保持するための
８個のレジスタ４００〜４０７で構成される。FIG. 12 shows the internal configuration of the input buffer 10. The input buffer 10 stores data y _0j , y _1j ,
y _2j, composed of _{_{_{y 3j, y 4j, y 5j}}} , y 6j, 8 pieces of registers 400 to 407 for holding the y _7j.

【００４３】定数乗算回路１１の内部構成を図１３に示
す。定数乗算回路１１は、データｙ_0jを保持するための
入力レジスタ４１０と、データｙ_4jを保持するための入
力レジスタ４１１と、２個のデータｙ_0j，ｙ_4jを順次選
択するための２入力セレクタ４１２と、２個の定数乗算
ｔ₀ｙ_0j，ｔ₀ｙ_4jを順次実行するための乗算器４１３
と、積ｔ₀ｙ_0jを保持するための一時レジスタ４１４
と、積ｔ₀ｙ_4jを保持するための一時レジスタ４１５
と、当該１次元ＩＤＣＴプロセッサのパイプライン動作
を保証するように両一時レジスタ４１４，４１５の出力
を保持するための２個のバッファレジスタ４１６，４１
７とで構成される。FIG. 13 shows the internal configuration of the constant multiplication circuit 11. The constant multiplication circuit 11 has an input register 410 for holding data y _0j , an input register 411 for holding data y _4j, and a two-input selector for sequentially selecting two data y _0j and y _4j. 412 and a multiplier 413 for sequentially executing two constant multiplications t ₀ y _0j and t ₀ y _4j.
And a temporary register 414 for holding the product t ₀ y _0j
And a temporary register 415 for holding the product t ₀ y _4j
And two buffer registers 416 and 41 for holding the outputs of both temporary registers 414 and 415 so as to guarantee the pipeline operation of the one-dimensional IDCT processor.
7 is comprised.

【００４４】ＤＡ回路１２の内部構成を図１４に示す。
ＤＡ回路１２は、６個のシフトレジスタ４２０〜４２５
と、８個の６ビット入力ＲＡＣ４２６〜４３３と、８個
のバッファレジスタ４３４〜４４１と、８入力セレクタ
４４２とで構成される。シフトレジスタ４２０〜４２５
は、各々データｙ_1j，ｙ_2j，ｙ_3j，ｙ_5j，ｙ_6j，ｙ_7jを
保持し、各々の最下位２ビットを次々とシフトアウトす
るものである。シフトレジスタ４２０〜４２５の各々の
最下位ビットは第１のビットスライスワードｑ₀とし
て、各々の最下位ビットより１桁上位のビットは第２の
ビットスライスワードｑ₁としてそれぞれ６ビット入力
ＲＡＣ４２６〜４３３へ供給される。６ビット入力ＲＡ
Ｃ４２６は、図１５に示すように、第１のＲＯＭ７１
と、第２のＲＯＭ７２と、３入力加減算器７３と、シフ
タ７４と、累算レジスタ７５とで構成される。第１のＲ
ＯＭ７１は、第１のビットスライスワードｑ₀をアドレ
スとして受け取り、対応するベクトル内積の部分和を３
入力加減算器７３へ第１の入力として供給するものであ
る。第２のＲＯＭ７２は、第２のビットスライスワード
ｑ₁をアドレスとして受け取り、対応するベクトル内積
の部分和を３入力加減算器７３へ第２の入力として供給
するものである。累算レジスタ７５の保持出力は、３入
力加減算器７３へ第３の入力として供給される。ただ
し、第２の入力は、第１及び第３の入力より１ビット上
位の重みを持つ。累算レジスタ７５の保持内容は、予め
０に初期化される。３入力加減算器７３は、第１〜第３
の入力の加算を実行するものである。ただし、最後のビ
ットスライスワードｑ₁に係る部分和については、減算
を実行する。シフタ７４は、３入力加減算器７３の結果
の桁移動のための左シフタである。累算レジスタ７５の
保持内容は、シフタ７４の出力に書き換えられる。最終
的に、累算レジスタ７５から部分内積ω_0jが出力され
る。図１４中の他の６ビット入力ＲＡＣの内部構成も図
１５と同様である。したがって、８個の６ビット入力Ｒ
ＡＣ４２６〜４３３で８個の部分内積ω_0j，ω_1j，
ω_2j，ω_3j，ω_4j，ω_5j，ω_6j，ω_7jが並列に求められ
る。バッファレジスタ４３４〜４４１は、当該１次元Ｉ
ＤＣＴプロセッサのパイプライン動作を保証するよう
に、６ビット入力ＲＡＣ４２６〜４３３の出力を保持す
るものである。８入力セレクタ４４２は、バッファレジ
スタ４３４〜４４１の保持データを順次選択して、部分
内積ω_ij（ｉ＝０〜７，ｊ＝０〜７）を、ω_0j，ω_1j，
ω_2j，ω_3j，ω_4j，ω_5j，ω_6j，ω_7jの順序で出力する
ものである。FIG. 14 shows the internal configuration of the DA circuit 12.
The DA circuit 12 includes six shift registers 420 to 425.
, Eight 6-bit input RACs 426 to 433, eight buffer registers 434 to 441, and an eight-input selector 442. Shift registers 420 to 425
Holds data y _1j , y _2j , y _3j , y _5j , y _6j , and y _7j , and shifts out the least significant two bits one after another. The least significant bit of each of shift registers 420 to 425 is a first bit slice word q ₀ , and the bit one digit higher than the least significant bit is a second bit slice word q ₁ , each having a 6-bit input RAC 426 to 433. Supplied to 6-bit input RA
C426, as shown in FIG.
, A second ROM 72, a three-input adder / subtractor 73, a shifter 74, and an accumulation register 75. First R
The OM 71 receives the first bit slice word q ₀ as an address and calculates the partial sum of the corresponding vector dot product by 3
It is supplied to the input adder / subtractor 73 as a first input. The second ROM 72 receives the second bit slice word q ₁ as an address, and supplies the partial sum of the corresponding vector inner product to the three-input adder / subtractor 73 as a second input. The output held by the accumulation register 75 is supplied to the three-input adder / subtractor 73 as a third input. However, the second input has a weight one bit higher than the first and third inputs. The content held in the accumulation register 75 is initialized to 0 in advance. The three-input adder / subtractor 73 includes first to third
Is performed. However, the partial sum of the last bit slice word q _1, executes the subtraction. The shifter 74 is a left shifter for shifting the digit of the result of the three-input adder / subtracter 73. The content held in the accumulation register 75 is rewritten to the output of the shifter 74. Finally, the partial inner product ω _0j is output from the accumulation register 75. The internal configuration of another 6-bit input RAC in FIG. 14 is the same as that in FIG. Therefore, eight 6-bit inputs R
In AC 426 to 433, eight partial inner products ω _0j , ω _1j ,
ω _2j , ω _3j , ω _4j , ω _5j , ω _6j , ω _7j are obtained in parallel. The buffer registers 434 to 441 store the one-dimensional I
The outputs of the 6-bit input RACs 426 to 433 are held so as to guarantee the pipeline operation of the DCT processor. The 8-input selector 442 sequentially selects the data held in the buffer registers 434 to 441, and converts the partial inner product ω _ij (i = 0 to 7, j = 0 to 7) into ω _0j , ω _1j ,
ω _2j , ω _3j , ω _4j , ω _5j , ω _6j , and ω _7j are output in this order.

【００４５】ＲＡ回路１３の内部構成を図１６に示す。
ＲＡ回路１３は、定数乗算回路１１から供給された２個
の積ｔ₀ｙ_0j，ｔ₀ｙ_4jと、ＤＡ回路１２から供給され
た部分内積ω_ijとの加減算を実行して内積ｗ_ijを求める
ための３入力加減算器４５０で構成される。ただし、積
ｔ₀ｙ_4jについては、式（５）に従って、ｉの値に応じ
て加算又は減算が選択される。具体的には、ｉ＝０，
３，４，７のサイクルでは加算を選択し、ｉ＝１，２，
５，６のサイクルでは減算を選択するように制御され
る。FIG. 16 shows the internal configuration of the RA circuit 13.
The RA circuit 13 performs addition and subtraction of the two products t ₀ y _0j and t ₀ y _4j supplied from the constant multiplication circuit 11 and the partial inner product ω _ij supplied from the DA circuit 12 to obtain an inner product w _ij . It comprises a three-input adder / subtractor 450 for obtaining. However, for the product t ₀ y _4j , addition or subtraction is selected according to the value of i according to equation (5). Specifically, i = 0,
In cycles 3, 4, and 7, addition is selected, and i = 1, 2, 2,
In cycles 5 and 6, control is performed to select subtraction.

【００４６】以下、図１１〜図１６に基づいて、本発明
の第３の実施例に係る１次元ＩＤＣＴプロセッサの動作
を説明する。The operation of the one-dimensional IDCT processor according to the third embodiment of the present invention will be described below with reference to FIGS.

【００４７】第１から第８のサイクルでは、入力端子か
ら入力バッファ１０に８個のデータｙ₀₀，ｙ₁₀，ｙ₂₀，
ｙ₃₀，ｙ₄₀，ｙ₅₀，ｙ₆₀，ｙ₇₀が順次入力される。これ
らのデータは、それぞれレジスタ４００〜４０７に格納
される。In the first to eighth cycles, the eight data y ₀₀ , y ₁₀ , y ₂₀ ,
_{_{_{y 30, y 40, y 50}}} , y 60, y 70 are sequentially inputted. These data are stored in registers 400 to 407, respectively.

【００４８】第９のサイクルでは、入力バッファ１０の
データが定数乗算回路１１及びＤＡ回路１２へ転送され
る。すなわち、データｙ₀₀，ｙ₄₀は定数乗算回路１１の
入力レジスタ４１０，４１１に、データｙ₁₀，ｙ₂₀，ｙ
₃₀，ｙ₅₀，ｙ₆₀，ｙ₇₀はＤＡ回路１２のシフトレジスタ
４２０〜４２５にそれぞれ格納される。In the ninth cycle, the data in the input buffer 10 is transferred to the constant multiplication circuit 11 and the DA circuit 12. That is, the data y _00, y ₄₀ to the input register 410, 411 of the constant multiplier circuit 11, the data y _10, y _20, y
₃₀ , y ₅₀ , y ₆₀ , and y ₇₀ are stored in shift registers 420 to 425 of the DA circuit 12, respectively.

【００４９】第１０から第１３のサイクルでは、定数乗
算回路１１の２入力セレクタ４１２によりデータｙ₀₀が
選択され、乗算器４１３により定数乗算ｔ₀ｙ₀₀が実行
され、その結果が一時レジスタ４１４に書き込まれる。
第１４から第１７のサイクルでは、２入力セレクタ４１
２によりデータｙ₄₀が選択され、乗算器４１３により定
数乗算ｔ₀ｙ₄₀が実行され、その結果が一時レジスタ４
１５に書き込まれる。一方、ＤＡ回路１２では、第１０
から第１７のサイクルにおいて、６ビット入力ＲＡＣ４
２６〜４３３により８個の部分内積ω₀₀，ω₁₀，ω₂₀，
ω₃₀，ω₄₀，ω₅₀，ω₆₀，ω₇₀が求められる。In the tenth to thirteenth cycles, the data y ₀₀ is selected by the two-input selector 412 of the constant multiplication circuit 11, the constant multiplication t ₀ y ₀₀ is executed by the multiplier 413, and the result is stored in the temporary register 414. Written.
In the fourteenth to seventeenth cycles, the two-input selector 41
2, the data y ₄₀ is selected, the constant multiplication t ₀ y ₄₀ is executed by the multiplier 413, and the result is stored in the temporary register 4
15 is written. On the other hand, in the DA circuit 12, the tenth
To the seventeenth cycle, the 6-bit input RAC4
26 to 433, the eight partial inner products ω ₀₀ , ω ₁₀ , ω ₂₀ ,
ω ₃₀ , ω ₄₀ , ω ₅₀ , ω ₆₀ , ω ₇₀ are obtained.

【００５０】第１８のサイクルでは、定数乗算回路１１
の一時レジスタ４１４，４１５の保持データがバッファ
レジスタ４１６，４１７へ、ＤＡ回路１２の６ビット入
力ＲＡＣ４２６〜４３３の出力データがバッファレジス
タ４３４〜４４１へそれぞれ転送される。In the eighteenth cycle, the constant multiplication circuit 11
The data held in the temporary registers 414 and 415 are transferred to the buffer registers 416 and 417, and the output data from the 6-bit input RACs 426 to 433 of the DA circuit 12 are transferred to the buffer registers 434 to 441, respectively.

【００５１】第１９から第２６のサイクルでは、ＤＡ回
路１２の８入力セレクタ４４２が部分内積ω₀₀，ω₁₀，
ω₂₀，ω₃₀，ω₄₀，ω₅₀，ω₆₀，ω₇₀をＲＡ回路１３へ
順次供給する。一方、積ｔ₀ｙ₀₀，ｔ₀ｙ₄₀が定数演算
回路１１からＲＡ回路１３へ供給される。ＲＡ回路１１
の３入力加減算器４５０は、式（５）に従って、内積ｗ
₀₀，ｗ₁₀，ｗ₂₀，ｗ₃₀，ｗ₄₀，ｗ₅₀，ｗ₆₀，ｗ₇₀を順次
出力する。In the nineteenth through twenty-sixth cycles, the 8-input selector 442 of the DA circuit 12 sets the partial inner products ω ₀₀ , ω ₁₀ ,
ω ₂₀ , ω ₃₀ , ω ₄₀ , ω ₅₀ , ω ₆₀ and ω ₇₀ are sequentially supplied to the RA circuit 13. On the other hand, the products t ₀ y ₀₀ and t ₀ y ₄₀ are supplied from the constant operation circuit 11 to the RA circuit 13. RA circuit 11
The three-input adder / subtractor 450 calculates the inner product w according to equation (5).
_{_{_{00, w 10, w 20,}}} w 30, w 40, w 50, and sequentially outputs w _60, w _70.

【００５２】次の８個のデータｙ₀₁，ｙ₁₁，ｙ₂₁，
ｙ₃₁，ｙ₄₁，ｙ₅₁，ｙ₆₁，ｙ₇₁に関する処理は、第９か
ら第３４のサイクルにおいて、上記第１から第２６のサ
イクルの処理と同様に行われる。この結果、第２７から
第３４のサイクルで内積ｗ₀₁，ｗ₁₁，ｗ₂₁，ｗ₃₁，
ｗ₄₁，ｗ₅₁，ｗ₆₁，ｗ₇₁が順次出力される。The next eight data y ₀₁ , y ₁₁ , y ₂₁ ,
The processing related to y ₃₁ , y ₄₁ , y ₅₁ , y ₆₁ , and y ₇₁ is performed in the ninth to thirty-fourth cycles in the same manner as the processing in the first to twenty-sixth cycles. As a result, the inner products w ₀₁ , w ₁₁ , w ₂₁ , w ₃₁ ,
w ₄₁ , w ₅₁ , w ₆₁ , w ₇₁ are sequentially output.

【００５３】以下同様の処理を繰り返すことにより、連
続的に供給される入力データｙ₀₀〜ｙ₇₀，ｙ₀₁〜ｙ₇₁，
…，ｙ₀₇〜ｙ₇₇に対応した出力データｗ₀₀〜ｗ₇₀，ｗ₀₁
〜ｗ₇₁，…，ｗ₀₇〜ｗ₇₇が連続して得られる。By repeating the same processing as described above, the input data y _{00 to} y ₇₀ , y _{01 to} y ₇₁ , and the continuously supplied input data y _{00 to} y ₇₀ ,
..., output data w _{00 to} w ₇₀ , w ₀₁ corresponding to y _{07 to} y ₇₇
_{_{~w 71, ..., w 07 ~w}} 77 can be obtained continuously.

【００５４】以上のとおり、第３の実施例によれば、乗
算器の数が１に低減された１次元ＩＤＣＴプロセッサを
実現できる。しかも、定数乗算回路１１の中の乗算器４
１３は、２変数入力の乗算器に比べて回路規模が小さ
い。また、内積計算の一部を定数乗算回路１１で実行す
るので、ＤＡ回路１２のＲＯＭサイズが低減される。As described above, according to the third embodiment, a one-dimensional IDCT processor in which the number of multipliers is reduced to one can be realized. Moreover, the multiplier 4 in the constant multiplication circuit 11
13 has a smaller circuit scale than a two-variable input multiplier. Further, since a part of the inner product calculation is performed by the constant multiplication circuit 11, the ROM size of the DA circuit 12 is reduced.

【００５５】以下、上記第３の実施例の変形例について
説明する。式（５）は、ｗ_ij＝ｔ₀（ｙ_0j±ｙ_4j）＋ω_ij …（７）のように変形される。ここに、式（７）中の“±”は、
ｉ＝０，３，４，７の場合に“＋”を、ｉ＝１，２，
５，６の場合に“−”をそれぞれ意味する（図１参
照）。図１７の定数乗算回路１１及び図１８のＲＡ回路
１３は、式（７）の演算手順を採用したものである。Hereinafter, a modification of the third embodiment will be described. Equation (5) is modified as follows: w _ij = t ₀ (y _0j ± y _4j ) + ω _ij (7) Here, “±” in equation (7) is
"+" when i = 0,3,4,7, i = 1,2,2
In the case of 5, 6, it means "-", respectively (see FIG. 1). The constant multiplication circuit 11 of FIG. 17 and the RA circuit 13 of FIG. 18 employ the calculation procedure of Expression (7).

【００５６】図１７に示した定数乗算回路１１は、デー
タｙ_0jを保持するための入力レジスタ５００と、データ
ｙ_4jを保持するための入力レジスタ５０１と、加算ｙ_0j
＋ｙ_4j及び減算ｙ_0j−ｙ_4jを順次実行するための２入力
加減算器５０２と、２個の定数乗算ｔ₀（ｙ_0j＋
ｙ_4j），ｔ₀（ｙ_0j−ｙ_4j）を順次実行するための乗算
器５０３と、積ｔ₀（ｙ_0j＋ｙ_4j）を保持するための一
時レジスタ５０４と、積ｔ₀（ｙ_0j−ｙ_4j）を保持する
ための一時レジスタ５０５と、両一時レジスタ５０４，
５０５の出力を保持するための２個のバッファレジスタ
５０６，５０７とで構成される。The constant multiplication circuit 11 shown in FIG. 17 includes an input register 500 for holding data y _0j , an input register 501 for holding data y _4j , and an addition y _0j
+ Y _4j and subtraction y _0j −y _4j for sequentially executing two-input adder / subtractor 502 and two constant multiplications t ₀ (y _0j +
y _4j ), a multiplier 503 for sequentially executing t ₀ (y _0j −y _4j ), a temporary register 504 for holding the product t ₀ (y _0j + y _4j ), and a product t ₀ (y _0j − y _4j ), and both temporary registers 504 and 504.
It is composed of two buffer registers 506 and 507 for holding the output of 505.

【００５７】図１７の定数乗算回路１１を採用する場合
には、図１６のＲＡ回路１３は図１８のように変形され
る。図１８のＲＡ回路１３は、定数乗算回路１１から供
給された２個の積ｔ₀（ｙ_0j＋ｙ_4j），ｔ₀（ｙ_0j−ｙ
_4j）のうちのいずれか一方を選択するための２入力セレ
クタ５１０と、該２入力セレクタ５１０で選択された積
とＤＡ回路１２から供給された部分内積ω_ijとの加算を
実行して内積ｗ_ijを求めるための２入力加算器５１１と
で構成される。２入力セレクタ５１０は、式（７）に従
って、ｉ＝０，３，４，７のサイクルではｔ₀（ｙ_0j＋
ｙ_4j）を選択し、ｉ＝１，２，５，６のサイクルではｔ
₀（ｙ_0j−ｙ_4j）を選択するように制御される。When the constant multiplication circuit 11 of FIG. 17 is employed, the RA circuit 13 of FIG. 16 is modified as shown in FIG. The RA circuit 13 in FIG. 18 includes two products t ₀ (y _0j + y _4j ) and t ₀ (y _0j −y) supplied from the constant multiplication circuit 11.
_4j ), and the addition of the product selected by the two-input selector 510 and the partial inner product ω _ij supplied from the DA circuit 12 to execute the inner product w and a two-input adder 511 for _{obtaining ij} . According to equation (7), 2-input selector 510 determines that t ₀ (y _0j +
y _4j ), and in the cycle of i = 1, 2, 5, 6, t
₀ (y _0j −y _4j ) is controlled.

【００５８】さて、図１０の行列演算は、図１９のよう
に変形される。図１９中の８行６列の行列の要素の半分
は０である。したがって、図１９の行列演算は、図２０
（ａ）及び図２０（ｂ）のように２つに分割される。図
２０（ａ）中の４個の部分内積ρ_0j，ρ_1j，ρ_2j，ρ_3j
は４個の２ビット入力ＲＡＣで、図２０（ｂ）中の４個
の部分内積σ_0j，σ_1j，σ_2j，σ_3jは４個の４ビット入
力ＲＡＣでそれぞれ求めることができる。また、図２０
（ａ）及び図２０（ｂ）から、 ω_0j＝ρ_0j＋σ_0j ω_1j＝ρ_1j＋σ_1j ω_2j＝ρ_2j＋σ_2j ω_3j＝ρ_3j＋σ_3j ω_4j＝ρ_3j−σ_3j ω_5j＝ρ_2j−σ_2j ω_6j＝ρ_1j−σ_1j ω_7j＝ρ_0j−σ_0j …（８）であることが分かる。図２１のＤＡ回路１２は、図２０
（ａ）及び図２０（ｂ）の行列演算をそれぞれＲＡＣで
実行したうえ、式（８）を用いて部分内積ω_ijを求める
ものである。The matrix operation shown in FIG. 10 is modified as shown in FIG. Half of the elements of the matrix of 8 rows and 6 columns in FIG. 19 are 0. Therefore, the matrix operation of FIG.
It is divided into two as shown in FIG. The four partial inner products ρ _0j , ρ _1j , ρ _2j , ρ _{3j in FIG.}
_Are four 2-bit input RACs, and the four partial inner products σ _0j , σ _1j , σ _2j , and σ _3j in FIG. _20B can be obtained by the four 4-bit input RACs. FIG.
From (a) and FIG. 20 (b), ω _0j = ρ _0j + σ _0j ω _1j = ρ _1j + σ _1j ω _2j = ρ _2j + σ _2j ω _3j = ρ _3j + σ _3j ω _4j = ρ _3j -σ _3j ω _5j = It can be seen that ρ _2j −σ _2j ω _6j = ρ _1j _−σ _1j ω _7j = ρ _0j _{−σ 0j} (8). The DA circuit 12 in FIG.
The matrix calculations shown in FIGS. 20A and 20B are respectively executed by RAC, and the partial inner product ω _ij is obtained by using Expression (8).

【００５９】図２１のＤＡ回路１２は、６個のシフトレ
ジスタ７００〜７０５と、４個の４ビット入力ＲＡＣ７
０６〜７０９と、４個の２ビット入力ＲＡＣ７１０〜７
１３と、８個のバッファレジスタ７１４〜７２１と、第
１の４入力セレクタ７２２と、第２の４入力セレクタ７
２３と、２入力加減算器７２４とで構成される。シフト
レジスタ７００〜７０５は、各々データｙ_1j，ｙ_2j，ｙ
_3j，ｙ_5j，ｙ_6j，ｙ_7jを保持し、各々の最下位２ビット
を次々とシフトアウトするものである。４個のシフトレ
ジスタ７００，７０２，７０３，７０５の各々の最下位
ビットは第１のビットスライスワードｓ₀として、各々
の最下位ビットより１桁上位のビットは第２のビットス
ライスワードｓ₁としてそれぞれ４ビット入力ＲＡＣ７
０６〜７０９へ供給される。２個のシフトレジスタ７０
１，７０４の各々の最下位ビットは第３のビットスライ
スワードｒ₀として、各々の最下位ビットより１桁上位
のビットは第４のビットスライスワードｒ₁としてそれ
ぞれ２ビット入力ＲＡＣ７１０〜７１３へ供給される。
４ビット入力ＲＡＣ７０６は、図２２に示すように、第
１のＲＯＭ８１と、第２のＲＯＭ８２と、３入力加減算
器８３と、シフタ８４と、累算レジスタ８５とで構成さ
れる。図２１中の他の４ビット入力ＲＡＣの内部構成も
図２２と同様である。したがって、４個の４ビット入力
ＲＡＣ７０６〜７０９で４個の部分内積σ_0j，σ_1j，σ
_2j，σ_3jが並列に求められる。２ビット入力ＲＡＣ７１
０は、図２３に示すように、第１のＲＯＭ９１と、第２
のＲＯＭ９２と、３入力加減算器９３と、シフタ９４
と、累算レジスタ９５とで構成される。図２１中の他の
２ビット入力ＲＡＣの内部構成も図２３と同様である。
したがって、４個の２ビット入力ＲＡＣ７１０〜７１３
で４個の部分内積ρ_0j，ρ_1j，ρ_2j，ρ_3jが並列に求め
られる。バッファレジスタ７１４〜７２１は、当該１次
元ＩＤＣＴプロセッサのパイプライン動作を保証するよ
うに、８個のＲＡＣ７０６〜７１３の出力を保持するも
のである。第１の４入力セレクタ７２２は、バッファレ
ジスタ７１４〜７１７の保持データを選択して、部分内
積σ_0j，σ_1j，σ_2j，σ_3j，σ_3j，σ_2j，σ_1j，σ_0jを
２入力加減算器７２４へ順次供給するものである。第２
の４入力セレクタ７２３は、バッファレジスタ７１８〜
７２１の保持データを選択して、部分内積ρ_0j，ρ_1j，
ρ_2j，ρ_3j，ρ_3j，ρ_2j，ρ_1j，ρ_0jを２入力加減算器
７２４へ順次供給するものである。２入力加減算器７２
４は、式（８）に従って加減算を実行するものである。
すなわち、部分内積ω_ij（ｉ＝０〜７，ｊ＝０〜７）
が、ω_0j，ω_1j，ω_2j，ω_3j，ω_4j，ω_5j，ω_6j，ω_7j
の順序で２入力加減算器７２４から出力される。The DA circuit 12 shown in FIG. 21 includes six shift registers 700 to 705 and four 4-bit input RAC7s.
06 to 709 and four 2-bit input RACs 710 to 7
13, eight buffer registers 714 to 721, a first four-input selector 722, and a second four-input selector 7.
23 and a two-input adder / subtractor 724. The shift registers 700 to 705 store data y _1j , y _2j , y
_3j , _y5j , _y6j , and _y7j are held, and the least significant two bits are shifted out one after another. Least significant bit of each of the four shift registers 700,702,703,705 as first bit slice word s _0, as a bit slice word s ₁ bit of the second order of magnitude higher than the least significant bit of each Each 4-bit input RAC7
06 to 709. Two shift registers 70
1, 704, the least significant bit of which is supplied as a third bit slice word r ₀ , and the bit one digit higher than the least significant bit is supplied as a fourth bit slice word r ₁ to the 2-bit inputs RAC 710-713, respectively. Is done.
As shown in FIG. 22, the 4-bit input RAC 706 includes a first ROM 81, a second ROM 82, a three-input adder / subtractor 83, a shifter 84, and an accumulation register 85. The internal configuration of another 4-bit input RAC in FIG. 21 is the same as that in FIG. Therefore, four partial inner products σ _0j , σ _1j , σ are obtained by four 4-bit inputs RAC 706 to ₇₀₉ .
_2j and σ _3j are obtained in parallel. 2-bit input RAC71
0 is the first ROM 91 and the second ROM 91, as shown in FIG.
ROM 92, 3-input adder / subtractor 93, and shifter 94
And an accumulation register 95. The internal configuration of another 2-bit input RAC in FIG. 21 is the same as that in FIG.
Therefore, four 2-bit inputs RAC 710-713
, Four partial inner products ρ _0j , ρ _1j , ρ _2j , ρ _3j are obtained in parallel. The buffer registers 714 to 721 hold the outputs of the eight RACs 706 to 713 so as to guarantee the pipeline operation of the one-dimensional IDCT processor. The first four-input selector 722 selects the data held in the buffer registers 714 to 717 and inputs the partial inner products σ _0j , σ _1j , σ _2j , σ _3j , σ _3j , σ _2j , σ _1j , and σ _0j. These are sequentially supplied to the adder / subtractor 724. Second
Of the buffer registers 718 to
721 is selected, and the partial inner products ρ _0j , ρ _1j ,
ρ _2j , ρ _3j , ρ _3j , ρ _2j , ρ _1j , and ρ _0j are sequentially supplied to the two-input adder / subtractor 724. 2-input adder / subtractor 72
4 executes addition / subtraction according to the equation (8).
That is, the partial inner product ω _ij (i = 0 to 7, j = 0 to 7)
Ω _0j , ω _1j , ω _2j , ω _3j , ω _4j , ω _5j , ω _6j , ω _7j
Are output from the two-input adder / subtractor 724 in the following order.

【００６０】図１１中の定数演算回路１１及びＲＡ回路
１３の内部構成は、図１３と図１６との組み合わせ、図
１７と図１８との組み合わせなどの中から適宜選択され
る。また、図１１中のＤＡ回路１２の内部構成は、図１
４及び図２１などの中から適宜選択される。The internal configurations of the constant operation circuit 11 and the RA circuit 13 in FIG. 11 are appropriately selected from a combination of FIGS. 13 and 16 and a combination of FIGS. 17 and 18. The internal configuration of the DA circuit 12 in FIG.
4 and FIG. 21 or the like.

【００６１】なお、上記第１〜第３の実施例では８ポイ
ントＩＤＣＴ処理について説明したが、各実施例は１６
ポイントＩＤＣＴ処理、８ポイントＩＤＳＴ処理、１６
ポイントＩＤＳＴ処理などに容易に変形できる。In the first to third embodiments, the 8-point IDCT processing has been described.
Point IDCT processing, 8-point IDST processing, 16
It can be easily transformed into a point IDST process or the like.

【００６２】[0062]

【発明の効果】以上説明してきたとおり、本発明によれ
ば、所要の乗算器数が大幅に低減される結果、直交変換
プロセッサの回路規模が低減される。また、複数の内積
計算の各々を２個の定数乗算と１個の部分内積計算とに
分割することとすれば、内積計算の全てをＤＡ回路で実
現する場合に比べてＲＯＭサイズが低減される結果、直
交変換プロセッサの回路規模が低減される。As described above, according to the present invention, the required number of multipliers is greatly reduced, and as a result, the circuit scale of the orthogonal transform processor is reduced. Further, if each of the plurality of inner product calculations is divided into two constant multiplications and one partial inner product calculation, the ROM size is reduced as compared with the case where all the inner product calculations are realized by DA circuits. As a result, the circuit scale of the orthogonal transform processor is reduced.

[Brief description of the drawings]

【図１】本発明に係るＩＤＣＴプロセッサによって実行
されるべき行列演算を示す図である。FIG. 1 illustrates a matrix operation to be performed by an IDCT processor according to the present invention.

【図２】図１の行列演算の１つの実行手順を示す図であ
る。FIG. 2 is a diagram showing one execution procedure of the matrix operation of FIG. 1;

【図３】図２の手順で用いられる係数行列を示す図であ
る。FIG. 3 is a diagram showing a coefficient matrix used in the procedure of FIG. 2;

【図４】図３の係数行列を採用した、本発明の第１の実
施例に係るＩＤＣＴプロセッサの構成図である。FIG. 4 is a configuration diagram of an IDCT processor according to the first embodiment of the present invention, which employs the coefficient matrix of FIG. 3;

【図５】図４中の１個の累算器の内部構成図である。FIG. 5 is an internal configuration diagram of one accumulator in FIG. 4;

【図６】図５の累算器の変形例を示す図である。FIG. 6 is a diagram showing a modification of the accumulator of FIG.

【図７】図１の行列演算の他の実行手順を示す図であ
る。FIG. 7 is a diagram showing another execution procedure of the matrix operation of FIG. 1;

【図８】図７の手順で用いられる係数行列を示す図であ
る。FIG. 8 is a diagram showing a coefficient matrix used in the procedure of FIG. 7;

【図９】図８の係数行列を採用した、本発明の第２の実
施例に係るＩＤＣＴプロセッサの構成図である。FIG. 9 is a configuration diagram of an IDCT processor according to a second embodiment of the present invention employing the coefficient matrix of FIG. 8;

【図１０】図１の行列演算の一部を示す図である。FIG. 10 is a diagram showing a part of the matrix operation of FIG. 1;

【図１１】本発明の第３の実施例に係るＩＤＣＴプロセ
ッサの構成図である。FIG. 11 is a configuration diagram of an IDCT processor according to a third embodiment of the present invention.

【図１２】図１１中の入力バッファの内部構成図であ
る。FIG. 12 is an internal configuration diagram of an input buffer in FIG. 11;

【図１３】図１１中の定数乗算回路の内部構成図であ
る。13 is an internal configuration diagram of a constant multiplication circuit in FIG.

【図１４】図１１中の分布演算回路の内部構成図であ
る。14 is an internal configuration diagram of the distribution operation circuit in FIG.

【図１５】図１０の行列演算を実行するための、図１４
中の１個の６ビット入力ＲＡＣの内部構成図である。FIG. 15 for performing the matrix operation of FIG. 10;
FIG. 4 is an internal configuration diagram of one of the 6-bit input RACs.

【図１６】図１１中の合成演算回路の内部構成図であ
る。FIG. 16 is an internal configuration diagram of the synthesis operation circuit in FIG. 11;

【図１７】図１３の定数乗算回路の変形例を示す図であ
る。FIG. 17 is a diagram illustrating a modification of the constant multiplication circuit of FIG. 13;

【図１８】図１７の定数乗算回路を採用したＩＤＣＴプ
ロセッサにおける合成演算回路の内部構成図である。18 is an internal configuration diagram of a synthesis operation circuit in the IDCT processor employing the constant multiplication circuit of FIG.

【図１９】図１０から導出された行列演算を示す図であ
る。FIG. 19 is a diagram illustrating a matrix operation derived from FIG. 10;

【図２０】（ａ）及び（ｂ）は図１９から分割された２
つの行列演算を示す図である。FIGS. 20 (a) and (b) are two divisions of FIG.
FIG. 7 is a diagram showing two matrix operations.

【図２１】図１４の分布演算回路の変形例を示す図であ
る。FIG. 21 is a diagram illustrating a modification of the distribution calculation circuit of FIG. 14;

【図２２】図２０（ｂ）の行列演算を実行するための、
図２１中の１個の４ビット入力ＲＡＣの内部構成図であ
る。FIG. 22 is a diagram for explaining the matrix operation shown in FIG.
FIG. 22 is an internal configuration diagram of one 4-bit input RAC in FIG. 21.

【図２３】図２０（ａ）の行列演算を実行するための、
図２１中の１個の２ビット入力ＲＡＣの内部構成図であ
る。FIG. 23 is a diagram for explaining the matrix operation shown in FIG.
FIG. 22 is an internal configuration diagram of one 2-bit input RAC in FIG. 21.

[Explanation of symbols]

１０入力バッファ１１定数乗算回路１２分布演算回路（ＤＡ回路）１３合成演算回路（ＲＡ回路）７１，７２，８１，８２，９１，９２ＲＯＭ７３，８３，９３３入力加減算器７４，８４，９４シフタ７５，８５，９５累算レジスタ１０１〜１０４，３０２〜３０４係数メモリ１０５〜１０８，３０６〜３０８乗算器１０９〜１１６，３１０〜３１７累算器１１７，３１８８入力セレクタ２０１４入力セレクタ２０２２の補数器２０３加算器２０４累算レジスタ２０５バッファレジスタ２１２１の補数器３０１入力レジスタ３０５２入力セレクタ３０９一時レジスタ４００〜４０７レジスタ４１０，４１１，５００，５０１入力レジスタ４１２，５１０２入力セレクタ４１３，５０３乗算器４１４，４１５，５０４，５０５一時レジスタ４１６，４１７，５０６，５０７バッファレジスタ４２０〜４２５，７００〜７０５シフトレジスタ４２６〜４３３６ビット入力ＲＡＣ４３４〜４４１，７１４〜７２１バッファレジスタ４４２８入力セレクタ４５０３入力加減算器５０２２入力加減算器５１１２入力加算器７０６〜７０９４ビット入力ＲＡＣ７１０〜７１３２ビット入力ＲＡＣ７２２，７２３４入力セレクタ７２４２入力加減算器 Reference Signs List 10 input buffer 11 constant multiplication circuit 12 distribution operation circuit (DA circuit) 13 synthesis operation circuit (RA circuit) 71, 72, 81, 82, 91, 92 ROM 73, 83, 93 3-input adder / subtractor 74, 84, 94 shifter 75, 85, 95 Accumulation registers 101 to 104, 302 to 304 Coefficient memories 105 to 108, 306 to 308 Multipliers 109 to 116, 310 to 317 Accumulators 117, 3188 8-input selectors 2014 4-input selectors 202 2's complement Device 203 adder 204 accumulation register 205 buffer register 212 one's complementer 301 input register 305 two-input selector 309 temporary register 400-407 register 410,411,500,501 input register 412,510 two-input selector 413,503 multiplier 414,41 , 504, 505 Temporary register 416, 417, 506, 507 Buffer register 420-425, 700-705 Shift register 426-433 6-bit input RAC 434-441, 714-721 Buffer register 442 8-input selector 450 3-input adder / subtractor 502 2-input adder / subtractor 511 2-input adder 706-709 4-bit input RAC 710-713 2-bit input RAC 722,723 4-input selector 724 2-input adder / subtractor

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06F 17/14 G16T 1/00 H03M 7/30 H04N 1/41 ＪＩＣＳＴファイル（ＪＯＩＳ)────────────────────────────────────────────────── ─── Continued on the front page (58) Fields surveyed (Int. Cl. ⁶ , DB name) G06F 17/14 G16T 1/00 H03M 7/30 H04N 1/41 JICST file (JOIS)

Claims

(57) [Claims]

1. An orthogonal transformation processor for performing orthogonal transformation processing on input data composed of 2 ^{n + 1} (n is an integer of 2 or more) elements, wherein 2 ^{n + 1} × 2 n ^{+ 1} pieces of 2 ⁿ × 2 ^{n + 1} coefficients of the coefficient of each of the absolute values respectively 2
first to second ⁿ coefficient memories for storing ^{n + 1} pieces each, one element of the input data and the first to second
and from the corresponding first to perform each one and the multiplication of 2 ^{n + 1} pieces of storing coefficients of the coefficient memory of the 2 ⁿ multipliers of the coefficient memory of ^n, the orthogonal transform matrix To perform accumulation using the results of the first to second ⁿ multipliers, respectively, while restoring the sign of the coefficient of the orthogonal transformation matrix so as to obtain the corresponding 2 ^{n + 1} inner products in parallel. from the first and the 2 ^{n + 1} of the accumulator, 2 for sequentially selecting outputs as an element of the output data of said orthogonal transform processor results first from the 2 ^{n + 1} accumulator ^{n +} An orthogonal transformation processor comprising a ^one- input selector.

2. The orthogonal transform processor according to claim 1, wherein n is 2.

3. The orthogonal transform processor according to claim 2, wherein each of the first to eighth accumulators includes a result itself of one of the first to fourth multipliers. A two's complementer for selecting and outputting any of the two's complements of the result, an adder for executing an addition of an output of the two's complementer and an accumulation result, and the accumulation result An accumulation register for holding and outputting the result of the adder as an intermediate value of the accumulation result, using 0 as an initial value of the accumulation result, and a buffer register for holding and outputting the output of the accumulation register. An orthogonal transformation processor characterized in that:

4. The orthogonal transform processor according to claim 3, wherein each of said first to eighth accumulators comprises one of said first to fourth accumulators.
A four-input selector for selecting and outputting the result of one of the multipliers as an input to the two's complementer.

5. The orthogonal transform processor according to claim 2, wherein each of the first to eighth accumulators includes a result itself of one of the first to fourth multipliers. A one's complement for selecting and outputting one of the one's complement of the result, an adder for executing an addition of the output of the one's complement and the accumulation result, and the accumulation result An accumulation register for holding and outputting the result of the adder as an intermediate value of the accumulation result, and a buffer register for holding and outputting the output of the accumulation register. An orthogonal transformation processor characterized in that:

6. The orthogonal transform processor according to claim 5, wherein each of said first to eighth accumulators comprises said first to fourth accumulators.
A four-input selector for selecting and outputting a result of one of the multipliers as an input of the one's complementer.

7. An orthogonal transformation processor for performing orthogonal transformation processing on input data composed of 2 ^{n + 1} (n is an integer of 2 or more) elements, wherein 2 ^{n + 1} × of 2 n ^{+ 1} coefficients ^{^{(2 n -1) × 2 n}} + 1 coefficients of each of the absolute value from the first for storing each 2 ^{n + 1} or each of the first (2 ⁿ -
1) a coefficient memory, an input register for holding and outputting one element of the input data supplied as an input, and a two-input selector for selecting and outputting one of an input and an output of the input register. And the output of the two-input selector and two of the first coefficient memory
a first multiplier for performing multiplication with one of ^{n + 1} storage coefficients; an output of the input register; and a second to (2 ⁿ -1)
Second to (2 ⁿ -1) multipliers for respectively performing multiplication with one of 2 ^{n +1} storage coefficients of a corresponding coefficient memory of the coefficient memories of A temporary register for holding and outputting the result of the multiplier of 1, and ^{2n + 1} inner products corresponding to the orthogonal transformation matrix are obtained in parallel while restoring the sign of the coefficient of the orthogonal transformation matrix. A first to a second ( ^{n + 1) -th} accumulator for performing an accumulation using a result of the first to a (2 ⁿ -1) ^-th multiplier and an output of the temporary register, respectively; orthogonal transform processor, characterized in that a 2 ^{n + 1} input selector for sequentially selecting outputs the result of the 2 ^{n + 1} of the accumulator as an element of the output data of said orthogonal transform processor from.

8. The orthogonal transform processor according to claim 7, wherein n is 2.

9. The orthogonal transform processor according to claim 8, wherein each of said first to eighth accumulators comprises: a result of said first to third multipliers and an output of said temporary register. A four-input selector for selecting and outputting one of the outputs, an output itself of the four-input selector, a two's complementer for selectively outputting one of the two's complement of the output, and a two's complementer. An adder for performing addition of the output and the accumulation result; and for preliminarily holding 0 as the initial value of the accumulation result, and holding and outputting the result of the adder as an intermediate value of the accumulation result. An accumulation register, and a buffer register for holding and outputting an output of the accumulation register.

10. The orthogonal transform processor according to claim 8, wherein each of said first to eighth accumulators comprises: a first one of said first to third multipliers; A four-input selector for selecting and outputting one of the outputs, an output itself of the four-input selector, a one's complementer for selectively outputting one of the one's complement of the output, and a one's complementer An adder for performing the addition of the output and the accumulation result; anda pre-holding a constant initial value of the accumulation result, and holding and outputting the result of the adder as an intermediate value of the accumulation result. An orthogonal transformation processor comprising: an accumulation register; and a buffer register for holding and outputting an output of the accumulation register.

11. An orthogonal transformation processor for performing orthogonal transformation processing on input data composed of 2 ^{n + 1} (n is an integer of 2 or more) elements, wherein 2 ^{n + 1} consecutive input data are provided. An input buffer for collectively holding and outputting the elements of the above, and a ^{first of the} 2 ^{n + 1} elements from the input buffer.
Input the 2nd element and the (2 ⁿ +1) th element
A constant multiplier circuit for parallel outputting pieces of constant multiplication result, the input from the buffer of the other (2 n ^{+ 1} -2) by entering the number of elements, 2 ^{n + 1} pieces of which corresponds to the orthogonal transform matrix A distribution operation circuit for sequentially outputting partial inner products; and a synthesis operation of two outputs of the constant multiplication circuit and an output of the distribution operation circuit so as to obtain an element of output data of the orthogonal transformation processor. An orthogonal transformation processor, comprising:

12. The orthogonal transform processor according to claim 11, wherein n is 2.

13. The orthogonal transform processor according to claim 12, wherein said input buffer includes eight registers for holding and outputting each of eight consecutive elements of said input data. Orthogonal transform processor.

14. The orthogonal transformation processor according to claim 12, wherein said constant multiplication circuit holds and outputs a first element of eight consecutive elements of said input data. A second input register for holding and outputting a fifth element of eight consecutive elements of the input data; an output of the first input register and an output of the second input register A two-input selector for sequentially selecting and outputting: a first constant multiplication of an output of the first input register by using an output of the two-input selector; and a second constant of the second input register A multiplier for sequentially executing multiplication, a first temporary register for holding and outputting the result of the first constant multiplication, and a second temporary register for holding and outputting the result of the second constant multiplication. A temporary register; A first buffer register for holding and outputting the output of the first temporary register; and a second buffer register for holding and outputting the output of the second temporary register. A three-input adder / subtractor for performing addition / subtraction using the output of the first buffer register and the output of the distribution operation circuit as addition inputs and using the output of the second buffer register as an addition / subtraction input, respectively. An orthogonal transformation processor characterized by the above-mentioned.

15. The orthogonal transformation processor according to claim 12, wherein said constant multiplication circuit holds and outputs a first element of eight consecutive elements of said input data. A second input register for holding and outputting a fifth element of eight consecutive elements of the input data; an output of the first input register and an output of the second input register A two-input adder / subtracter for sequentially executing addition and subtraction with the first input, a first constant multiplication of the addition result of the two-input addition / subtraction, and a second constant multiplication of the subtraction result of the two-input addition / subtraction. A multiplier for executing; a first temporary register for holding and outputting the result of the first constant multiplication; a second temporary register for holding and outputting the result of the second constant multiplication; The first temporary cash register A first buffer register for holding and outputting the output of the second temporary register; and a second buffer register for holding and outputting the output of the second temporary register. A two-input selector for selecting and outputting one of the outputs of the second buffer register; and a two-input adder for executing an addition of the output of the two-input selector and the output of the distribution operation circuit. An orthogonal transformation processor characterized by the above-mentioned.

16. The orthogonal transformation processor according to claim 12, wherein said distribution operation circuit comprises a second, a third, a fourth, a sixth, a seventh and a seventh one of eight consecutive elements of said input data. The eighth element is retained, and the least significant bit of each of the six elements is collected into a first bit slice word, and the bit one digit higher than the least significant bit of each of the six elements is collected to form a second bit slice word. Six shift registers for sequentially shifting out the least significant two bits of each of the six elements so as to form a bit slice word, and eight partial dot products corresponding to the orthogonal transformation matrix are obtained in parallel. 8 pieces of 6-bit input RACs for respectively executing a multiply-accumulate operation based on the first and second bit slice words; Buff Registers and, an orthogonal transform processor, characterized in that an 8-input selector for sequentially selecting an output of the eight buffer registers.

17. The orthogonal transform processor according to claim 16, wherein each of said eight 6-bit input RACs is a vector based on said orthogonal transform matrix such that said first bit slice word is indexed as an address. A first ROM for storing a partial sum of an inner product, and a second ROM for storing a partial sum of a vector inner product based on the orthogonal transformation matrix so as to be indexed using the second bit slice word as an address. And a partial sum indexed from the first ROM as a first addition input, a partial sum indexed from the second ROM as an addition / subtraction input, and an accumulation result as a second addition input. A three-input adder / subtracter for performing addition / subtraction, a shifter for shifting the result of the three-input adder / subtractor to the left, and holding in advance 0 as an initial value of the accumulation result; And an accumulation register for holding and outputting an output of the shifter as an intermediate value of the accumulation result.

18. The orthogonal transformation processor according to claim 12, wherein the distribution operation circuit holds second, fourth, sixth, and eighth elements of eight consecutive elements of the input data. And the least significant bits of each of the four elements are collected into a first bit slice word, and the bits one digit higher than the least significant bit of each of the four elements are collected into a second bit slice word. And four shift registers for sequentially shifting out the least significant two bits of each of the four elements, and holding third and seventh elements of eight consecutive elements of the input data. And collecting the least significant bits of each of the two elements to form a third bit slice word,
Two shift registers for shifting out the least significant two bits of each of the two elements one after the other so as to collect bits one digit higher than the least significant bit of each of the elements into a fourth bit slice word And four 4-bit input RACs for respectively performing a product-sum operation based on the first and second bit slice words so as to obtain four partial inner products corresponding to the orthogonal transformation matrix in parallel. Four 2-bit input RACs for respectively performing a product-sum operation based on the third and fourth bit slice words so as to obtain four partial inner products corresponding to the orthogonal transformation matrix in parallel; First to fourth buffer registers for holding and outputting the results of the four 4-bit input RACs; and fifth to fifth buffer registers for holding and outputting the results of the four 2-bit input RACs 8 buffer registers, a first 4-input selector for sequentially selecting and outputting the outputs of the first to fourth buffer registers, and a first 4-input selector for sequentially selecting and outputting the outputs of the fifth to eighth buffer registers. A second four-input selector; and a two-input adder / subtractor for performing addition / subtraction using the output of the first four-input selector as an addition / subtraction input and the output of the second four-input selector as an addition input. An orthogonal transformation processor characterized in that:

19. The orthogonal transform processor of claim 18, wherein each of said four 4-bit inputs RAC is a vector based on said orthogonal transform matrix such that said first bit slice word is indexed as an address. A first ROM for storing a partial sum of an inner product, and a second ROM for storing a partial sum of a vector inner product based on the orthogonal transformation matrix so as to be indexed using the second bit slice word as an address. And a partial sum indexed from the first ROM as a first addition input, a partial sum indexed from the second ROM as an addition / subtraction input, and an accumulation result as a second addition input. A three-input adder / subtracter for performing addition / subtraction, a shifter for shifting the result of the three-input adder / subtractor to the left, and holding in advance 0 as an initial value of the accumulation result; And an accumulation register for holding and outputting an output of the shifter as an intermediate value of the accumulation result.

20. The orthogonal transform processor of claim 18, wherein each of the four 2-bit inputs RAC is a vector based on the orthogonal transform matrix such that the third bit slice word is indexed as an address. A first ROM for storing a partial sum of an inner product, and a second ROM for storing a partial sum of a vector inner product based on the orthogonal transformation matrix so as to be indexed using the fourth bit slice word as an address. And a partial sum indexed from the first ROM as a first addition input, a partial sum indexed from the second ROM as an addition / subtraction input, and an accumulation result as a second addition input. A three-input adder / subtracter for performing addition / subtraction, a shifter for shifting the result of the three-input adder / subtractor to the left, and holding in advance 0 as an initial value of the accumulation result; And an accumulation register for holding and outputting an output of the shifter as an intermediate value of the accumulation result.