JP3185211B2

JP3185211B2 - Matrix data multiplier

Info

Publication number: JP3185211B2
Application number: JP32528989A
Authority: JP
Inventors: 光晴大木
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1989-12-15
Filing date: 1989-12-15
Publication date: 2001-07-09
Anticipated expiration: 2016-07-09
Also published as: JPH03186969A

Description

【発明の詳細な説明】以下の順序で本発明を説明する。DETAILED DESCRIPTION OF THE INVENTION The present invention will be described in the following order.

Ａ産業上の利用分野Ｂ発明の概要Ｃ従来の技術Ｄ発明が解決しようとする課題Ｅ課題を解決するための手段（第１図）Ｆ作用Ｇ実施例 G₁一実施例の構成（第１図〜第３図） G₂一実施例の動作（第１図〜第11図）Ｈ発明の効果Ａ産業上の利用分野この発明は、デジタル画像処理等に好適な行列データ
乗算装置に関する。A Industrial Fields B Overview of the Invention C Prior Art D Problems to be Solved by the Invention E Means for Solving the Problems (FIG. 1) F Function G Example G ₁ Configuration of One Example (First Example) (FIGS. 1 to 3) G ₂ Operation of One Embodiment (FIGS. 1 to 11) H Effects of the Invention A Industrial Field of the Invention The present invention relates to a matrix data multiplication apparatus suitable for digital image processing and the like.

Ｂ発明の概要この発明は、行列の内積を演算する内積演算回路と、
行列のデータを並べ替える並べ替え回路とを備える行列
データ乗算装置において、所要の定数行列を複数の疎行
列に分解して、一方の疎行列の要素を0,＋１及び−１と
すると共に、他方の低次の疎行列の要素を定数行列のデ
ータ成分とすることにより、内積演算回路の回路規模を
小さく、構成を簡単にすると共に、演算回数を低減して
演算速度を向上させるようにしたものである。B SUMMARY OF THE INVENTION The present invention provides an inner product operation circuit for calculating an inner product of a matrix,
In a matrix data multiplying device including a rearrangement circuit for rearranging matrix data, a required constant matrix is decomposed into a plurality of sparse matrices, and elements of one sparse matrix are set to 0, +1 and -1, and the other By using the elements of the low-order sparse matrix as data elements of a constant matrix, the circuit scale of the inner product operation circuit is reduced, the configuration is simplified, and the number of operations is reduced to improve the operation speed. It is.

Ｃ従来の技術従来、デジタル画像処理に適した各種の離散的直交変
換が知られており、そのうち、離散的コサイン変換（Di
screte Cosine Tramsform,DCT）は帯域圧縮に通し、処
理方式が比較的簡単である。C Prior Art Various discrete orthogonal transforms suitable for digital image processing have been known, and among them, a discrete cosine transform (Di
The screte cosine tramsform (DCT) passes through band compression and the processing method is relatively simple.

このDCTはＮ次の場合、第１行のすべてが第２行以下は cos｛（2x＋１）ｋπ/2N｝（ｘ＝0,1,‥‥Ｎ−1;k＝1,‥‥Ｎ−１）の要素からなる行列を用いて、変換及び逆変換（IDCT）
が定義され、２次元の場合、次のように表わされる。If this DCT is of order N, all of the first row The second and subsequent rows are transformed and inverse transformed using a matrix composed of the elements of cos ｛(2x + 1) kπ / 2N｝ (x = 0,1, ‥‥ N−1; k = 1, ‥‥ N−1). (IDCT)
Is defined, and in the case of two dimensions, it is expressed as follows.

〔Ｙ〕＝〔Ｎ〕・〔Ｘ〕・^ｔ〔Ｎ〕 ‥‥（1a）〔Ｘ〕＝^ｔ〔Ｎ〕・〔Ｙ〕・〔Ｎ〕 ‥‥（1b）なお、行列の規模が2^N行2^N列のとき、（1a）式には1/
2^N+1の係数が掛るが、Ｎ＋１ビットのデータシフトと等
価であるから、この係数の記載は省略する。また、（1
a），（1b）式にそれぞれ1/2^N-1の係数が掛かると定義
すれば、DCTとIDCTとが対称的になる。[Y] = [N] · [X] · ^t [N] ‥‥ (1a) [X] = ^t [N] · [Y] · [N] ‥‥ (1b) Note that the matrix size is 2 ^N In the case of row 2 ^N columns, 1 /
Although a coefficient of 2 ^{N + 1} is applied, description of this coefficient is omitted because it is equivalent to a data shift of N + 1 bits. Also, (1
If it is defined that the coefficients a) and (1b) are multiplied by 1/2 ^N-1 respectively, the DCT and the IDCT become symmetric.

ところで、（1a），（1b）式のような行列データの乗
算には、第12図に示すような、内積演算回路と並べ替え
回路（コーナターナ）とからなる乗算装置が従来用いら
れていた。In the meantime, a multiplication device including an inner product operation circuit and a rearrangement circuit (corner turner) as shown in FIG. 12 has been conventionally used for multiplication of matrix data as in the equations (1a) and (1b).

第12図において、（10），（20）は内積演算回路であ
って、簡単のために、いずれも４行４列の規模の行列に
対応する４次構成とされ、コーナターナ（30）を介して
接続される。In FIG. 12, (10) and (20) denote inner product operation circuits, each of which has, for simplicity, a quartic configuration corresponding to a matrix having a scale of 4 rows and 4 columns, and is provided via a corner turner (30). Connected.

即ち、端子INから次の（２）式のようなデータ行列
〔Ｘ〕が入力され、一方の内積演算回路（10）におい
て、（３）式のような係数行列〔Ａ〕との内積演算が行
なわれる。That is, a data matrix [X] as shown in the following equation (2) is input from the terminal IN, and an inner product operation with a coefficient matrix [A] as shown in equation (3) is performed in one inner product operation circuit (10). Done.

内積演算回路（10）は、３個の単位遅延器（11₁），
（11₂），（11₃）が逆順に縦続接続されて、その出力
端，両接続中点及び入力端に４個のラッチ（12₁），（1
2₂），（12₃）及び（12₄）がそれぞれ接続され、各ラッ
チ（12₁）〜（12₄）にそれぞれ縦続する乗算器（13₁）
〜（13₄）に係数ROM（14₁）〜（14₄）がそれぞれ接続さ
れ、各乗算器（13₁）〜（13₄）の出力が加算器（15）に
接続されて、有限インパルス応答（Finite Impulse Res
ponse,FIR）型のトランスバーサルフィルタ構成とされ
る。 The inner product operation circuit (10) has three unit delay units (11 ₁ ),
(11 ₂ ) and (11 ₃ ) are cascaded in reverse order, and four latches (12 ₁ ) and (1
2 ₂ ), (12 ₃ ) and (12 ₄ ) are respectively connected, and multipliers (13 ₁ ) cascaded to the latches (12 ₁ ) to (12 ₄ ), respectively
- (13 ₄₎ to the coefficient ROM (14 ₁₎ to (14 ₄₎ are respectively connected, the output of which is connected to the adder (15) each multiplier (13 ₁₎ to (13 _4), finite impulse response (Finite Impulse Res
ponse, FIR) type transversal filter.

同様に、内積演算回路（20）もFIR型トランスバーサ
ルフィルタ構成とされ、対応する各要素の符号の「10」
の位の数字を「２」に替えて重複説明を省略する。ただ
し、ROM（24₁）〜（24₄）に格納される係数b_ijがROM（1
4₁）〜（14₄）の係数a_ijと異なる。Similarly, the inner product operation circuit (20) also has an FIR transversal filter configuration, and the sign of each corresponding element is “10”.
Is replaced by “2”, and redundant description is omitted. However, the coefficients _bij stored in the ROMs (24 ₁ ) to (24 ₄ )
Different from the coefficients a _{ij of} 4 ₁ ) to (14 ₄ ).

コーナターナ（30）は１対のRAM（31）及び（32）
と、入力側及び出力側の切換スイッチ（33）及び（34）
とで構成され、両スイッチ（33）及び（34）は、１対の
RAM（31）及び（32）の一方にデータが書き込まれる期
間に、他方からデータが読み出されるように連動して切
り換えられる。RAM（31）及び（32）の容量は、前述の
ような４行４列の規模の行列対応して、それぞれ16ワー
ドとされる。Corner Turner (30) is a pair of RAM (31) and (32)
And input-side and output-side selector switches (33) and (34)
And both switches (33) and (34) are a pair of switches.
During a period in which data is written to one of the RAMs (31) and (32), the data is read out from the other and linked together. The capacities of the RAMs (31) and (32) are each 16 words, corresponding to a matrix having a size of 4 rows and 4 columns as described above.

次に、第13図を参照しながら、第12図の従来例の行列
データを乗算について説明する。Next, multiplication of matrix data of the conventional example shown in FIG. 12 will be described with reference to FIG.

入力端子INから、第13図Ａに示すような16ワード単位
の入力行列〔Ｘ〕のデータが第１列（x₁₁,x₂₁,x₃₁,x
₄₁）〜第４列（x₁₄,x₂₄,x₃₄,x₄₄）の順序で供給され
る。From the input terminal IN, the data of the input matrix [X] in units of 16 words as shown in FIG. 13A is stored in the first column (x ₁₁ , x ₂₁ , x ₃₁ , x ₃₁₎ .
₄₁₎ are supplied in the order of ~ fourth column _{_{(x 14, x 24, x}} 34, x 44).

単位データの入力開始時点t₀から３サイクル分の時間
3Tが経過したt₁時点では、単位遅延器（11₁），（11₂）
及び（11₃）の各出力端に第１列のデータx₁₁,x₂₁及びx
₃₁が存在すると共に、４番目のデータx₄₁が遅延器（1
1₃）の入力端に存在する。Time from the input start time t ₀ of the unit data of three cycles
At time t ₁ when 3T has elapsed, the unit delay units (11 ₁ ) and (11 ₂ )
And (11 ₃ ) output data x ₁₁ , x ₂₁ and x in the first column
With ₃₁ is present, the fourth data x ₄₁ is a delay unit (1
Present at the input terminal of the 1 _3).

この状態で、各ラッチに共通のイネイブルパルスが供
給されて、第１列の４個のデータx₁₁,x₂₁,x₃₁及びx₄₁が
４個のラッチ（12₁），（12₂），（12₃）及び（12₄）に
それぞれ取り込まれ、第13図B,D,F及びＨに示すよう
に、入力開始時点t₀から4T時間経過後のt₂時点から4T時
間にわたって保持される。In this state, the common Yi Neiburu pulse is supplied to each latch, four data x ₁₁ of the first row, x _21, x ₃₁ and x ₄₁ are four latches (12 _1), (12 ₂₎ , (12 ₃ ) and (12 ₄ ), respectively, and as shown in FIGS. 13, B, D, F and H, are held for 4 T time from the time t ₂ after the elapse of 4 T time from the input start time t _0. You.

ROM（14₁），（14₂），（14₃）及び（14₄）には係数
行列〔Ａ〕の各列の係数a_i1,a_i2,a_i3及びa_i4（ｉ＝1,2,
3,4）が格納されており、同図C,E,G及びＪに示すよう
に、t₂時点以後の１サイクルごとに、対応する乗算器
（13₁），（13₂），（13₃）及び（13₄）に順次供給さ
れ、それぞれ対応するラッチ（12₁），（12₂），（1
2₃）及び（12₄）に保持された第１列のデータx_i1（ｉ＝
1,2,3,4）と乗算される。 _{ROM (14 1), (14} 2), (14 3) and the coefficient of each column (14 ₄₎ The coefficient matrix [A] a _i1, a _i2, a _i3 and a _i4 (i = 1,2,
3, 4) are stored, FIG. C, E, as shown in G and J, for each cycle of t ₂ time points after a corresponding multiplier (13 _1), (13 _2), (13 ₃ ) and (13 ₄ ) are sequentially supplied, and the corresponding latches (12 ₁ ), (12 ₂ ), (1
2 ₃ ) and (12 ₄ ) held in the first column of data x _i1 (i =
1,2,3,4).

即ち、t₂時点以降の1,2,3及び４番目の各サイクル
で、係数行列の1,2,3及び４行の係数a_1j,a_2j,a_3j及びa
_4j（ｊ＝1,2,3,4）が入力行列の第１列のデータx₁₁,
x₂₁,x₃₁及びx₄₁と乗算される。That is, in t ₂ after the time of 1, 2, 3 and 4 th of each cycle, the coefficient a _1j of 1, 2, 3 and 4 rows of the coefficient matrix, a _2j, a _3j and a
_4j (j = 1,2,3,4) is the data x ₁₁ of the first column of the input matrix,
is multiplied by the x _21, x ₃₁ and x _41.

加算器（15）において、各乗算器（13₁）〜（13₄）の
出力が加算されて、同図Ｋに示すように、t₂時点以降の
４サイクルで次の（４）式に示すような積の行列〔Ｕ〕
の第１列のデータu₁₁,u₂₁,u₃₁及びu₄₁が得られる。In the adder (15), the output is summed for each multiplier (13 ₁₎ to (13 _4), as shown in FIG. K, shown in the following equation (4) at t ₂ after the time of 4 cycles Product matrix [U]
, The data u ₁₁ , u ₂₁ , u ₃₁ and u ₄₁ of the first column are obtained.

〔Ｕ〕＝〔Ａ〕・〔Ｘ〕 ‥‥‥（４）一方、同図Ａに示すように、t₂時点で行列〔Ｘ〕の第
２列のデータx₁₂,x₂₂,x₃₂及びx₄₂の入力が開始されて、
前述と同様に、t₂時点から4T時間後の時点t₃では、第２
列のデータx₁₂,x₂₂,x₃₂及びx₄₂がそれぞれラッチ（1
2₁），（12₂），（12₃）及び（12₄）にラッチされてい
る。また、t₃時点以降の１サイクルごとに、ROM（1
4₁），（14₂），（14₃）及び（14₄）から、前述と同様
に、行列〔Ａ〕の各列の係数a_i1,a_i2,a_i3及びa_i4（ｉ＝
1,2,3,4）が順次出力される。[U] = [A] · [X] ‥‥‥ (4) On the other hand, as shown in FIG. 7A, at time t ₂ , data x ₁₂ , x ₂₂ , x ₃₂ and 2 x ₄₂ input starts,
As before, at the time point t ₃ after 4T period from t ₂ time, the second
Column of data x _12, x _22, x ₃₂ and x ₄₂ respectively latch (1
2 ₁ ), (12 ₂ ), (12 ₃ ) and (12 ₄ ). In addition, for each cycle of the t ₃ point onward, ROM (1
From 4 ₁ ), (14 ₂ ), (14 ₃ ) and (14 ₄ ), similarly to the above, the coefficients a _i1 , a _i2 , a _i3 and a _i4 (i =
1, 2, 3, 4) are sequentially output.

以下前述と同様にして、t₃時点以降の４サイクルで前
出（４）式に示すような積の行列〔Ｕ〕の第２列のデー
タu₁₂,u₂₂,u₃₂及びu₄₂が得られる。Thereafter, in the same manner as described above, the data u ₁₂ , u ₂₂ , u ₃₂ and u ₄₂ of the second column of the product matrix [U] as shown in the above equation (4) are obtained in four cycles after time t _3. Can be

以下同様にして、次のt₄時点以降の４サイクルで、積
の行列〔Ｕ〕の第３列のデータu₁₃〜u₄₃が得られ、その
次のt₅時点以降の４サイクルで、積の行列〔Ｕ〕の第４
列のデータu₁₄〜u₄₄が得られる。In the same manner, at the next t ₄ after the time of 4 cycles, data u ₁₃ ~u ₄₃ of the third column of the matrix product [U] is obtained, in the next t ₅ after the time of 4 cycles, the product Of the matrix [U]
Data u ₁₄ ~u ₄₄ columns are obtained.

上述のようにして得られた行列〔Ｕ〕の16ワードの列
順のデータはコーナターナ（30）のRAM（31）及び（3
2）に交互に書き込まれる。書き込み時のアドレスと読
み出し時のアドレスとを変えることにより、RAM（31）
及び（32）から行順で交互に読出された行列〔Ｕ〕のデ
ータが第２の内積演算回路（20）に供給され、上述と全
く同様にして、第２の係数行列〔Ｂ〕と乗算されて、次
の（５）式で表わされる積の行列〔Ｙ〕のデータが端子
OUTに導出される。The 16-word column-order data of the matrix [U] obtained as described above is stored in the RAMs (31) and (3) of the corner turner (30).
It is written alternately in 2). By changing the address for writing and the address for reading, RAM (31)
The data of the matrix [U] alternately read out in row order from (32) and (32) is supplied to the second inner product operation circuit (20), and multiplied with the second coefficient matrix [B] in the same manner as described above. The data of the product matrix [Y] expressed by the following equation (5) is
Derived to OUT.

〔Ｙ〕＝〔Ｕ〕・〔Ｂ〕＝〔Ａ〕・〔Ｘ〕・〔Ｂ〕 ‥‥（５）Ｄ発明が解決しようとする課題ところで、行列の規模が８行８列の場合、（１）式の
定数行列〔Ｎ〕は、次の（６）式のように表される。[Y] = [U]. [B] = [A]. [X]. [B] ‥‥ (5) D Problems to be Solved by the Invention By the way, when the size of the matrix is 8 rows and 8 columns, The constant matrix [N] in the expression (1) is represented by the following expression (6).

ここに、要素ａ〜ｎは、第14図に示すように、角度π
/16を単位とする所定角の余弦である。 Here, the elements a to n are, as shown in FIG.
This is a cosine of a predetermined angle in units of / 16.

また、DCT及びIDCTを定義する（１）式から明らかな
ように、行列〔Ｙ〕の要素y_ijは行列〔Ｘ〕の要素x_ijの
１次式で表現される。Further, as is apparent from the equation (1) that defines DCT and IDCT, the element y _ij of the matrix [Y] is expressed by a linear expression of the element x _ij of the matrix [X].

従って、第15図に示すように、８行８列の要素x₁₁〜x
₈₈が列順に入力されて64次のベクトルとなる行列〔Xc〕
と、８行８列の要素y₁₁〜y₈₈が列順に出力されて64次の
ベクトルとなる行列〔Yc〕との間には、次の（７）式で
表される関係が成立する。Therefore, as shown in FIG. 15, the elements x _{11 to} x of 8 rows and 8 columns
₈₈ is entered 64 Next vector in column order matrix [Xc]
A matrix represented by the following equation (7) holds between the matrix and the matrix [Yc], in which elements y _{11 to} y ₈₈ of 8 rows and 8 columns are output in column order and become a 64th-order vector.

〔Yc〕＝〔Ｍ〕・〔Xc〕 ‥‥（７）ここに〔Ｍ〕は64行64列の定数行列である。 [Yc] = [M] · [Xc] (7) where [M] is a constant matrix of 64 rows and 64 columns.

ところが、前述のような従来の行列データ乗算装置で
は、この（７）式の演算を行う場合、例えば64次の内積
演算回路を用いて一挙に計算するため、回路規模が膨大
になり、構成が複雑になると共に、演算回数が多くなっ
て演算速度が制約されるという問題があった。However, in the above-described conventional matrix data multiplication apparatus, when performing the operation of the equation (7), the calculation is performed at once using, for example, a 64th-order inner product operation circuit. There is a problem that the operation becomes complicated and the number of operations increases, thereby restricting the operation speed.

かかる点に鑑み、この発明の目的は、回路規模が小さ
く、構成が簡単であると共に、演算回数が減少して高速
演算が可能な行列データ乗算装置を提供するところにあ
る。In view of the foregoing, it is an object of the present invention to provide a matrix data multiplication device which has a small circuit scale, a simple configuration, a reduced number of operations, and a high speed operation.

Ｅ課題を解決するための手段第１のこの発明は、行列の内積を演算する内積演算回
路と、行列のデータ成分を所定の順序に並べ替える並べ
替え回路とを備える行列データ乗算装置において、係数
が＋１及び−１で４次の第１の内積演算回路（42）と、
係数が0,＋１及び−１で16次の第２の内積演算回路（4
4）と、定数行列のデータ成分が格納されたメモリを含
む４次の第３の内積演算回路（45）とを設け、８行８列
の入力データを第１の並べ替え回路（41）を介して第１
の内積演算回路に供給し、当該第１の内積演算回路の出
力を第２の並べ替え回路（43）を介して第２の内積演算
回路に供給し、当該第２の内積演算回路の出力を直接に
第３の内積演算回路に供給すると共に、当該第３の内積
演算回路の出力を第３の並べ替え回路（46）を介して導
出するよにした行列データ乗算装置である。E Means for Solving the Problems A first aspect of the present invention provides a matrix data multiplication apparatus including an inner product operation circuit for calculating an inner product of a matrix and a rearrangement circuit for rearranging the data components of the matrix in a predetermined order. A first inner product operation circuit (42) of order 4 with +1 and -1;
A 16th-order second inner product operation circuit with coefficients 0, +1 and -1 (4
4) and a fourth-order third inner product operation circuit (45) including a memory in which data components of a constant matrix are stored, and the input data of 8 rows and 8 columns is provided by a first rearrangement circuit (41). First through
And supplies the output of the first inner product operation circuit to the second inner product operation circuit via the second rearranging circuit (43), and outputs the output of the second inner product operation circuit. This is a matrix data multiplying device that is directly supplied to a third inner product operation circuit and derives an output of the third inner product operation circuit via a third rearrangement circuit (46).

Ｆ作用この発明によれば、内積演算回路の規模が小さく、構
成が簡単であると共に、演算回数が減少して高速演算が
可能になる。According to the present invention, the scale of the inner product calculation circuit is small, the configuration is simple, and the number of calculations is reduced, thereby enabling high-speed calculation.

Ｇ実施例以下、第１図〜第11図を参照しながら、この発明によ
る行列データ乗算装置の一実施例について説明する。G Embodiment Hereinafter, an embodiment of a matrix data multiplication device according to the present invention will be described with reference to FIGS. 1 to 11.

G₁一実施例の構成この発明の一実施例の構成を第１図に示し、その要部
の構成を第２図及び第３図に示す。G ₁ Configuration of One Embodiment FIG. 1 shows the configuration of one embodiment of the present invention, and FIG. 2 and FIG. 3 show the configuration of main parts thereof.

第１図において、入力端子INから８行８列のデータ
が、前出第31図の行列〔Xc〕に示すように、列順で入力
され、64ワードの第１のコーナターナ（41）を介して、
４次の第１の内積演算回路（42）に供給される。この内
積演算回路（42）の出力は、64ワードの第２のコーナタ
ーナ（43）を介して、16次の第２の内積演算回路（44）
に供給される。内積演算回路（44）の出力が４次の第３
の内積演算回路（45）に供給され、内積演算回路（45）
の出力は64ワードの第３のコーナターナ（46）を介し
て、出力端子OUTに導出される。In FIG. 1, eight rows and eight columns of data are input in column order from the input terminal IN, as shown in the matrix [Xc] of FIG. 31 described above, and are transmitted via a first corner turner (41) of 64 words. hand,
It is supplied to a fourth-order first inner product calculation circuit (42). The output of the inner product operation circuit (42) is passed through the second corner turner (43) of 64 words to the 16th-order second inner product operation circuit (44).
Supplied to The output of the inner product operation circuit (44) is the third order of the fourth order.
Is supplied to the inner product arithmetic circuit (45) of the inner product arithmetic circuit (45).
Is output to an output terminal OUT via a 64-word third corner turner (46).

後述のように、第１の内積演算回路（42）の係数は、
＋１及び−１だけであり、第２の内積演算回路（44）の
係数は、0,＋１及び−１だけである。また、第３の内積
演算回路（45）の係数はDCTに特有の値となる。As described later, the coefficient of the first inner product calculation circuit (42) is
There are only +1 and −1, and the coefficients of the second inner product operation circuit (44) are only 0, +1 and −1. The coefficient of the third inner product calculation circuit (45) is a value unique to DCT.

第２図において、（50）は４次の内積演算回路であっ
て、第１図の内積演算回路（42）に相当し、３個の単位
遅延器（51₁），（51₂），（51₃）が逆順に縦続接続さ
れて、その出力端，両接続中点及び入力端に４個のラッ
チ（52₁），（52₂），（52₃），（52₄）がそれぞれ接続
される。ラッチ（52₁）〜（52₄）の出力が、それぞれス
イッチ（53₁）〜（53₄）の＋側接点に供給されると共
に、２の補数回路（54₁）〜（54₄）を介して、スイッチ
（53₁）〜（53₄）の−側接点にそれぞれ供給される。ス
イッチ（53₁）〜（53₄）の各出力が加算器（55）に供給
される。In FIG. 2, (50) is a fourth-order inner product operation circuit, which corresponds to the inner product operation circuit (42) in FIG. 1, and includes three unit delay units (51 ₁ ), (51 ₂ ), (51 ₂ ) 51 ₃ ) are connected in cascade in reverse order, and four latches (52 ₁ ), (52 ₂ ), (52 ₃ ), and (52 ₄ ) are connected to the output terminal, both connection midpoints, and the input terminal, respectively. You. The outputs of the latches (52 ₁ ) to (52 ₄ ) are supplied to the positive contacts of the switches (53 ₁ ) to (53 ₄ ), respectively, and via the two's complement circuits (54 ₁ ) to (54 ₄ ). And supplied to the negative contacts of the switches (53 ₁ ) to (53 ₄ ). The outputs of the switches (53 ₁ ) to (53 ₄ ) are supplied to the adder (55).

各スイッチ（53₁）〜（53₄）は、各補数回路（54₁）
〜（54₄）と共に係数が＋1,−１だけの乗算器を構成
し、システム制御回路（56）により互いに独立に切り換
えられる。Each switch (53 ₁ ) to (53 ₄ ) is a complement circuit (54 ₁ )
(54 ₄ ) constitutes a multiplier having coefficients of +1 and −1, and can be switched independently by the system control circuit (56).

また、２の補数回路（54₁）〜（54₄）は、周知のよう
に、否定回路と加算回路とで構成される。As is well known, the two's complement circuits (54 ₁ ) to (54 ₄ ) include a negation circuit and an addition circuit.

第３図において、（60）は16次の内積演算回路であっ
て、第１図の内積演算回路（44）に相当し、15個の単位
遅延器（61₁），（61₂）〜（61₁₅）が逆順に縦続接続さ
れて、その出力端，各接続中点及び入力端に16個のラッ
チ（62₁），（62₂）〜（62₁₆）がそれぞれ接続される。
ラッチ（62₁）〜（62₁₆）の出力が、それぞれ３接点切
換スイッチ（63₁）〜（63₁₆）の＋側接点に供給される
と共に、２の補数回路（64₁）〜（64₁₆）を介して、ス
イッチ（63₁）〜（63₁₆）の−側接点にそれぞれ供給さ
れる。スイッチ（63₁）〜（63₁₆）の第３の接点には係
数０がそれぞれ供給され、スイッチ（63₁）〜（63₁₆）
の各出力が加算器（65）に供給される。In FIG. 3, reference numeral (60) denotes a 16th-order inner product operation circuit, which corresponds to the inner product operation circuit (44) in FIG. 1, and includes 15 unit delay units (61 ₁ ), (61 ₂ ) to (61 ₂ ). 61 ₁₅₎ connected in cascade in reverse order, its output, each connecting node and the input terminal to the 16 latches (62 _1), (62 ₂₎ to (62 ₁₆₎ are respectively connected.
The outputs of the latches (62 ₁ ) to (62 ₁₆ ) are supplied to the positive contacts of the three-contact selector switches (63 ₁ ) to (63 ₁₆ ), respectively, and the two's complement circuits (64 ₁ ) to (64 ₁₆ ) ) Are supplied to the negative contacts of the switches (63 ₁ ) to (63 ₁₆ ). Switch (63 ₁₎ to the coefficient 0 to the third contacts (63 ₁₆₎ are supplied, the switch (63 ₁₎ to (63 ₁₆₎
Are supplied to the adder (65).

各スイッチ（63₁）〜（63₁₆）は、各補数回路（64₁）
〜（64₁₆）と共に係数が0,＋1,−１だけの乗算器を構成
し、システム制御回路（66）により互いに独立に切り換
えられる。Each switch (63 ₁ ) to (63 ₁₆ ) is a complement circuit (64 ₁ )
(64 ₁₆ ) constitutes a multiplier having coefficients of only 0, +1 and −1, and can be switched independently by the system control circuit (66).

G₂一実施例の動作次に、第４図〜第11図をも参照しながら、第１図の実
施例の動作について説明する。Operation of G ₂ one embodiment Next, with reference also to Figure 4-Figure 11, the operation of the embodiment of Figure 1.

第１図の実施例においては、DCTのための64行63列の
定数行列〔Ｍ〕を次の（８）式に示すような６個の行列
に分解している。In the embodiment of FIG. 1, a constant matrix [M] of 64 rows and 63 columns for DCT is decomposed into six matrices as shown in the following equation (8).

〔Ｍ〕＝［Ｗ］・［Ｖ］・［TS］・［Ｒ］・［Ｌ］・［Ｑ］/8 ‥‥（８）行列〔Ｑ〕，〔Ｒ〕及び〔Ｗ〕が第1,第２及び第３の
コーナターナ（41），（43）及び（46）にそれぞれ対応
すると共に、行列〔Ｌ〕，〔TS〕及び〔Ｖ〕が第1,第２
及び第３の内積演算回路（42），（44）及び（45）にそ
れぞれ対応する。行列〔Ｑ〕〜〔Ｗ〕は何れも64行64列
であり、第４図〜第11図に示されるように、それぞれ多
数の０要素を含む疎行列（Sparse Matrix）である。[M] = [W] · [V] · [TS] · [R] · [L] · [Q] / 8 ８ (8) The matrices [Q], [R] and [W] are the first, The matrices [L], [TS], and [V] correspond to the first and second corner turners (41), (43), and (46), respectively.
And the third inner product operation circuits (42), (44) and (45), respectively. Each of the matrices [Q] to [W] has 64 rows and 64 columns, and as shown in FIGS. 4 to 11, is a sparse matrix containing a large number of 0 elements.

なお、この第４図〜第11図において、＋及び−はそれ
ぞれ＋１及び−１を表しており、他の行列を示す後出各
図においても同様である。In FIGS. 4 to 11, + and-represent +1 and -1, respectively, and the same applies to the following figures showing other matrices.

コーナターナ（41）では、第４図に示されるように、
行列〔Ｑ〕の各行各列とも、１か所だけが＋１で、残り
の63個の要素は全て０であるから、64ワードの入力デー
タＸの並べ替えが行われる。In corner turner (41), as shown in FIG.
In each row and each column of the matrix [Q], only one location is +1 and the remaining 63 elements are all 0, so that the input data X of 64 words is rearranged.

内積演算回路（42）において、この並べ替えられたデ
ータQXが、第５図の行列〔Ｌ〕で表されるような演算処
理を受ける。同図に明らかなように、この行列〔Ｌ〕
は、＋１及び−１の要素のみで、同形の４行４列の小行
列が対角線上に16個並び、他の部分が全て０要素の疎行
列であるから、第２図に示したような４次の内積演算回
路（50）で演算処理することができる。In the inner product operation circuit (42), the rearranged data QX undergoes an operation process represented by a matrix [L] in FIG. As is apparent from FIG.
Is a sparse matrix with only +1 and -1 elements, 16 diagonal 4-row, 4-column matrices of the same shape, and all other parts are sparse matrices with 0 elements, as shown in FIG. The arithmetic processing can be performed by the fourth-order inner product arithmetic circuit (50).

第２図において、入力端子INから、64ワード単位のデ
ータQXが供給され、それぞれ４個のデータが４個のラッ
チ（52₁），（52₂），（52₃），（52₄）に取り込まれ、
4T時間にわたって保持される。In FIG. 2, data QX in units of 64 words is supplied from an input terminal IN, and four data are respectively supplied to four latches (52 ₁ ), (52 ₂ ), (52 ₃ ), and (52 ₄ ). Captured,
Holds for 4T hours.

４個のスイッチ（53₁），（53₂），（53₃），（53₄）
は、行列〔Ｌ〕の４行４列の小行列の要素が＋１である
か−１であるかにより、＋側または−側に切り換えられ
て、各ラッチ（52₁）〜（52₄）に保持されたデータに＋
１または−１の係数が乗算され、加算器（55）で加算さ
れて、端子OUTから出力される。Four switches (53 ₁ ), (53 ₂ ), (53 ₃ ), (53 ₄ )
Is switched to the + side or the-side depending on whether the element of the 4-row, 4-column sub-matrix of the matrix [L] is +1 or −1, and each of the latches (52 ₁ ) to (52 ₄ ) + To retained data
The coefficient is multiplied by 1 or −1, added by the adder (55), and output from the terminal OUT.

内積演算回路（42）から出力された64ワードのデータ
LQXは、第２のコーナターナ（43）において、第６図及
び第７図Ａ〜Ｄに示す行列〔Ｒ〕で表されるように並べ
替えられる。64-word data output from the inner product operation circuit (42)
The LQXs are rearranged in the second corner turner (43) as represented by a matrix [R] shown in FIGS. 6 and 7A to 7D.

この並べ替えられたデータRLQXが、第２の内積演算回
路（44）において、第８図の行列〔TS〕で表されるよう
な演算処理を受ける。同図に明らかなように、この行列
〔TS〕は、それぞれ16行16列で、＋1,−１及び０の要素
のみの小行列が対角線上に４個並び、他の部分が全て０
要素の疎行列であるから、第３図に示したような16次の
内積演算回路（60）で演算処理することができる。The rearranged data RLQX is subjected to a calculation process represented by the matrix [TS] in FIG. 8 in the second inner product calculation circuit (44). As can be seen from the figure, this matrix [TS] has 16 rows and 16 columns, and four small matrices having only +1, -1 and 0 elements are arranged diagonally, and the other parts are all 0.
Since the matrix is a sparse matrix of elements, arithmetic processing can be performed by a 16th-order inner product arithmetic circuit (60) as shown in FIG.

第３図において、入力端子INから、64ワード単位のデ
ータRLQXが供給され、それぞれ16個のデータが16個のラ
ッチ（62₁）〜（62₁₆）に取り込まれ、16T時間にわたっ
て保持される。In FIG. 3, data RLQX in units of 64 words is supplied from an input terminal IN, and 16 data are fetched into 16 latches (62 ₁ ) to (62 ₁₆ ) and held for 16T time.

16個のスイッチ（63₁）〜（63₁₆）は、行列〔TS〕の1
6行16列の小行列の要素が0,＋1,−１のいずれであるか
により、０側，＋側または−側に切り換えられて、各ラ
ッチ（62₁）〜（62₁₆）に保持されたデータに0,＋１ま
たは−１の係数が乗算され、加算器（65）で加算され
て、端子OUTから出力される。The 16 switches (63 ₁ ) to (63 ₁₆ ) correspond to one of the matrix [TS].
6 rows and 16 columns of submatrix elements of 0, + 1, by whether it is a -1, 0 side, the + side or - is switched to the side, are held in the respective latches (62 ₁₎ to (62 ₁₆₎ The obtained data is multiplied by a coefficient of 0, +1 or −1, added by an adder (65), and output from a terminal OUT.

内積演算回路（44）から出力された64ワードのデータ
TSRLQXは、更に、第３の内積演算回路（45）において、
第９図の行列〔Ｖ〕で表されるような演算処理を受け
る。同図に明らかなように、この行列〔Ｖ〕は、それぞ
れ４行４列の小行列が対角線上に４個並び、他の部分が
全て０要素の疎行列であるから、前出第28図に示すよう
な通常の４次内積演算回路（45）で演算処理することが
できる。64-word data output from the inner product operation circuit (44)
TSRLQX further includes a third inner product operation circuit (45)
It is subjected to an arithmetic operation represented by the matrix [V] in FIG. As apparent from FIG. 28, this matrix [V] is a sparse matrix having four rows and four columns, each of which is arranged diagonally and four other elements are all sparse matrices. The arithmetic processing can be performed by a normal fourth-order inner product arithmetic circuit (45) as shown in FIG.

内積演算回路（45）から出力された64ワードのデータ
VTSRLQXは、第３のコーナターナ（46）において、第10
図及び第11図Ａ〜Ｄに示す行列〔Ｗ〕で表されるように
並べ替えられて、所望の出力データWVTSRLQXが得られ
る。64-word data output from the inner product operation circuit (45)
VTSRLQX is in the third corner (46)
The desired output data WVTSRLQX is obtained by rearrangement as shown by the matrix [W] shown in FIGS. 11 and 11A to 11D.

第１図の実施例においては、各内積演算回路（42），
（44）及び（45）の演算処理を表す行列〔Ｌ〕，〔TS〕
及び〔Ｖ〕が何れも疎行列であるため、乗算回路を少な
くして、各内積演算回路を小規模にすることができる。In the embodiment of FIG. 1, each inner product operation circuit (42),
Matrices [L] and [TS] representing the arithmetic processing of (44) and (45)
Since [V] and [V] are both sparse matrices, the number of multiplication circuits can be reduced, and the size of each inner product operation circuit can be reduced.

また、内積演算回路（42）及び（44）については、行
列〔Ｌ〕及び〔TS〕の係数が０と＋1,−１だけであるた
め、第２図及び第３図に示すように、各乗算器の構成を
簡単にすることができると共に、内積演算時に丸め誤差
が発生することがない。As for the inner product operation circuits (42) and (44), since the coefficients of the matrices [L] and [TS] are only 0 and +1 and −1, as shown in FIGS. The configuration of the multiplier can be simplified, and no rounding error occurs during the inner product operation.

更に、行列〔Ｌ〕，〔TS〕及び〔Ｖ〕は、それらを形
成する小行列が何れも対角線上に配列されており、各転
置行列も同様の形になるため、逆変換の場合にも、第１
図の実施例と同様の構成で対応することができる。Further, the matrices [L], [TS] and [V] are such that the small matrices forming them are all arranged diagonally, and the transposed matrices have the same shape. , First
This can be dealt with by a configuration similar to that of the embodiment shown in FIG.

なお、行列〔TS〕及び〔Ｖ〕を合体して行列〔VTS〕
を形成した場合、それぞれ16次及び４次の内積演算回路
（44）及び（45）に代えて、単一の通常の16次内積演算
回路を用いることができる。The matrix [TS] and [V] are combined to form a matrix [VTS].
Is formed, a single ordinary 16th-order inner product calculation circuit can be used instead of the 16th and 4th order inner product calculation circuits (44) and (45), respectively.

Ｈ発明の効果以上詳述のように、この発明によれば、所望の定数行
列を複数の疎行列に分解して、一方の疎行列の要素を0,
＋１及び−１とすると共に、他方の低次の疎行列の要素
を定数行列のデータ成分とするようにしたので、内積演
算回路の回路規模が小さく、構成が簡単になると共に、
演算回数が低減して演算速度が向上した行列データ乗算
装置が得られる。H Effects of the Invention As described in detail above, according to the present invention, a desired constant matrix is decomposed into a plurality of sparse matrices, and one sparse matrix element is set to 0,
In addition to +1 and −1, the other low-order sparse matrix element is used as a data component of a constant matrix. Therefore, the circuit scale of the inner product operation circuit is small, and the configuration is simplified.
A matrix data multiplication device in which the number of calculations is reduced and the calculation speed is improved can be obtained.

[Brief description of the drawings]

第１図はこの発明による行列データ乗算装置の一実施例
の構成を示すブロック図、第２図及び第３図はこの発明
の一実施例の要部の構成を示すブロック図、第４図〜第
11図はこの発明の一実施例の要部の動作を説明するため
の行列を示す図、第12図は従来の行列データ乗算装置の
構成例を示すブロック図、第13図は従来例の動作を説明
するためのタイムチャート、第14図及び第15図はこの発
明の説明のための図である。（14₁）〜（14₄），（24₁）〜（24₄）はROM、（41），
（43），（46），（81），（83），（85），（87）は並
べ替え回路、（42），（44），（45），（47），（4
8），（49），（82），（84），（86）は内積演算回
路、（54₁）〜（54₄），（64₁）〜（64₁₅），（75₁）〜
（75₈）は２の補数回路である。FIG. 1 is a block diagram showing a configuration of an embodiment of a matrix data multiplication device according to the present invention, FIGS. 2 and 3 are block diagrams showing a configuration of a main part of an embodiment of the present invention, and FIGS. No.
11 is a diagram showing a matrix for explaining the operation of the main part of one embodiment of the present invention, FIG. 12 is a block diagram showing a configuration example of a conventional matrix data multiplication device, and FIG. 13 is an operation of the conventional example FIG. 14 and FIG. 15 are diagrams for explaining the present invention. (14 ₁₎ to (14 _4), (24 ₁₎ to (24 ₄₎ is ROM, (41),
(43), (46), (81), (83), (85), and (87) are rearranging circuits, and (42), (44), (45), (47), (4)
8), (49), (82), (84), and (86) are inner product operation circuits, (54 ₁ ) to (54 ₄ ), (64 ₁ ) to (64 ₁₅ ), and (75 ₁ )
(75 ₈ ) is a two's complement circuit.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＡｃｏｕｓｔｉｃｓ，ＳｐｅｅｃｈａｎｄＳｉｇｎａｌＰｒｏｃｅｓｓｉｎｇ，ｖｏｌ．51 ｎｏ．５, ｐ．1304−1307 ＳｉｇｎａｌＰｒｏｃｅｓｓｉｎｇ，ｖｏｌ．５ｎｏ．１，ｐ．81−85 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/14 ＪＩＣＳＴファイル（ＪＯＩＳ)──────────────────────────────────────────────────続き Continuation of the front page (56) References IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 51 no. 5, p. 1304-1307 Signal Processing, vol. 5 no. 1, p. 81-85 (58) Field surveyed (Int.Cl. ⁷ , DB name) G06F 17/14 JICST file (JOIS)

Claims

(57) [Claims]

1. A matrix data multiplying device comprising: an inner product operation circuit for calculating an inner product of a matrix; and a rearranging circuit for rearranging the data components of the matrix in a predetermined order. 1, a 16th-order second inner product operation circuit having coefficients 0, +1 and -1; and a fourth-order third inner product operation circuit including a memory in which data components of a constant matrix are stored. Providing input data of 8 rows and 8 columns to the first inner product operation circuit via a first rearrangement circuit, and outputting the output of the first inner product operation circuit via a second rearrangement circuit. Supplying the output of the second inner product arithmetic circuit directly to the third inner product arithmetic circuit, and supplying the output of the third inner product arithmetic circuit to a third sorting circuit A matrix data multiplying device, wherein the matrix data multiplying device is derived through