JP2766858B2

JP2766858B2 - Neural network parallel simulation method and device used therefor

Info

Publication number: JP2766858B2
Application number: JP14255889A
Authority: JP
Inventors: 琢美渡辺
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1989-06-05
Filing date: 1989-06-05
Publication date: 1998-06-18
Anticipated expiration: 2013-06-18
Also published as: JPH036769A

Description

DETAILED DESCRIPTION OF THE INVENTION [Industrial applications]

本発明は、複数の入力値の総和に非線形で微分可能な
関数を作用させて生ずる値を出力値とするユニットの複
数からなる処理層の複数が順次階層的に配列されている
構成を有し、上記複数の処理層中の１つの処理層に含ま
れる各ユニットの出力値に重みを課した出力をその１つ
の処理層と階層的にみて隣接する処理層の各ユニットの
入力として伝搬させる階層的ニューラルネットワーク上
において、ユニット間の重みを入力パターンに対して所
望の出力パターンが得るように修正することによって行
われる学習を並列に処理するニューラルネットワーク並
列シミュレーション方法及びそれに用いる装置に関し、
とくに、パタン識別、音声認識などに利用されているニ
ューラルネット学習アルゴリズムにおける各ユニット間
の結合の修正を並列処理によって行わせる場合に適用し
て好適なニューラルネット並列シミュレーション方法及
びそれに用いる装置に関する。The present invention has a configuration in which a plurality of processing layers each including a plurality of units each having a value obtained by applying a non-linear differentiable function to the sum of a plurality of input values and having a value generated as an output value are sequentially hierarchically arranged. A hierarchy in which an output obtained by imposing a weight on an output value of each unit included in one of the plurality of processing layers is propagated as an input of each unit of a processing layer that is hierarchically adjacent to the one processing layer A neural network parallel simulation method for parallel processing of learning performed by correcting weights between units on an artificial neural network so as to obtain a desired output pattern with respect to an input pattern, and an apparatus used therefor.
In particular, the present invention relates to a neural network parallel simulation method and an apparatus used in the neural network learning algorithm, which are preferably used for pattern identification, speech recognition, and the like, which are preferably applied to a case where a connection between units is modified by parallel processing.

[Prior art]

まず、従来提案されている階層構造のネットワークに
おける学習アルゴリズムであるバックプロパケーション
（後向き伝送）処理について、第１図を伴って、簡単の
ため、入力層でなる処理層（以下、簡単のため、単に入
力層とも称す）と、１個の中間層でなる処理層（以下、
簡単のため、単に中間層とも称す）と、出力層でなる処
理層（以下、簡単のため単に出力層とも称す）とを有
し、各処理層でのユニット数が３個である場合の例で、
以下に、簡単に説明する。なお、中間層が２個以上であ
る場合でも、また、各処理層におけるユニット数が４以
上である場合についても、以下述べるところに準じてい
る。ネットワークは、第１図に示すように、階層構造を用
いており、入力層、中間層及び出力層は、入力層から中
間層の方向に、次でその中間層から出力層の方向にとい
う単方向結合をしているが、各処理層内でのユニット間
結合はなく、また、各処理層の出力からその処理層より
前の処理層の入力へと向うフィードバック結合もない。
ただし、後述するところより明らかとなる後向き伝搬処
理の場合は、出力層から入力層の方向に伝搬させること
で学習を行う。その詳細については、D.F.Rumelhart,E.Geoffery,and
R.J.Willioams,“Learning International Presentati
ons by Error Propagation,"In Parallel Distributed
Processing:Exploration in the Microstructures of C
ognition（Vol.1）,pp.318−362,MIT Press,Cambridge,
Massachusettes,1986.を参照されたい。バックプロパゲーション（後向き伝送）アルゴリズム
は、多層（階層構造）ネットワークにおける誤差関数の
極小値を求める学習アルゴリズムであり、データは、入
力層から中間層を通って、出力層に伝搬する。データがこのように伝搬する前向き伝搬処理において
は、全処理層中の一の処理層を階層的にみて一般に第ｌ
番目の処理層とするとき、その第ｌ番目の処理層のユニ
ットの出力値は、そのユニットに結合している第（ｌ−
１）番目の処理層の全てのユニットの重み付き和に、微
分可能な関数（例えばシグモイド（sigmoid）関数）を
適用して得られる。また、前向き伝搬処理では、このような処理を、各処
理層において、繰返し行う。いま、一般に、Ｌ個の処理層からなる多層（階層構
造）ネットワークにおける第ｌ番目の処理層の第ｉ番目
のユニットｉの入力をU₁、出力をa₁とするとき、これら
の関係を述べれば、次のとおりである。ただし、第１図
においては、簡単のため、全ユニットに入力層側の１か
ら順に２、３の番号を付し、ｉ、ｊはそれらの番号で表
されている。 u_i＝Σw_ij（ｌ）a_j（ｌ−１） ……（１） a_i＝ｆ（u_i（ｌ）） ……（２）１≦ｉ≦Ｎ（ｌ）１≦ｊ≦Ｎ（ｌ−１）１≦ｌ≦Ｌここで、W_ij（ｌ）は、一般に、第（ｌ−１）番目の
処理層の第ｊ番目のユニットと第ｌ番目の処理層の第ｉ
番目のユニットとの間の重みを表し、また、Ｎ（ｌ）
は、第ｌ番目の処理層におけるユニット数を示す。また、前向き伝搬処理後における前向き伝搬処理とは
逆に向かってデータを伝搬させる後向き伝搬処理では、
出力層から入力層に向って、階層的にみて入力層側の処
理層での誤差の重み付き和を計算しながら、順に誤差勾
配を求め、誤差を小さくするように、重みの修正を行
う。すなわち、多層（階層構造）ネットワークに、あるパ
タンを与えた時の各ユニット間の重みの変化分△w
_ijは、 △w_ij＝δ_jo_i ……（３）で示される。ここで、o_iは、ユニットｉから、ユニットｊへの入力
値を示す。また、δ_ｊは、ユニットｊが出力層のユニットである
か中間層のユニットであるかによって異なり、ユニット
ｊが出力層のユニットである場合、 δ_ｊ＝（t_j−o_j）ｆ′（net_j） ……（４）で示され、ユニットｊが中間層のユニットである場合、
そのユニットのδ_ｊをδ_ｊ（ｌ）と表すとき、 δ_ｊ（ｌ）＝ｆ′（net_j）Σδ_ｊ（ｌ＋１）w_kj（ｌ＋１） ……（５）で示される。ただし、ｋは出力側から付されたｉと同様
の番号で表される。ここで、t_jは、教師信号（望ましい値）を示し、ま
た、net_jは、 net_j＝Σw_jio_j ……（４）′ で示される。次に、バックプロパゲーションアルゴリズムにおける
具体的な処理について述べれば、次のとおりである。すなわち、下記の前向き伝搬処理後、下記の後向き伝
搬処理を行う。（ｉ）前向き伝搬処理（ａ）入力値または階層的にみて入力層側の処理層の
ユニットの出力値を、該当する重みに伝える。（ｂ）この値と重みの積を、計算する。（ｃ）階層的にみて（ａ）の処理層からみて最上の処
理層側の処理層の同じユニットに接続されている重みご
とに、重み付き和を、計算する。（ｄ）この値に、関数ｆを適用する。（ii）後向き伝搬処理（ａ）該当する重みに、誤差を伝える。（ｂ）誤差と重みの積を、計算する。（ｃ）階層的にみて出力層側の処理層のユニットから
のこれらの値の和を、計算する。（ｄ）関数ｆの微分を、計算する。（ｅ）誤差勾配に従って、重みを、修正する。上述した処理を、収束するまで繰返す。従来、上述した処理は、逐次処理型の汎用計算機上で
行われていた。この場合の上述した処理には、相隣る処理層のユニッ
トが、それぞれｍ個及びｎ個の個数を有するとき、ｍ×
ｎ個のユニット間結合があるため、学習に、多数回の繰
返しが必要であった。このため、ｎの値が大きなニューラルネットにおいて
は、上述した処理に膨大な時間を必要としていた。First, for the sake of simplicity, a back layer (rearward transmission) process, which is a learning algorithm in a conventionally proposed hierarchical network, will be described with reference to FIG. A processing layer consisting of one intermediate layer (hereinafter, simply referred to as an input layer).
An example in which the number of units in each processing layer is three, including an intermediate layer for simplicity) and a processing layer formed of an output layer (hereinafter, also simply referred to as an output layer for simplicity). so,
The following is a brief description. Note that, even when the number of the intermediate layers is two or more, and the case where the number of units in each processing layer is four or more, the same applies as described below. The network uses a hierarchical structure as shown in FIG. 1, and the input layer, the intermediate layer and the output layer are simply arranged in the direction from the input layer to the intermediate layer, and then in the direction from the intermediate layer to the output layer. Although directional coupling is performed, there is no inter-unit coupling within each processing layer, and there is no feedback coupling from the output of each processing layer to the input of the processing layer before that processing layer.
However, in the case of the backward propagation process which becomes clearer as will be described later, learning is performed by propagating in the direction from the output layer to the input layer. See DFRumelhart, E. Geoffery, and
RJWillioams, “Learning International Presentati
ons by Error Propagation, "In Parallel Distributed
Processing: Exploration in the Microstructures of C
ognition (Vol.1), pp.318-362, MIT Press, Cambridge,
See Massachusettes, 1986. The back-propagation (backward transmission) algorithm is a learning algorithm for finding a minimum value of an error function in a multilayer (hierarchical structure) network, and data propagates from an input layer to an intermediate layer to an output layer. In the forward propagation process in which data is propagated in this way, one processing layer among all the processing layers is generally referred to as a first
In the case of the first processing layer, the output value of the unit of the first processing layer is (l−
1) It is obtained by applying a differentiable function (for example, a sigmoid function) to the weighted sum of all units in the first processing layer. In the forward propagation process, such a process is repeatedly performed in each processing layer. Now, in general, when the input of the i-th unit i of the l-th processing layer in the multi-layer (hierarchical structure) network having L processing layers is U ₁ and the output is a ₁ , these relationships are described. It is as follows. However, in FIG. 1, for simplicity, numbers 2 and 3 are assigned to all units in order from 1 on the input layer side, and i and j are represented by those numbers. u _i = Σw _ij (l) a _j (1-1) (1) a _i = f (u _i (1)) (2) 1 ≦ i ≦ N (l) 1 ≦ j ≦ N ( l−1) 1 ≦ l ≦ L where W _ij (l) is generally the j-th unit of the (l−1) -th processing layer and the i-th unit of the l-th processing layer.
Represents the weight between the second unit and N (l)
Indicates the number of units in the first processing layer. Also, in the backward propagation process in which data is propagated in a direction opposite to the forward propagation process after the forward propagation process,
From the output layer to the input layer, while calculating the weighted sum of the errors in the processing layer on the input layer side in a hierarchical manner, the error gradient is obtained in order, and the weight is corrected so as to reduce the error. That is, when a certain pattern is given to a multi-layer (hierarchical structure) network, the weight change ユニット w
_ij is represented by Δw _ij = δ _j o _i (3). Here, o _i is the unit i, shows an input value to the unit j. Δ _j differs depending on whether the unit _j is a unit of the output layer or a unit of the intermediate layer. When the unit j is a unit of the output layer, δ _j = (t _j −o _j ) f ′ ( net _j )... (4) where unit j is a unit in the middle layer,
When referring to [delta] _j of the unit and _{_{δ j (l), δ j}} (l) = f '(net j) Σδ j (l + 1) w kj (l + 1) represented by ...... (5). Here, k is represented by the same number as i assigned from the output side. Here, t _j indicates a teacher signal (desired value), also, net Non _j is represented by _{_{_{net j = Σw ji o j ......}}} (4) '. Next, specific processing in the back propagation algorithm will be described as follows. That is, after the following forward propagation process, the following backward propagation process is performed. (I) Forward propagation processing (a) The input value or the output value of the unit of the processing layer on the input layer side in the hierarchy is transmitted to the corresponding weight. (B) Calculate the product of this value and the weight. (C) The weighted sum is calculated for each weight connected to the same unit of the processing layer on the uppermost processing layer side as viewed from the processing layer shown in FIG. (D) The function f is applied to this value. (Ii) Backward propagation processing (a) An error is transmitted to the corresponding weight. (B) Calculate the product of the error and the weight. (C) Calculate the sum of these values from the unit of the processing layer on the output layer side when viewed hierarchically. (D) Calculate the derivative of the function f. (E) Modify the weight according to the error gradient. The above processing is repeated until convergence. Conventionally, the above-described processing has been performed on a general-purpose computer of a sequential processing type. In the above-described processing in this case, when the units of the adjacent processing layers have m and n units, respectively, mx
Since there are n units of connections, the learning required a large number of repetitions. For this reason, in a neural network having a large value of n, the above-described processing requires an enormous amount of time.

[Object of the present invention]

本発明は、上述したバックプロパゲーションアルゴリ
ズムを、高い並列度で処理することによって、処理を高
速化することを目的としている。An object of the present invention is to speed up the processing by processing the above-described back propagation algorithm with a high degree of parallelism.

[Means of the present invention]

本発明によるニューラルネット並列シミュレーション
方法は、複数の入力値の総和に非線形で微分可能な関数
を作用させて生ずる値を出力値とするユニットの複数か
らなる処理層の複数が順次階層的に配列されている構成
を有し、上記複数の処理層中の１つの処理層に含まれる
各ユニットの出力値に重みを課した出力をその１つの処
理層と階層的にみて隣接する処理層の各ユニットの入力
として伝搬させる階層的ニューラルネットワーク上にお
いて、ユニット間の重みを入力パターンに対して所望の
出力パターンが得るように修正することによって行われ
る学習を並列に処理するニューラルネットワーク並列シ
ミュレーション方法において、（イ）上記複数の処理層
中の最大ユニット数をとる処理層のユニット数をｎとす
るとき、ｎ×ｎの演算要素がｎ×ｎの２次元格子状に配
置され且つ隣接する演算要素間でデータの授受を行う演
算要素群を用い、（ロ）上記複数の処理層中の一の処理
層を階層的にみて一般に第ｌ番目の処理層とするとき、
第（ｌ−１）番目の処理層の１つのユニットからの第ｌ
番目の処理層の全てのユニットへの重みを、上記演算要
素群の１つの列（または行）の演算要素にそれぞれ対応
させるとともに、第ｌ番目の処理層の１つのユニットか
らの第（ｌ＋１）番目の処理層の全てのユニットへの重
みを、上記演算要素群の１つの行（または列）の演算要
素にそれぞれ対応させ、（ハ）上記複数の処理層の階層
的にみた一方の端の処理層側から他方の端の処理層に向
ってデータを伝搬させる前向き伝搬処理と、その前向き
伝搬処理後における上記前向き伝搬処理とは逆に向かっ
てデータを伝搬させる後向き伝搬処理とを行わせ、
（ａ）上記前向き伝搬処理が、（ｉ）上記演算要素群の
各演算要素において、入力値とその各演算要素に格納さ
れている重みとの乗算を、同時に行わせ、（ii）その演
算結果を、上記演算要素群の各行（または列）毎に、そ
の一方の端の演算要素から他方の端の演算要素に向かっ
て、順に加算を繰り返しながら転送させて、上記演算要
素群の各行（または列）の他方の端の演算要素に、その
行（または列）に対応する上記複数の処理層中の処理層
の各ユニットへの重み付き入力値の計算結果を得、（ii
i）その計算結果を得て後、上記演算要素群の行（また
は列）方向にみた一方の端の列（または行）上の各演算
要素において、上記（ii）の加算の繰り返しによる加算
結果に関数を適用した計算をさせ、その計算結果を、上
記演算要素群の行（または列）毎に、その上記一方の端
の演算要素から他方の端の演算要素に向かって伝搬さ
せ、（iv）その伝搬後、上記演算要素群の各演算要素に
おいて、上記（iii）の計算結果の伝搬で得られた上記
（ｉ）の入力値に対応する値とその各演算要素に格納さ
れている重みとの乗算を、同時に行わせ、（ｖ）その演
算結果を、上記演算要素群の各列（または行）毎に、そ
の一方の端の演算要素から他方の端の演算要素に向かっ
て、順に加算を繰り返しながら転送させて、上記演算要
素群の各列（または行）の一方の端の演算要素に、その
列（または行）に対応する上記複数の処理層中の処理層
の各ユニットへの重み付き入力値の計算結果を得、（v
i）その計算結果を得て後、上記演算要素群の列（また
は行）方向にみた一方の端の行（または列）上の各演算
要素において、上記（ｖ）の加算の繰り返しによる加算
結果に関数を適用した計算をさせ、その計算結果を、上
記演算要素群の列（または行）毎に、その上記一方の端
の演算要素から他方の端の演算要素に向かって伝搬させ
る、という処理を、上記複数の処理層とするその複数の
数に応じて繰り返すことによって、上記演算要素群の列
（または行）方向にみた一方の端の演算要素から、出力
値を得る処理でなり、（ｂ）上記後向き伝搬処理は、
（ｉ）上記演算要素群の各演算要素において、教師信号
と出力値との間の誤差に関数を適用した計算をさせ、そ
の計算結果を、上記演算要素群の各列（または行）につ
いて同じ計算で行わせ、（ii）その計算後、上記演算要
素群の各演算要素において、重みの変化分の計算を行わ
せ、且つその計算結果でその各演算要素に格納されてい
る重みを更新させることを、同時に行わせ、（iii）そ
の後、上記演算要素群の各行（または列）毎に、その一
方の端の演算要素から他方の端の演算要素に向かって、
順に加算を繰り返しながら転送させて、上記演算要素群
の各行（または列）の一方の端の演算要素に、その行
（または列）に対応する上記複数の処理層中の処理層の
各ユニットへの重み付き入力値の計算結果を得、（iv）
次で、その計算結果を、上記演算要素群の各行（または
列）毎に、その上記（iii）の上記他方の端に対応する
一方の端から上記（iii）の一方の端に対応する他方の
端に向かって伝搬させ、（ｖ）その計算結果を得て後、
上記演算要素群の各演算要素において、重みの変化分の
計算を行わせ、且つその計算結果でその各演算要素に格
納されている重みを更新させることを、同時に行わせ
る、という処理を、上記転送の方向を行方向及び列方向
に変化させながら、上記演算要素群の、上記複数の処理
層中の一方の端の処理層のユニットの重みに対応してい
る演算要素を有する列（または行）に達するまで繰り返
すことによって、学習を並列に行わせる処理でなる。本発明によるニューラルネット並列シミュレーション
方法に用いる装置は、複数の入力値の総和に非線形で微
分可能な関数を作用させて生ずる値を出力値とするユニ
ットの複数からなる処理層の複数が順次階層的に配列さ
れている構成を有し、上記複数の処理層中の１つの処理
層に含まれる各ユニットの出力値に重みを課した出力を
その１つの処理層と階層的にみて隣接する処理層の各ユ
ニットの入力として伝搬させる階層的ニューラルネット
ワーク上において、ユニット間の重みを入力パターンに
対して所望の出力パターンが得るように修正することに
よって行われる学習を並列に処理するニューラルネット
ワーク並列シミュレーション装置において、（イ）上記
複数の処理層中の最大ユニット数をとる処理層のユニッ
ト数をｎとするとき、演算要素がｎ×ｎの２次元状に配
置され且つ隣接する演算要素間でデータの授受を行う演
算要素群とを有し、（ロ）上記複数の処理層中の一の処
理層を階層的にみて一般に第ｌ番目の処理層とすると
き、第（ｌ−１）番目の処理層の１つのユニットからの
第ｌ番目の処理層の全てのユニットへの重みを、上記演
算要素群の１つの列（または行）の演算要素にそれぞれ
対応させるとともに、第ｌ番目の処理層の１つのユニッ
トからの第（ｌ＋１）番目の処理層の全てのユニットへ
の重みを、上記演算要素群の１つの行（または列）の演
算要素にそれぞれ対応させ、（ハ）上記複数の処理層の
階層的にみた最下の処理層側から最上の処理層に向って
データを伝搬させる前向き伝搬処理と、その前向き伝搬
処理後における上記前向き伝搬処理とは逆に向かってデ
ータを伝搬させる後向き伝搬処理とを行わせるようにな
され、（ａ）上記前向き伝搬処理が、（ｉ）上記演算要
素群の各演算要素において、入力値とその各演算要素に
格納されている重みとの乗算を、同時に行わせ、（ii）
その演算結果を、上記演算要素群の各行（または列）毎
に、その一方の端の演算要素から他方の端の演算要素に
向かって、順に加算を繰り返しながら転送させて、上記
演算要素群の各行（または列）の他方の端の演算要素
に、その行（または列）に対応する上記複数の処理層中
の処理層の各ユニットへの重み付き入力値の計算結果を
得、（iii）その計算結果を得て後、上記演算要素群の
行（または列）方向にみた一方の端の列（または行）上
の各演算要素において、上記（ii）の加算の繰り返しに
よる加算結果に関数を適用した計算をさせ、その計算結
果を、上記演算要素群の行（または列）毎に、その上記
一方の端の演算要素側から他方の端の演算要素に向かっ
て伝搬させ、（iv）その伝搬後、上記演算要素群の各演
算要素において、上記（iii）の計算結果の伝搬で得ら
れた上記（ｉ）の入力値に対応する値とその各演算要素
に格納されている重みとの乗算を、同時に行わせ、
（ｖ）その演算結果を、上記演算要素群の各列（または
行）毎に、その一方の端の演算要素から他方の端の演算
要素に向かって、順に加算を繰り返しながら転送させ
て、上記演算要素群の各列（または行）の一方の端の演
算要素に、その列（または行）に対応する上記複数の処
理層中の処理層の各ユニットへの重み付き入力値の計算
結果を得、（vi）その計算結果を得て後、上記演算要素
群の列（または行）方向にみた一方の端の行（または
列）上の各演算要素において、上記（ｖ）の加算の繰り
返しによる加算結果に関数を適用した計算をさせ、その
計算結果を、上記演算要素群の列（または行）毎に、そ
の上記一方の端の演算要素側から他方の端の演算要素に
向かって伝搬させる、という処理を、上記複数の処理層
とするその複数の数に応じて繰り返すことによって、上
記演算要素群の列（または行）方向にみた一方の端のユ
ニットから、出力値を得る処理でなり、（ｂ）上記後向
き伝搬処理が、（ｉ）上記演算要素群の各演算要素にお
いて、教師信号と出力値との間の誤差に関数を適用した
計算をさせ、その計算結果を、上記演算要素群の各列
（または行）について同じ計算で行わせ、（ii）その計
算後、上記演算要素群の各演算要素において、重みの変
化分の計算を行わせ、且つその計算結果でその各演算要
素に格納されていると重みを更新させることを、同時に
行わせ、（iii）その後、上記演算要素群の各行（また
は列）毎に、その一方の端の演算要素から他方の端の演
算要素に向かって、順に加算を繰り返しながら転送させ
て、上記演算要素群の各行（または列）の一方の端の演
算要素に、その行（または列）に対応する上記副の処理
層中の処理層の各ユニットへの重み付き入力値の計算結
果を得、（iv）次で、その計算結果を、上記演算要素群
の各行（または列）毎に、その上記（iii）の上記他方
の端に対応する一方の端から上記（iii）の一方の端に
対応する他方の端に向かって伝搬させ、（ｖ）その計算
結果を得て後、上記演算要素群の各演算要素において、
重みの変化分の計算を行わせ、且つその計算結果でその
各演算要素に格納されていると重みを更新させること
を、同時に行わせる、という処理を、上記転送の方向を
行方向及び列方向に変化させながら、上記演算要素群
の、上記複数の処理層中の最下の処理層のユニットに対
応している演算要素を有する列に達するまで繰り返すこ
とによって、学習を並列に行わせる処理でなり、（ニ）
上記前向き伝搬処理及び後向き伝搬処理を、他の入力パ
タンについて、繰返すことで、学習を、並列に行わせる
ようになされている。次に、本発明による方法を、簡単のため、第１図に示
すネットワークモデルをもとに、具体例で説明すれば、
次のとおりである。なお、中間層としての処理層の数
や、各処理層におけるユニットの数が、第１図の場合か
ら増加しても、下記の説明に準じた処理を行わせること
ができる。すなわち、複数の入力値の総和に非線形で微分可能な
関数を作用させて生ずる値を出力値とするユニットの複
数からなる処理層（入力層、中間層、出力層）の複数が
順次階層的に配列されている構成を有し、複数の処理層
（入力層、中間層、出力層）中の１つの処理層に含まれ
る各ユニットの出力値に重みを課した出力をその１つの
処理層と階層的にみて隣接する処理層の各ユニットの入
力として伝搬させる階層的ニューラルネットワーク上に
おいて、ユニット間の重みを入力パターンに対して所望
の出力パターンが得るように修正することによって行わ
れる学習を並列に処理するニューラルネットワーク並列
シミュレーション方法において、第２図に示すように、
複数の処理層（入力層、中間層、出力層）中の最大ユニ
ット数をとる処理層のユニット数をｎ（図においては、
３）とするとき、ｎ×ｎ（図においては、３×３）の演
算要素（プロセッサPE）がｎ×ｎ（図においては、３×
３）の２次元格子状に配置され且つ隣接する演算要素間
でデータの授受を行う演算要素群を用いる。そして、第１図及び第２図に示すところから明らかな
ように、複数の処理層（入力層、中間層、出力層）中の
一の処理層を階層的にみて一般に第ｌ番目の処理層とす
るとき、第（ｌ−１）番目の処理層の１つのユニットか
らの第ｌ番目の処理層の全てのユニットへの重みを、上
記演算要素群の１つの列（または行）の演算要素にそれ
ぞれ対応させるとともに、第ｌ番目の処理層の１つのユ
ニットからの第（ｌ＋１）番目の処理層の全てのユニッ
トへの重みを、上記演算要素群の１つの行（または列）
の演算要素にそれぞれ対応させる。また、予め、全ての演算要素の重みの初期値、入力値
（第１図において、入力パターンa_1i、a_2i、a_3iで示さ
れている）、教師信号（望ましい値）（第１図におい
て、d₁、d₂、d₃で示されている）を求めておく。そして、第２図に示すように、それらのデータ（第２
図においては、a_i1〜a_3i、d₁〜d₃だけがデータとして示
されている）を、各プロセッサPEに送る。次で、複数の処理層（入力層、中間層、出力層）の階
層的にみた一方の端の処理層（入力層）から他方の端の
処理層（出力層）に向ってデータを伝搬させる前向き伝
搬処理と、その前向き伝搬処理後における前向き伝搬処
理とは逆に向かってデータを伝搬させる後向き伝搬処理
とを行わせる。ここで、前向き伝搬処理は、第３図Ａに示されている
次に述べる処理でなる。（ｉ）演算要素群の各演算要素において、入力値とその
各演算要素（プロセッサ）に格納されている重みとの乗
算を、同時に行わせる。第３図Ａの右上のプロセッサPE
上でみて、入力値a_3iと重みw₃₆との乗算を行わせる。（ii）その演算結果を、演算要素群の各行（または列）
（第３図においては行）毎に、その一方の端（第３図Ａ
においては、右端）の演算要素から他方の端（第３図Ａ
においては左端）の演算要素に向かって、順に加算を繰
り返しながら転送させて、演算要素群の各行（または
列）の他方の端（第３図Ａにおいては、左端）の演算要
素に、その行（または列）に対応する複数の処理層中の
処理層の各ユニットへの重み付き入力値の計算結果（上
述した（１）式の値）を得る。（iii）その計算結果を得て後、演算要素群の行（また
は列）方向にみた一方の端の列（または行）上の各演算
要素において、（ii）の加算の繰り返しによる加算結果
に関数を適用した計算をさせ、その計算結果（上述した
（２）式の値を有する）を、演算要素群の行（または
列）毎に、その一方の端の演算要素から他方の端の演算
要素に向かって伝搬させる。（iv）その伝搬後、演算要素群の各演算要素において、
（iii）の計算結果の伝播で得られた（ｉ）の入力値に
対応する値とその各演算要素に格納されている重みとの
乗算を、同時に行わせる。（ｖ）その演算結果を、演算要素群の各列（または行）
（第３図においては列）毎に、その一方の端（第３図に
おいては、上端）の演算要素から他方の端（第３図Ａに
おいては、下端）の演算要素に向かって、順に加算を繰
り返しながら転送させて、演算要素群の各列（または
行）の他方の端の演算要素に、その列（または行）に対
応する複数の処理層中の処理層の各ユニットへの重み付
き入力値の計算結果（上述した（１）式の値を有する）
を得る。（vi）その計算結果を得て後、上記演算要素群の列（ま
たは行）方向にみた一方の端の行（または列）上の各演
算要素において、（ｖ）の加算の繰り返しによる加算結
果に関数を適用した計算をさせ、その計算結果を、演算
要素群の列（または行）毎に、その一方の端の演算要素
から他方の端の演算要素に向かって伝搬させる。（vii）上述した（ｉ）〜（vi）の処理を、複数の処理
層とするその複数の数に応じて繰り返すことによって、
演算要素群の列（または行）方向にみた一方の端（第３
図Ａにおいては、上端）の演算要素から、出力値を得
る。また、後向き伝搬処理は、第３図Ｂに示されている次
に述べる処理でなる。（ｉ）演算要素群の各演算要素において、教師信号と出
力値との間の誤差に関数を適用した計算（上述した
（４）式に示されている）をさせ、その計算結果を、演
算要素群の各列（または行）（第３図においては、列）
について同じ計算で行わせる。（ii）その計算後、演算要素群の各演算要素において、
重みの変化分の計算（上述した（３）式に示されてい
る）を行わせ、且つその計算結果でその各演算要素に格
納されている重みを更新させることを、同時に行わせ
る。（iii）その後、演算要素群の各行（または列）毎に、
その一方の端の演算要素から他方の端の演算要素に向か
って、順に加算を繰り返しながら転送させて、演算要素
群の各行（または列）の方向の端の演算要素に、その行
（または列）に対応する複数の処理層中の処理層の各ユ
ニットへの重み付き入力値の計算結果（上述した（５）
式の値を有する）を得る。（iv）次で、その計算結果を、演算要素群の各行（また
は列）毎に、その（iii）の他方の端に対応する一方の
端から（iii）の一方の端に対応する他方の端に向かっ
て伝搬させる。（ｖ）その計算結果を得て後、演算要素群の各演算要素
において、重みの変化分の計算（上述した（３）式で示
されている）を行わせ、且つその計算結果でその各演算
要素に格納されている重みを更新させることを、同時に
行わせる。（vi）上述した（ｉ）〜（ｖ）の処理を、転送の方向を
行方向及び列方向に変化させながら、演算要素群の、複
数の処理層中の一方の端の処理層のユニットの重みに対
応している演算要素を有する列（または行）に達するま
で繰り返すことによって、学習を並列に行わせる。本発明によるニューラルネット並列シミュレーション
方法に用いる装置は、本発明は、以上したところから明らかなように、重み
を各プロセッサに割当て、行方向及び列方向のデータ転
送、演算を繰返し行わせることで、演算だけでなく、デ
ータ転送においても、高い並列度で学習を行わせること
を特徴としている。In the neural network parallel simulation method according to the present invention, a plurality of processing layers each including a plurality of units each having a value obtained by applying a non-linear differentiable function to a sum of a plurality of input values and having an output value are sequentially hierarchically arranged. And outputs the weighted output values of the units included in one of the plurality of processing layers in each of the processing layers adjacent to the one processing layer. In a neural network parallel simulation method for processing learning performed in parallel on a hierarchical neural network that propagates as an input by modifying the weight between units so that a desired output pattern is obtained for an input pattern, B) When n is the number of units in the processing layer that takes the maximum number of units in the plurality of processing layers, n × n (2) One processing layer among the plurality of processing layers is hierarchically viewed by using a processing element group in which elements are arranged in an n × n two-dimensional lattice and data is exchanged between adjacent processing elements. Generally, when the first processing layer is used,
(L-1) th processing layer from one unit of the processing layer
The weights for all the units in the first processing layer are made to correspond to the operation elements in one column (or row) of the operation element group, respectively, and the (l + 1) -th unit from one unit in the l-th processing layer is used. The weights for all the units in the processing layer are made to correspond to the operation elements in one row (or column) of the operation element group, respectively, and (c) one end of one end of the plurality of processing layers as viewed hierarchically. A forward propagation process of propagating data from the processing layer side to the other end processing layer, and a backward propagation process of propagating data in a direction opposite to the forward propagation process after the forward propagation process,
(A) The forward propagation process causes (i) multiplication of an input value and a weight stored in each of the operation elements to be simultaneously performed in each of the operation elements in the operation element group, and (ii) a result of the operation. For each row (or column) of the operation element group, from the operation element at one end to the operation element at the other end, while sequentially repeating the addition, and transferring each row (or column) of the operation element group. A calculation result of a weighted input value to each unit of the processing layer among the plurality of processing layers corresponding to the row (or column) is obtained in the operation element at the other end of the column), and (ii)
i) After obtaining the calculation result, in each operation element on one end column (or row) viewed in the row (or column) direction of the operation element group, the addition result by repeating the addition of (ii) above , And the calculation result is propagated from the one end operation element to the other end operation element for each row (or column) of the operation element group, and (iv After the propagation, in each operation element of the operation element group, a value corresponding to the input value of (i) obtained by propagating the calculation result of (iii) and the weight stored in each operation element And (v) the operation result is sequentially calculated for each column (or row) of the operation element group from the operation element at one end to the operation element at the other end. Each column (or row) of the above operation element group is transferred by repeating addition. The computing element on one end, a calculated result of the weighted input value to each unit of the plurality of processing layers in the process layer corresponding to the column (or row), (v
i) After obtaining the calculation result, in each operation element on one end row (or column) viewed in the column (or row) direction of the operation element group, the addition result by repeating the addition of (v) above In which a function is applied to the calculation element, and the calculation result is propagated from the one end operation element to the other end operation element for each column (or row) of the operation element group. Is repeated in accordance with the plurality of processing layers to obtain an output value from one end of the operation elements in the column (or row) direction of the operation element group. b) The backward propagation processing includes:
(I) In each operation element of the operation element group, calculation is performed by applying a function to the error between the teacher signal and the output value, and the calculation result is the same for each column (or row) of the operation element group. (Ii) After the calculation, in each operation element of the operation element group, the calculation of the change in weight is performed, and the weight stored in each operation element is updated with the calculation result. (Iii) Thereafter, for each row (or column) of the group of operation elements, from one end of the operation element to the other end of the operation element group,
The transfer is performed while repeating the addition in order, and the operation element at one end of each row (or column) of the operation element group is transferred to each unit of the processing layer among the plurality of processing layers corresponding to the row (or column). (Iv)
Next, the calculation result is converted from one end corresponding to the other end of (iii) to the other end corresponding to one end of (iii) for each row (or column) of the operation element group. (V) After obtaining the calculation result,
In each operation element of the operation element group, a process of causing a calculation of a change in weight to be performed and updating the weight stored in each operation element with the calculation result are performed simultaneously. While changing the transfer direction in the row direction and the column direction, a column (or row) having an arithmetic element corresponding to the weight of the unit of the processing layer at one end of the plurality of processing layers in the arithmetic element group ) Is repeated until learning is performed, whereby learning is performed in parallel. The apparatus used in the neural network parallel simulation method according to the present invention is characterized in that a plurality of processing layers consisting of a plurality of units each having a value obtained by applying a non-linearly differentiable function to the sum of a plurality of input values and having an output value are sequentially hierarchically arranged. And a processing layer that weights the output value of each unit included in one of the plurality of processing layers and is adjacent to the one processing layer in a hierarchical manner. A neural network parallel simulation apparatus for processing learning performed in parallel by correcting weights between units so that a desired output pattern is obtained for an input pattern on a hierarchical neural network propagated as an input of each unit In (a), when the number of units in the processing layer that takes the maximum number of units in the plurality of processing layers is n A computing element group in which computing elements are arranged in an n × n two-dimensional manner and exchange data between adjacent computing elements, and (b) one of the plurality of processing layers is hierarchically arranged. In general, when the first processing layer is used, the weight from one unit of the (1-1) th processing layer to all the units of the first processing layer is calculated as 1 The arithmetic elements in one column (or row) are respectively associated with each other, and the weight from one unit in the l-th processing layer to all the units in the (l + 1) -th processing layer is set to 1 (C) forward propagation processing for propagating data from the lowermost processing layer side of the plurality of processing layers to the uppermost processing layer in correspondence with the arithmetic elements of one row (or column); The reverse of the forward propagation process after the forward propagation process And (a) the forward propagation process is stored in the input value and each of the operation elements in (i) each of the operation elements in the operation element group. (Ii)
The calculation result is transferred for each row (or column) of the operation element group from the operation element at one end to the operation element at the other end while repeating addition in order, and A calculation result of a weighted input value to each unit of the processing layer in the plurality of processing layers corresponding to the row (or column) is obtained in the operation element at the other end of each row (or column); (iii) After obtaining the calculation result, in each operation element on one end column (or row) viewed in the row (or column) direction of the operation element group, a function is added to the addition result obtained by repeating the addition (ii). Is applied, and the calculation result is propagated from the one end operation element side to the other end operation element side for each row (or column) of the operation element group, and (iv) After the propagation, in each operation element of the operation element group, the above ( iii) multiplying the value corresponding to the input value of (i) obtained by the propagation of the calculation result and the weight stored in each operation element thereof at the same time,
(V) transferring the operation result for each column (or row) of the operation element group from the operation element at one end to the operation element at the other end while repeating addition in order; The calculation result of the weighted input value to each unit of the processing layer among the plurality of processing layers corresponding to the column (or row) is stored in the calculation element at one end of each column (or row) of the calculation element group. (Vi) After the calculation result is obtained, the addition of (v) is repeated for each operation element on one end row (or column) viewed in the column (or row) direction of the operation element group And the calculation result is propagated from the one end operation element side to the other end operation element for each column (or row) of the operation element group. To the plurality of processing layers described above. By repeating the above operation, an output value is obtained from a unit at one end in the column (or row) direction of the operation element group, and (b) the backward propagation processing is performed by (i) the operation element group In each of the operation elements, the calculation is performed by applying a function to the error between the teacher signal and the output value, and the calculation result is performed by the same calculation for each column (or row) of the operation element group, (ii After the calculation, in each of the operation elements of the operation element group, the calculation of the change in the weight is performed, and the calculation result is updated simultaneously if the weight is stored in each of the operation elements. (Iii) Thereafter, for each row (or column) of the operation element group, the data is transferred from the operation element at one end to the operation element at the other end of the operation element group while repeating addition in order. One of each row (or column) of The calculation element at the end obtains a calculation result of a weighted input value to each unit of the processing layer in the sub processing layer corresponding to the row (or column). (Iv) Next, the calculation result is For each row (or column) of the operation element group, the signal is propagated from one end corresponding to the other end of (iii) to the other end corresponding to one end of (iii), (V) After obtaining the calculation result, in each operation element of the operation element group,
The process of causing the calculation of the change in the weight to be performed and simultaneously updating the weight stored in each calculation element with the result of the calculation is performed simultaneously. By repeating until the sequence of the arithmetic element group reaches the column having the arithmetic element corresponding to the unit of the lowest processing layer among the plurality of processing layers, learning is performed in parallel. Become (d)
By repeating the forward propagation process and the backward propagation process for other input patterns, learning is performed in parallel. Next, for simplicity, the method according to the present invention will be described in a concrete example based on the network model shown in FIG.
It is as follows. In addition, even if the number of processing layers as an intermediate layer and the number of units in each processing layer increase from the case of FIG. 1, processing according to the following description can be performed. That is, a plurality of processing layers (input layer, intermediate layer, output layer) composed of a plurality of units each having a value generated by applying a non-linear differentiable function to the sum of a plurality of input values as output values are sequentially hierarchically arranged. An output having weighted output values of each unit included in one processing layer among a plurality of processing layers (input layer, intermediate layer, output layer) is arranged as one processing layer. On a hierarchical neural network that propagates as an input of each unit of a processing layer adjacent in a hierarchy, learning performed by modifying the weight between units so that a desired output pattern is obtained for an input pattern is performed in parallel. In the neural network parallel simulation method for processing as shown in FIG.
The unit number of the processing layer that takes the maximum number of units in a plurality of processing layers (input layer, intermediate layer, output layer) is n (in the figure,
3), n × n (3 × 3 in the figure) arithmetic elements (processor PE) are n × n (3 × in the figure)
The operation element group 3) that is arranged in a two-dimensional lattice and that exchanges data between adjacent operation elements is used. As is apparent from FIGS. 1 and 2, one processing layer among a plurality of processing layers (input layer, intermediate layer, output layer) is generally viewed as a first processing layer. , The weight from one unit of the (l−1) th processing layer to all the units of the lth processing layer is calculated by the arithmetic element of one column (or row) of the arithmetic element group. And the weights from one unit of the l-th processing layer to all units of the (l + 1) -th processing layer are assigned to one row (or column) of the arithmetic element group.
Respectively. In addition, in advance, initial values of weights of all operation elements, input values (indicated by input patterns a _1i , a _2i , and a _3i in FIG. 1), teacher signals (desired values) (see FIG. 1) , D ₁ , d ₂ , and d ₃ ). Then, as shown in FIG.
In Figure, a _i1 ~a _3i, only d ₁ to d ₃ is a with which) is shown as data, and sends to each processor PE. Next, data is propagated from the processing layer (input layer) at one end of the plurality of processing layers (input layer, intermediate layer, output layer) to the processing layer (output layer) at the other end in a hierarchical manner. A forward propagation process and a backward propagation process for propagating data in a direction opposite to the forward propagation process after the forward propagation process are performed. Here, the forward propagation process is the following process shown in FIG. 3A. (I) In each operation element of the operation element group, the input value is multiplied by the weight stored in each operation element (processor) at the same time. Processor PE in the upper right of FIG. 3A
As described above, the input value a _3i is multiplied by the weight w ₃₆ . (Ii) The operation result is written in each row (or column) of the operation element group.
(A row in FIG. 3), one end thereof (FIG. 3A
In FIG. 3, the operation element at the right end (from the right end)
, The data is transferred while repeating the addition in order toward the operation element at the left end (in FIG. 3, left end in FIG. 3A) of each row (or column) of the operation element group. The calculation result of the weighted input value to each unit of the processing layer among the plurality of processing layers corresponding to (or column) (the value of the above-described equation (1)) is obtained. (Iii) After obtaining the calculation result, for each operation element on one end column (or row) in the row (or column) direction of the operation element group, the addition result obtained by repeating the addition of (ii) A calculation to which a function is applied is performed, and the calculation result (having the value of Expression (2) described above) is calculated for each row (or column) of the calculation element group from the calculation element at one end to the calculation at the other end. Propagate toward the element. (Iv) After the propagation, in each operation element of the operation element group,
The multiplication of the value corresponding to the input value of (i) obtained by the propagation of the calculation result of (iii) and the weight stored in each operation element is performed simultaneously. (V) The result of the operation is stored in each column (or row) of the operation element group.
(Columns in FIG. 3), the addition is performed in order from the operation element at one end (the upper end in FIG. 3) to the operation element at the other end (the lower end in FIG. 3A). Is repeated, and the arithmetic element at the other end of each column (or row) of the arithmetic element group is weighted to each unit of the processing layer among the plurality of processing layers corresponding to the column (or row) Calculation result of the input value (has the value of the above formula (1))
Get. (Vi) After obtaining the calculation result, in each operation element on one end row (or column) viewed in the column (or row) direction of the operation element group, the addition result by repeating the addition of (v) And a calculation result is propagated from the operation element at one end to the operation element at the other end for each column (or row) of the operation element group. (Vii) By repeating the above-mentioned processes (i) to (vi) according to the plurality of processing layers,
One end (third end) of the operation element group in the column (or row) direction
An output value is obtained from the operation element at the upper end in FIG. A). In addition, the backward propagation processing is the following processing shown in FIG. 3B. (I) In each operation element of the operation element group, a calculation (shown in the above equation (4)) is performed by applying a function to the error between the teacher signal and the output value, and the calculation result is calculated. Each column (or row) of the element group (column in FIG. 3)
For the same calculation. (Ii) After the calculation, in each operation element of the operation element group,
The calculation of the change in the weight (shown in the above-described equation (3)) is performed, and the weight stored in each operation element is updated with the calculation result at the same time. (Iii) Thereafter, for each row (or column) of the operation element group,
From the operation element at one end to the operation element at the other end, transfer is performed while repeating the addition in order, and the operation element at the end in the direction of each row (or column) of the operation element group is added to the row (or column). ), The calculation result of the weighted input value to each unit of the processing layer among the plurality of processing layers ((5) described above).
With the value of the formula). (Iv) Next, for each row (or column) of the operation element group, the calculation result is converted from one end corresponding to the other end of (iii) to the other end corresponding to one end of (iii). Propagate towards the edge. (V) After obtaining the calculation result, the calculation of the change of the weight (shown by the above-mentioned equation (3)) is performed in each calculation element of the calculation element group, and each calculation element is calculated by the calculation result. The updating of the weight stored in the operation element is performed at the same time. (Vi) The above-described processes (i) to (v) are performed while changing the transfer direction in the row direction and the column direction, while changing the unit of the processing element group at one end of the plurality of processing layers of the arithmetic element group. Learning is performed in parallel by repeating until a column (or row) having an operation element corresponding to the weight is reached. The apparatus used in the neural network parallel simulation method according to the present invention is, as is clear from the above description, the present invention assigns weights to each processor, and repeatedly performs data transfer in row and column directions, and computation. It is characterized in that learning is performed with a high degree of parallelism not only in arithmetic but also in data transfer.

【Example】

次に、第４図を伴って、本発明の実施例を述べよう。第４図は、本発明の一例構成を示し、前処理部１と、
インターフェイス部２と、アレイ部４と、制御部５とを
有する。前処理部１は、アレイ部４及びインターフェイス部２
を制御する制御部５を制御するとともに、各重みの初期
値、学習をさせる各種パタン（入力パタン）及びそれら
に対応する望ましい出力信号（教師信号）を準備する処
理を行い、逐次型計算機で構成されている。第５図は、第４図に示されているアレイ部４の一例構
成を示し、本図において、PEはプロセッサ、６は制御信
号線を示す。第６図は、第５図に示す各プロセッサPEを示し、本図
において、301〜304は選択回路、305はレジスタ、306は
アキュムレータ、307は演算器、308はレジスタファイ
ル、309は制御レジスタを示す。ここで、選択回路301は、相隣るプロセッサPEと通信
を行う場合、データを上下左右のどの隣接するプロセッ
サPEから受け取るかを選択する機能を有する。また、選択回路302は、レジスタ305にどのデータを格
納するかを選択する機能を有する。さらに、選択回路303は、隣接するプロセッサPEと通
信を行う場合、どのデータを出力するかを選択する機能
を有する。いま、選択回路301の出力を選択すれば、隣接するプ
ロセッサPEからのデータが、レジスタ305などの記憶素
子に格納されることなしに、そのまま出力される。また、この選択回路303は、制御部５から全てのプロ
セッサPEに送られる制御信号６によって、全てのプロセ
ッサPEを通じて、同一の動きをするだけでなく、プロセ
ッサPE内の制御レジスタ309に格納されているデータに
よって、各プロセッサPEで個別に出力信号を選択できる
機能を有する。さらに、選択回路304は、演算器307の入力の片側ポー
トに入力するデータを選択する機能を有する。第５図に示すアレイ部４において、プロセッサPE間の
通信を行う場合は、各プロセッサPEのレジスタ305をシ
フトレジスタのように動作させ、各プロセッサPEが、デ
ータを、一斉に、上（または下、もしくは左、または
右）に隣接しているプロセッサPEにシフト転送させるこ
とができる。また、プロセッサPEにおける制御レジスタ309を適当
に設定し、選択回路303を適当に制御すれば、あるプロ
セッサPEでは演算器307の出力、あるいはレジスタ305の
出力を、そのプロセッサPEに隣接している他のプロセッ
サPEに出力し（このプロセッサPEを、発信プロセッサPE
と呼ぶ）、別のプロセッサPEでは他のプロセッサPEから
のデータを、レジスタ305に書き込むと同時に、選択回
路303を経て出力する（このプロセッサPEを受信プロセ
ッサPEと呼ぶ）ことができる。このような機能を、リッ
プル転送と称す。第４図に示す本発明による装置を動作させるには、前
処理部１で、各ユニット間の重みの初期値、学習をさせ
る各種パタン（入力パタン）及びそれらに対応する望ま
しい出力信号（教師信号）を作成し、インタフェイス部
２を介して、第３図に示すように、各プロセッサPEにデ
ータが割当てられるように、アレイ部４に送る。このとき、教師信号、及び入力パタンについては、各
プロセッサPEの列（または行）で、同一データであるの
で、上述したリップル転送を用いて、データを送る。重みの初期値については、各プロセッサPEによって異
なる値を有するので、通常のシフト転送を行わせる。各プロセッサPEにおいて、データは、レジスタ305か
ら、演算器307を介して、レジスタファイル308の適当な
アドレスに格納される。第４図に示されている制御部５は、前処理部１からの
制御信号に従って、以後の処理を行うようなインターフ
ェイス部２、アレイ部４を制御する命令群を、逐次生成
する。まず、入力パタンあるいは重みを、レジスタファイル
308から読出して、アキュムレータ306に格納した後、入
力パタンと重みの積の演算を演算器307で行い、その演
算結果を、アキュムレータ306に格納し、その後、レジ
スタファイル308に格納する。各プロセッサPEの行（または列）毎に、上述したリッ
プル転送を用いた加算（リップル加算）を用いて、これ
らの値を順に加算させ、各プロセッサPEの行（または
列）における端のプロセッサPEに、行（または列）毎の
結果を格納する。上述したリップル加算を行うには、選択回路301の出
力を選択し、演算器307で、レジスタファイル308のデー
タと加算を行うとともに、選択回路302が、選択回路301
の出力を選択し、それを、隣接しているプロセッサPEか
らのデータを格納することで行われる。以上のようにして、次の層の各ユニットの入力データ
が、並列に求められたことになる。この各プロセッサの行（または列）における端のプロ
セッサPEに格納された重み付き和の結果を、各行（また
は列）毎の他のプロセッサPEに、リップル転送を用いて
放送し、各プロセッサPEで、この値を入力として、シグ
モイド関数の値を計算する。この場合、各プロセッサの行（または列）における端
のプロセッサPEにおいて、シグモイド関数の計算を行っ
た後、行（または列）毎に、リップル転送を用いて放送
しても良い。次に、上述したシグモイド関数の値を、次の層のユニ
ットの入力データとして、上述したと同様な処理を行
う。ただし、この場合、前述したように、データの転送方
向が、列（または行）方向になる。そして、上述したと同様の処理を、中間層の数に応じ
た回数だけ行う。また、後方の伝搬処理についても、詳細説明は省略す
るが、上述したと同様の方法で、行わせることができ
る。なお、各層のユニットの数が一致しない場合は、接続
関係のない重みを常に０にするように制御することで、
フィードバックのない任意の階層型ネットワーク構造に
適応可能である。また、各層のユニット数が、２次元プロセッサPEのア
レイの一辺のプロセッサPEの数を超えるときは、単純
に、問題のアレイを、物理アレイに格納できる大きさに
折畳む、すなわち、プロセッサPE内のレジスタファイル
あるいは、各プロセッサPEから直接アクセス可能なロー
カルメモリの深さ方向に折畳んだデータを格納し、実プ
ロセッサPEのアレイ毎に、シリアルに処理することで適
用可能である。Next, an embodiment of the present invention will be described with reference to FIG. FIG. 4 shows an example of the configuration of the present invention, in which a preprocessing unit 1
An interface unit 2, an array unit 4, and a control unit 5 are provided. The preprocessing unit 1 includes an array unit 4 and an interface unit 2
Controls the control unit 5 for controlling the initial value of each weight, prepares various patterns for learning (input patterns) and corresponding output signals (teacher signals) corresponding thereto, and configures a sequential type computer. Have been. FIG. 5 shows an example of the configuration of the array section 4 shown in FIG. 4. In FIG. 5, PE indicates a processor, and 6 indicates a control signal line. FIG. 6 shows each processor PE shown in FIG. 5. In this figure, 301 to 304 are selection circuits, 305 is a register, 306 is an accumulator, 307 is a calculator, 308 is a register file, and 309 is a control register. Show. Here, the selection circuit 301 has a function of selecting which of the upper, lower, left and right adjacent processors PE should receive data when communicating with the adjacent processor PE. Further, the selection circuit 302 has a function of selecting which data is stored in the register 305. Further, the selection circuit 303 has a function of selecting which data to output when communicating with the adjacent processor PE. Now, if the output of the selection circuit 301 is selected, the data from the adjacent processor PE is output as it is without being stored in a storage element such as the register 305. The selection circuit 303 not only performs the same operation through all the processors PE by the control signal 6 sent from the control unit 5 to all the processors PE but also stores the same in the control register 309 in the processor PE. Each processor PE has a function of selecting an output signal individually according to the data in the processor. Further, the selection circuit 304 has a function of selecting data to be input to one of the input ports of the arithmetic unit 307. In the array unit 4 shown in FIG. 5, when communication between the processors PE is performed, the register 305 of each processor PE is operated like a shift register, and the processors PE simultaneously transfer data upward (or downward). , Or left or right). Further, if the control register 309 in the processor PE is appropriately set and the selection circuit 303 is appropriately controlled, the output of the arithmetic unit 307 or the output of the register 305 in one processor PE may be different from that of another processor PE adjacent to the processor PE. Output to the processor PE (this processor PE
), The other processor PE can write the data from the other processor PE to the register 305 and simultaneously output the data through the selection circuit 303 (this processor PE is called the receiving processor PE). Such a function is called ripple transfer. In order to operate the apparatus according to the present invention shown in FIG. 4, the pre-processing unit 1 uses the initial values of the weights between the units, various patterns for learning (input patterns), and corresponding desired output signals (teacher signals). ) Is sent to the array unit 4 via the interface unit 2 so that data is allocated to each processor PE as shown in FIG. At this time, since the teacher signal and the input pattern have the same data in the column (or row) of each processor PE, the data is transmitted using the above-described ripple transfer. Since the initial value of the weight has a different value for each processor PE, normal shift transfer is performed. In each processor PE, data is stored from a register 305 via a calculator 307 at an appropriate address in a register file 308. The control unit 5 shown in FIG. 4 sequentially generates a group of instructions for controlling the interface unit 2 and the array unit 4 for performing the subsequent processing according to the control signal from the preprocessing unit 1. First, input patterns or weights are stored in a register file.
After reading from 308 and storing it in accumulator 306, the operation of the product of the input pattern and the weight is performed by arithmetic unit 307, the calculation result is stored in accumulator 306, and then stored in register file 308. For each row (or column) of each processor PE, these values are sequentially added using the above-described addition using ripple transfer (ripple addition), and the end processor PE in the row (or column) of each processor PE is added. The result is stored for each row (or column). To perform the above-described ripple addition, the output of the selection circuit 301 is selected, the arithmetic unit 307 adds the data to the register file 308, and the selection circuit 302
Is performed by storing the data from the adjacent processor PE. As described above, the input data of each unit of the next layer is obtained in parallel. The result of the weighted sum stored in the processor PE at the end of the row (or column) of each processor is broadcast to other processors PE of each row (or column) using ripple transfer, and each processor PE Using this value as input, the value of the sigmoid function is calculated. In this case, after the calculation of the sigmoid function is performed in the end processor PE in the row (or column) of each processor, the broadcast may be performed using the ripple transfer for each row (or column). Next, the same processing as described above is performed using the value of the sigmoid function described above as input data of the unit of the next layer. However, in this case, as described above, the data transfer direction is the column (or row) direction. Then, the same processing as described above is performed a number of times corresponding to the number of the intermediate layers. Although the detailed description of the backward propagation processing is omitted, the backward propagation processing can be performed in the same manner as described above. When the number of units in each layer does not match, by controlling the weights having no connection relation to always be 0,
It is adaptable to any hierarchical network structure without feedback. When the number of units in each layer exceeds the number of processors PE on one side of the array of the two-dimensional processor PE, the array in question is simply folded to a size that can be stored in the physical array, that is, in the processor PE. This can be applied by storing a register file or data folded in the depth direction of a local memory that can be directly accessed from each processor PE, and performing serial processing for each array of the real processors PE.

[Effects of the present invention]

上述したところから明らかなように、本発明によれ
ば、プロセッサPEの数を増やすことによって、それに応
じて並列度が向上し、大規模なネットワークのシミュレ
ーションを高速化できる。また、全体の処理時間のほとんどを占める学習処理を
行うプロセッサPEのアレイ部が、単純な同一構成のプロ
セッサPEを規則正しく２次元状に接続している構成を有
するので、容易に、LSI化でき、同一ハードウエア量で
は、通常の32ビットプロセッサに比べて、多くのプロセ
ッサが搭載できるので、大規模なネットワークのシミュ
レーションにとって好適である。また、層毎の重みの計算、重み付き和の計算、関数
（シグモイド関数）の計算などを、全て並列に行うの
で、極めて高速に学習を行うことができる。次表は、本発明を実際に実現した時のシミュレーショ
ン速度と、汎用計算機上で行った従来アルゴリズムによ
るシミュレーション速度の比較を示している。ただし、中間層が１個、入力ニューロン数＝出力ニュ
ーロン数＝256、学習回数＝100回の場合である。上表からも明らかなように、本発明によれば、大型汎
用計算機上のシミュレーション速度に比べて約45倍の学
習速度が得られる。さらに、本発明によれば、文字認識処理に適用した場
合、学習済みの文字パタンだけでなく、未知のパタンに
ついても、すでに学習済みのパタンの中から選択して答
を出力するネットワークの重みの値を、極めて短時間で
得ることができる。As is clear from the above description, according to the present invention, by increasing the number of processors PE, the degree of parallelism is correspondingly improved, and the simulation of a large-scale network can be sped up. In addition, since the array part of the processor PE that performs the learning processing, which occupies most of the entire processing time, has a configuration in which the processors PE having the same configuration are regularly connected in a two-dimensional manner, the LSI can be easily formed into an LSI. With the same amount of hardware, a larger number of processors can be mounted than a normal 32-bit processor, which is suitable for large-scale network simulation. Further, since the calculation of the weight for each layer, the calculation of the weighted sum, the calculation of the function (sigmoid function), and the like are all performed in parallel, the learning can be performed at a very high speed. The following table shows a comparison between a simulation speed when the present invention is actually realized and a simulation speed according to a conventional algorithm performed on a general-purpose computer. However, this is a case where there is one hidden layer, the number of input neurons = the number of output neurons = 256, and the number of times of learning = 100. As is clear from the above table, according to the present invention, a learning speed that is about 45 times faster than the simulation speed on a large general-purpose computer can be obtained. Furthermore, according to the present invention, when applied to character recognition processing, not only learned character patterns but also unknown patterns are selected from already learned patterns and the weight of the network that outputs an answer is selected. The value can be obtained in a very short time.

[Brief description of the drawings]

第１図は、３層構造の階層型ネットワークを示す図であ
る。第２図は、プロセッサPEへの各種データのマッピングを
示す図である。第３図Ａは、前向き伝搬処理時の処理を示す図である。第３図Ｂは、後向き伝搬処理時の処理を示す図である。第４図は、本発明の一例構成を示す図である。第５図は、そのアレイ部の一例構成を示す図である。第６図は、そのプロセッサPEの一例構成を示す図であ
る。 PE……プロセッサ 301〜304……選択回路 305……レジスタ 306……アキュムレータ 307……ALU 308……レジスタファイル 309……制御レジスタFIG. 1 is a diagram showing a hierarchical network having a three-layer structure. FIG. 2 is a diagram showing mapping of various data to the processor PE. FIG. 3A is a diagram showing processing during forward propagation processing. FIG. 3B is a diagram showing processing at the time of backward propagation processing. FIG. 4 is a diagram showing an example configuration of the present invention. FIG. 5 is a diagram showing an example configuration of the array section. FIG. 6 is a diagram showing an example configuration of the processor PE. PE… Processor 301 to 304… Selection circuit 305… Register 306… Accumulator 307… ALU 308… Register file 309… Control register

Claims

(57) [Claims]

The present invention has a structure in which a plurality of processing layers including a plurality of units each having a value obtained by applying a non-linear differentiable function to the sum of a plurality of input values and having an output value are sequentially arranged in a hierarchical manner. Then, an output obtained by weighting the output value of each unit included in one of the plurality of processing layers is propagated as an input of each unit of a processing layer that is hierarchically adjacent to the one processing layer. A neural network parallel simulation method for processing learning performed in parallel on a hierarchical neural network by modifying weights between units so that a desired output pattern is obtained for an input pattern, wherein the plurality of processing layers When the number of units of the processing layer that takes the maximum number of units is n, n × n arithmetic elements are arranged in an n × n two-dimensional lattice and are adjacent to each other. When one processing layer among the plurality of processing layers is generally regarded as an l-th processing layer by using a processing element group for exchanging data between calculation elements, the (l-1) -th The weights from one unit of the processing layer to all the units of the l-th processing layer are made to correspond to the operation elements of one column (or row) of the above-mentioned operation element group, respectively, and the l-th processing layer The weights of all the units of the (l + 1) th processing layer from one unit of the above are respectively made to correspond to the operation elements of one row (or column) of the operation element group, and the hierarchy of the plurality of processing layers From the processing layer at one end to the processing layer at the other end, and the backward propagation after the forward propagation processing, in which the data is propagated in a direction opposite to the forward propagation processing. Processing and The forward propagation process causes (i) multiplication of an input value and a weight stored in each of the operation elements to be simultaneously performed in each of the operation elements in the operation element group, and (ii) the operation result is For each row (or column) of the operation element group, the data is transferred while repeating addition in order from the operation element at one end to the operation element at the other end. The calculation element at the other end obtains a calculation result of a weighted input value to each unit of the processing layer among the plurality of processing layers corresponding to the row (or column), and (iii) obtains the calculation result. After that, the row (or column) of the above operation element group
In each of the operation elements on the column (or row) at one end as viewed in the direction, a calculation is performed by applying a function to the addition result obtained by repeating the above addition (ii), and the calculation result is stored in the row of the operation element group. (Or columns), the signal is propagated from the operation element at one end to the operation element at the other end, and (i
v) After the propagation, in each operation element of the operation element group, a value corresponding to the input value of (i) obtained by propagation of the calculation result of (iii) is stored in each operation element. Multiplying by a weight is performed at the same time, and (v) the operation result is calculated for each column (or row) of the operation element group from one end operation element to the other end operation element. Each unit of the processing layer among the plurality of processing layers corresponding to the column (or row) is added to the other end of the operation element at the other end of each column (or row) of the operation element group by repeating the addition in order. The result of calculating the weighted input value to
i) After obtaining the calculation result, in each operation element on one end row (or column) viewed in the column (or row) direction of the operation element group, the addition result by repeating the addition of (v) above In which a function is applied to the calculation element, and the calculation result is propagated from the one end operation element to the other end operation element for each column (or row) of the operation element group. Is repeated according to the number of the plurality of processing layers to obtain an output value from one end of the operation element group in the column (or row) direction of the operation element group. The backward propagation process (i) causes each operation element of the operation element group to perform a calculation by applying a function to an error between a teacher signal and an output value, and divides the calculation result into each column of the operation element group ( Or row) with the same calculation, (ii )
After the calculation, in each of the operation elements in the operation element group, the calculation of the change in the weight is performed, and the weight stored in each of the operation elements is updated with the calculation result at the same time. iii) Thereafter, for each row (or column) of the operation element group, the data is transferred while repeating addition sequentially from the operation element at one end to the operation element at the other end, and each row (or column) of the operation element group is transferred. The calculation element at the other end of (or column) obtains the calculation result of the weighted input value to each unit of the processing layer among the plurality of processing layers corresponding to the row (or column), and (iv) Then, for each row (or column) of the operation element group, the calculation result is converted from one end corresponding to the other end of (iii) to the other end corresponding to one end of (iii). Propagate toward the edge, (v)
After obtaining the calculation result, the calculation of the change in weight is performed in each calculation element of the calculation element group, and the weight stored in each calculation element is updated with the calculation result at the same time. While changing the transfer direction in the row direction and the column direction while corresponding to the weight of the unit of the processing layer at one end of the plurality of processing layers of the arithmetic element group. A neural network parallel simulation method characterized by a process of performing learning in parallel by repeating until a column (or row) having an operation element is reached.

2. A processing system comprising a plurality of processing layers each comprising a plurality of units each having a value obtained by applying a non-linearly differentiable function to the sum of a plurality of input values and having a value generated as an output value. Then, an output obtained by weighting the output value of each unit included in one of the plurality of processing layers is propagated as an input of each unit of a processing layer that is hierarchically adjacent to the one processing layer. A neural network parallel simulation apparatus for processing learning performed in parallel on a hierarchical neural network by modifying a weight between units so that a desired output pattern is obtained for an input pattern, comprising: Assuming that the number of units in the processing layer that takes the maximum number of units in the processing layer is n, n × n arithmetic elements are arranged in an n × n two-dimensional shape and adjacent to each other. (B) When one processing layer among the plurality of processing layers is generally regarded as the first processing layer when viewed hierarchically, the (l) -1) The weights from one unit of the first processing layer to all the units of the l-th processing layer are made to correspond to the operation elements of one column (or row) of the operation element group, respectively. (l +) from one unit of the l-th processing layer
1) The weights for all the units in the first processing layer are made to correspond to the calculation elements in one row (or column) of the calculation element group, respectively. (C) One of the plurality of processing layers as viewed hierarchically A forward propagation process for propagating data from one end processing layer to the other end processing layer, and a backward propagation process for propagating data in a direction opposite to the forward propagation process after the forward propagation process are performed. (A) in the forward propagation process, (i) in each of the operation elements in the operation element group, simultaneously multiply the input value by the weight stored in each of the operation elements, and (ii) The calculation result is transferred while repeating addition sequentially from one end of the operation element group to the other end of the operation element group for each row (or column) of the operation element group. Each row (or column) The computing elements of the other end, the row (or column)
(Iii) obtaining the calculation result of the weighted input value to each unit of the processing layer of the plurality of processing layers corresponding to the above, and obtaining the calculation result, and then in the row (or column) direction of the operation element group In each operation element on the column (or row) at one end,
A calculation is performed by applying a function to the addition result obtained by repeating the addition in the above (ii), and the calculation result is calculated for each row (or column) of the calculation element group from the calculation element side at one end to the other. (Iv) After the propagation, each operation element of the operation element group corresponds to the input value of (i) obtained by the propagation of the calculation result of (iii). The value is multiplied by the weight stored in each operation element at the same time, and (v) the operation result is calculated for each column (or row) of the operation element group by one end of the operation element To the other end of the operation element, the data is transferred while repeating the addition in order, and the operation element at the other end of each column (or row) of the operation element group corresponds to the column (or row) corresponding to the column (or row). Sum of weighted input values to each unit of the processing layer among multiple processing layers (Vi) After obtaining the calculation result, in each operation element on one end row (or column) viewed in the column (or row) direction of the operation element group, A calculation is performed by applying a function to the addition result obtained by repeating the addition, and the calculation result is transferred from the operation element at one end to the operation element at the other end for each column (or row) of the operation element group. By repeating the process of propagating toward the plurality of processing layers according to the plurality of processing layers, the output value from one end unit in the column (or row) direction of the operation element group is obtained. (B) the backward propagation process, (i) in each operation element of the operation element group, a calculation in which a function is applied to an error between a teacher signal and an output value, and the calculation result is obtained. , For each column (or row) of the above (Ii) After the calculation, the calculation of the change in the weight is performed in each of the calculation elements in the above-described calculation element group, and if the calculation result stores the weight in each of the calculation elements, the weight is calculated. (Iii) Thereafter, for each row (or column) of the operation element group, addition is sequentially repeated from the operation element at one end to the operation element at the other end. The weighted input value to each unit of the processing layer of the plurality of processing layers corresponding to the row (or column) is transferred to the other end of each row (or column) of the processing element group. (Iv) Next, the calculation result is converted from one end corresponding to the other end of (iii) to (iii) for each row (or column) of the operation element group. ) To the other end corresponding to one end of After obtaining the calculation result, in each operation element of the operation element group, the calculation of the change in weight is performed, and the calculation result is updated when the weight is stored in each operation element. The processing corresponding to one unit of the processing layer at one end of the plurality of processing layers of the processing element group while changing the transfer direction in the row direction and the column direction. This is a process of performing learning in parallel by repeating until a sequence having elements is reached. (D) Learning is performed in parallel by repeating the above-described forward propagation process and backward propagation process for other input patterns. A neural network parallel simulation apparatus characterized in that the simulation is performed.