JPH0421153A

JPH0421153A - Learning processor

Info

Publication number: JPH0421153A
Application number: JP2124249A
Authority: JP
Inventors: Atsunobu Hiraiwa; 平岩　篤信
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1990-05-16
Filing date: 1990-05-16
Publication date: 1992-01-24

Abstract

PURPOSE:To execute efficient learning processing by dividedly allocating connection weight elements between an input vector and respective neurons to ringed processing elements. CONSTITUTION:A vertical ring is obtained by dividing a network into N elements and respective processing elements PE0 to PEN-1 have respectively different connection weight values. In order to execute inner product operation, maximum value retrieval and updated neuron retrieval, intermediate results, maximum values and updated neurons are circulated along a vertical ring. In the case of a horizontal ring, input data are distributed into D groups and respective elements PE0 to PED-1 have respective different data and the same connection weight. In each time when a host computer HOST checks all data, the differential value of the connection weight and updated connection weight circulate along the horizontal ring. When the learning processing of a network constituted of a neuron ni is executed in accordance with prescribed processing algorithm, highly efficient learning is attained.

Description

【発明の詳細な説明】Ａ　産業上の利用分野本発明は、ニューラルネットワーク（Ｎｅｕｒａｌ〜ｅ
開ｏｒｋ　：神経回路に！４）の学習処理装置に間し、
特に、ｉ！ｏｈｏｎｅｎ　Ｆｅａｔｕｒｅ　Ｍａｐｓ　
（ＫＦＭ）アルゴリズムを採用した学習処理装置に関す
る。DETAILED DESCRIPTION OF THE INVENTION A. Field of Industrial Application The present invention is directed to neural networks (Neural-e).
Open ork: For neural circuits! 4) into the learning processing device,
Especially i! ohonen Feature Maps
The present invention relates to a learning processing device that employs the (KFM) algorithm.

Ｂ　発明の概要本発明は、ＫＦＭアルゴリズムを採用したニューラルネ
ットワークの学習処理装置において、多数のプロセッシ
ング・エレメントをリング結合又はメツシュ結合するこ
とにより、並列処理による効率の良い学習処理を可能に
したものである。B. Summary of the Invention The present invention enables efficient learning processing through parallel processing by ring-coupling or mesh-coupling a large number of processing elements in a neural network learning processing device that employs the KFM algorithm. be.

Ｃ従来の技術近年、人工的なニューラルネットワーク（ＡＮＮ：Ａｒ
ｔｉｆｉｃａｌ　Ｎｅｕｒａｌ　Ｎｅｔｗｏｒｋ　）の
研究が多方面で進められられている。C. Conventional technology In recent years, artificial neural networks (ANN: Ar
Research on the tificial neural network is underway in a variety of fields.

ニューラルネットワークを構成するニューロンは、第５
回に示すように、入力ＸＪ（ｊ＝１．２００．、ｎ）に
対して、結合重みＷ、Ｊとの積の総和Ｗ、ＪＸＪを例え
ばｓｉｇｍｏｉｄ関数などの所定の閾値関数ｒで変換し
、０、−ｆ（ΣＷｉ、ＸＪ）　　　・・・・・第１式なる
第１式で示される値０８を出力する多大力１出力型の閾
値特性を持つ非線形素子ｎ、と見做せる。The neurons that make up the neural network are the fifth
As shown in 2, for input XJ (j = 1.200., n), convert the connection weight W, the sum of products with J, JXJ, using a predetermined threshold function r such as a sigmoid function, 0, -f(ΣWi,

このようなニューロンにより構成されるニューラルネッ
トワークは、各種入力ＸＪパターンに対する各ニューロ
ンｎｌ　の結合重みＷＩＪを学習するようにした各種の
モデルが提案され、画像処理。For neural networks composed of such neurons, various models have been proposed that learn the connection weights WIJ of each neuron nl for various input XJ patterns, and are used for image processing.

音声認識、制御などの分野への応用が試みられている。Applications to fields such as voice recognition and control are being attempted.

このようなニューラルネットワークの学習アルゴリズム
の１つとして、Ｋ　Ｆ　Ｍ　（Ｋｏｈｏｎｅｎ　Ｆｅａ
ｔｕｒｅＭａｐｓ）アルゴリズムが提案されている。こ
のＫＦＭアルゴリズムは、入力ベクトルＸｊに対して結
合重みベクトルＷ、Ｊを持ったニューロンｎ８によによ
り構成される第６図に示すような二次元のニューラルネ
ットワークで自己組織型パターン認識を行う教師無しの
学習アルゴリズムである。One of the learning algorithms for such a neural network is KFM (Kohonen Fea
tureMaps) algorithm has been proposed. This KFM algorithm is an unsupervised algorithm that performs self-organizing pattern recognition using a two-dimensional neural network as shown in Figure 6, which is composed of neurons n8 having connection weight vectors W and J for input vectors Xj. This is a learning algorithm.

このＫＦＭアルゴリズムでは、先ず、検索フェーズにお
いて、各ノードｉについて、入力ベクトルＸと結合重み
ベクトルＷ、きの間のユークリッド距離η、を η１−１１χ−Ｗ、１１　　　・・・・・第２式なる第
２式により算出し、この距離η、が最小のニューロンを
検索する。In this KFM algorithm, first, in the search phase, for each node i, the Euclidean distance η between the input vector Calculated using the second equation, the neuron with the minimum distance η is searched.

ここで、上記距離η、の自乗η、′は、ｖ、”　　−Ｉ
ｔ　Ｘ−Ｗ＝　　ＩＩ　”＝ｌｌＷ、１＋　鵞　＋　１
１　　Ｘ　　１１”　　−２Ｘ”Ｗ！・・・・・第３式なる第３式により表され、この第３式から内積ＸＴＷ、
が最大となるノードｉの距離η、が最小値となるので、
この内積演算により上記距離η、が最小のノードｉを検
索することができる。Here, the square η,′ of the distance η, is v,” −I
t
1 X 11"-2X"W! ... is expressed by the third equation, and from this third equation, the inner product XTW,
Since the distance η of node i, where is the maximum, is the minimum value,
By this inner product calculation, it is possible to search for the node i with the minimum distance η.

次に、更新フェーズでは、上記距離η、が最小のニュー
ロンｎＸと該ニューロンｎ、に近接スるニューロンｎＸ
　ｌ　＋　〜ｎＸ８について、各結合重みベクトルＷ、
をＷ、　＝Ｗ、＋α（ｘ−ｗ、　）　　・・・・・第４式
なる第４式により更新する。なお、この第４式における
αは、学習定数である。Next, in the update phase, the neuron nX with the minimum distance η and the neuron nX that is close to the neuron n
For l + ~nX8, each connection weight vector W,
W, =W, +α(x-w, ) ... is updated by the fourth equation. Note that α in this fourth equation is a learning constant.

そして、二〇ＫＦＭアルゴリズムでは、上記検索フェー
ズの処理動作を更新フェーズの処理動作を繰り返し行う
ことにより、上記結合重みベクトルＷ、を待った各ニュ
ーロンによる二次元ネットワークにより上記入力ベクト
ルｘＪに応答する自己ａｍ型パターン認識を行う。In the 20KFM algorithm, by repeating the processing operations in the search phase and the processing operations in the update phase, the self am that responds to the input vector Perform type pattern recognition.

このようなニューラルネットワークを利用して画像処理
や音声認識などの分野において実用に適した装置を実現
するためには、ニューロンの数を増やしてネットワーク
の規模を大きくすることが必要とされ、それに伴い学習
処理に膨大な計算量が必要となる。In order to use such neural networks to create devices suitable for practical use in fields such as image processing and speech recognition, it is necessary to increase the number of neurons and increase the scale of the network. A huge amount of calculation is required for the learning process.

従来より、ニューラルネットワークを利用した装！では
、ニューラルネットワークにおける計算処理は元来並列
処理であることから、並列処理によって計算速度を高め
る試みがなされている。並列化の方法としては、１つの
ニューロンを１つの計算素子に対応させる方法や、複数
のニューロンを担当するプロセッサを結合して用いる方
法があるが、前者はハードウェアが大きくなり、現在の
技術で大規模なネットワークを実現するのは実用的でな
い。Traditionally, the system uses neural networks! Since calculation processing in neural networks is originally parallel processing, attempts have been made to increase calculation speed through parallel processing. Parallelization methods include a method in which one neuron corresponds to one computational element, and a method in which processors in charge of multiple neurons are combined, but the former requires large hardware and is difficult to achieve with current technology. It is not practical to realize a large-scale network.

従って、現在提案されているシステムは、その多くが後
者に基づいており、プロセッサとしては信号処理ブセン
サＣＤ５Ｐ）、汎用のマイクロプロセッサ専用のチップ
を用いたもの等がある。いずれも、複数のプロセッサが
、互いに通信しながら並列的にニューラルネットワーク
の計算処理を実行する。また、ニューラルネットワーク
の並列処理方式としては、ネットワークを複数に分割し
て処理を行うネットワーク分割法やデータを複数のプロ
セッサに分散させて処理を行うデータ分割法が知られて
いる。Therefore, most of the currently proposed systems are based on the latter, and include systems using a signal processing bus sensor (CD5P) or a chip dedicated to a general-purpose microprocessor as the processor. In either case, multiple processors execute neural network calculations in parallel while communicating with each other. Furthermore, as parallel processing methods for neural networks, there are known a network division method in which a network is divided into multiple parts for processing, and a data division method in which data is distributed to multiple processors for processing.

Ｄ　発明が解決しようとする課題ところで、上述の如きＫＦＭアルゴリズムに従った学習
処理を行う学習処理装置において、ネットワーク分割法
を採用した場合、ネットワークの分割数すなわちプロセ
ンサ数が多くなると、プロセッサ間の通信時間が増大し
、性能の向上が期待できなくなる。また、データ分割法
を採用した場合には、大量の学習量が必要であり、デー
タの分割数すなわちプロセッサ数が例えば１００以上に
なると、プロセッサ間の通信時間が増大し、性能の向上
が期待できなくなる。しかも、データ分割法を採用した
場合には、例えば第７図に４台のプロセッサによるデー
タ分割の例を示しであるように、各プロセッサＰＥＯ〜
ＰＥ３が検索モードでは独立に内積計算を行って更新ニ
ューロンを決定できるのであるが、更新フェーズにおい
て、更新ニューロンの対する結合重みの変化量を求めて
更新処理を行うのは、更新ニューロンの対する結合重み
Ｗ、を待ったプロセッサであって、例えばプロセッサＰ
ＥＩに負荷が集中し、並列処理による効果を期待するこ
とができない。D Problems to be Solved by the Invention By the way, when a network division method is adopted in a learning processing device that performs learning processing according to the KFM algorithm as described above, when the number of network divisions, that is, the number of processors increases, communication between processors becomes difficult. This increases the time required, and no improvement in performance can be expected. In addition, when the data division method is adopted, a large amount of learning is required, and if the number of data divisions, that is, the number of processors, exceeds, for example, 100, the communication time between processors increases, and performance cannot be expected to improve. It disappears. Moreover, when the data division method is adopted, each processor PEO to
In the search mode, PE3 can independently calculate the inner product to determine the update neuron, but in the update phase, it calculates the amount of change in the connection weight of the update neuron and performs the update process based on the connection weight of the update neuron. W, for example, a processor P
The load is concentrated on EI, and the effect of parallel processing cannot be expected.

そこで、本発明は、上述の如き従来の実情に鑑み、多数
のプロセンサによる並列処理によって、高速且つ少ない
オーバーヘッドでニューラルネ。In view of the above-mentioned conventional situation, the present invention provides a neural network at high speed and with little overhead through parallel processing using a large number of processors.

トワークに対するＫＦＭアルゴリズムに従った学習処理
を効率良く行うことができるようにした学習処理装置を
提供することを目的とする。An object of the present invention is to provide a learning processing device that can efficiently perform learning processing according to the KFM algorithm for networks.

Ｅ　課題を解決するための手段本発明は、上述の目的を達成するために、３個の要素か
らなる入力ベクトルと、それぞれ」個の要素からなりネ
ットワークを構成する複数個のニューロンのそれぞれの
結合重みとの距離の比較を行い、最小の距離を有するニ
ューロン及びそれに近接したニューロンを更新ニューロ
ンとして決定し、該更新ニューロンの上記結合重みを上
記入力ベクトルに近づける方向に変化させる学習アルゴ
リズムを繰り返すことにより、上記入力ベクトルに応答
するニューラルネットワークを形成するようにした学習
処理装置において、それぞれデータ転送メモリを介して
リング結合されたＮ個のプロセッシング・エレメントを
備え、各上記ニューロンの」個の上記結合重みをＮ分割
するとともにｊ個の上記入力ベクトルの要素をＮ分割し
、それぞれＮ分割された上記結合重み及び上記入力ベク
トルの要素を各上記プロセッシング・エレメントに割り
当て、Ｎ個の上記プロセッシング・エレメントにより上
記ニューロンの上記結合重み及び上記入力ベクトルとの
内積演算を並列に行って、得られた内積値が最大となる
上記ニューロン及びそれに近接したニューロンを上記更
新ニューロンとして決定し、Ｎ個の上記プロセッシング
・エレメントにより上記更新ニューロンの上記結合重み
の更新計算を並列に行い、上記入力ベクトルに応答する
ニューラルネットワークを上記学習アルゴリズムにより
形成するようにしたことを特徴とするものである。E. Means for Solving the Problems In order to achieve the above-mentioned object, the present invention provides an input vector consisting of three elements and a connection between each of a plurality of neurons constituting a network, each consisting of '' elements. By comparing the distance to the weight, determining the neuron with the minimum distance and the neurons close to it as update neurons, and repeating a learning algorithm that changes the connection weight of the update neuron in a direction closer to the input vector. , a learning processing device configured to form a neural network responsive to the input vector, comprising N processing elements each ring-coupled via a data transfer memory, wherein the connection weights of each neuron are is divided into N, and the j elements of the input vector are divided into N, and the N-divided connection weights and elements of the input vector are assigned to each of the processing elements, and the N processing elements perform the above processing. The inner product calculation between the connection weight of the neuron and the input vector is performed in parallel, and the neuron with the largest inner product value and the neurons close to it are determined as the update neurons, and the N processing elements are The invention is characterized in that update calculations of the connection weights of the update neurons are performed in parallel, and a neural network responsive to the input vector is formed by the learning algorithm.

また、本発明は、上述の目的を達成するためにｊ個の要
素からなる入力ベクトルと、それぞれｊ個の要素からな
りネットワークを構成する複数個のニューロンのそれぞ
れの結合重みとの距離の比較を行い、最小の距離を有す
るニューロン及びそれに近接したニューロンを更新ニュ
ーロンとして決定し、該更新ニューロンの上記結合重み
を上記入力ベクトルに近づける方向に変化させる学習ア
ルゴリズムを繰り返すことにより、上記入力ベクトルに
応答するニューラルネットワークを形成するようにした
学習処理装置において、それぞれ垂直リング結合用のデ
ータ転送メモリと水平リング結合用のデータ転送メモリ
とを介してメツシュ結合されたＮ×Ｄ個のプロセッシン
グ・エレメントを備え、各上記ニューロンのｊ個の上記
結合重みをＮ分割するとともにｊ個の上記入力ベクトル
の要素をＮ分割し、それぞれＮ分割された上記結合重み
及び上記入力ベクトルの要素を各上記プロセッシング・
エレメントに割り当て、Ｎ個の上記プロセッシング・エ
レメントにより上記ニューロンの上記結合重み及び上記
入力ベクトルとの内積演算を並列に行って、得られた内
積値が最大となる上記ニューロン及びそれに近接したニ
ューロンを上記更新ニューロンとして決定し、Ｎ個の上
記プロセッシング・エレメントにより上記更新ニューロ
ンの上記結合重みの更新計算を並列に行うともに、学習
させる複数の上記入力ベクトルをＤグループに分割し、
それぞれが上記水平リング結合用のデータ転送メモリを
介して結合されたＮ個の上記プロセッシング・エレメン
トからなるＤ組の水平結合リングに割り当て、学習させ
る全ての上記入力ベクトルに対する上記結合重みの更新
計算を並列に行い、上記入力ベクトルに応答するニュー
ラルネットワークを上記学習アルゴリズムにより形成す
るようにしたことを特徴とするものである。Furthermore, in order to achieve the above-mentioned object, the present invention compares the distance between an input vector consisting of j elements and the connection weight of each of a plurality of neurons, each consisting of j elements, constituting a network. respond to the input vector by repeating a learning algorithm that determines the neuron with the minimum distance and its neighboring neurons as update neurons, and changes the connection weight of the update neuron in a direction closer to the input vector. A learning processing device configured to form a neural network, comprising N×D processing elements each mesh-coupled via a data transfer memory for vertical ring coupling and a data transfer memory for horizontal ring coupling, The j connection weights of each neuron are divided into N, and the j elements of the input vector are divided into N, and the connection weights and input vector elements divided into N are divided into each of the processing units.
The N processing elements perform an inner product operation on the connection weight of the neuron and the input vector in parallel, and select the neuron with the maximum inner product value and the neuron adjacent to it as the above-mentioned neuron. is determined as an update neuron, and the N processing elements update the connection weights of the update neuron in parallel, and divide the plurality of input vectors to be learned into D groups;
Update calculation of the connection weights for all the input vectors to be assigned and learned to D sets of horizontal connection rings, each consisting of the N processing elements connected via the horizontal ring connection data transfer memory. The present invention is characterized in that a neural network that responds to the input vector is formed using the learning algorithm in parallel.

Ｆ　作用すなわち、本発明に係る学習処理装置では、それぞれデ
ータ転送メモリを介してリング結合されたＮ個のプロセ
ッシング・エレメントＰＥｏ−ＰＥ。F Effect: In the learning processing device according to the invention, N processing elements PEo-PE are each coupled in a ring via a data transfer memory.

にネットワークをＮ分割して割り当てた入力ベクトルＸ
のｊ個の要素及び各ニューロンの結合重みＷの」個の要
素について、各プロセッシング・エレメントＰＥ、〜Ｐ
Ｅゎ、により、０＝Ｗ−Ｘ　　　・・・・・第５式なる第５式〇内積演算をＰＥｏ　　ＰＥ＋　　ＰＥｚ　　−−−ＰＥＮ・・・・
・第６式なる第６式に示すように並列に行い、各処理期間Ｔ０〜
Ｔ８に得られる内積の中間結果をリングに沿って順次転
送しなから内積を求める。そして、各プロセッシング・
エレメントＰＥ、〜ＰＥＮ−＋　で得られた内積の最大
値を上記リングに沿って転送して１周させることにより
、内積が最大値のニューロン及びそれに近接したニュー
ロンを更新ニューロンとして決定する。さらに、各プロ
セッシング・エレメントＰＥｏ〜ＰＥ５−＋　は、上記
更新ニューロンの上記結合重みの更新計算及び更新処理
を並列に行う。The input vector X divided into N and assigned to
For j elements of and connection weights W of each neuron, each processing element PE, ~P
According to Eゎ, 0=W-X...The fifth equation is the inner product operation PEo PE+ PEz ---PEN...
・Perform in parallel as shown in Equation 6, and each processing period T0~
The inner product is determined by sequentially transferring the intermediate results of the inner product obtained at T8 along the ring. And each processing
By transferring the maximum value of the inner product obtained by the elements PE, ~PEN-+ along the ring and making one round, the neuron with the maximum value of the inner product and the neurons close to it are determined as update neurons. Further, each of the processing elements PEo to PE5-+ performs update calculation and update processing of the connection weights of the update neurons in parallel.

また、本発明に係る学習処理装置では、メツシュ結合し
たＮＸ０個のプロセッシング・エレメントＰＥｆ。、。Further, in the learning processing device according to the present invention, NX0 processing elements PEf are mesh-coupled. ,.

、〜ＰＥ（Ｄ−１，Ｎ−１１を用いることにより、ＫＦ
Ｍアルゴリズムに従った学習処理をネットワーク分割法
とデータ分割法を併用した並列分散処理で行う。, ~PE (by using D-1, N-11, KF
Learning processing according to the M algorithm is performed by parallel distributed processing using a combination of network partitioning method and data partitioning method.

Ｇ　実施例以下、本発明に係る学習処理装置の一実施例について、
図面を参照しながら詳細に説明する。G. Example Hereinafter, an example of the learning processing device according to the present invention will be described.
This will be explained in detail with reference to the drawings.

この学習処理装置は、第１図に示すように、それぞれ垂
直リング結合用のデータ転送メモリＶＭ、。、〜ＶＭ（
Ｎ−１＋　　と水平リング結合用のデータ転送メモリＨ
Ｍ、。、〜ＨＭ（Ｄ−１１とを介してメツシュ結合され
たＮＸ０個のプロセッシング・エレメントＰＥ、。、。As shown in FIG. 1, this learning processing device includes a data transfer memory VM for vertical ring coupling, respectively. , ~VM(
N-1+ and data transfer memory H for horizontal ring coupling
M. , ~HM(NX0 processing elements PE mesh-coupled via D-11, .

、〜Ｐ　Ｅ　（Ｄ−１１Ｎ−１１を備え、ネットワーク
をＮ分割し、入力データをＤグループに分散させて、上
記メツシュ結合されたＮＸ０個のプロセッシング・エレ
メントＰＥ、。、。、〜Ｐ　Ｅ　、、−。, ~P E (D-11N-11, the network is divided into N, the input data is distributed to D groups, and the mesh-coupled NX0 processing elements PE, . . . , ~P E , ,-.

Ｎ−１１にＫＦＭアルゴリズムをホストコンピュータＨ
Ｏ３Ｔによりマツピングしてなる。Apply the KFM algorithm to N-11 on host computer H.
It is mapped by O3T.

上記各プロセッシング・エレメントＰＥ、。、。）〜Ｐ
　Ｅ　（Ｄ−１，Ｎ−１１には、第２図に示すように、
例えば、Ｉｎｔｅ１社によって開発された６４ビツト、
ＲＩＳＣタイプの汎用マイクロプロセッサ（８０８６０
）がそれぞれ用いられ、ニューロン間の結合の重みや出
力をストアするために４ＭハイドのローカルメモリＲＡ
Ｍが設けられている。Each of the above processing elements PE. ,. )～P
E (As shown in Figure 2, D-1 and N-11 have
For example, 64 bit developed by Intel1,
RISC type general-purpose microprocessor (80860
) are used, and a 4M Hyde local memory RA is used to store the weights and outputs of connections between neurons.
M is provided.

また、上記垂直リング結合用の各データ転送メモリＶＭ
、。、〜Ｖ　Ｍ　、（Ｎ−１１及び水平リング結合用の
各データ転送メモリＨＭ　（。）〜ＨＭ（Ｄ−１１とじ
ては、Ｆ　Ｉ　Ｆ　Ｏ（Ｆｉｒｓｔ　ｉｎ　Ｆｉｒｓｔ
　ｏｕｔ）メモリがそれぞれ用いられている。In addition, each data transfer memory VM for the vertical ring coupling
,. , ~V M , (N-11 and each data transfer memory HM (.) ~ HM (D-11 is FIFO (First in First)
out) memory is used respectively.

そして、上記各プロセッシング・エレメントＰＥ、。、
。）〜Ｐ　Ｅ　ＩＤ−１＋　Ｍ−０は、隣接する４つの
プロセッシング・エレメントＰＥとＦＩＦＯによるデー
タ転送メモリＶＭ、ＨＭを介して結合されており、上記
データ転送メモリＶＭ、ＨＭを介して上記隣接する４つ
のプロセッシング・エレメントＰＥと非同期に通信を行
うことができる。and each of the above-mentioned processing elements PE. ,
. )~P E ID-1+M-0 is coupled to four adjacent processing elements PE via FIFO data transfer memories VM and HM, and is connected to the adjacent four processing elements PE via the data transfer memories VM and HM. Communication can be performed asynchronously with four processing elements PE.

すなわち、この実施例の学習処理装置において、その垂
直方向のリング（垂直リング）は、ネットワークをＮ分
割しており、その垂直リング内の各プロセッシング・エ
レメントＰＥ、。、〜Ｐ　Ｅ　、。That is, in the learning processing device of this embodiment, the vertical ring divides the network into N parts, and each processing element PE in the vertical ring. , ~P E ,.

は異なった結合重みを持ち、内積演算や最大値の検索、
更新ニューロンの検索などのために、中間結果や最大値
、更新ニューロンの情報などがこの垂直リングに沿って
１周される。また、水平方向のリング（水平リング）は
、入力データをＤグループに分散させており、その水平
リング内の各プロセッシング・エレメントＰＥｔ。）〜
Ｐ　Ｅ　（Ｄは異なる入力データと同じ結合重みを持ち
、ホストコンピュータＨＯ３Ｔにより全データを見せる
毎に、結合重みの微分値と更新された結合重みがこの水
平リングに沿って１周される。have different connection weights, and can be used for dot product operations, maximum value searches,
In order to search for an update neuron, intermediate results, maximum values, update neuron information, etc. are passed around once along this vertical ring. Further, the horizontal ring distributes input data into D groups, and each processing element PEt in the horizontal ring. )~
P E (D has the same connection weight as different input data, and each time the host computer HO3T shows all the data, the differential value of the connection weight and the updated connection weight are rotated once along this horizontal ring.

そして、この学習処理装置は、第３図に示すようなｉ個
のニューロンｎ、により構成されるニューラルネットワ
ークについて、次の処理アルゴリズム（１）〜０■に従
って学習処理を行う。Then, this learning processing device performs learning processing on a neural network constituted by i neurons n as shown in FIG. 3 according to the following processing algorithms (1) to 0.

（１）入力ベクトルＸと結合重みＷの各１個の要素をＮ
分割し、内積の中間結果を垂直リング上に転送しながら
各プロセッシング・エレメントＰＥ（。、。）〜Ｐ　Ｅ
　ｆＤ−１＋　Ｎ−１１により内積を求める。(1) Set one element each of input vector X and connection weight W to N
Each processing element PE(.,.) ~ PE while dividing and transferring the intermediate result of the dot product onto the vertical ring.
Find the inner product using fD-1+N-11.

（２）各プロセッシング・エレメントＰＥ、。、。、〜
Ｐ　Ｅ　！Ｄ−１＋　Ｎ−１１で内積のローカルな最大
値を求め、垂直リングを１周することでグローバルな最
大値を求める。(2) Each processing element PE. ,. ,~
PE! The local maximum value of the inner product is determined by D-1+N-11, and the global maximum value is determined by going around the vertical ring once.

（３）グローバルな最大値が、垂直リングを１周するこ
とで各プロセッシング・エレメントＰＥ、。、。、〜Ｐ
　Ｅ　（Ｄ−１，□７．に放送される。(3) The global maximum value travels around the vertical ring once to each processing element PE. ,. ,~P
E (Broadcast on D-1, □7.

（４）各プロセッシング・エレメントＰＥ、。、。、〜
Ｐ　Ｅ　１１１−１１Ｈ−１１に割り当てられたニュー
ロンについて、更新の有無を決定し、更新ニューロンｎ
＊＋　ｎ　ｚｌ　””　ｎ　、ｌｌｌの情報を垂直リン
グを１周することで各プロセッシング・エレメントＰＥ
、。、。。(4) Each processing element PE. ,. ,~
Determine whether or not to update the neuron assigned to P E 111-11H-11, and update the neuron n
*+ n zl ”” By passing the information of n,lll once around the vertical ring, each processing element PE
,. ,. .

〜ＰＥ、。−１，Ｎ−１１に放送する。~PE,. -1, broadcast to N-11.

（５）各プロセッシング・エレメントＰＥ、。、。、〜
Ｐ　Ｅ　（６−１１Ｎ−１１により、更新ニューロンｎ
Ｘ。(5) Each processing element PE. ,. , ~
P E (by 6-11N-11, update neuron n
X.

ｎＸｌ””’ｎＸ１１に対して、結合重みＷｉ　ｊの変
化量ΔＷ□１を求める。For nXl""'nX11, the amount of change ΔW□1 in the connection weight Wi j is determined.

ΔＷｔＪ（ｍ）＝ＸＪ−Ｗ１ｊ＋ΔＷ　ｉ　ｊ　Ｃｍ−
１）ここで、全データ数をＡＤとすると、上記ｍは、ｍ
＝１．２．　　・・・、ＡＤである。ΔWtJ(m)=XJ−W1j+ΔW i j Cm−
1) Here, if the total number of data is AD, the above m is m
=1.2. ..., AD.

（６）分割された全データについて、上記（１）〜（５
）の処理をＡＤ／Ｄ回繰り返し行う。(6) Regarding all the divided data, the above (1) to (5)
) is repeated AD/D times.

（７）各プロセッシング・エレメントＰＥ、。、。、〜
Ｐ　Ｅ　（Ｄ−１，１１−１１により、水平リングを使
い、データ分割された結合重みの変化量について総和を
求める。(7) Each processing element PE. ,. ,~
P E (D-1, 11-11, use the horizontal ring to calculate the sum of the amount of change in the connection weights of the data divided.

（８）各プロセッシング・エレメントＰＥ、。、。、〜
Ｐ　Ｅ　、Ｄ−ｒ、　ｘ−ｎ　ニヨリ、結合重ミＷｒＪ
（Ｄ　を更新する。(8) Each processing element PE. ,. ,~
P E , D-r, x-n Niyori, combined weight WrJ
(Update D.

ΔＷ＋ｊ（ｔ）　　＝ηΣδＷ、ｊ（ＡＤ／Ｄ）Ｗ＝ｊ
（ｔ）＝Δ’Ｌ＝（ｔ）＋Ｗｉ＝（ｔ−１）ここで、η
は学習定数、ｔは学習回数である。ΔW+j(t) =ηΣδW, j(AD/D)W=j
(t)=Δ'L=(t)+Wi=(t-1)where, η
is a learning constant, and t is the number of times of learning.

（９）水平リングを使い、更新された結合重みＷ　ｉ　
ｊ（１）をデータ分割された他のプロセッシング。(9) Updated connection weights W i using horizontal rings
Other processing in which data is divided into j(1).

エレメントＰ　Ｅ　＜ｏｒ　ｕ　　〜Ｐ　Ｅ　ｔｎ−＋
、　Ｎｌｌ　ｌｓＭ転送する。Element P E <or u ~ P E tn-+
, Nll lsM transfer.

０ω　上記（１）〜（９）の処理をに回繰り返し行う。0ω Repeat the steps (1) to (9) above twice.

すなわち、この実施例の学習処理装置では、垂直リング
結合用のデータ転送メモリＶＭ、。、〜■Ｍ（、−１１
を介してリング結合された垂直リング上のＮ個のプロセ
ッシング・エレメントＰＥ（。〉〜ＰＥ、□、）に、入
力ベクトルＸ及び各ニューロンｎｉの結合重みＷの各要
素を分割して割り当て、中間結果や最大値、更新ニュー
ロンなどをこの垂直リングに沿って１周させながら、Ｋ
ＦＭアルゴリズムに従った学習処理を行う。これにより
、Ｎ＝４の場合を第４図に示しであるように、更新処理
時の負荷を各プロセッシング・エレメントＰＥ（ｏ）　
〜Ｐ　Ｅ　、、、均等に分担させることができ、並列処
理による効率のよい学習処理を行うことができる。That is, in the learning processing device of this embodiment, a data transfer memory VM for vertical ring coupling. ,~■M(,-11
Divide and assign each element of the input vector While passing the result, maximum value, update neuron, etc. along this vertical ring, K
Performs learning processing according to the FM algorithm. As a result, as shown in FIG. 4 in the case of N=4, the load during update processing is reduced to each processing element PE(o).
~P E , . . . It is possible to share the tasks equally, and it is possible to perform efficient learning processing through parallel processing.

さらに、この実施例の学習処理装置では、垂直リング結
合用のデータ転送メモリＶ　Ｍ　ｔ。、〜ＶＭＩＮ−１
１と水平リング結合用のデータ転送メモリ■Ｍｌ＠）〜
Ｖ　Ｍ　［Ｄ−１＋　　とを介してメツシュ結合された
ＮＸ０個のプロセッシング・エレメントＰＥ、。Furthermore, in the learning processing device of this embodiment, a data transfer memory V M t for vertical ring coupling is provided. ,~VMIN-1
1 and data transfer memory for horizontal ring connection ■Ml@)~
NX0 processing elements PE, mesh-coupled via V M [D-1+.

。）〜Ｐ　Ｅ　＋Ｄ−１＋　Ｎ−１１を用いて、垂直リ
ングによりネットワークをＮ分割するとともに、水平リ
ングによりデータをＤ分割して、ネットワーク分割法と
デー分割法を同時に併用した並列分散処理ににより処理
能力を向上させることができ、ＫＦＭアルゴリズムに従
った学習処理を高速に行うことができる。. )~P E +D-1+N-11, the network is divided into N by the vertical ring, and the data is divided into D by the horizontal ring, and by parallel distributed processing using both the network division method and the data division method at the same time. Processing capacity can be improved, and learning processing according to the KFM algorithm can be performed at high speed.

発明の効果上述のように、本発明に係る学習処理装置では、リング
結合された複数のプロセッシング・エレメントに、入力
ベクトル及び各ニューロンの結合重みの各要素を分割し
て各プロセッシング・エレメントに割り当てて、ＫＦＭ
アルゴリズムに従った学習処理を行うことにより、更新
処理時の負荷を均等にして、並列処理による効率のよい
学習処理を行うことができる。Effects of the Invention As described above, in the learning processing device according to the present invention, each element of the input vector and the connection weight of each neuron is divided into a plurality of ring-coupled processing elements and assigned to each processing element. , K.F.M.
By performing learning processing according to an algorithm, it is possible to equalize the load during update processing and perform efficient learning processing through parallel processing.

さらに、本発明に係る学習処理装置では、メツシュ結合
された各プロセッシング・エレメントを用いて２．トワ
ーク分割法とデー分割法を同時に併用することにより、
並列処理による処理能力を向上させることができる。Furthermore, in the learning processing device according to the present invention, 2. By simultaneously using the network partitioning method and the data partitioning method,
Processing capacity can be improved through parallel processing.

従って、本発明によれば、多数のプロセッサによる並列
処理によって、ニューラルネットワークに対するＫＦＭ
アルゴリズム従った学習処理を高速且つ少ないオーバー
へノドで行う学習処理装置を実現することができる。Therefore, according to the present invention, KFM for a neural network is
It is possible to realize a learning processing device that performs learning processing according to an algorithm at high speed and with little overload.

[Brief explanation of drawings]

第１図は本発明に係る学習処理装置の構成を概量的に示
すブロック図、第２図は上記学習処理装置を構成するプ
ロセッシング・エレメントの構成概念的に示すブロック
図、第３図は上記学習処理装置によりＫＦＭアルゴリズ
ムに従った学習処理の施されるニューラルネットワーク
の構成を概念的に示す構成図、第４図は上記学習処理装
置における垂直リング上の各プロセッシング・エレメン
トの動作状態を示す状態説明図である。第５図はニューラルネットワークを構成するニューロン
の機能を示す概念図、第６図はＫＦＭアルゴリズムに従
った学習処理の施されるニューラルネットワークの構成
を概念的に示す図、第７回はＫＦＭアルゴリズムに従っ
た学習処理を行う従来の学習処理装置における各プロセ
ッシング・エレメントの動作状態を示す状態説明図であ
る。Ｐ　Ｅ　（。、。＋、ＰＥ＋。１．〜Ｐ　Ｅ　（Ｄ−１
，Ｎ−１１・・・・プロセッシング・エレメントＶＭ＋ｏ、ｏ＋＋ＶＭｔｏ、ｎ〜ＶＭｎ＋−＋、ｓ−＋
＋・・・・垂直転送用データ転送メモリＨＭｌｏ、。）、ＨＭｔｏ、＋＋〜ＨＭｆ。−１＋Ｎ−
１１１００，水平転送用データ転送メモリFIG. 1 is a block diagram schematically showing the configuration of a learning processing device according to the present invention, FIG. 2 is a block diagram conceptually showing the configuration of processing elements constituting the learning processing device, and FIG. A configuration diagram conceptually showing the configuration of a neural network that undergoes learning processing according to the KFM algorithm by a learning processing device. FIG. 4 shows the operating state of each processing element on the vertical ring in the learning processing device. It is an explanatory diagram. Figure 5 is a conceptual diagram showing the functions of the neurons that make up the neural network. Figure 6 is a diagram conceptually showing the configuration of the neural network that undergoes learning processing according to the KFM algorithm. FIG. 2 is a state explanatory diagram showing the operating state of each processing element in a conventional learning processing device that performs learning processing according to the present invention. P E (.,.+, PE+.1.~P E (D-1
, N-11...Processing element VM+o, o++VMto, n~VMn+-+, s-+
+...Data transfer memory HMlo for vertical transfer. ), HMto, ++~HMf. -1+N-
11100, data transfer memory for horizontal transfer

Claims

[Claims]

(1) Input vector consisting of j elements and each j
The distance between each of the connection weights of a plurality of neurons constituting a network consisting of elements is compared, and the neuron having the minimum distance and the neurons close to it are determined as update neurons, and the above-mentioned connections of the update neurons are determined. In a learning processing device that forms a neural network that responds to the input vector by repeating a learning algorithm that changes the weights in a direction closer to the input vector, N
processing elements, divides the j connection weights of each neuron into N, divides the j elements of the input vector into N, and divides the connection weights into N and the elements of the input vector, respectively. is assigned to each of the processing elements, and the N processing elements perform an inner product calculation with the connection weight of the neuron and the input vector in parallel, and select the neuron and the neuron whose inner product value is the maximum. A nearby neuron is determined as the update neuron, the N processing elements perform update calculations of the connection weights of the update neuron in parallel, and a neural network responsive to the input vector is formed by the learning algorithm. A learning processing device characterized by:

(2) Input vector consisting of j elements and each j
The distance between each of the connection weights of a plurality of neurons constituting a network consisting of elements is compared, and the neuron having the minimum distance and the neurons close to it are determined as update neurons, and the above-mentioned connections of the update neurons are determined. In a learning processing device that forms a neural network that responds to the input vector by repeating a learning algorithm that changes the weights in a direction closer to the input vector, a data transfer memory for vertical ring coupling and a data transfer memory for horizontal ring coupling are provided. comprises N×D processing elements that are mesh-coupled via a data transfer memory for each neuron, and divides the j connection weights of each neuron into N and divides the j elements of the input vector into N. Then, the N-divided elements of the connection weight and the input vector are assigned to each of the processing elements, and the N processing elements perform the inner product calculation of the connection weight of the neuron and the input vector in parallel. The neuron with the largest inner product value and the neurons close to it are determined as the update neurons, and the N processing elements update the connection weights of the update neurons in parallel. , dividing the plurality of input vectors to be trained into D groups, and assigning each to D groups of horizontal coupling rings each consisting of N processing elements coupled via the horizontal ring coupling data transfer memory; A learning processing device, characterized in that update calculations of the connection weights for all the input vectors to be learned are performed in parallel, and a neural network responsive to the input vectors is formed by the learning algorithm.