JPH02100757A

JPH02100757A - Parallel neural network learning system

Info

Publication number: JPH02100757A
Application number: JP63254490A
Authority: JP
Inventors: Sumio Watanabe; 渡辺　澄夫
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1988-10-07
Filing date: 1988-10-07
Publication date: 1990-04-12

Abstract

PURPOSE:To increase the efficiency and accuracy of learning and to generate an output which can be predicted even for an unknown input by connecting neural networks in parallel and making the respective networks learn. CONSTITUTION:One neural network 1 is made to learn constant times first up to extremely small local parts. When a sufficiently desired output is not obtained by one network, the learning of the 1st neural network 1 is stopped and a 2nd neural network 2 is made to learn. In the learning of the 2nd neural network 2, the network 2 learns a difference between the output of the 1st network 1 and tutor data. Similarly, when an (n+1)th neural network is made to learn, the learning of up to an (n)th network is stopped and the (n+1)th network learns the difference between the sum of the outputs of up to the (n)th neural network and tutor data. Consequently, the difference between the sum of the neural networks and tutor data decreases, so the sufficiently desired output is obtained.

Description

【発明の詳細な説明】技幕分で− 本発明は、並列ニューラルネットワーク学習方式に関す
るものであり、例えば、音声認識、画像処理、自動翻訳
、連想記憶等、あらゆる情報処理関係の機器に応用でき
るものである。[Detailed Description of the Invention] The present invention relates to a parallel neural network learning method, and can be applied to all kinds of information processing-related equipment, such as speech recognition, image processing, automatic translation, associative memory, etc. It is something.

従」Ｕ支揉多層型パーセプトロンの学習方式としてバックプロパゲ
イジョン法が有効であることが明らかになりつつあるが
、この方式には次のような欠点がある。Although it is becoming clear that the backpropagation method is effective as a learning method for the U-branched multilayer perceptron, this method has the following drawbacks.

（、）この方式では誤差を極小にする点しか見つからず
、−度局所極小に落ちこむと学習が進まなくなる。(,) This method can only find points that minimize the error, and if the error falls to a local minimum, learning will not proceed.

（ｂ）学習するデータの個数が多くなると、学習の精度
が非常に落ちてしまう。(b) When the number of pieces of data to be learned increases, the accuracy of learning drops significantly.

（Ｃ）入力から出力に至るパターン変換方法が非線形か
つ複雑であるため、未知の入力に対する出力がどのよう
なものになるか予想が付きにくい。(C) Since the pattern conversion method from input to output is nonlinear and complicated, it is difficult to predict what the output will be for an unknown input.

これらの欠点を補う方法として、バックプロパゲイジョ
ン法を改良したものが考案さ九ているが（たとえば、ｃ
ｏｍｐｕｔｅｒ　ｔｏｄａｙ　　１９８８　／　９Ｎｏ
、２７．ｐｐ５４〜５９　”学習する発声システムの原
理″）、決定的な方法はまだ見つかっていない。To compensate for these shortcomings, improved backpropagation methods have been devised (for example, c
computer today 1988 / 9No.
, 27. pp54-59 "Principles of a learning vocal system"), no definitive method has yet been found.

頂−一一咋本発明は、上述のごとき実情に鑑みてなされたもので、
特に、与えられた入カバターンと教師データの対応関係
を学習することにより望ましい出力をするニューラルネ
ットワークを高精度かつ高速に構成することを目的とし
てなされたものである。The present invention was made in view of the above-mentioned circumstances.
In particular, it was developed with the aim of constructing a neural network with high precision and high speed that produces a desired output by learning the correspondence between a given input pattern and training data.

邊１−−皮本発明は、上記目的を達成するために、与えられた入力
とそれに対する教師データの対応関係をニューラルネッ
トワークに学習させる場合において、一つ以上のニュー
ラルネットワークを並列につなぎ、各ネットワークに順
次学習させることにより、学習の効率、精度をあげ、ま
た、未知の入力に対しても予想可能な出力をさせること
を特徴としたものであり、より具体的には、上記学習方
法において、並列につながれた一つ以上のニューラルネ
ットワークにｎ＝１．２，３．、、、、と順次学習させ
る場合において、ｎ個目までの学習によって学習できず
残された、教師データと出力の差を（ｎ＋１）個目のニ
ューラルネットワークに学習させること、或いは、各ニ
ューラルネットを多層型ニューラルネットとし、各ニュ
ーラルネットの層の数をｎによって変化させること、或
いは、各ニューラルネットワークをパックプロパゲイジ
ョン法で学習させ、ｎ個目のニューラルネットの学習か
ら（ｎ＋１）個目のニューラルネットの学習に移る時期
を、ｎ個目のニューラルネットの学習が局所極小に落ち
たことを判定することによって決めることを特徴とする
ものである。以下、本発明の実施例に基づいて説明する
。In order to achieve the above object, the present invention connects one or more neural networks in parallel to learn the correspondence between a given input and training data for each input. It is characterized by increasing the efficiency and accuracy of learning by sequentially training the network, and also making it possible to produce predictable output even in response to unknown input.More specifically, in the above learning method, , one or more neural networks connected in parallel with n=1.2, 3. , , , , etc., in the case of sequential learning, the (n+1)th neural network learns the difference between the teacher data and the output that was not learned after learning up to the nth neural network, or each neural network is a multilayer neural network, and the number of layers in each neural network is varied by n, or each neural network is trained using the pack propagation method, and the (n+1)th neural network is trained from the nth neural network to the (n+1)th neural network. This method is characterized in that the timing to start learning the n-th neural network is determined by determining whether the learning of the n-th neural network has fallen to a local minimum. Hereinafter, the present invention will be explained based on examples.

ニューラルネットの実現方法は、ソフトウェア及びハー
ドウェアの両方の方法が考えられるが、以下ではその両
方を対象とする。Both software and hardware methods are conceivable as methods for implementing a neural network, and both will be considered below.

第１図は、本発明の一実施例を説明するためのアルゴリ
ズムのフローチャートであるが、まず、第１図を参照し
ながら、本発明の概要について説明する。FIG. 1 is a flowchart of an algorithm for explaining one embodiment of the present invention.First, an overview of the present invention will be explained with reference to FIG.

与えられた入カバターンと教師データの関係を。The relationship between the given input pattern and the training data.

並列につながれた一つ以上のニューラルネットワークに
学習させる場合において、最初に、一つ目のニューラル
ネットワークに公知の方式であるバックプロパゲイジョ
ン法で局所極小に至るまで、あるいは一定の回数だけ学
習させる。もし、ひとつのネットワークだけで十分望ま
しい出力が得られるようになれば学習を終了し、そうで
ないときは、一つ目のニューラルネットの学習は停止し
て、二つ目のニューラルネットの学習に移る。二つ目の
ニューラルネットの学習においては、一つ目のネットワ
ークの出力と教師データとの差を学習させる。以下、同
様にして、（ｎ＋１）個目のニューラルネットワークの
学習においては、ｎ個目までのネットワークの学習を停
止し、ｎ個目までのニューラルネットワークの出力の和
と教師データとの差を学習させる。各ネットワークの学
習の結果、ニューラルネットの出力の和と教師データと
の差は減少してゆくので、十分望ましい出力が得られる
ようになったところで学習を停止すれば、並列ネットワ
ークの学習が達成される。この方式により、単独のニュ
ーラルネットでは得られない精度の学習が可能になる。When training one or more neural networks connected in parallel, first the first neural network is trained using a well-known backpropagation method until it reaches a local minimum or a certain number of times. . If the desired output can be obtained with only one network, the training is terminated. If not, the training of the first neural network is stopped and the training of the second neural network is started. . In learning the second neural network, the difference between the output of the first network and the training data is learned. Similarly, in learning the (n+1)th neural network, the learning of up to the nth network is stopped, and the difference between the sum of the outputs of the up to nth neural networks and the training data is learned. let As a result of each network's learning, the difference between the sum of the neural network outputs and the training data will decrease, so if you stop learning when the desired output is obtained, the parallel network will be trained. Ru. This method enables learning with a precision that cannot be achieved with a single neural network.

次に１本発明の一つ目の実施例を挙げて説明する。Next, a first embodiment of the present invention will be described.

第２図は１本発明の一つ目の実施例を説明するための図
で、ｌ、２．３は並列につなが九たニューラルネットワ
ークを表わし、図中の実線は、ユニットとユニットを結
ぶリンクであり、矢印の方角にだけ情報を伝達する。伝
達の方法は次の通りである。Figure 2 is a diagram for explaining the first embodiment of the present invention, where 1 and 2.3 represent nine neural networks connected in parallel, and solid lines in the diagram are links connecting units. , and information is transmitted only in the direction of the arrow. The method of communication is as follows.

（ａ）点線の場合：値を変化させずに伝える。(a) In the case of dotted line: The value is transmitted without changing.

（ｂ）実線の場合：各リンクには重み係数が与えられて
いて、入力値をその重み関数倍した値を出力する。(b) In case of solid line: Each link is given a weighting coefficient, and a value obtained by multiplying the input value by the weighting function is output.

また、図中の三角、四角、六角、丸はネットワークを構
Ｊ戊しているユニットで、それぞれの入力、出力関係は
１次の通りである。Further, the triangles, squares, hexagons, and circles in the figure are units that make up the network, and their input and output relationships are linear.

（ａ）三角の場合：出力値はそのユニットに集まる入力
値の和である。(a) Triangular case: The output value is the sum of the input values collected in that unit.

（ｂ）四角の場合：出力値はそのユニットに入る入力値
に等しい。(b) For the square case: the output value is equal to the input value entering that unit.

（Ｃ）六角の場合：出力値はそのユニットに入る入力値
Ｘ１倍率ａとバイアスｂとを用いてａｘ＋ｂである。(C) In the case of hexagon: the output value is ax+b using the input value X1 entering the unit, the multiplication factor a and the bias b.

（ｄ）丸の場合：出力値はそのユニットに集まる入力値
の和（ｎｅｔ）、そのユニットのバイアスＯと単調増加
関数ｆを用いてｆ（ｎｅｔ）と表わされる。具体的には
１例えば。(d) In the case of a circle: The output value is expressed as f(net) using the sum (net) of the input values gathered in that unit, the bias O of that unit, and a monotonically increasing function f. Specifically, for example.

ｆ（ｘ）＝１／　（１＋ｅｘｐ（−ｘ））とすれば良い
。It is sufficient to set f(x)=1/(1+exp(-x)).

このような構造のネッ１へワークについて１次のように
学習を行う。Learning is performed on the network 1 having such a structure in a first-order manner.

（１）ニューラルネット１の学習方法ｆとしてｆ（ｘ）＝］／（１＋ｅｘｐ（−１））を用い
る場合、出力値の範囲は区間（０，１）になるので、出
力値が教師データの値を出力可能なように、図の六角形
の二二ソ１−の倍Ｉ　ａ及び、バイアスｂを学習を始め
る前に選んでおく。例えば、教師データの値の範囲が区
間（Ａ、Ｂ）ならば。(1) When using f(x)=]/(1+exp(-1)) as the learning method f for neural network 1, the range of the output value is the interval (0, 1), so the output value is the same as the training data. In order to be able to output the values, select the hexagonal 22 so 1- times Ia and bias b before starting learning. For example, if the value range of the teacher data is the interval (A, B).

ａ＝Ｂ−Ａｂ＝Ａと選んでおけば良い。a=B-A b=A All you have to do is choose.

次に、ニューラルネット１以外のネット出力を０にして
、周知の方法パックプロパゲイジョンによってニューラ
ルネット１の学習を行う。すなわちユニットｉがらユニ
ッＩ−ｊへのリンクの重み係数をｗｊ；、、ユニソｌ−
ｊのバイアスをｏｊとするとき、与えられた教師データ
と出力値との二乗誤差が最小になるようにｗｉｊとｏｊ
とを変化させる。しかし、バンクプロパゲイジョン法で
は、最急降下法を用いて最小点をさがすために、実際は
最小点ではなく、極小点に至ると学習が進まなくなって
しまう。学習が極小点に達したかどうかは次のようにす
れば判定できる。すなわち、データ全朱における、教師
データと出力値との二乗誤差が十分長い学習回数の間不
変であれば、局所極小である。局所極小に達した場合、
あるいは、あらかじめ定められた回数に学習回数が到達
した場合は、そこでニューラルネット】−の学習を停止
する。Next, the net outputs other than the neural network 1 are set to 0, and the neural network 1 is trained by the well-known method pack propagation. That is, the weighting coefficient of the link from unit i to unit I-j is wj;, unisol l-
When the bias of j is oj, wij and oj are set so that the square error between the given teaching data and the output value is minimized.
and change. However, in the bank propagation method, the steepest descent method is used to search for the minimum point, so when the minimum point is actually reached, rather than the minimum point, learning stops progressing. Whether learning has reached the minimum point can be determined as follows. That is, if the squared error between the teacher data and the output value in all red data remains unchanged for a sufficiently long number of learning times, it is a local minimum. When a local minimum is reached,
Alternatively, when the number of learning times reaches a predetermined number of times, the learning of the neural network ]- is stopped at that point.

（２）ニューラルネット２以上の学習方法次に、ニュー
ラルネット２の学習を行う。この場合は、与えられた入
力データに対して、ニューラルネット１は上記の学習の
結果による出力をするものとする。従って、ニューラル
ネット２の学習における教師データは、全ニューラルネ
ットに対する教師データからニューラルネット１の出力
を差し引いたものである。学習を始める前に、このニュ
ーラルネット２のための教師データから、（１）と同様
に出力値が教師データの値を出力可能なように、ニュー
ラルネット２の出力変換用の六角形で表わされたユニッ
トの倍率、バイアスを決めておく。学習方法と局所極小
判定法は（１）と同じである。(2) Learning method for neural net 2 or higher Next, learning for neural net 2 is performed. In this case, it is assumed that the neural network 1 outputs the result of the above learning for the given input data. Therefore, the training data for neural network 2 is obtained by subtracting the output of neural network 1 from the training data for all neural networks. Before starting learning, from the training data for this neural network 2, the output value is represented by a hexagon for output conversion of the neural network 2 so that the value of the training data can be outputted as in (1). Determine the magnification and bias of the unit. The learning method and local minimum determination method are the same as (1).

以ド、同様にして、３，４，５．、、、のニューラルネ
ットの学習を行う。（ｎ＋１）個目のネッ１へワークの
学習をしているときは、ｎ個目までのネットワークは、
それまでの学習の結果得られた出力をするものとする。Similarly, 3, 4, 5. , , perform neural network training. When learning the work to the (n+1)th network 1, the networks up to the nth
It is assumed that the output obtained as a result of the learning up to that point is output.

望ましい出力が得られたら、そのｎにおいて学習を終了
する。When a desired output is obtained, learning is terminated at n.

なお、ニューラルネットの層の数は各ニューラルネット
によって異なっても良い。Note that the number of layers of the neural network may differ depending on each neural network.

次に１本発明の他の実施例について説明するが、この実
施例においても、前記実施例と同様に学習を行うが、た
だし異なる点は、ニューラルネッ１−１の層の個数を１
個にすることである。すなわち、第２図の丸で表わされ
たユニツ１〜の層の数を１にすることである。この場合
、このネットワーク（すなわち、単層ネットワーク）だ
けでは線形分離可能なパターンだけが学習できることが
知られている。この例では、その性質を利用して、最初
のネットワークで、入力と出力値との線形な関係を学習
させる。そして、２番目以上のネットワークで、２層以
上のネットワークを用いて非線形な関係を学習させる。Next, another embodiment of the present invention will be described. In this embodiment, learning is performed in the same manner as in the previous embodiment, but the difference is that the number of layers of the neural network 1-1 is reduced to 1.
It is to make it individual. That is, the number of layers of units 1 to 1 represented by circles in FIG. 2 is set to one. In this case, it is known that only linearly separable patterns can be learned using this network (ie, a single-layer network). In this example, we use this property to make the first network learn a linear relationship between input and output values. Then, in the second or higher networks, nonlinear relationships are learned using networks with two or more layers.

この場合、ニューラルネッ１へ１の出力が全出力に及ぼ
す影響が大きいので、入力と出力との線形的な関係を重
視したニューラルネットを構成されることになる。また
、線形関数の場合、重ねあわせの定理が成立することか
ら、未知のデータに対する出力がどのようなものになる
か、推測できるニューラルネットが得られる。In this case, since the output of 1 to neural network 1 has a large influence on the total output, a neural network is constructed that emphasizes the linear relationship between input and output. Furthermore, in the case of linear functions, since the superposition theorem holds, a neural network can be obtained that can estimate what the output will be for unknown data.

蛎−一末以上の説明から明らかなように、本発明によると、与え
られた入カバターンと教師データとから、望ましい出力
をするニューラルネットワークが、高精度かつ高速に得
られる。As is clear from the above description, according to the present invention, a neural network that produces a desired output can be obtained with high accuracy and high speed from given input patterns and training data.

[Brief explanation of the drawing]

第１図は、本発明の一実施例を説明するためのアルゴリ
ズム、第２図は、並列ニューラルネットワークの一例を
示す図である。１〜３・・・ニューラルネット。特許出願人　　株式会社　リコーFIG. 1 is an algorithm for explaining an embodiment of the present invention, and FIG. 2 is a diagram showing an example of a parallel neural network. 1-3...Neural net. Patent applicant Ricoh Co., Ltd.

Claims

[Claims]

1. When making a neural network learn the correspondence between a given input and its corresponding training data, connecting one or more neural networks in parallel and having each network learn sequentially increases learning efficiency and accuracy. , a parallel neural network learning method that is characterized by producing predictable outputs even in response to unknown inputs.