JPH0415860A

JPH0415860A - Method for making learning of neural network efficient

Info

Publication number: JPH0415860A
Application number: JP2117490A
Authority: JP
Inventors: Junichi Tateno; 純一舘野; Kazuya Asano; 一哉浅野
Original assignee: Kawasaki Steel Corp
Current assignee: JFE Steel Corp
Priority date: 1990-05-09
Filing date: 1990-05-09
Publication date: 1992-01-21

Abstract

PURPOSE:To improve learning efficiency by enlarging an update step width as against a pair of an input pattern and a teach signal by prescribed multiple when learning effect as against a pair of the specified input pattern and the teach signals is not improved. CONSTITUTION:The sum of differences between the desired output values and real output values of all output units as against the prescribed input pattern is obtained at every repetitive calculation of learning using a back propagation method. When the sum is larger than a previously set threshold, the update step width as against a pair of the input pattern and the teach signal is enlarged by prescribed multiple. 1.2-1.6 multiple is suitable for the update step width. Thus, learning efficiency improves.

Description

【発明の詳細な説明】［産業上の利用分野ｊ本発明はニューラル・ネットワークの学習効率化方法に
関し、さらに詳しくは、文字認識や音声認識等のパター
ン認識などに用いられるニューラル・ネットワークに、
バックプロパゲーション法を用いて学習を行う際の学習
効率化方法に関する。[Detailed Description of the Invention] [Industrial Application Fields] The present invention relates to a method for improving the learning efficiency of neural networks, and more specifically, to neural networks used for pattern recognition such as character recognition and speech recognition.
This paper relates to a method for improving learning efficiency when learning using the backpropagation method.

［従来の技術１本発明が改善しようとするバックブロバゲションの学習
アルゴリズムは、第２図に示すような入力層、出力層及
び中間層で形成される多層ニューラル・ネットワークの
学習を行うものであり、以下に説明する。[Prior art 1] The learning algorithm for backblowing that the present invention seeks to improve is one that trains a multilayer neural network formed by an input layer, an output layer, and an intermediate layer as shown in FIG. Yes, and will be explained below.

ネットワークのユニット間が人力層から出力層にもかっ
て結合されており、第３図に示すようにそれぞれのユニ
ットＪでは他のユニットｌからの入力、すなわちユニッ
トｌの出力Ｏ１とユニット１．５間の結合重み係ｈ　Ｗ
　ｊ　＜の積の総和ｎｅｔｉ＝ΣＷｊｉ−Ｏｉをとり、さらに入出力関数ｆ（ｘ’）を通して、出力信号　Ｏｊ
＝　ｆ　（ｎ　ｅ　ｔ　ｉ）を出力する。The units of the network are connected from the human power layer to the output layer, and as shown in FIG. The connection weight coefficient h W
Take the sum of the products neti = ΣWji - Oi, and further pass it through the input/output function f(x') to obtain the output signal Oj
Output = f (net i).

すなわち、ある入力信号のパターンをネットワークの入
力層に入力したときに、上述のような計算をすべてのユ
ニットで行い（但し、人力層では入出力関数を通さない
ことが多い）、最終的に出力層から出た信号パターンが
望ましいパターン（以下教師信号という）になるように
、ユニット間の結合重み係数を決定する。ここでは、評
価関数として教師信号とニューラル・ネットワークの出
力信号の誤差の２乗和Ｅｐを用いている。In other words, when a certain input signal pattern is input to the input layer of the network, the calculations described above are performed in all units (however, in the human layer, it is often not passed through the input/output function), and the final output is The coupling weight coefficients between units are determined so that the signal pattern output from the layer becomes a desired pattern (hereinafter referred to as a teacher signal). Here, the sum of squares Ep of errors between the teacher signal and the output signal of the neural network is used as the evaluation function.

Ｅｐ＝（１／２）　　・Σ（ｔｐｊ　　Ｏｐｉ）２−・
・（１）ここでｔｐｊは入カパターーンＰに対する出カニニット
ｊの教師信号でありであり、Ｏｐｊは出カニニットｊの
出力信号である。この誤差関数Ｅｐをすべての入カパタ
ーーンに対して最小にする必要がある。従って、問題は
、Ｅ＝ΣＥｐを最小にするような結合重み係数を決定す
るという最小化問題となる。Ep=(1/2) ・Σ(tpj Opi)2−・
(1) Here, tpj is the teacher signal of the output unit j for the input pattern P, and Opj is the output signal of the output unit j. This error function Ep needs to be minimized for all input patterns. Therefore, the problem becomes a minimization problem of determining the coupling weighting coefficients that minimize E=ΣEp.

この問題を解くために、バックプロパゲーション法ては
最急降下法を用いている。バックプロパゲーション法に
ついての詳細はチー・イー・’ｙｌルハート他「並列分
散処理」工人？イテ仕出版（Ｄ、　Ｅ、　Ｒｕｍｅｌｈ
ａｒｔｅｔ　ａｌ：　Ｐａｒａｌｌｅｌ　Ｄｉｓｔｒｉ
ｂｕｔｅｄ　Ｐｒｏｃｅｓｓｉｎｇ、　ＭＩＴＰｒｅｓ
ｓ　ｆ１９８６１１に述べられている。To solve this problem, the backpropagation method uses the steepest descent method. For more information on the backpropagation method, please refer to Qi Yi'ylhart et al.'s ``Parallel Distributed Processing'' engineer. Ite Publishing (D, E, Rumelh
Artet al: Parallel Distri
Butted Processing, MITPres
s f198611.

バックプロパゲーション法では、以下のように結合重み
係数を決定する。In the backpropagation method, the connection weighting coefficients are determined as follows.

ユニット１からユニットｊへの結合重み係数ＷＨに関し
て、２番目の入出力データに対する更新量（変化量）Δ
ρＷｊｊは ΔρＷｊｉ：ｒ）・δｐｊ・Ｏｐｉ　　　　　・−（２
）で与えられる。但し、 η　＝更新ステップ幅（定数） δｐｊ”Ｏｐｊ・　（１０ｐｊ）　　・　（ｔｐｊ−Ｏ
ｐｊ）（ユニットＪが出カニニットのとき） δｐｊ＝ｏｐｊ　・（ｌ　　０ｐｊ）　　・Σδｐｉｃ
Ｗｋｊ（ユニットｊが中間ユニットのとき）ｔｐｊ：２番目の入力パターンに対するユニットＪの教
師信号Ｏｐｊ：２番目の入カパターーンに対するユニットＪの
実際の出力値である。Regarding the connection weighting coefficient WH from unit 1 to unit j, the amount of update (amount of change) Δ for the second input/output data
ρWjj is ΔρWji:r)・δpj・Opi・−(2
) is given by However, η = update step width (constant) δpj”Opj・(10pj)・(tpj−O
pj) (When unit J is a crab unit) δpj=opj ・(l 0pj) ・Σδpic
Wkj (when unit j is an intermediate unit) tpj: Teacher signal of unit J for the second input pattern Opj: Actual output value of unit J for the second input pattern.

このように、バックプロパゲーション法の基本原理は最
急降下法であるため、最短距離で最小値に到達するため
には、更新幅ΔＷｊｉを無限小にする必要があるが、実
際問題として、計算繰り返し回数が増加するため収束速
度は遅くなる。そこで、なるべく大きな更新幅ΔＷｊｉ
を得るために、上述の（２）式においてηの値を太き（
とりたいが、更新方向が振動しやすくなる。そこでＲｕ
ｍｅｌｈａｒｔの文献では、前回の更新量（変化量）を
慣性項として加え。In this way, the basic principle of the backpropagation method is the steepest descent method, so in order to reach the minimum value in the shortest distance, the update width ΔWji must be made infinitely small. As the number of times increases, the convergence speed becomes slower. Therefore, the update width ΔWji is as large as possible.
To obtain , the value of η in equation (2) above is increased (
I want to do this, but the update direction tends to oscillate. So Ru
In Melhart's literature, the previous update amount (change amount) is added as an inertia term.

ΔρＷｊ（（ｎ＋１．　）＝η・δｐｊ−Ｏｃｓｉ＋　ａ・△ρＷｊ＜（ｎ）・・
・　（３）とすることを提案している。但し、ａ：定数である。ΔρWj ((n+1.) = η・δpj−Ocsi+ a・△ρWj<(n)・・
・(3) It is proposed that However, a: is a constant.

−Ｍに（３）式において、定数η及びａの値は経験的に
決めている。特開平１−３２０５６５号公報では、学習
の繰り返し計算ごとあるいは何回かの学習繰り返し計算
に１回の割合で、ニュラル・ネットワークの出力誤差が
最小もしくは最小に近い値をとるように、この定数ηと
αの値を変更するとを提案している。-M In equation (3), the values of the constants η and a are determined empirically. In Japanese Unexamined Patent Publication No. 1-320565, this constant η is set so that the output error of the neural network takes a minimum value or a value close to the minimum value for each repeated learning calculation or once for several repeated learning calculations. It is proposed that the value of α be changed.

［発明が解決しようとする課題］上述の（３）式において、定数η及びａの値を経験的に
決めていたのでは、学習に時間がかかったり、不適当な
局所最小値（ローカル・ミニマム）に陥りやすいという
問題点がある。[Problems to be Solved by the Invention] In the above equation (3), if the values of the constants η and a were determined empirically, learning would take a long time and an inappropriate local minimum value (local minimum value) would be generated. ) is easy to fall into.

特開平１−３２０５６５号公報は、学習繰り返し計算ご
とに出力誤差Ｅｐが小さくなるように定数η及びαを決
めるため、定数η及びαの値を有限個用意してその中か
らＥｐが最も小さくなる定数η及びａを退択するという
ものである。この方法では、定数η及びａの値を変更す
るごとに、用意した定数η及びａの組み合わせの数だけ
Ｅｐを求める計算をし、さらにその中から最小のＥｐを
求めなければならないという問題点がある。JP-A-1-320565 discloses that in order to determine the constants η and α so that the output error Ep becomes small for each repeated learning calculation, a finite number of values of the constants η and α are prepared and Ep is the smallest among them. The constants η and a are rejected. This method has the problem that each time the values of the constants η and a are changed, Ep must be calculated for the number of combinations of the constants η and a prepared, and the minimum Ep must be calculated from among them. be.

また、学習時においてはいくつかの入力パタンと教師信
号との対の集合について繰り返し計算するが、どれか特
定の入カパターーンに対する学習効果が他に比べて上が
らず、学習に時間がかかることが多い。これは、１回の
学習繰り返し計算においてすべての入カパターーンと教
師信号の対に対して同じ条件で、つまり、（３）式にお
いて定数η及びαの値を同じにしていることが原因と考
えられる。本発明の目的は、どれか特定の入力パターン
と教師信号の対に対する学習効果が上がっていないとき
、定数ηの値を変更してその対に対する学習条件を変更
して学習効率を改善するニューラル−ネットワークの学
習効率化方法を提供することである。In addition, during learning, calculations are repeated for a set of pairs of input patterns and teacher signals, but the learning effect for any particular input pattern is not as good as for others, and learning often takes a long time. . This is thought to be due to the fact that the same conditions are used for all input pattern and teacher signal pairs in one learning iteration calculation, that is, the values of the constants η and α are the same in equation (3). . An object of the present invention is to improve the learning efficiency of a neural network by changing the value of the constant η and changing the learning conditions for that pair when the learning effect for a particular input pattern and teacher signal pair is not improved. The purpose of this invention is to provide a method for improving network learning efficiency.

［課題を解決するための手段１ −Ｂに、パックプロパゲーション法における教師信号は
出カニニットの数だけデータ数を持ち、それぞれユニッ
トの所望の出力値を表わしている。学習繰り返し計算ご
とにその時点での出カニニットの値を観察していると、
どれか特定のユニットの値が変動せず、所望の出力値と
大きく離れた値のままとなっていることが多い。つまり
その出カニニットを所望の出力値とするための入力パタ
ーンによる学習の効果が上がっていないわけである。[Means for Solving the Problems 1-B] The teacher signal in the pack propagation method has the same number of data as the number of output units, each representing a desired output value of the unit. When observing the value of output kaninits at that point for each repeated learning calculation,
In many cases, the value of a particular unit does not change and remains at a value that is significantly different from the desired output value. In other words, the effect of learning based on the input pattern for setting the output value to the desired output value is not improved.

本発明は、バックプロパゲーション法を用いた階層型ニ
ューラル・ネットワークの学習に適用され、次の技術手
段を採った。すなわち、学習の繰返し計算ごとに、ある
入力パターンに対するすべての出カニニットの所望の出
力値と実際の出力値の誤差の総和を求め、総和があらか
じめ設定した閾値より大きい時に、入力パターンと教師
信号の対に対する更新ステップ幅を所定倍数大きくする
ことを特徴とするニューラル・ネットワークの学習効率
化方法である。The present invention is applied to learning a hierarchical neural network using the backpropagation method, and employs the following technical means. In other words, for each repeated calculation of learning, the sum of the errors between the desired output value and the actual output value of all output units for a certain input pattern is calculated, and when the sum is greater than a preset threshold, the difference between the input pattern and the teacher signal is calculated. This is a method for improving the learning efficiency of a neural network, which is characterized by increasing the update step width for a pair by a predetermined multiple.

更新ステップ幅は１．２〜１６倍が好適である。The update step width is preferably 1.2 to 16 times.

すなわち、更新ステップ幅が１．２から１．６倍までの
範囲内であれば、最も学習回数が少なく、学習効率が向
上する。この理由として、更新ステップ幅が１．２倍未
満では更新効果が小さすぎるために学習回数が増加する
。また、一方更新ステップ幅が１．６倍を越えると、更
新方向が振動しやすくなり、更新量Δ、Ｗｊｉは一定値
に収束しにくくなるので、かえって学習効率は低下する
。That is, if the update step width is within the range of 1.2 to 1.6 times, the number of times of learning is the smallest and the learning efficiency is improved. The reason for this is that when the update step width is less than 1.2 times, the update effect is too small and the number of learning increases. On the other hand, if the update step width exceeds 1.6 times, the update direction tends to oscillate, and the update amounts Δ and Wji become difficult to converge to a constant value, so that the learning efficiency decreases on the contrary.

［作用］本発明では、次式により結合重み係数Ｗｊｉに関してｐ
番目の入出力データに対する更新量ΔρＷｊｉを決定す
る。[Operation] In the present invention, p
The update amount ΔρWji for the th input/output data is determined.

△ｐＷｊｉ（ｎ＋１）＝η・η′　・δρｊ・Ｏｐｉ　　　　・・−（４）ま
たは ΔｐＷｊｉ（ｎ＋１）＝η　゛　η′　°　δｐｊ　°Ｏρｉ十ａ・ΔｐＷｊ
ｉ（ｎ）　　　　・・・（５）においで各１０ｐｊ　　ｔｐｊｌ＞ｓのとき　η′＝ｔ各１０ｐ
ｊ　　ｔｐｊｌ≦Ｓのとき　η′＝１ここで、Ｓは閾値
、ｔは更新ステップ幅であり、厳密には閾値Ｓと更新ス
テップ幅ｔの適切な値はタスクに応してくるが、実験例
では閾値　　　　　　　ｓ　＝０．７更新ステップ幅　　１．２≦ｔ≦１．６で学習の効率化
を図ることができた。△pWji(n+1) =η・η′ ・δρj・Opi ・・−(4) or ΔpWji(n+1) =η ゛ η′ ° δpj °Oρi tena・ΔpWj
i(n) ... 10pj each in (5) When tpjl>s η'=t 10p each
j When tpjl≦S η'=1 Here, S is the threshold and t is the update step width. Strictly speaking, appropriate values for the threshold S and update step width t depend on the task, but in the experimental example In this case, learning efficiency could be improved by setting the threshold value s = 0.7 and the update step width 1.2≦t≦1.6.

〔実施例１第１図は、本発明の実施例である文字認識のための階層
型ニューラル・ネットワークの構造を示す説明図である
。実験条件は以下の通り。[Embodiment 1] FIG. 1 is an explanatory diagram showing the structure of a hierarchical neural network for character recognition, which is an embodiment of the present invention. The experimental conditions are as follows.

（１）ニューラル・ネットワークの構造は、入力層のユ
ニットが１６０個、中間層のユニットが２０個、出力層
のユニットが１０個とし、ユニットの入出力関数にはｆ　（ｘ）＝１／　（１＋ｅｘｐ　（−ｘ））を用いる
。(1) The structure of the neural network is that there are 160 units in the input layer, 20 units in the middle layer, and 10 units in the output layer, and the input/output function of the unit is f (x) = 1/ ( 1+exp (-x)) is used.

（２）認識対象は数字０〜９までの１０文字とし、出カ
ニニットはそれぞれの文字と対応している。(2) The objects to be recognized are 10 characters from numbers 0 to 9, and the output characters correspond to each character.

（３）文字は１６０のセグメントに分割して、それぞれ
のセグメントを入カニニットに対応させ、文字部分なら
ｌ、背景部分なら０を特徴とする特許（４）学習はパックプロパゲーション法により行い、学
習前の結合重み係数は乱数により与える。(3) Characters are divided into 160 segments, and each segment is made to correspond to an inkaninit, and the character is characterized by l for the character part and 0 for the background part. (4) Learning is performed by pack propagation method. The previous connection weighting coefficient is given by a random number.

結合重み係数の更新は（３）式及び（５）式を用いた。Equations (3) and (5) were used to update the connection weighting coefficients.

それぞれの式において、ｎ＝０．７５、　　　ａ　＝　０．８とした。また学習の終了を決める収束条件は、Ｅ、＜０
．０８　　　とした。In each equation, n=0.75 and a=0.8. The convergence condition that determines the end of learning is E, <0
．． 08.

（５）実験は、Ｎｏｌ　　：　　（５）式において、ｓ　＝０．７、ｔ　＝　１．２としだものＮｏ２　　：
　　（５）式において、ｓ　＝　０．７、ｔ＝１．４としだものＮｏ３　　：　
　（５）式において、ｓ　＝　０．７、ｔ＝１．６としたものＮｏ４　　：　
　（３）式によるものについて、それぞれ初期状態を変えて３回の実験を行っ
た。(5) In the experiment, Nol: In equation (5), s = 0.7, t = 1.2 and Shimono No2:
In formula (5), s = 0.7, t = 1.4, and Shimono No. 3:
In formula (5), s = 0.7 and t = 1.6 No.4:
Three experiments were conducted using the formula (3), each with a different initial state.

第１表に、以上の実験結果を示した。ここで。Table 1 shows the above experimental results. here.

学習回数は入カパターーンと教師信号の対に対する回数
で、１回の学習繰り返しでは、１０文字分１０回の学習
を行っている。The number of learning times is the number of times for a pair of an input pattern and a teacher signal, and in one learning repetition, 10 characters are learned 10 times.

第　　　ｌ　　　表以上のように、本発明によれば（４）式又は（５ン式に
おいてＳとｔの値を上述のように適宜に決め、入力パタ
ーンと教師信号の対による学習ごとに適切な更新ステッ
プ幅を設定することにより、結果として学習の効率化を
達成することができた。また、この更新ステップ幅を決
定するために、入力パターンの呈示ごとにすべての出カ
ニニットの総和を求め、その値を前もって設定した値Ｓ
と大小の比較を行う必要があるが、これは実用上問題に
ならない程度の計算量である。Table l As shown above, according to the present invention, the values of S and t in equation (4) or (5) are appropriately determined as described above, and appropriate values are determined for each learning using a pair of input pattern and teacher signal. By setting the update step width, we were able to improve learning efficiency.Also, in order to determine this update step width, we calculated the sum of all output kaninits for each presentation of the input pattern. The value S is set in advance
Although it is necessary to compare the sizes of

［発明の効果］本発明はニューラル・ネットワークの学習効率化に優れ
た効果を奏する。[Effects of the Invention] The present invention has excellent effects on improving the learning efficiency of neural networks.

[Brief explanation of drawings]

第１図は本発明の一実施例の階層型ニューラル・ネット
ワークの構造を示す説明図、第２図は階層型ニューラル
・ネットワークの一般構造を示す説明図、第３図はネッ
トワークの各ユニ・ントの入出力関係を示す説明図であ
る。Ｗｊｉ・・・結合重み係数FIG. 1 is an explanatory diagram showing the structure of a hierarchical neural network according to an embodiment of the present invention, FIG. 2 is an explanatory diagram showing the general structure of a hierarchical neural network, and FIG. FIG. 2 is an explanatory diagram showing the input/output relationship. Wji... Connection weight coefficient

Claims

[Claims] 1. In learning a hierarchical neural network using the backpropagation method, the error between the desired output value and the actual output value of all output units for a certain input pattern is determined for each repeated learning calculation. 1. A method for improving the learning efficiency of a neural network, characterized in that the total sum is calculated, and when the sum is larger than a preset threshold, the update step width for the input pattern and teacher signal pair is increased by a predetermined multiple. 2. The method for improving learning efficiency of a neural network according to claim 1, wherein the update step width is 1.2 to 1.6 times.