JPH0363870A

JPH0363870A - Learning method and neural network structure

Info

Publication number: JPH0363870A
Application number: JP2150083A
Authority: JP
Inventors: Sherif Makram-Ebeid; シエリフ　マクラム‐エビード
Original assignee: Philips Gloeilampenfabrieken NV
Current assignee: Koninklijke Philips NV
Priority date: 1989-06-09
Filing date: 1990-06-11
Publication date: 1991-03-19
Also published as: EP0401927B1; FR2648251B1; US5630020A; DE69029538T2; DE69029538D1; FR2648251A1; EP0401927A1

Abstract

PURPOSE: To accelerate the learning of an input layer, and to decelerate the learning of an output layer by correction added to a neuron state by proportionating change to a component and a constant by steps for multiplying the component by a constant decided according to each layer, and decreasing the constant according to the number of the layers from an input layer to an output layer. CONSTITUTION: In a learning method including each layer of the update of a synapse coefficient based on change ▵Xj ,L, a gradient component gj ,L is multiplied by a parameter θj ,L for deciding the next change of a neuron state. Thus, change ▵Xj ,L proportional to -θj ,L gj ,L' is calculated. Here, θj ,L is decided according to the state of a neuron (j) of a layer 1, and when 0<=θ1 <+> <=1, and -gj ,L has different codes, θj ,L=1, and when -gj ,L and Xj ,L have the same codes, θj ,L=θL<+> . Thus, the learning of the input layer can be accelerated, and the learning of the output layer can be decelerated by the correction added to the neuron state.

Description

【発明の詳細な説明】技逝光立本発明は、ニューラル　ネットワークにおいて、エラー
　グラディエント　バック　プロパゲーション　アルゴ
リズムによって、実例を基としてシナプス係数を定める
学習相（フェース）を遂行する学習方法に関するもので
ある。さらに本発明はニューラル　ネットワーク構造及
び上述の方法を遂行するようプログラムされたコンピュ
ータにも関するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a learning method in a neural network that performs a learning phase in which synaptic coefficients are determined based on actual examples using an error gradient back propagation algorithm. The invention further relates to a neural network structure and a computer programmed to carry out the method described above.

宜景挟歪ニューラル　ネットワークはイメージ　プロセシング、
スピーチ　プロセシング等に用いられる。Yijing narrow distortion neural network is image processing,
Used for speech processing, etc.

ニューラル　ネットワークは、対応のシナプス係数を有
するシナプスによって相互接続されている自動装置であ
る。これらのものは、従来のシーケンシャル　コンピュ
ータでは解を得ることが困難であった問題の解を導きう
る。Neural networks are automatic devices interconnected by synapses with corresponding synaptic coefficients. These can lead to solutions to problems that are difficult to solve with traditional sequential computers.

所定のプロセス　オペレーションを行うために、ニュー
ラル　ネットワークは、前もってかかるオペレーション
（演算）を如何にして行うかを学習する必要がある。こ
のいわゆる学習（ラーニング）フェースは実際の例（エ
クザンブル）を用いるが、学習ではある入力データに対
し、出力に得られるべき結果が前もって判明している。In order to perform a given process operation, a neural network must previously learn how to perform such an operation. This so-called learning phase uses actual examples (examples), but in learning, the desired output result for certain input data is known in advance.

第１周期では、ニューラル　ネットワークは未だ所望の
任務を遂行するに適して居らず、不正確な結果を生ずる
。In the first cycle, the neural network is not yet suitable for performing the desired task and produces inaccurate results.

次で得られた結果と、得られるべき結果の間のエラーＥ
ｐを決定し、適用化（アダブチ−ジョン）の原理に基づ
いて、シナプス係数を変化させ、ニューラル　ネットワ
ークがある選択した例を学習しうるようにする。ニュー
ラル　ネットワークが満足な学習を行うに必要と考えら
れるだけの数の例に対し、このステップを反復して行う
。Error E between the result obtained and the result that should be obtained
Determine p and vary the synaptic coefficients based on the principle of adaptation, allowing the neural network to learn certain selected examples. Repeat this step for as many examples as you think are necessary for the neural network to learn satisfactorily.

かかる適応化を行うため、広く行われている方法は、グ
ラディエント　バック・プロパゲーションである。前位
のエラーＥ９　　（最終相りで計算されたもの）のグラ
ディエントｇｊ、Ｌの成分を、次で各ニューロン状態ｘ
、、　Ｌに対して決定する。A widely used method for performing such adaptation is gradient back propagation. The gradient gj, L component of the previous error E9 (calculated in the final phase) is expressed as follows for each neuron state x
,,determine for L.

これらの成分を次でニューラル　ネットワーク内にバッ
ク・プロパゲーションを行わせる。これはまずすべての
内部成分ｇ＝、ｔ　　（ｆ≠Ｌ）を決定するため、その
出力より出発し、次で関連のニューロンのシナプス係数
’ｉｊ＋Ｌに加えるべき修正について行う。この方法は
、例えば次の文献に記載されている。These components are then back-propagated into the neural network. This starts from the output to determine all internal components g=, t (f≠L), and then performs the correction to be made to the synaptic coefficient 'ij+L of the relevant neuron. This method is described, for example, in the following document:

デイ−・イー・ルーメルハート（Ｄ、ＥｊＲｕｍｅｌｈ
ａｒｔ）デイ−・イーヒントン（Ｄ、ＥｊＨｉｎｔｏｎ
）及びアール・ジェー・ウィリアムス（Ｒ，Ｊ、Ｗｉｌ
ｌｉａｍｓ）著”Ｌｅａｒｎｉｎｇ　Ｉｎｔｅｒｎａｌ
　Ｒｅｐｒｅｓｅｎｔａｔｉｏｎ　ｂｙ　ＥｒｒｏｒＰ
ｒｏｐａｇａｔｉｏｎデイ−・イー・ルーメルハート（Ｄ、ＥｊＲｕｍｅｌｈ
ａｒｔ）ジエー・エルーマｙクレランド（Ｊ、Ｌ、Ｍｃ
Ｃｌｅｌｌａｎｄ）著”Ｐａｒａｌｌｅｌ　Ｄｉｓｔｒ
ｉｂｕｔｅｄ　Ｐｒｏｃｅｓｓｔｎｇ　：　Ｅｘｐｌｏ
ｒａｔｉｏｎｉｎ　ｔｈｅ　Ｍｉｃｒｏｓｔｒｕｃｔｕ
ｒｅ　ｏｆ　Ｃｏｇｎｔｉｏｎ　”　ＶｏＩ！、、　Ｉ
Ｆｏｕｎｄａｔｉｏｎｓ、　ＭＩＴ　Ｐｒｅｓｓ　（１
９８６）デイ−・ジェー・バール（Ｄ、Ｊ、Ｂｕｒｒ　
）”Ｅｘｐｅｒｉｍｅｎｔｓ　ｏｎ　ｎｅｕｒａｌ　ｎ
ｅｔ　ｒｅｃｏｇｎｉｔｉｏｎ　ｏｆｓｐｏｋｅｎ　ａ
ｎｄ　ｗｒｉｔｔｅｎ　ｔｅｘｔ　”　、　ＴＥＥＥ　
Ｔｒａｎｓ、　ｏｎＡｃｏｕｓｔｉｃ、　５ｐｅｅｃｈ
　ａｎｄ　ｓｉｇｎａｌ　ｐｒｏｃｅｓｓｉｎｇ＋　Ｖ
ｏｌ。D, Ej Rumelh
art) Day Ehinton (D, EjHinton)
) and R.J. Williams (R.J., Wil
liams) “Learning Internal”
Representation by ErrorP
ropagation D, Ej Rumelh
art) J, L, Mc
Clelland) “Parallel Distr.
ibutedProcess: Explo
ration in the Microstructure
re of Cognition” VoI!,,I
Foundations, MIT Press (1
986) D.J. Burr
)”Experiments on neural n.
recognition of spoken a
nd written text”, TEEE
Trans, onAcoustic, 5peech
and signal processing + V
ol.

３６、　Ｎｏ、７．１９８８年７月１１６２頁。36, No. 7. July 1988, page 1162.

しかしこれらの方法がニューラル　ネットワーク内で行
われると、ある特定用途に対し学習期間が極めて長くな
る。例えばパリティ問題のとき、かかる困難が生ずる。However, when these methods are performed within a neural network, the learning period becomes extremely long for certain applications. Such difficulties arise, for example, with parity problems.

パリティ問題は、例えば、入力が２進信号Ｉ１項にリン
クし、出力は、入力“′１”の数が奇数のとき状態ｌを
出力し、反対に偶数のとき状態Ｏを出力するようになっ
ているニューラル　ネットワークで生ずる。この場合の
学習の問題は、入力のうちの僅か１つが状態を変化する
と出力の状態を変化させる必要があり、かつ偶数個の入
力状態が変化したときは出力は変化してはならないとい
う規定に起因する。For example, the parity problem is such that the input is linked to the binary signal I1 term, and the output outputs state l when the number of inputs "'1" is odd, and outputs state O when it is even. This occurs in a neural network that is connected to a computer. The learning problem in this case is that the output must change state if only one of the inputs changes state, and the output must not change if an even number of input states change. to cause.

さらに、例えばニューラル・ネットワークを分類（クラ
シフィケーション）問題に使用するとき、最小距離の間
が小であるクラスを分離させることが困難である問題が
ある。これはこれらのクラスを弁別するのにニューラル
・ネットワークが学習に長時間を必要とするからである
。この欠陥により、連続的に符号化される入力データの
分離が困難となり、とくに例のいくつかが異なるクラス
に属する場合で、入力が互に極く僅かしか相違しないと
きこれは困難である。Furthermore, when using a neural network for a classification problem, for example, there is a problem in that it is difficult to separate classes whose minimum distance is small. This is because neural networks require a long training time to discriminate between these classes. This defect makes it difficult to separate input data that are encoded sequentially, especially when some of the examples belong to different classes and the inputs differ only slightly from each other.

発里生盟主従って本発明の課題は、必要とする追加のハードウェア
を最小としながら、ニューラル・ネットワークの学習時
間を減少させるというにある。SUMMARY OF THE INVENTION It is therefore an object of the present invention to reduce the training time of neural networks while minimizing the additional hardware required.

本発明は、　Ｌ層よりなるニューラル　ネットワークに
より行われる学習方法であって、次の各ステップすなわ
ち、・Ｎｌのニューロンにシナプス係数Ｗ（ｊ＋Ｌ′で接続
されている前位の層のニューロンより供給される出力電
位Ｙｉ＋Ｌ−１を基とするか、あるいは層ｅ＝１に対す
る入力データＹｊ、。を基として、層ｌのニューロンの
状態Ｘｊ、Ｌを、Ｘｊ＋Ｌ”’ΣｉＪ＋ｆ　’　　Ｙｔ＋＋−＋で決定す
るステップと、・非直線関数Ｆを用いて次の如く、出力ニューロンの電
位Ｙｊ１．を決定するステップで、Ｙｉ、、ｉ　＝Ｆ　
（Ｘ、ｔ　）ここにおいて、ｌ：１≦ｉ≦Ｌのときの層のインデックスｊ：出力層ｌ
のニューロンのインデックスに入力層乏−１のニューロ
ンのインデックスであるステ・・・ブを具える方法であ
って、本方法はニューラル　ネットワークの入力に連続
的に供給されるＰの例の反復による学習相を有し、かつ
これら学習相は。The present invention is a learning method performed by a neural network consisting of L layers, in which each of the following steps is performed: ・Nl neurons are supplied with synaptic coefficients W(j+L′ from neurons in the previous layer) A step of determining the state Xj, L of the neuron in layer l as follows: Xj+L"'ΣiJ+f' Yt++-+・In the step of determining the potential Yj1 of the output neuron using the nonlinear function F as follows, Yi,, i = F
(X, t) Here, l: layer index j when 1≦i≦L: output layer l
, the index of the neuron in the input layer is the index of the neuron in the input layer -1. phase, and these learning phases.

６ニユーラル　ネットワークのシナプス係数のマトリッ
クスＷｉｊ＋１の設定、・学習しようどする各例ｐの入力データｙｊ、。の導入
、・部分エラー　Ｅｊを規定する為入力に提供されるこの
例ｐに対し７直視される出力Ｙｊによ１０、各層り内に
得られる結果Ｙｊ、Ｌの比較、・各出力ニューロン及び
各例ｐに対し観察される・出力層りに対する状１．Ｘｉ
、ｔ、に関するエラーＥの各グラディエント戊分ｇ４４．−θＥ　／’θＸｊ＋（の決定・グラディエントの成分ｇｌｔ、のバック　プロパゲー
ション方法を行い、これによりニューラルネソトワーク
による、置換されたシナプス係数７１リクスを基として
他の層に対するグラディエントの成分ｇ１Ｌの決定、・ニューラル　ネットワークに適用するため、対応の係
数ｇ１１．の符号と逆の符号を有する次の変化ΔＸｊ、
　Ｌの決定、・変化ΔＸｊ＋　Ｌを基として、シナプス係数のア・７
プデートの各相を含んでいる学習方法において、ニュー
ロン状態の次の変化を決定するため、グラディエント成
分ｇｊ，Ｌにパラメータθ１Ｌを乗するステップを有し
、これによって−θ１゜ｇｌ、′に比例する変化ΔＸｊ
＋Ｌを計算すること、ここにおいて、θ、、は層ｌのニ
ューロンｊの状態に応じて定まり、また０≦θ１゛≦１で、ｇ１９．とＸｊ＋Ｌが異なる符号を有するときは、θｊ
＋　１　＝１であり、ｇｊ＋　ｔ　とＸｊｒＬが同じ符号を有するときは、θ
ｊ＋Ｌ＝θＬ＋であることを特徴とする。6. Setting of the matrix Wij+1 of synaptic coefficients of the neural network. - Input data yj of each example p to be learned. 10. Comparison of the results Yj, L obtained in each layer by the output Yj provided to the input for this example p provided in order to define the partial error Ej, 10 for each output neuron and each Conditions for the output layer observed for example p: 1. Xi
, t, each gradient fraction of error E with respect to g44. −θE /'θXj+( Determination of ・The back propagation method of the gradient component glt is performed, thereby determining the gradient component g1L for other layers based on the replaced synaptic coefficient 71 lix by neural net work, For application to neural networks, the next change ΔXj, which has a sign opposite to that of the corresponding coefficient g11.
Determination of L, ・Based on the change ΔXj+L, the synaptic coefficient A・7
In the learning method that includes each phase of the update, in order to determine the next change in the neuron state, the gradient component gj,L is multiplied by the parameter θ1L, thereby making it proportional to −θ1°gl,′ Change ΔXj
+L, where θ,, is determined depending on the state of neuron j in layer l, and 0≦θ1゛≦1, g19. When and Xj+L have different signs, θj
+ 1 = 1, and when gj+ t and XjrL have the same sign, θ
It is characterized in that j+L=θL+.

学習工程中において、所定の例ｐが提示される。During the learning process, a given example p is presented.

例ｐに付属し、ニューラル・ネットワークの入力を通じ
て導入されるデータは、ネットワークの最終層りにおい
て、所定の出力ニューロンｊに結果Ｙｊ＋　Ｌを生ずる
。このとき、スタート点で遂行すべき結果ＹＪが判明し
ている。このため、エラーは例えば次の如くして計算で
きる。The data attached to example p and introduced through the input of the neural network produce a result Yj+L for a given output neuron j in the final layer of the network. At this time, the result YJ to be achieved at the starting point is known. Therefore, the error can be calculated as follows, for example.

この式は自乗平均エラーに対する式である。他の比較基
準を用いることもできる。This formula is for the root mean square error. Other comparison criteria can also be used.

既知のグラディエント　パック・プロパゲーション方法
によると、エラー　グラディエントの成分ｇ、ｔ　は、
ニューロンの状態ＸｊｒＬの各貢献に対し決定される。According to the known gradient pack propagation method, the components g, t of the error gradient are:
Determined for each contribution of the neuron's state XjrL.

従ってｇｊ、　ｔ　＝　ａ　Ｅ’７θｘ、、　ＬここでＸｊ、
Ｌ　は非直線関数適用前のニューロンの状態を表わす。Therefore, gj, t = a E'7θx,, L where Xj,
L represents the state of the neuron before applying the nonlinear function.

従って、成分ｇｊ、Ｌ＝θＥ’／θＸｊ、Ｌは出力層りに関するものであり、このため、ｇｊ、ｔ、
　＝　（ｙｊ、ｔ、　−Ｙｊ）　　・　Ｐ’ｊ＋Ｌ　と
なる。Therefore, the component gj,L=θE'/θXj,L is related to the output layer, and therefore gj,t,
= (yj, t, -Yj) ・P'j+L.

ここでＦ’ｊ＋Ｌは非直線出力関数の導関数である。Here F'j+L is the derivative of the nonlinear output function.

このときニューラル・ネットワークは、置換されたシナ
プス係数マトリクス’ｊｉ＋Ｌをロードされ、成分ｇ、
Ｌはネットワークの出力よりパック・プロパゲートされ
る。ネットワークはこれによって、ｌ≠Ｌにおけるグラ
ディエントの他の成分を決定する。これらの成分ｇ＝＋
ｔを変化ΔＸｊ＋Ｌの決定のために使用し、これを用い
てシナプス係数Ｗｉｊ＋１を修正し、ネットワークを関
連の例（エクザンプル）に適合させる。At this time, the neural network is loaded with the replaced synaptic coefficient matrix 'ji+L, and the components g,
L is pack-propagated from the output of the network. The network thereby determines the other components of the gradient for l≠L. These components g=+
t is used for determining the change ΔXj+L, which is used to modify the synaptic coefficient Wij+1 and adapt the network to the relevant example.

一般に云って、この既知の修正方法は、次の如くなるよ
うにして行う。Generally speaking, this known modification method is performed as follows.

Ｗ＝ｊ、ｔ　（ｎｅｗ）　＝Ｗｉ　１ｔ　（ｏｌｄ）　
＋　ｋ　・ΔＸｊ、　、’　Ｙｉ、Ｌ−１本発明によれ
ば、成分ｇｊ＋　ｔ　は既知の方法の上述の如くは使用
せず、各戒Ｌｊ１・は、所定のニューロンｊの符号、す
なわち、・このニューロン状態Ｘｊ＋を及び・グラディエントｇＪ、Ｌ　の成分の符号にもとづいて
定まる関連のパラメータθｊ＋Ｌによって前もって増倍
しておく。W = j, t (new) = Wi 1t (old)
+ k .DELTA. This neuron state Xj+ is multiplied in advance by an associated parameter θj+L determined based on the signs of the components of the gradient gJ,L.

これらのパラメータは、ｇｌｔ　とＸｊｒＬが異なる符号のときは、θ１゜であ
り、 −ｇ、、とＸｊｒＬが同じ符号で、０≦θ１゛≦】のと
きは、θｊ＋Ｌ””θ１である。These parameters are θ1° when glt and XjrL have different signs, and θj+L””θ1 when −g, , and XjrL have the same sign and 0≦θ1゛≦].

しかし学習プロセスを促進するため、第１学習反復中に
おいて、各所定例に対し、θＬ＋を零に近くまたは零に
選定するを可とする。However, to facilitate the learning process, during the first learning iteration, θL+ can be chosen to be close to or equal to zero for each given example.

さらに、後の学習反復コースにおいて、θ。Furthermore, in a later learning iterative course, θ.

が各所定例に対し１に向って増加する。increases towards 1 for each given example.

本発明における符号の応用は、学習の開始時においては
、観察されるエラーの符号を考慮して修正を行い、かつ
学習が進むにつれ、より高精度で、より粗の程度の小な
い修正が徐々に行われるようにするを可とする。The application of codes in the present invention is such that at the beginning of learning, corrections are made taking into account the signs of observed errors, and as learning progresses, corrections are gradually made with higher accuracy and with a smaller degree of coarseness. It is possible to make it happen at any time.

出力電位を決定する非直線関数は僅かな非直線、または
強度の非直線に選定しうる。本発明による符号の採用の
効率を増加させるため、非直線関数の選択を学習工程中
に変更することができる。し。The non-linear function determining the output potential can be selected to be slightly non-linear or strongly non-linear. In order to increase the efficiency of the adoption of codes according to the invention, the selection of non-linear functions can be changed during the learning process. death.

かし、グラディエント　パック・プロパゲーション方法
により得られる変化Δｘ４７．は、シナプス係数に過剰
の変化を生じさせることはない。従って本発明による補
間バージョンでは、標準化が行われ、このためシナプス
係数の自乗の和は準・−定のままとなる。However, the change Δx47 obtained by the gradient pack propagation method. does not cause excessive changes in synaptic coefficients. Therefore, in the interpolated version according to the invention, a standardization is performed, so that the sum of the squares of the synaptic coefficients remains quasi-definite.

このため、学習の開始時における非直線関数は僅かな非
直線に選択し、学習の終りには符号形に近づくようにす
る。この選択を可能にするため、シナプス係数は、所定
のニューロンｊに向って収斂する標準の、 Σ　（Ｙｉ７．ｔ）２の準安定値を保つようにする。For this reason, the nonlinear function at the start of learning is selected to be slightly nonlinear, and at the end of learning, it approaches the coded form. To enable this selection, the synaptic coefficients are made to maintain a standard, metastable value of Σ (Yi7.t)2 that converges towards a given neuron j.

非直線関数Ｆが、Ｙｊｔ　＝　ｔａｎｈ　（ＸＪ、ｔ／
Ｔｔ）型であり、ここにＴ＋　は層に関するパラメータ
で、層ｌの温度と称されるパラメータである。The nonlinear function F is expressed as Yjt = tanh (XJ, t/
Tt), where T+ is a layer-related parameter called the temperature of layer l.

非直線関数の非直線性の程度について、学習中に印加さ
れた変化は、各層に対するパラメータＴＬの変化より得
られる如くした。Regarding the degree of nonlinearity of the nonlinear function, changes applied during learning were obtained from changes in the parameter TL for each layer.

本発明による符号の応用原理は、エラー符号によって優
先的に粗修正（θ゛が小でかつ正）を行い、次で高精度
で修正を行うため、ｌに近いパラメータβ゛によって精
密修正を行い、構造全体のレベルにおいて類似の効果を
生せしめる。このたメータηｊ＋Ｌで増倍する。これに
より、すべてのニューロン状態に関し同時に加えられる
修正（パラメータηｊ＋Ｌ）は、各状態（パラメータθ
＋）に対し行われる各個別修正に重畳される。The principle of application of the code according to the present invention is to first perform coarse correction (θ゛ is small and positive) using the error code, and then perform fine correction using a parameter β゛ close to l in order to perform high-precision correction. , produce similar effects at the level of the overall structure. This is multiplied by meter ηj+L. As a result, the modification (parameter ηj+L) applied simultaneously to all neuron states can be applied to each state (parameter θ
+) is superimposed on each individual modification made to

前もって提出された符号の効果により、最終層りの各出
力ニューロンｊに応じた修正係数ηｊ＋Ｌを導入するこ
とができる。この場合のエラーＨｐは次式で定まる。Due to the effect of the previously submitted sign, it is possible to introduce a modification factor ηj+L depending on each output neuron j of the final layer. The error Hp in this case is determined by the following equation.

！ｇ　　＝’Ａ　　（ＹＪＹ４．Ｌ）２である。! g = 'A (YJY4.L)2.

このエラーは自乗関数となる。This error is a square function.

一般的に云って、層Ｌ（所定の例ｐに対し）の各出力ニ
ューロンｊに対し、このエラーＥＪ　は、ＥＪ＝　Ｈ（
Ｙ７　　　Ｙｌｔ　）となり、ここで、Ｈは得られた結果Ｙｊ＋Ｌと所期の結果ＹＪとの差の関
数である。Generally speaking, for each output neuron j of layer L (for a given example p), this error EJ is: EJ=H(
Y7 Ylt ), where H is a function of the difference between the obtained result Yj+L and the desired result YJ.

かく得られた、エラーＥｐを上述の如くして形成された
グラディエントの成分ｇｊ、Ｌ及びｇｊ＋　Ｌ（ただし
ｌ≠Ｌ）を決定するに使用する。The error Ep thus obtained is used to determine the components gj, L and gj+L (where l≠L) of the gradient formed as described above.

本発明方法は、グラディエントｇｊ＋Ｌの成分を決定す
るため、前もって、最終層のニューロンｊによって定ま
る修正係数ηｊ＋Ｌを加えることにより、エラーＥ２を
決定するステップを有し、これによりここで、ｊ＝１として、学習の開始を有利にするステップを設ける。た
だし、Ｅ：及びｙｊ、　Ｌが異なる符号のときはη３．ｔ＝１
、Ｙｊ及びＹｊ、Ｌが同じ符号のときはηｊ＋Ｌ”η“
であり、ここに０≦η３≦ｌである。In order to determine the components of the gradient gj+L, the method of the invention comprises the step of determining the error E2 beforehand by adding a correction factor ηj+L determined by the neuron j of the final layer, so that here j=1 , steps are provided to facilitate the start of learning. However, when E:, yj, and L have different signs, η3. t=1
, Yj and Yj, when L have the same sign, ηj+L"η"
, where 0≦η3≦l.

現在の場合には、η、１＝θｊ＋Ｌである。In the current case, η,1=θj+L.

を可とする。is allowed.

本発明の補足的バー・ジッダによれば９戦略はニューラ
ル　ネットワークの各層のレベルにおいて展開させるこ
とができる。学習は、入力層に対して付託された有力な
役割を考慮して、入力層に対しては加速し、出力層に対
しては減速するようにする。According to a supplementary bar of the present invention, the 9 strategies can be developed at the level of each layer of the neural network. Learning is accelerated for the input layer and decelerated for the output layer, taking into account the dominant role entrusted to the input layer.

グラディエント　バンク　プロパゲーション方法の一般
の適用において、ニューロンＸｊ、Ｌの状態はグラデイ
エン）ｇＪ、ｔの対応する成分を考慮した量−ΔＸｊ、
Ｌ　により修正している。これはニューロン　ネットワ
ークのすべての層に対して同一の比例係数（ｐｒｏｐｏ
ｒｔｉｏｎａｌｉｔｙ　ｃｏｅｆｆｉｃｉｅｎｔ　）に
よりグラディエントｇ４１．の各成分を増倍することに
より起る。In the general application of the gradient bank propagation method, the state of the neuron Xj,L is determined by the gradient -ΔXj,
Modified by L. It has the same proportionality coefficient (propo) for all layers of the neuron network.
gradient g41. This occurs by multiplying each component of .

また補足的バージョンによれば、本発明は各階層の各ニ
ューロンに比例係数βｊ＋Ｌを割当てることにより修正
を行い、各修正−ΔＸｊ、Ｌをβ５．。According to a complementary version, the invention also performs the correction by assigning a proportionality factor βj+L to each neuron in each layer, and each correction −ΔXj,L is reduced to β5. .

ｇｌｌに比例させるようにすることを提案している。It is proposed to make it proportional to gll.

パラメータβｊ＋Ｌ　は、修正値Δχ４１．を決定する
のに役立つパラメータθｊ＋Ｌ　に比例するようそれを
設定することにより上述の符号戦略に関連させる。The parameter βj+L is a correction value Δχ41. related to the coding strategy described above by setting it to be proportional to the parameter θj+L that helps determine .

このようにして、β４１．はβ、・θｊ＋Ｌに比例する
。ここで、β、は任意の所定の層ｌに対して同一である
。この補足バージョンによるときは、入力層における学
習速度の出力層における学習速度に対する制御を可能に
するパラメータβ、を各層ｌに割当てるようにしている
。したがって、パラメータβ、は、ｌが入力層から出力
層に向かって増加するにしたがって減少する。In this way, β41. is proportional to β,·θj+L. Here, β is the same for any given layer l. According to this supplementary version, a parameter β, which enables control over the learning speed in the input layer and the learning speed in the output layer, is assigned to each layer l. Therefore, the parameter β, decreases as l increases from the input layer to the output layer.

このように、本発明方法は、各層に応じて定まる定数β
、によって、成分θ１２．・　ｇｊ＋ｔを倍数するステ
ップを有し、このステップによって−ΔＸ５９．をβ１
　・θｊ＋Ｌ　　’　　ｇｊ、ｔに比例させ、ここでβ
、は入力層よ１０、各層に向って層の数に応じて厳密に
減少する如くし、このためニューロン状態に加えられる
修正が、入力層の学習を加速し、かつ出力層の学習を減
速することを確保する。In this way, the method of the present invention uses the constant β determined depending on each layer.
, the component θ12. - A step of multiplying gj+t, and by this step -ΔX59. β1
・θj+L' gj, proportional to t, where β
, is 10 for the input layer, and decreases strictly with the number of layers toward each layer, so that modifications made to the neuron state accelerate the learning of the input layer and slow down the learning of the output layer. ensure that

〔Example〕

以下図面により本発明を説明する。 The present invention will be explained below with reference to the drawings.

第１図はそれぞれ入力信号Ｙｌ＋　Ｌ−１＋　ＹＺ＋　
Ｌ−１＋ｙ、　（Ｌ−１１，Ｌ−１をその状態がＸｊ＋
１の単一の出力ニューロンに供給する複数のニューロン
１０＋１０１（１−１１を含む入力層により形成した単
体のニューロン　ネットワークにより行われる一般の作
動図を示す。この場合、上記の状態は計算手段１工によ
り次のように決定される。Figure 1 shows input signals Yl+ L-1+ YZ+, respectively.
L-1+y, (L-11, L-1 if its state is Xj+
1 shows the general operation diagram performed by a single neuron network formed by an input layer of multiple neurons 10+101 (1-11) feeding a single output neuron of 1. In this case, the above situation is It is determined as follows.

Ｘｊ・１＝ΣＷｉ＋ｊ・、′Ｙ・・Ｌ−１この状態χ４
１．は非直線関数（ブロック１２）の影響を受け、この
関数Ｆを適用された後、出力ポテンシャルＹ４．．を与
える。Xj・1=ΣWi+j・,′Y・・L−1 This state χ4
1. is influenced by a nonlinear function (block 12), and after applying this function F, the output potential Y4. ．． give.

Ｙｊ、Ｌ＝　Ｆ　（Ｘ１ｔ　）したがって、この出力ポテンシャルＹ４１．は後続の層
に対する入力状態として役立つことができる。Yj, L= F (X1t) Therefore, this output potential Y41. can serve as input state for subsequent layers.

かくして、入力層ｉ＝１、隠蔽層（ｈｉｄｄｅｎ　１ａ
ｙｅｒ）１＝２．３および出力層１＝Ｌを含む第２図に
示すような複数の層が得られる。層のニューロンはシナ
プス係数’ｉｊ＋Ｌを介して後続の層のニューロンに排
他的に連接（リンク）させるようにする。Thus, input layer i=1, hidden layer (hidden 1a
A plurality of layers as shown in FIG. 2 are obtained, including yer) 1=2.3 and output layer 1=L. Neurons in a layer are exclusively linked to neurons in subsequent layers via synaptic coefficients 'ij+L.

各ニューロンの状態は層２＝１からスタートして前述の
式により決められる。The state of each neuron is determined by the above equation starting from layer 2=1.

学習プロセスを実行するため、すなわち、所定のタスク
に対してシナプス係数を適応させるため、出力層上の結
果ｙｊが前もって分っているような例（イグザンブル）
を入力に提供し、各実例に関して、すべての出力状態に
対しエラーＥｐを計算した後、各中間状態の微小変数θ
Ｘｊ、Ｌに関してその変数を決定する。この場合、グラ
ディエント成分ｇｊ、Ｌは次式で与えられる。In order to carry out the learning process, i.e. to adapt the synaptic coefficients for a given task, the result yj on the output layer is known in advance (Example).
is provided as input, and for each instance, after calculating the error Ep for all output states, the infinitesimal variable θ of each intermediate state
Determine the variables for Xj,L. In this case, the gradient components gj, L are given by the following equation.

ｇｊｌ、＝θＥｐ／θｘｊ１かくして、出力層内の成分ｇ４．Ｌは計算された後、ニ
ューロン　ネットワークに逆転Ｔｔｉ（パックプロパゲ
ート）され、そこでエラー　グラディエントの他の成分
ｇｊ＋ｔが復元される。これらの成分はニューラル　ネ
ットワークを直面するタスクに適応させるため、状態χ
１１．用としてそれから推論される変数Δχｊ＋Ｌを決
定することを可能にする。この作動は、前述のように、
シナプス係数Ｗ＝＝、ｔの更新に先立って行われる。gjl, = θEp/θxj1 Thus, the component g4.g in the output layer. After L is calculated, it is inverted Tti (pack propagated) to the neuron network, where the other components gj+t of the error gradient are recovered. These components adapt the neural network to the task it faces, so the state χ
11. This makes it possible to determine the variables Δχj+L that can be deduced from it for use. This operation, as mentioned above,
This is performed prior to updating the synaptic coefficient W==, t.

本発明方法のこれらのステップは第３図に示すような専
用のニューロン　ネットワーク構造または本方法を実行
するようプログラムされたコンピュータ内で行うように
する。These steps of the method of the invention may be carried out in a dedicated neuron network structure, as shown in FIG. 3, or in a computer programmed to carry out the method.

メモリ３０は、例えば入力手段２９により最初に供給さ
れるシナプス係数マトリックス−１５，および順序を逆
にしたマトリックス−５ｉ１．を記憶する。The memory 30 stores, for example, the synaptic coefficient matrix -15 initially supplied by the input means 29 and the matrix -5i1 . remember.

シナプス係数は、前の層から入力ポテンシャルＹｉ＋Ｌ
−１を受信する計算手段３１に供給されるようにし、こ
れらの手段３１はＸｊ＋Ｌ＝Σ　ｗｔｊ、ｔ　’　　Ｙｉ＋Ｌ−１を決定
する。The synaptic coefficient is the input potential Yi+L from the previous layer.
-1, and these means 31 determine Xj+L=Σ wtj, t'Yi+L-1.

ネットワークの入力には、入力ニューロン状態Ｙｉ＋Ｌ
−１をベースにして、イグザンプル（例）Ｙよ、。The input of the network is the input neuron state Yi+L
-1 as a base, example Y.

を供給する。これらの例はイグザンプル　メモリ（実例
メモリ）３２により供給されるようにする。supply. These examples are provided by example memory 32.

セレクタ３３はこの選定を可能にする技能を有する。The selector 33 has the skill to make this selection possible.

また、前記イグザンプル　メモリ３２は各イグザンブル
ｐおよし各出力ポテンシャルｊ用として得られるべき結
果Ｙｊをも記憶する。The example memory 32 also stores the results Yj to be obtained for each example p and each output potential j.

出力ニューロンの状態Ｘ、、　ｔ　はメンバー３４にお
いて非直線関数に従わせる。前記メンバー３４は各イグ
ザンプルに対してシステムにより供給されるような最後
の層りの出力ポテンシャルＹｊ、　Ｌを供給する。層ｌ
の出力ポテンシャルＹｊ＋Ｌは、１つの層から他の層へ
の中間計算ステップの実行のためこれを状態メモリ（ス
テート　メモリ）３７に一時記憶させ、次の層に対する
入力状態として使用しうるようにする。各ポテンシャル
Ｙｊ＋Ｌは比較器３５において意図する状態Ｙｊと比較
する。前記比較器３５はさらにすべての検出エラーＥＪ
を記憶し、これらのエラーを加算して各イグザンブルに
関するエラーＥｐを与える。The state of the output neuron, X,, t, is made to follow a non-linear function in member 34. Said member 34 provides the last layer output potential Yj,L as provided by the system for each example. Layer l
The output potential Yj+L of is temporarily stored in a state memory 37 for performing intermediate calculation steps from one layer to another, so that it can be used as an input state for the next layer. Each potential Yj+L is compared with the intended state Yj in a comparator 35. Said comparator 35 furthermore detects all detected errors EJ
are stored and these errors are added to give the error Ep for each exembre.

グラディエントｇｊ＋Ｌの成分はホスト　コンピュータ
３６により決定するようにする。これがため、コンピュ
ータは、エラーＥ１１出力ポテンシャルＹｊ＋Ｌおよび
意図する状態ｙ、を受信する。ホスト　コンピュータ３
６に次式が成立するよう成分ｇｊ＋ｔを決定する。The components of the gradient gj+L are determined by the host computer 36. The computer therefore receives the error E11 output potential Yj+L and the intended state y. host computer 3
6, the component gj+t is determined so that the following equation holds true.

ｇ；＋ｔ”θｊ＋Ｌ　　・（Ｙｊ、ｔ　−Ｙｊ）　　・
Ｆ’ｊ＋しただし、１≦ｊ≦Ｉ（Ｌ）、また、Ｆ’ｊ＋Ｌは出力層の各非直線関数の導関数であ
る。g;+t”θj+L ・(Yj, t −Yj) ・
F'j+where 1≦j≦I(L), and F'j+L is a derivative of each nonlinear function of the output layer.

これらの成分ｇｊ、Ｌはグラディエント逆伝搬方法（ク
ラデイエンド　パック　プロパゲーション方法）の実行
を可能にする計算手段３１に一供給するようにする。す
なわち、成分ｇｊ＋Ｌは出力層に供給され、これらの効
果が入力層に逆伝搬されるようにする。These components gj, L are fed to calculation means 31 which make it possible to carry out a gradient backpropagation method (Cladiend pack propagation method). That is, the components gj+L are fed to the output layer so that their effects are back-propagated to the input layer.

かしくて、グラディエントｇｊ、、＝θＥ９／θし、。Therefore, the gradient gj, , = θE9/θ.

（ただし、ｌ−＃Ｌ）は計算手段３１を用い、エラーＥ
’のグラディエントの逆伝搬により決定される。(However, l-#L) uses the calculation means 31, and the error E
' is determined by backpropagation of the gradient.

この成分ｇｊ、ｔ　は各ニューロン状態に対して次の変
数ΔＸｊ，Ｌを決定するためホスト　コンピュータ３６
に供給するようにする。この目的のため本発明の場合、
コンピュータ３６は各成分ｇｊ＋ｔをそのパラメータθ
４．．により増倍させるようにする。This component gj,t is used by the host computer 36 to determine the next variable ΔXj,L for each neuron state.
supply. For this purpose, in the case of the present invention:
The computer 36 converts each component gj+t into its parameter θ
4. ．． so that it is multiplied by

すべての変数ΔＸｊ＋Ｌ　はこれらを更新メンバー３８
に供給する。前記更新メンバー３８は新しいシナプス係
数Ｗｉｊ＋１を決定し、これらの係数をメモリ３項に供
給する機能を有する。All variables ΔXj+L update these member 38
supply to. Said update member 38 has the function of determining new synaptic coefficients Wij+1 and supplying these coefficients to the memory 3 term.

このプロセスは全学習フェーズを実行するため反復する
ようにする。その課程において、ホストコンピュータ３
６は最初の反復に対し、項に等しいか、項にほぼ等しい
修正パラメータ（コレクション　パラメータ）θどを供
給することができ、その後コンピュータ３６は爾後にお
ける反復の課程においてこのパラメータを値１に近付け
るよう増加させることができる。さらに、ホスト　コン
ピュータ３６は計算手段３１においてグラディエント逆
伝搬を行わせるため、成分ｇｊ、ｔを計算する前にＥｊ
のパラメータη１Ｌによる増倍を実施する。This process is made iterative to perform all learning phases. In the process, the host computer 3
6 may supply for the first iteration a correction parameter θ equal to or approximately equal to the term, after which the computer 36 causes this parameter to approach the value 1 in the course of subsequent iterations. can be increased. Furthermore, in order to perform gradient backpropagation in the calculation means 31, the host computer 36 calculates Ej before calculating the components gj, t.
Multiplication is performed using the parameter η1L.

β、・θｊ＋Ｌ　　’　　ｇｊ＋Ｌに比例する変数−Δ
×１．Ｌを決定するため各層に関する定数β、を修正値
θ１８．・　ｇｊ＋　１に供給する場合、ホスト　コン
ピュータは更新メンバー３８によるシナプス係数’１ｉ
ｒｔの更新前に進行する。β,・θj+L' Variable -Δ proportional to gj+L
×1. To determine L, the constant β for each layer is modified by a modified value θ18. gj+1, the host computer updates the synaptic coefficient '1i by update member 38.
Proceed before updating rt.

かくして、本発明による階層状のニューラルネットワー
ク構造は上述の学習方法を実行するための手段を含み、
その目的のため前記構造は一シナプス係数（連続係数）
を記憶する手段と、−学習し、ニューラル　ネットワー
ク内に導入すべきイグザンブル（例）を記憶する手段と
、−各イグザンプルごとに、出力に得られるニューロン
　ポテンシャルを各実例に対して直面する結果と比較し
、観察された差に一致するエラーを供給するための手段
と、一人カニューロン　ポテンシャルをベースにして出力ニ
ューロン状態を計算し、かつ該エラーのグラディエント
逆伝搬を行って、該グラディエントの成分ｇ１．．を与
えるための手段と、−出力において非直線関数を供給す
るための手段と、一グラディエントの成分ｇｊ，Ｌおよび本方法に関する
乗算器パラメータを考慮に入れて新しいシナプス係数を
計算し、反復サイクルの所定の反復に割当てられたシグ
ニフィカンス（ｓｉｇｎｉｆｉｃａｎｃｅ）あるいはニ
ューラル　ネットワークの所定のニューロンまたは所定
の層に割当てられたシグニ入れて新規なシナプス係数を
可能にする手段とを含む。Thus, the hierarchical neural network structure according to the invention comprises means for carrying out the learning method described above;
For that purpose the structure is one synaptic coefficient (continuous coefficient)
- means for storing the examples to be learned and introduced into the neural network; and - for each example, comparing the resulting neuron potential at the output with the result faced for each example. and means for providing an error corresponding to the observed difference; computing an output neuron state based on the one-person neuron potential, and performing gradient backpropagation of the error to obtain a component g1 of the gradient. ．． - means for providing a non-linear function at the output; - calculating new synaptic coefficients taking into account the components of the gradient gj,L and the multiplier parameters for the method, and calculating the new synaptic coefficients of the iterative cycle; means for enabling new synaptic coefficients by assigning a significance to a given iteration or to a given neuron or a given layer of the neural network.

第３図示システムはホスト　コンピュータにより制御す
るようにした機能ブロックよりなるニューラル　ネット
ワーク構造の形で与えるようにしたもので、この場合、
実現すべき機能にコンピュータそれ自体の中に集積する
ことが好ましい。その場合には、本発明は前述の方法の
各ステップを実施するプロゲラミンクされたコンピュー
タにも関する。The system shown in the third diagram is provided in the form of a neural network structure consisting of functional blocks controlled by a host computer.
Preferably, the functionality to be implemented is integrated within the computer itself. In that case, the invention also relates to a programmed computer implementing the steps of the method described above.

表１は本発明によるプログラムの例の主要なスチップを
含むフローチャートを示す。Table 1 shows a flowchart containing the main steps of an example program according to the invention.

−２ｊ二乙１」−は小さい正の値にη゛およびθ°を初
期設定し、温度ＴＬを固定する。層（レア）ｊ２＝ｌに
対しては値ＴＬはイグザンプル（例）ｐに関する入力の
絶対値の平均に等しく、層ｌ≠１に対しては、値ＴＬは
１のオーダーである（ｌｏｏｐ　ｔｏ　１　）。-2j2ot1''- initializes η゛ and θ° to small positive values and fixes the temperature TL. For layer (rare) j2=l, the value TL is equal to the average of the absolute values of the inputs with respect to example p; for layer l≠1, the value TL is of the order of 1 (loop to 1 ).

シナプス係数−ｉｊ＋Ｌは無作為選択により初期設定す
るか、既知の値に設定する（ｌｏｏｐ　ｔｏ　ｉ　ａｎ
ｄ　ｊ　）。The synaptic coefficient -ij+L is initialized by random selection or set to a known value (loop to i an
d j ).

−ステップ２はイグザンプルｐに対する入力値Ｙｉ、。- Step 2 is the input value Yi for example p.

をニューラル　ネットワーク内に挿入する。Insert into the neural network.

−表土ヱ１工は状態Ｘ４１．および出力ポテンシャルＹ
ｊ＋Ｌを計算する。状態Ｘｊ＋１の計算はスレショール
ビＳ１１．を含む。前記スレショールドは非直線関数Ｆ
内に導くこともできる。- Topsoil 1 work is in condition X41. and output potential Y
Calculate j+L. The calculation of state Xj+1 is done by Threshorubi S11. including. The threshold is a nonlinear function F
It can also be guided inward.

−２乙ｊ二と１」ユは出力エラーに符号戦略（ｓｉｇｎ
　ｓｔｒａｔｅｇｙ）を供給する。この目的のため積（
プロダクト）Ｙｊ−Ｙｊ、Ｌを形威し、その符号を考慮
する。積が負またはＯの場合は、ηｊ＋Ｌは値１をとり
、反対の場合、ηｊ＋Ｌは値η゛をとる。-2 Otsu j 2 and 1” U uses a sign strategy (sign
strategy). For this purpose the product (
product) Yj-Yj, L and consider its sign. If the product is negative or O, ηj+L takes the value 1; in the opposite case, ηj+L takes the value η゛.

出力層におけるエラーＥ’を決定し、グラディエントｇ
ｊ、Ｌの成分を計算する。Determine the error E' in the output layer and calculate the gradient g
Calculate the components of j and L.

−ステップ５　非直線関数の導関数Ｆ’ｊ＋Ｌを計算す
る。次に、グラディエントの逆伝搬によりグラディエン
ト　ｇｊ、Ｌ−１の成分を計算する。積（プロダクト）
　　　ｇｌｔ　　・　Ｘ４９．をチエツクする。この積
が負または０の場合、θ、Ｌは１に等しくこの積が正の
場合、θ５１．はθ°　（ただし０≦θ１≦１）に等し
い。次に、β１．を計算する。- Step 5 Calculate the derivative F'j+L of the nonlinear function. Next, the components of gradient gj, L-1 are calculated by backpropagation of the gradient. Product
glt・X49. Check. If this product is negative or 0, θ,L equals 1; if this product is positive, θ51. is equal to θ° (0≦θ1≦1). Next, β1. Calculate.

−ステップ６　次の変数ΔＸｊ＋Ｌを決定するため、グ
ラディエントｇｊ，Ｌの成分を使用する。このステップ
は、成分ｇｊ、　Ｌに変数ΔＸ、、　Ｌへの影響をもた
せることを可能にする自動応用関数例（ａｕｔｏ−ａｄ
ａｐｔｉｕｅ　ｆｕｎｃｔｉｏｎ　ｅｘａｍｐｌｅ　）
の選択を与える。- Step 6 Use the components of the gradient gj,L to determine the next variable ΔXj+L. This step creates an example auto-ad function that allows the component gj,L to have an influence on the variable ΔX, .
aptiue function example)
give you a choice.

この関数はグラディエントｇｊ，ＬのモジュラスＧ２、
修正の振幅を制御する因子γ、ξおよび種々のニューロ
ンに関連する環β５．．の平均値Ｔを含む。This function is the gradient gj, the modulus G2 of L,
Factors γ, ξ that control the amplitude of the modification and the rings β5 associated with the various neurons. ．． includes the average value T.

−ステップ７　このステップはイグザンプルＰに対して
計算された変数ΔＸｊ＋　Ｌのシナプス係数−□ｊ＋Ｌ
問およびスレショールドＳｊ＋Ｌ間の分配を可能にする
。分配係数（ディストリビューションファクタ）は基準
（ｎｏｒｍ） ΣＹ□、。−Step 7 This step calculates the synapse coefficient of the variable ΔXj+L calculated for example P −□j+L
and thresholds Sj+L. The distribution coefficient is the norm ΣY□.

を適用するパラメータσ、により制御される。このステ
ップ７は、シナプス係数の基準を所定の出力ニューロン
に対して準一定値に保持することを可能にする分配の例
を表わす。変化はできるだけ、小さいスレショールドお
よび重み（ウェイト）のバリエーションにより実現され
る必要がある。is controlled by the parameter σ, which applies This step 7 represents an example of a distribution that allows the criterion of synaptic coefficients to be held at a quasi-constant value for a given output neuron. Changes should be realized with as small a variation of thresholds and weights as possible.

−ステップ８　すべてのイグザンプルに対する影表１さい場合は、学習を終了する。このエラーがεより大き
い場合は、次のステップにより手順を継続する。- Step 8 If the shadow table 1 for all examples is small, end the learning. If this error is greater than ε, the procedure continues with the next step.

一ステップ９　温度ＴＬを僅かに低下させる。したがっ
て、初期値はＯないし１間のパラメータｅ。Step 9: Slightly lower the temperature TL. Therefore, the initial value is the parameter e between O and 1.

により増倍される。is multiplied by

一ステップ１０　　η゛およびθ“の値を再調整する。Step 10: Readjust the values of η and θ.

−去±エフ”ｌｌ　　他のイグザンプルｐ′を選定し、
ステップ２により作動を再開させる。-Yu±F”ll Select another example p′,
The operation is restarted in step 2.

[Brief explanation of drawings]

第１図は入力ニューロンの層および単一出力ニューロン
を含む構造により実行される処理のメカニズムを示す図
、第２図は複数の層、すなわち入力層、隠蔽層および出力
層を含む構造を示す図、第３図は本発明方法を実行するニューラル　ネットワー
ク構造を示す図である。１０＋、　１０ｇ−−−１０１（Ｌ−１）”’ニーＬ−
ロン（神経の細胞）１１、３１・・・計算手段１２・・・非直線関数２９・・・入力手段３０・・・メモリ３２・・・イグザンプル　メモリ（実例メモリ）３３・
・・セレクタ３４・・・非線形メンバー３５・・・比較器３６・・・ホスト　コンピュータ３７・・・状態メモリ３８・・・更新メンバーFIG. 1 shows the mechanism of processing carried out by a structure containing a layer of input neurons and a single output neuron; FIG. 2 shows a structure containing several layers: an input layer, a hidden layer and an output layer. , FIG. 3 is a diagram showing a neural network structure for implementing the method of the invention. 10+, 10g---101(L-1)"'knee L-
Ron (nerve cells) 11, 31...Calculation means 12...Non-linear function 29...Input means 30...Memory 32...Example memory (example memory) 33.
... Selector 34 ... Nonlinear member 35 ... Comparator 36 ... Host computer 37 ... State memory 38 ... Update member

Claims

[Claims] 1. A learning method performed by a neural network consisting of L layers, which includes the following steps: - Setting synaptic coefficients W_i_j_,_L to neurons in layer l;
The state of the neuron in layer l is determined based on the output potential Y_i_,_L_-_1 supplied from the neuron in the previous layer connected by ', or based on the input data Y_i_,_0 for layer l=1. X_j_,_L, X_j_,_L=Σij,l・Y_i_,_L_-_1
A step of determining the potentials Y_i_, _L of the output neuron using the nonlinear function F as follows, Y_i_, _
L=F(X_j_,_L) where l: Index of the layer when 1≦l≦L j: Index of the neuron in the output layer l i: Index of the neuron in the input layer l-1. A method, the method comprising a learning phase by repetition of examples p that are successively supplied to the input of the neural network, and these learning phases include: - setting of the matrix of synaptic coefficients W_i_j_,_L of the neural network; , the introduction of input data Y_j_,_0 for each example p to be learned; the result obtained in the output layer L by the direct output Y_j for this example p provided at the input to define the partial error E_j; Comparison of Y_j_,_L, - all partial errors E_j observed for each output neuron and each example p
Determination of the sum E of,・state X_j_,_L for output layer L
Determination of each gradient component g_j_,_L=∂E/∂X_j_,_L of the error E for Determination of the gradient components g_j_,_L for other layers, and the corresponding coefficients g_j_,_L for application to the neural network.
Determination of the next change ΔX_j_,_L with a sign opposite to that of , Determine the next change of the neuron state based on the change ΔX_j_,_L in a learning method that includes each phase of updating of the synaptic coefficients. Therefore, there is a step of multiplying the gradient component g_j_,_L by the parameter θ_j_,_L, thereby -θ_j_,_L
・Calculating the change ΔX_j_,_L proportional to g_j_,_L′, where θ_j_,_L is determined according to the state of neuron j in layer l, and 0≦θ_1^+≦1, −g_j_,_L When and How to learn. 2. Make the first learning iteration θ_L^+ approximately equal to zero, or
Claim 1: Each predetermined value is approximately zero.
Method described. 3. The method of claim 2, wherein in subsequent learning iterations θ_L^+ increases toward 1 for each given example. 4. Select the non-linear function to be slightly non-linear at the beginning of the training, and then make it approach the signed function at the end of the training, and to allow such selection, converge towards a given neuron j. The synaptic coefficient is the standard Σ(W_
4. The method according to claim 1, wherein the i_j_,_L)^2 quasi-constant is maintained. 5. The nonlinear function F is Y_j_,_L=tanh(X_
5. The method according to claim 4, wherein T_L is a layer-related parameter referred to as the temperature of layer l. 6. The method according to claim 5, wherein changes in the degree of nonlinearity of the nonlinear function applied during learning are obtained from changes in the parameter T_L for each layer. 7. To determine the components of the gradient g_j_,_L, the method adds in advance a correction factor η_j_,_L determined by the neuron j of the final layer,
determining the error E^p, thereby E^p≒Σ^I^(^L^)_j_=_1η_j_,_
A step that favors the start of learning as L・E^p_j, provided that η_j when E_j and Y_j_,_L have different signs.
_,_L=1, and when Y_j and Y_j_,_L have the same sign, η_j_,_L=η^+, where 0≦η^+≦1, according to any one of claims 1 to 6. the method of. 8. The method according to claim 7, wherein η_j_,_L=θ_j_,_L. 9. The method according to any one of claims 1 to 8, wherein the partial error E_j is the squared error 1/2 (Y_j - Y_j_,_L)^2. 10. The component θ is determined by the constant β_L determined according to each layer.
It has a step of multiplying _j_,_L・g_j_,_L, and by this step -ΔX_j_,_L becomes β_L
・Proportional to θ_j_, _L ・g_j_, _L, where β_L decreases strictly according to the number of layers from the input layer to the output layer, so that the modification applied to the neuron state is proportional to that of the input layer. Any one of claims 1 to 9, ensuring acceleration of learning and deceleration of learning of the output layer.
The method described in section. 11. A neural network comprising means for performing the learning method according to claims 1 to 10, comprising: - means for accumulating synaptic coefficients; - means for accumulating examples to be learned and introducing these examples into the neural network; means for comparing, for each example, the neuron potential obtained at the output of the neuron with the direct result for each example and providing an error corresponding to the observed difference; - output based on the input neuron potential; Compute the neuron state, perform gradient back propagation of the error, and calculate the component g_j of the gradient.
_, _L and the multiplication parameters assigned to the method to calculate new synaptic coefficients, thereby controlling the significance contributing to a given iteration of the iterative cycle, or to a given layer or neural network. A neural network structure characterized in that it has means for controlling the significance assigned to a predetermined neuron. 12. A computer programmed to carry out the learning method according to any one of claims 1 to 10.