JPH01271888A

JPH01271888A - Learning method for pattern recognition

Info

Publication number: JPH01271888A
Application number: JP63100846A
Authority: JP
Inventors: Masanori Mizoguchi; 正典溝口
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-04-22
Filing date: 1988-04-22
Publication date: 1989-10-30

Abstract

PURPOSE:To realize the title learning method with high efficiency and without deteriorating the resolution of identification by changing dynamically the parameter of a back-propagation learning method at learning and monitoring the output of a unit to change the weight vector. CONSTITUTION:For a structure model of a single unit of an intermediate layer, the inner product (s) is calculated by a multiplier 10 and an accumulator 11 at each unit between the input vectors (x1-x4) and the weight vectors (w1-w4). Then the output value (h) of each unit is calculated as h=1/[1+exp(-s/uO)] based on the sigmoid function 12 (uO: parameter). The product (s) is inputted by a controller 13 and the conditions ¦s¦<epsilon are decided to the set value epsilon (the positive value approximate to 0, e.g., 0.01) designated previously. Thus DELTAuO, i.e., the change component of uO is outputted. In this case, a constant, for example, can substitute for DELTAuO and the proportion to uO is also allowed as the function of uO.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、入力されるパターンからそれらが属するカテ
ゴリを認識するための、回路網と回路網のパラメータを
決定する方法に関する。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a method for determining a circuit network and its parameters in order to recognize from input patterns the categories to which they belong.

（従来の技術）従来、パターン認識の問題に関しては例えば、パックプ
ロパゲーション法と呼ばれる方法がある。この方法は１
９８６年に、ＭＩＴ出版、”ＰａｒａｌｌｅｌＤｉｓｔ
ｒｉｂｕｔｅｄ　Ｐｒｏｃｅｓｓｉｎｇ”第２巻に、Ｒ
ｕｍｅｌｈａｒｔ及びＨｉｎｔｏｎ等によって紹介され
ている。パックプロパゲーション法を以下で簡単に説明
する。ｎ次元の入力パターンのベクトルｘｌｌをｘａ＝（Ｘａｌ、Ｘ８□、ｘａ３．・・・・・ｌ　Ｘａ
ｎ）とする。ｘａの属するカテゴリをａとする。このと
きＳ次元のカテゴリベクトルＣを、Ｃａ　”　（Ｃａｌ＋　Ｃａ２＋　Ｃａ３１　””°ｌ
　Ｃａ５）とし、Ｃａａ＝１Ｃａｊ二〇（ｊ≠ａ）で示すことにする。認識を行うことは、任意の人カバタ
ーンベクトルＸａに対するカテゴリベクトルＣａを与え
る関数φ、Ｃａ＝Φ（Ｘりを構成することに相当する。パックプロパゲーション法
では、次の階層的な回路構成を用いる。(Prior Art) Conventionally, for example, there is a method called a pack propagation method regarding pattern recognition problems. This method is 1
In 986, MIT Publishing, “ParallelDist
"Rebutted Processing" Volume 2, R
Introduced by Umelhart and Hinton et al. The pack propagation method will be briefly explained below. Let the n-dimensional input pattern vector xll be xa=(Xal, X8□, xa3...l Xa
n). Let a be the category to which xa belongs. At this time, the S-dimensional category vector C is expressed as Ca ” (Cal+ Ca2+ Ca31 ””°l
Ca5), and it is expressed as Caa=1 Caj20 (j≠a). Performing recognition corresponds to constructing a function φ, Ca=Φ(X), which gives a category vector Ca for an arbitrary human cover turn vector Xa.In the pack propagation method, the following hierarchical circuit configuration is use

ここでは３階層を例にとって説明を進める。入力層ハ入
力パターンベクトルの次元に等しいユニット数を用意し
、各ユニットの出力値はベクトルの各成分と対応させる
。Here, we will proceed with the explanation using three layers as an example. The input layer has a number of units equal to the dimension of the input pattern vector, and the output value of each unit is made to correspond to each component of the vector.

中間層はｍユニットを用意する。ここでｍは特に制限は
無いが、カテゴリ数や入力バタンベクトルの次元と同程
度かそれ以下が普通用いられている。出力層はカテゴリ
ベクトルの次元に対応したユニット数を用意する。中間
層の各ユニットの出力値からなるベクトルをＨ１出力層
の各ユニットの出力値からなるベクトルをＹとする。The middle layer has m units. There is no particular limit to m here, but it is usually equal to or smaller than the number of categories or the dimension of the input button vector. For the output layer, prepare the number of units corresponding to the dimension of the category vector. A vector consisting of the output values of each unit of the intermediate layer is H1. A vector consisting of the output values of each unit of the output layer is Y.

Ｈ＝（ｈｌ、ｈ２．　ｈ３．　・・−・・、　ｈｍ）Ｙ
＝（ｙｌ、ｙ２．ｙ３．・・・・・ｙｓ）すると、Ｈ，
Ｙは次の式で計算される。H=(hl, h2. h3. ・・・・・hm)Y
= (yl, y2.y3....ys) Then, H,
Y is calculated using the following formula.

Ｈ＝Ｆ（Ｗ、Ｘ）Ｙ＝Ｆ（Ｖ、Ｈ）ここで、Ｗ、■は重み（係数）行列と呼ばれるものであ
る。H=F(W,X) Y=F(V,H) Here, W and ■ are called weight (coefficient) matrices.

なお、Ｆ（１）は次式で計算されるものである。Ａがｍ
行ｎ列の行列、Ｂがｎ次元ベクトル、Ｂ　”　（ｂ、　ｂ２．　ｂ３．　””・＋　ｂｎ）と
したとき、Ｆ（１）はｍ次元のベクトルＦ（Ａ、Ｂ）＝
（ｆ（ａ□１＊ｂ１＋ａ１□＊ｂ２＋・・・・・＋ａ１
ｎ＊ｂｎ）。Note that F(1) is calculated using the following formula. A is m
When B is a matrix with rows and n columns, and B is an n-dimensional vector, B ” (b, b2. b3. ””・+ bn), F(1) is an m-dimensional vector F (A, B) =
(f(a□1*b1+a1□*b2+...+a1
n*bn).

ｆ（ａ２□＊ｂ１　＋ａ２２＊ｂ２　＋　・・”’　＋
　ａ２ｎ＊ｂｎ）ｒｆ（ａａ、＊ｂ　１＋　ａ３２＊ｂ
２　＋　””’　十ａｓｎ＊ｂｎ）１収ａｒｒ１１＊ｂ
工＋ａ１ｎ２＊ｂ２＋・・・・・十ａｎ１ｎ＊ｂｏ））
である。ここで、ｆはシグモイド関数と呼ばれる一回微
分可能な単調増加関数で、ｕＯをパラメータとして、ｆ（ｘ）＝１／（１＋ｅｘｐ（−ｘ／ｕＯ））と示され
る関数である。f(a2□*b1 +a22*b2 + ・・”' +
a2n*bn) rf(aa, *b 1+ a32*b
2 + “”’ 1 asn*bn) 1 arr11*b
ENG+a1n2*b2+...ten an1n*bo))
It is. Here, f is a once-differentiable monotonically increasing function called a sigmoid function, and is a function expressed as f(x)=1/(1+exp(-x/uO)) with uO as a parameter.

認識結果の評価としては、出力されるベクトルＹが人カ
バターンに対するカテゴリベクトルＣに正しく一致すれ
ば良い。ハックプロパゲーション法は入力バタンベクト
ルＸとカテゴリベクトルＣをペアにして学習させること
によって、自己組織的に回路のパラメータ行列であるＷ
、■を求める方法である。To evaluate the recognition result, it is sufficient that the output vector Y correctly matches the category vector C for the human cover turn. The hack propagation method self-organizes the circuit parameter matrix W by pairing input button vectors X and category vectors C and learning them.
, ■.

まず、Ｘの入力によって、中間層でベクトルＨ１出力層
でベクトルＹが計算されたならば、Ｄｙ、＝（ｃ、−ｙ
、）＊ｙ、＊（１−ｙ、）、（ｉ＝１．−、　ｓ）によ
って出力層ユニットの誤差信号ベクトルＤｙ１を計算し
、中間層ユニットの誤差ベクトルとして定義されるＤｈ
、、Ｄｈ、　＝　（Ｄｙよ＊ｖ、、　＋−＋　ＤｙＢ＊ｖ８
□）＊ｈ、＊（１−ｈ、）（ｉ＝１．・・・・・、ｍ）を計算する。First, if the vector H1 is calculated in the hidden layer and the vector Y is calculated in the output layer by the input of X, then Dy, = (c, -y
, )*y, *(1-y,), (i=1.-, s) calculates the error signal vector Dy1 of the output layer unit, and Dh defined as the error vector of the hidden layer unit.
,, Dh, = (Dyyo*v,, +-+ DyB*v8
□) *h, *(1-h,) (i=1....., m) Calculate.

さらに以下の式によって、重み係数行列の各要素に対し
て次式で計算される修正を行う。Furthermore, according to the following equation, each element of the weighting coefficient matrix is modified as calculated by the following equation.

■１ｊ←ア＊■１ｊ十α＊Ｄｙ、＊ｈ。■1j←A*■1j1α*Dy, *h.

ｗ、←γ＊Ｗ１ｊ＋α＊Ｄｈｉ＊ＸＪなおここで、γ、αは修正量を制御するだめのパラメー
タであ、す、 γ〜１０くαく１とすることが多い。w,←γ*W1j+α*Dhi*XJ Here, γ and α are parameters for controlling the amount of correction, and are often set to γ~1 0 × α × 1.

上記の修正操作を様々な入力バタンに対して繰り返して
行なうと、■、Ｗが収束し、Ｙとして正しいカテゴリベ
クトルＣに一致した出力をするようになる、というのが
パックプロパゲーション（学習）法である。The pack propagation (learning) method is such that if the above correction operation is repeated for various input buttons, ■, W converges and an output that matches the correct category vector C as Y. It is.

（発明が解決しようとする問題点ンパンクプロパゲーション法によって識別関数を構成する
場合、つぎの問題点がある。(Problems to be Solved by the Invention When constructing a discriminant function using the puncture propagation method, there are the following problems.

学習システムのパラメータとしてｕＯ，α、γがあり、
これらの値を最適値に設定するべきであるが、それらを
見つける方法はまだ見つかっていない。γについては１
．０もしくはやや小さめの０．９９などに設定すれば良
いことが経験的に知られている。There are uO, α, and γ as parameters of the learning system,
These values should be set to optimal values, but no method has yet been found to find them. 1 for γ
．． It is known from experience that it is sufficient to set it to 0 or a slightly smaller value such as 0.99.

しかし、ｕＯと０の値は学習が収束するまでの時間（収
束速度）と深く関連しており、ｕＯを大きくすればする
ほど、またａを小さくすればするほど収束速度は低下し
、学習に膨大な時間が掛かってしまうという問題があっ
た。However, the values of uO and 0 are deeply related to the time it takes for learning to converge (convergence speed), and the larger uO is and the smaller a is, the slower the convergence speed is, and the learning There was a problem that it took a huge amount of time.

また、ｕＯはパタンを識別する分解能力のパラメータで
あり、識別したいバタン群の統計的性質と中間層のユニ
ット数に応じて最適な値があるはずであるが、それを決
定する方法はまだ見つかっていない。In addition, uO is a parameter for the decomposition ability to identify patterns, and there should be an optimal value depending on the statistical properties of the pattern group to be identified and the number of units in the middle layer, but a method for determining it has not yet been found. Not yet.

本発明の目的は、パックプロパゲーション学習法におけ
る前記パラメータＵＯを学習時に動的に変更したり、ユ
ニットの出力を監視して重みベクトル変更することによ
り、前記識別の分解能力を低下させずに、学習の収束を
高速化したり、適応的に中間層ユニットを増加させて効
率よく学習を行う方法を提供することにある。The purpose of the present invention is to dynamically change the parameter UO during learning in the pack propagation learning method, or change the weight vector by monitoring the output of the unit, without reducing the discrimination resolution ability. The purpose of this invention is to provide a method for efficient learning by speeding up the convergence of learning and adaptively increasing the number of intermediate layer units.

（問題点を解決するための手段）本願の第１の発明は、各ユニット相互間が方向性を持つ
興奮性結合及び抑制性結合の重み値を持ち、入力結合を
持つユニットの出力値とその結合に対応する前記重み値
との積和値Ｘにシグモイド関数を施した値を出力値とす
るユニットにより、外部からの入力をうける入力ユニッ
ト群と、外部へ認識結果を出力する出力ユニット群と、
外部と直接には人出力を持たず、前記入カニニット群、
もしくは前記出力ユニット群、もしくはそれら自身が相
互に接続される中間ユニット群との、３種ユニット群で
構成されるシステムで、前記積和値Ｘに対する前記ユニ
ットのシグモイド関数として（１）式％式％（１）を用い、前記重み値はランダムに初期化され、入力パタ
ーンと教師パターンによって前記重み値を自己組織的に
学習するパターン認識学習方法において、各ユニット毎
にＵを用意し、前記積和値Ｘの、ＩＱ）ｄ値があらかじ
め指定した値よりも小さい時には、Ｕを増加させ、そう
でないときにはＵを減少させることを特徴とする。(Means for Solving the Problems) The first invention of the present application has weight values for excitatory connections and inhibitory connections having directionality between each unit, and the output value of the unit having the input connection and its weight value. an input unit group that receives input from the outside by a unit whose output value is a value obtained by applying a sigmoid function to the product-sum value X with the weight value corresponding to the combination; and an output unit group that outputs recognition results to the outside. ,
There is no human output directly to the outside, and the above-mentioned crabnit group,
Or, in a system composed of three types of unit groups, including the output unit group or an intermediate unit group that are themselves interconnected, the sigmoid function of the unit for the product-sum value %(1), the weight values are randomly initialized, and the weight values are learned self-organized using an input pattern and a teacher pattern. It is characterized in that when the IQ)d value of the sum value X is smaller than a predetermined value, U is increased, and when it is not, U is decreased.

また本願の第２の発明は、第１の発明において、パター
ン入力毎に各ユニットの出力値を監視し、１つ前に入力
したパターンに対する出力値との差があらかじめ指定し
た設定値以下であるかぎり、前記出力値から０．５を引
いたものの絶対値を積算して固定化係数を計算し、前記
固定化係数が他のもう一つの設定値を越えたならば、そ
のユニットの重み値すべてもしくは一部を再度ランダム
に初期化することを特徴とする。Further, the second invention of the present application is the first invention, in which the output value of each unit is monitored for each pattern input, and the difference between the output value and the output value for the previous input pattern is equal to or less than a predetermined setting value. As long as the fixed coefficient is calculated by integrating the absolute value of the output value minus 0.5, and if the fixed coefficient exceeds another set value, all the weight values of that unit are Alternatively, it is characterized by randomly re-initializing a part.

さらに、本願の第３の発明は、第１の発明において、パ
ターン入力毎に各ユニットの出力値を監視し、１つ前に
入力したパターンに対する出力値との差があらかじめ指
定した設定値以下であるかぎり、前記出力値から０．５
を引いたものの絶対値を積算して固定化係数を計算し、
前記固定化係数が他のもう一つの設定値を越えたならば
、そのユニットの重み値の一部分を零とすることにより
直交する重み値を複数個つくり、それらを重み値とする
ユニット群によって前記ユニットを置き換えることを特
徴とする。Furthermore, the third invention of the present application is based on the first invention, in which the output value of each unit is monitored for each pattern input, and the difference between the output value and the output value for the previously input pattern is equal to or less than a pre-specified setting value. 0.5 from the output value as long as
Calculate the fixation coefficient by integrating the absolute value of the subtracted value,
If the fixed coefficient exceeds another setting value, a part of the weight value of that unit is set to zero, thereby creating a plurality of orthogonal weight values, and using a group of units using these as weight values, the above-mentioned It is characterized by replacing the unit.

（実施例）次に本発明について図面を参照して説明する。(Example) Next, the present invention will be explained with reference to the drawings.

以下、本発明をパターン認識システムに適用した場合の
一実施例について説明する。An embodiment in which the present invention is applied to a pattern recognition system will be described below.

第１図は３層型のバックプロ゛パゲーション学習法を適
用できるユニットのネットワーク構造を示したものであ
る。簡単のため入力層を４ユニツト、中間層を３ユニツ
ト、出力層を２ユニツトとしている。入力層ユニットを
順にＸｌ、Ｘ２．Ｘ３．Ｘ４、中間層ユニットをｈ工、
　ｈ２．　ｈ３、出力層ユニットを３’ｌ＋ｙ２と区別
することにする。また、入力層ユニットＸ、から中間層
ユニットｈ１への結合の重みをｗ、Ｊとし、同様に中間
層ユニットへから出力層ユニットｙｋへの結合の重みを
■１評する。さらに、出力ユニットｙ８への教師信号は
ｃｌで表しである。FIG. 1 shows a network structure of units to which a three-layer backpropagation learning method can be applied. For simplicity, the input layer has 4 units, the middle layer has 3 units, and the output layer has 2 units. The input layer units are Xl, X2 . X3. X4, h-engine the middle layer unit,
h2. Let h3 and output layer unit be distinguished from 3'l+y2. Further, the weights of the connection from the input layer unit X to the hidden layer unit h1 are set as w and J, and similarly the weight of the connection from the hidden layer unit to the output layer unit yk is evaluated as 1. Furthermore, the teacher signal to the output unit y8 is represented by cl.

次に、第１図における中間層の１ユニツトの構造のモデ
ルを示し、本発明の詳細な説明する。なお、出力層ユニ
ットや他に中間層がある場合も全く同じユニットが使用
できる。ただし、入力層のユニットに関しては実際は中
間層ユニットへの出力のみであるから、単に値を保持出
力する構造で十分である。Next, a model of the structure of one unit of the intermediate layer in FIG. 1 will be shown, and the present invention will be explained in detail. Note that the exact same unit can be used even if there is an output layer unit or another intermediate layer. However, since the input layer unit actually only outputs to the intermediate layer unit, a structure that simply holds and outputs the value is sufficient.

次に、第３図に従来バンクプロパゲーション法で用いら
れているユニットの構造モデルを示す。以下では各入力
値をベクトルの要素として説明する。Next, FIG. 3 shows a structural model of a unit conventionally used in the bank propagation method. In the following, each input value will be explained as an element of a vector.

各ユニットでは、まず入力のベクトル（ｘｌ、　ｘ２．
　ｘ３゜Ｘ４）と、重みベクトル（ｗ１２ｗ２２ｗ３２
ｗ４）との内積Ｓが乗算器１０と累積器１１で計算され
る。ここで重みベクトルはユニット毎に用意される。さ
らにこのとき、常数入力を加えることが一般に行われて
いる。即ち、第５番目の入力としてＸ５＝１（常数＝１
）を加え、これに対する重みｗ５も加えるのであるが、
次元が増えたものとして全く同様に扱えばよい。In each unit, first input vectors (xl, x2 .
x3°X4) and the weight vector (w12w22w32
The inner product S with w4) is calculated by the multiplier 10 and the accumulator 11. Here, a weight vector is prepared for each unit. Furthermore, at this time, it is common practice to add a constant input. That is, as the fifth input, X5=1 (constant=1
) and the weight w5 for this is also added,
It can be treated in exactly the same way as if the dimension had increased.

シグモイド関数１２によってこのユニットの出力値りはｈ　＝　１／（１＋ｅｘｐ（−ｓ／ｕＯ））で計算され
る。従来、ｕＯはネットワーク全体で同一の値としてい
た。The output value of this unit is calculated by the sigmoid function 12 as h=1/(1+exp(-s/uO)). Conventionally, uO has been set to the same value throughout the network.

第２図は第１の発明に対して中間層の１ユニツトの構造
モデルで示したものである。このモデルに示したように
、第１の発明は各ユニット毎にシグモイド関数のパラメ
ータを独立に設定し、またこの値を前記内積値Ｓを用い
て適応的に変化させるようにした。このとき、ＵＯの増
減は前記Ｓの値が零に十分近ければ大きくし、そうでな
いならば小さくするのである。シグモイド関数のパラメ
ータＵＯ（正値）を小さくすると式かられかるが、いわ
ゆるステップ関数、しきい値開数に漸近する。またｕＯ
を大きくすればＳ＝Ｏ付近では線形（ｈがＳに比例）に
近くなる。FIG. 2 shows a structural model of one unit of the intermediate layer of the first invention. As shown in this model, in the first invention, the parameters of the sigmoid function are set independently for each unit, and these values are adaptively changed using the inner product value S. At this time, the increase or decrease in UO is increased if the value of S is sufficiently close to zero, and otherwise decreased. If the parameter UO (positive value) of the sigmoid function is made small, it will be removed from the equation, but it will asymptotically approach a so-called step function, a threshold open number. Also uO
If it is made large, it becomes close to linearity (h is proportional to S) near S=O.

このことから、バタン間の識別が良好な場合、言い替え
ればＳの絶対値が十分太きいときには、ｕＯを小さくし
てノイズや変形のあるバタンを許容できるようにし、逆
に類似バタンかある場合は、Ｓの絶対値が小さくなるの
で、ｕＯを大きくしてｈをＳに比例させて、Ｓ値の僅か
な差を利用できるようにすればよい。そこで本発明では
コントローラ１３によって、Ｓを入力し、あらかじめ指
定した設定値ε（正値でＯに近い値で例えば０．０１等
）に対して１川く８の条件を判定して、ｕＯの変化分であるΔｕＯを出力さ
ぜるようにしている。また、ここでΔｕＯとしては、例
えば定数でもよいが、ｕＯの関数としてＵＯに比例さぜ
たりしてもよい。ただし、ｕＯは正値となるようにする
。From this, when the discrimination between the batons is good, in other words, when the absolute value of S is sufficiently large, uO is made small so that the batons with noise and deformation can be tolerated, and conversely, when there are similar battens, , S becomes small, so uO may be increased to make h proportional to S so that a slight difference in S values can be utilized. Therefore, in the present invention, the controller 13 inputs S and determines 1 x 8 conditions for a prespecified set value ε (a positive value close to O, such as 0.01), and The amount of change, ΔuO, is outputted. Further, here, ΔuO may be a constant, for example, or may be changed proportionally to UO as a function of uO. However, uO should be a positive value.

第４図は第２の発明について中間層の１ユニツトの構造
モデルで示したものである。これは、出力値りを監視し
て、常に同一の固定値出力を行っているユニットを発見
し、そのユニットを有効利用する一方法である。FIG. 4 shows a structural model of one unit of the intermediate layer regarding the second invention. This is a method of monitoring output values, finding a unit that always outputs the same fixed value, and effectively utilizing that unit.

即ち、パターン入力毎に各二ニットの出力値を監視し、
１つ前に入力したパターンに対する出力値との差があら
かじめ指定した設定値以下であるかぎり、前記出力値か
ら０．５を引いたものの絶対値を積算して固定化係数を
計算し、前記固定化係数が他のもう一つの設定値を越え
たならば、そのユニットの重み値すべてもしくは一部を
再度ランダムに初期化する。That is, the output value of each two nits is monitored for each pattern input,
As long as the difference from the output value for the previous input pattern is less than or equal to the pre-specified setting value, the fixation coefficient is calculated by integrating the absolute value of the output value minus 0.5, and If the weighting factor exceeds another set value, all or some of the weight values for that unit are randomly initialized again.

第４図において、固定化係数計算回路１４は例えば次式
に従って固定化係数ｑを計算する。ここで、ｔは学習パ
ターンの提示される順番を示すものとする。In FIG. 4, the fixed coefficient calculation circuit 14 calculates the fixed coefficient q according to the following equation, for example. Here, t indicates the order in which learning patterns are presented.

ｑ（ｔ）　＝　（ｑ（ｔ−１）十旧（ｔ）　　０．５１
）＊δ（ｈ（ｔ）　−ｈ（ｔ−１））ここで、関数δは
例えばいわゆるデルタ関数が使える。その場合は、１つ
前に入力したパターンに対する出力が現在入力している
パターンに対する出力と完全に一致しているかぎりにお
いて累積計算が行われる。関数δは第１の設定値Ａにつ
いて、胆（ｔ）　−ｈ（ｔ−１）ドーＡのときに１である関数としてもよく、この場合いわゆる
デルタ関数はＡ＝００場合として含まれている。q(t) = (q(t-1) ten years old (t) 0.51
)*δ(h(t) −h(t-1)) Here, the function δ can be, for example, a so-called delta function. In that case, cumulative calculation is performed as long as the output for the previous input pattern completely matches the output for the currently input pattern. The function δ may be a function that is 1 when (t)-h(t-1)A for the first set value A, and in this case, the so-called delta function is included as the case A=00. .

次に、再初期化回路１５では第２の設定値Ｂに対して条件ｑ（ｔ）＞Ｂが成立した時に、ユニットの重みベクトルＷを再初期化
する。ここで第２の設定値Ｂは、例えば全てのカテゴリ
に対する学習パターンを１七ットとじたとき、１セント
に含まれるパターン数Ｎに対して、０．５＊Ｎ等とすれ
ば十分である。Next, the reinitialization circuit 15 reinitializes the weight vector W of the unit when the condition q(t)>B holds for the second set value B. Here, it is sufficient that the second set value B is 0.5*N, for example, for the number N of patterns included in 1 cent when learning patterns for all categories are 17 bits. .

各ユニットがパターンの識別に寄与するためには、出力
が人カバターンによって変化しなければならない。本発
明によって初期化が不良だった場合や学習による収束が
進んで、たまたま出力が固定化されて無効になってしま
ったユニットが再び有効に利用される。In order for each unit to contribute to pattern identification, the output must change depending on the person cover turn. According to the present invention, a unit that happens to have a fixed output and become invalid due to poor initialization or convergence due to learning is effectively used again.

次に、学習時に長期間にわたって固定値出力を行ってい
るユニットを発見し、そのユニットを有効に利用するも
う１つの方法として、第３の発明を説明する。Next, a third invention will be described as another method of finding a unit that outputs a fixed value for a long period of time during learning and effectively utilizing that unit.

第３の発明では第２の発明と同様に、パターン入力毎に
各ユニットの出力値を監視し、１つ前に入力したパター
ンに列する出力値との差があらかじめ指定した設定値以
下であるかぎり、前記出力値から０．５を引いたものの
絶対値を積算して固定化係数ψ・°（＼を計算し、前記固定化係数が他のもう一つの設定値を越
えたか否かを判定するところまでは同じである。In the third invention, as in the second invention, the output value of each unit is monitored for each pattern input, and the difference between the output value and the output value in the column of the previous input pattern is less than or equal to a pre-specified setting value. As long as the output value is subtracted by 0.5, the absolute value is integrated to calculate the fixing coefficient ψ° (＼), and it is determined whether the fixing coefficient exceeds another set value. It's the same until you do it.

なお、第３の発明における、固定化係数がそれを越えた
か否かを判定する設定値については、第２の発明の場合
よりも小さくする。Note that in the third invention, the set value for determining whether the fixed coefficient exceeds the fixed coefficient is set smaller than that in the second invention.

ここで第３の発明はランダムな再初期化というよりもユ
ニットの重み値の一部分を零とすることで、直交する重
みベクトルを複数個つくり、それらを重みベクトルとし
たユニット群にして前記ユニノＩ・を置き換えるように
している。Here, the third invention is to create a plurality of orthogonal weight vectors by setting a part of the weight values of the units to zero rather than random re-initialization, and to form a group of units using these as weight vectors, the Unino I I am trying to replace .

次に、＄５図を用いて前記ユニットの置き換えについて
説明する。第５図は４次元の重みベクトルＷ−（ｗｌ、
Ｗ２．Ｗ３．Ｗ４）から２つの重みベクＩ・ルＷ’＝（ｗ’□、ｗ’２．ｗ’３．ｗす、Ｗ”＝　（ｗ
”、、　ｗ”２２ｗ１′３２ｗ″４）を作る一例を示し
ている。ここでＷ′とＷｌ＋は内債が０で直交関係とす
る。Next, the replacement of the unit will be explained using the $5 diagram. Figure 5 shows the four-dimensional weight vector W-(wl,
W2. W3. W4) From the two weight vectors I and W'=(w'□, w'2.w'3.ws, W"=
``,,w''22w1'32w''4) is shown.Here, W' and Wl+ are assumed to have an orthogonal relationship with an internal value of 0.

例えば、中間層ユニット数が不足して識別できない場合
、中間層では入力バタンか異なっていても出力が変化し
なってしまう。そこでこれを検出して中間層ユニット数
を増やす必要がある。ただし、その場合に学習された重
みベクトルは、識別したいパターン間で共通したものを
学習しているので、これを学習した重みベクトルを直交
するベクトルに分割すれば、これまでの学習結果を利用
して効率のよく中間層ユニットを追加できる。For example, if the number of middle layer units is insufficient and identification is not possible, the output of the middle layer will not change even if the input button is different. Therefore, it is necessary to detect this and increase the number of middle layer units. However, the weight vectors learned in this case are common among the patterns that you want to identify, so if you divide the learned weight vectors into orthogonal vectors, you can use the learning results so far. mid-tier units can be added efficiently.

本発明は直交するベクトルを求めるのにベクトルの異な
る部分を零にしたものを用いているものであり、簡単に
直交ベクトルを求めることができる。なお、零とした要
素に相関のないランダム成分を加えることは勿論構わな
い。In the present invention, different parts of vectors are set to zero in order to obtain orthogonal vectors, and therefore orthogonal vectors can be easily obtained. Note that it is of course possible to add an uncorrelated random component to the zeroed element.

（発明の効果）以上説明したように、本発明によればバタン認識をバッ
クプロパゲーション学習法を用いて行う場合、認識の精
度を犠牲にすることなく、学習を高速化したり、中間層
ユニットの有効利用によって効率よく重みを学習するこ
とができる効果がある。また、第１、第２、第３の発明
はこれらを組み合わせて用いることができる。(Effects of the Invention) As explained above, according to the present invention, when performing backpropagation learning method for backpropagation learning, the learning speed can be increased without sacrificing recognition accuracy, and the middle layer unit Effective use has the effect of efficiently learning weights. Moreover, the first, second, and third inventions can be used in combination.

[Brief explanation of the drawing]

第１図は３層型のバックプロパゲーション学習法を適用
できるユニットのネットワーク構造を示している。第２
図は第１の発明に対して中間層の１ユニットの構造モデ
ルで示している。第３図に従来バックプロパゲーション
法で用いられているユニットの構造モデルを示している
。第４図は第２の発明に対して中間層の１ユニツトの構
造モデルで示している。第５図は第３の発明におけるユ
ニットの置き換えを説明する図である。図において、１０・・・乗算器、１１・・・累積器、１２・・・シグモイド関数、１３・・・コントローラ、１４・・・固定化係数計算回路、１５・・・再初期化回路・FIG. 1 shows a network structure of units to which a three-layer backpropagation learning method can be applied. Second
The figure shows a structural model of one unit of the intermediate layer for the first invention. FIG. 3 shows a structural model of a unit conventionally used in the backpropagation method. FIG. 4 shows a structural model of one unit of the intermediate layer for the second invention. FIG. 5 is a diagram illustrating the replacement of units in the third invention. In the figure, 10... Multiplier, 11... Accumulator, 12... Sigmoid function, 13... Controller, 14... Fixed coefficient calculation circuit, 15... Re-initialization circuit.

Claims

[Claims]

(1) Each unit has weight values for excitatory connections and inhibitory connections with directionality, and the sum of products x of the output value of a unit with input connections and the weight value corresponding to that connection is a sigmoid. An input unit group that receives input from the outside by a unit whose output value is a value subjected to a function, an output unit group that outputs recognition results to the outside, and an input unit that does not have input/output directly to the outside. A system consisting of three types of unit groups, such as a group, or the output unit group, or an intermediate unit group that are themselves interconnected, and as a sigmoid function f(x) of the unit for the product-sum value x. In a pattern recognition learning method in which the weight value is randomly initialized using f(x)=1/(1+exp(-x/u)), and the weight value is learned self-organized by an input pattern and a teacher pattern. , u is prepared for each unit, and when the absolute value of the sum of products x is smaller than a predetermined value, u is increased; otherwise, u is
A pattern recognition learning method characterized by reducing .

(2) In the pattern recognition learning method of claim (1),
The output value of each unit is monitored for each pattern input, and as long as the difference from the output value for the previous input pattern is less than or equal to a pre-specified setting value, 0.
A fixing coefficient is calculated by summing the absolute value of 5. If the fixing coefficient exceeds another set value, all or part of the weight values of that unit are randomly initialized again. A pattern recognition learning method characterized by

(3) In the pattern recognition learning method for the unit according to claim (1), the output value of each unit is monitored for each pattern input, and the difference from the output value for the previous input pattern is a pre-specified set value. Calculate the fixing coefficient by integrating the absolute value of the output value minus 0.5, as long as the fixing coefficient exceeds another set value, then the unit's A pattern recognition learning method characterized in that a plurality of orthogonal weight values are created by setting a portion of the weight values to zero, and the units are replaced by a unit group having these weight values as weight values.