JPH03252782A

JPH03252782A - Pattern learning device

Info

Publication number: JPH03252782A
Application number: JP2049257A
Authority: JP
Inventors: Seiji Yoshimoto; 誠司吉本
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1990-03-02
Filing date: 1990-03-02
Publication date: 1991-11-12

Abstract

PURPOSE:To obtain a learning result with high versatility by performing learning so as to compress the distribution of a learning pattern in one point. CONSTITUTION:The distribution of each category can be compressed in one point by performing the learning of all the parameters in a neutral network so as to reduce distance in an output layer between a pair of learning patterns that belong to the same category, and not only the recognition performance of the learning patterns but that of unlearned pattern distributed between them can be heightened. Also, it is possible to improve separation between different categories by performing the learning so as to increase the distance in the output layer between the learning patterns that belong to different categories. Such separation can be performed by performing the learning, for example, so as to minimize a dissipation factor. Thereby, high versatility can be obtained in spite of the number of intermediate layers.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、神経回路網を用いたパターン学習装置に関す
る。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a pattern learning device using a neural network.

[Conventional technology and issues]

神経回路網に期待される機能のひとつとして、いくつか
のパターンを学習することによって未学習のパターンを
も認識できるようになる能力、いわゆる汎化能力がある
。汎化能力とは、パターン空間内に離散的に与えられた
有限個の学習パターンから、連続的なパターン分布自体
を推定する能力ということができる。神経回路網による
関数の表現能力は中間層のユニットの数が多いほど高く
なるので、学習パターンを記憶する能力、いわゆる学習
能力は中間層のユニットの数が多いほど高くなる。しか
し、中間層のユニット数を増やすことが塊化能力の向上
に結び付くかどうかは学習法に依存する。中間層のユニ
ット数が少ないときには、関数を表現するパラメータの
数、即ち自由度が小さいことによる拘束条件によっであ
る程度の塊化能力が期待できるが、中間層のユニット数
が多くなると自由度が大きくなるため、学習法自体に塊
化能力を高くするための工夫をする必要がある。One of the functions expected of neural networks is the so-called generalization ability, which is the ability to recognize unlearned patterns by learning several patterns. Generalization ability can be said to be the ability to estimate a continuous pattern distribution itself from a finite number of learning patterns given discretely within a pattern space. The ability to express functions by a neural network increases as the number of units in the intermediate layer increases, so the ability to memorize learning patterns, so-called learning ability, increases as the number of units in the intermediate layer increases. However, whether increasing the number of units in the intermediate layer improves agglomeration ability depends on the learning method. When the number of units in the middle layer is small, a certain degree of agglomeration ability can be expected due to the constraints imposed by the number of parameters expressing the function, that is, the degree of freedom is small, but as the number of units in the middle layer increases, the degree of freedom decreases. As the size of the data increases, it is necessary to devise ways to increase the agglomeration ability of the learning method itself.

従来、多層のフィードフォワード型神経回路網の学習法
としては、誤差逆伝播法が最も多く用いられている。誤
差逆伝播法については、「並列分散処理」第１巻（Ｄ、
Ｅ、Ｒｕ＋ｕ＋ｅｌｈａｒｔ、　Ｇ、Ｅ、Ｈｉｎｔｏｎ
ａｎｄ　Ｒ，ＪＪｉｌｌｉａｍｓ、“Ｐａｒａｌｌｅｌ
　Ｄｉｓｔｒｉｂｕｔｅｄ　Ｐｒ。Conventionally, the error backpropagation method is most commonly used as a learning method for multilayer feedforward neural networks. Regarding the error backpropagation method, see "Parallel Distributed Processing" Volume 1 (D,
E, Ru+u+elhart, G, E, Hinton
and R.Jilliams, “Parallel
Distributed Pr.

ｃｅｓｓｉｎｇ　、　ｖｏｌ、１．　ｅｄｓ、　Ｊ、Ｌ
、ＭｃＣｌｅｌｌａｎｄ。cessing, vol, 1. eds, J.L.
, McClelland.

Ｄ、Ｅ、Ｒｕｗｍｅｌｈａｒｔ　　ａｎｄ　　Ｔｈｅ　
　ＰＤＰ　　Ｒｅ５ｅａｒｃｈ　　ｇｒｏｕｐ。D., E., Ruwmelhart and The.
PDP Re5search group.

ＭＩＴ　Ｐｒｅｓｓ、　Ｃａｍｂｒｉｄｇｅ、　ＭＡ＋
　１９８６）に詳しく述べられている。誤差逆伝播法で
は、出力層に対する教師信号と実際の出力信号との間の
距離を小さくするように神経回路網のパラメータを修正
するが、この学習法では中間層のユニット数を多くしす
ぎるとかえって塊化能力が低下することが知られている
。これは、誤差逆伝播法による学習が、個々の学習パタ
ーンに対する教師信号と出力信号との間の距離を小さく
するということのみに基づいており、塊化能力を高くす
るための工夫がなされていないためである。塊化能力を
高めるためには、離散的に与えられた学習データから連
続的なパターン分布を推定するのに有効な学習法を用い
る必要がある。MIT Press, Cambridge, MA+
1986). In the error backpropagation method, the parameters of the neural network are modified to reduce the distance between the teacher signal for the output layer and the actual output signal, but in this learning method, if the number of units in the intermediate layer is too large, It is known that on the contrary, the agglomeration ability decreases. This is based solely on the fact that learning using the error backpropagation method reduces the distance between the teacher signal and the output signal for each learning pattern, and no measures have been taken to improve the agglomeration ability. It's for a reason. In order to improve the agglomeration ability, it is necessary to use a learning method that is effective in estimating continuous pattern distribution from discretely given learning data.

また、フィードフォワード型神経回路網によってパター
ン認識を行う場合、神経回路網の各層をパターン変換の
表現空間と見なすことができ、学習の目的は出力層がパ
ターン変換に対して不変な表現空間になるように、即ち
入カバターンのパターン変換に対して出力が変わらない
ように神経回路網のパラメータを決めることである。パ
ターン変換は、アフィン変換等個々のパターンに依存し
ない変換と、居所的な歪みなど個々のパターンに依存す
る変換の組み合せからなり、各々の変換に対する不変性
を分担して学習したほうが経済的である。しかし、従来
の誤差逆伝播法では出力層に対する教師信号に基づいて
学習を行うため、回路網を部分的に学習させることがで
きない。Furthermore, when pattern recognition is performed using a feedforward neural network, each layer of the neural network can be regarded as a representation space for pattern transformation, and the purpose of learning is to make the output layer a representation space that is invariant to pattern transformation. In other words, the parameters of the neural network are determined so that the output does not change even if the input pattern is changed. Pattern transformations consist of a combination of transformations that do not depend on individual patterns, such as affine transformations, and transformations that depend on individual patterns, such as spatial distortion, and it is more economical to learn the invariance for each transformation separately. . However, in the conventional error backpropagation method, since learning is performed based on a teacher signal for the output layer, it is not possible to partially train the circuit network.

第１の発明の目的は、塊化能力を高めることに重点を置
いた学習法を用いることによって、中間層の数にかかわ
らず高い塊化能力が得られるパターン学習装置を提供す
ることにある。A first object of the invention is to provide a pattern learning device that can obtain high agglomeration ability regardless of the number of intermediate layers by using a learning method that focuses on increasing agglomeration ability.

第２の発明の目的は、Ｏ（Ｎ）の計算量で実行できる学
習法を用いることによって、学習にかかる時間を短縮で
きるパターン学習装置を提供することにある。A second object of the invention is to provide a pattern learning device that can shorten the time required for learning by using a learning method that can be executed with an O(N) amount of calculation.

第３の発明の目的は、第１または第２の発明の学習法に
よって部分神経回路網を学習させることができることを
利用して、各々の部分回路網に機能を分担させることが
できるパターン学習装置を提供することにある。The object of the third invention is to provide a pattern learning device that can make each partial neural network share functions by utilizing the fact that partial neural networks can be trained by the learning method of the first or second invention. Our goal is to provide the following.

[Means to solve the problem]

第１の発明は、フィードフォワード型神経回路網を用い
たパターン学習装置において、２つの学習パターンが同
じカテゴリーに属する場合には、神経回路網の学習可能
なパラメータを出力層における２つの学習パターン間の
距離を小さくするように修正し、２つの学習パターンが
異なるカテゴリーに属する場合には、神経回路網の学習
可能なパラメータを出力層における２つの学習パターン
間の距離を大きくするように修正する手段を有すること
を特徴とする。The first invention is a pattern learning device using a feedforward neural network, in which when two learning patterns belong to the same category, the learnable parameters of the neural network are transferred between the two learning patterns in the output layer. If the two learning patterns belong to different categories, the learnable parameters of the neural network are modified to increase the distance between the two learning patterns in the output layer. It is characterized by having the following.

第２の発明は、第１の発明のパターン学習装置において
、パラメータを修正する前記手段が、２つの学習パターン
の間の距離の修正を、パターン間距離の偶数次の多項式
で表されるようなポテンシャルを最小にするように行う
ことを特徴とする。A second invention is the pattern learning device according to the first invention, wherein the means for modifying the parameters modifies the distance between the two learning patterns such that the distance between the patterns is expressed by an even-order polynomial. It is characterized by being carried out so as to minimize the potential.

第３の発明は、Ｎ個の部分回路網からなるフィードフォワード型神経回
路網を用いたパターン学習装置において、各部分回路網
が第１または第２の発明のパターン学習装置で構成され
ることを特徴とする特〔作用〕第１の発明のパターン学習装置では、同一カテゴリーに
属する学習パターン対の間の出力層における距離を小さ
くするように神経回路網のすべてのパラメータの学習を
行うことによって、各々のカテゴリーの分布を一点に縮
め、学習パターンだけでなくそれらの間に分布している
未学習パターンの認識能力をも高めることができる。ま
た、異なるカテゴリーに属する学習パターンの間の出力
層における距離を大きくするように学習を行うことによ
って、異なるカテゴリー間の分離を良くすることができ
る。これは、例えば以下のような損失関数を最小化する
ように学習を行うことによって実現できる。A third invention provides a pattern learning device using a feedforward neural network consisting of N partial circuit networks, in which each partial network is configured with the pattern learning device of the first or second invention. Characteristic Features [Operations] In the pattern learning device of the first invention, by learning all the parameters of the neural network so as to reduce the distance in the output layer between pairs of learning patterns belonging to the same category, By reducing the distribution of each category to a single point, it is possible to improve the ability to recognize not only learned patterns but also unlearned patterns distributed between them. Furthermore, by performing learning to increase the distance in the output layer between learning patterns belonging to different categories, it is possible to improve the separation between different categories. This can be achieved, for example, by learning to minimize the following loss function.

・　・（１）にとる、この場合、＝ＮＳ、＋Ｓ、”＋２Ｍｚ”−Ｎ３　・ＸＩここで、ｘ
　（１）、　　ｘ　（２）は学習データを入力したとき
の出力層の出力を表すＮ１ｏｕｔ）次元ベクトル（Ｎ（
ｏｕＬｌ　は出力層のユニットの数）で、Ｃはカテゴリ
ー全体の集合を表す。（１）式の第１項におけるＶａｔ
＆は同じカテゴリーに属するパターン対間に引力を働か
せるためのポテンシャル、第２項におけるＶ　ｒａｐは
異なるカテゴリーに属するパターン対間に斥力を働かせ
るためのポテンシャルである。このように、同じカテゴ
リーに属するパターン間に引力、異なるカテゴリーに属
するパターン間に斥力が働くようなポテンシャルを損失
関数に選び、これを最小化するように学習させることに
よっ′て、同じカテゴリーに属するパターン分布を一点
に縮め、異なるカテゴリーに属するパターン間の分離を
よ（することができる。・・(1) In this case, =NS, +S, "+2Mz"-N3 ・XIHere, x
(1), x (2) is an N1out)-dimensional vector (N(
ouLl is the number of units in the output layer), and C represents the set of all categories. Vat in the first term of equation (1)
& is a potential for exerting an attractive force between pairs of patterns belonging to the same category, and V rap in the second term is a potential for exerting a repulsive force between pairs of patterns belonging to different categories. In this way, by selecting a potential for the loss function that creates an attractive force between patterns that belong to the same category and a repulsive force between patterns that belong to different categories, and by training the loss function to minimize this potential, it is possible to It is possible to reduce the distribution of patterns belonging to one point to one point, and to see the separation between patterns belonging to different categories.

第２の発明のパターン学習装置では、評価関数（１）に
おけるポテンシャルｖｓｔｔ　（ｒ　）　、Ｖ、、ｐ（
ｒ　）をｒの偶数次の多項式％式％（７）等の関係式を用いることによって、すべてのデータ対に
ついての和を直接計算せずに済む。ここで、ＮＣはカテ
ゴリーＣに属するデータの数を表す。In the pattern learning device of the second invention, the potential vstt (r), V, , p(
By using a relational expression such as %(7) for an even-order polynomial of r), it is not necessary to directly calculate the sum of all data pairs. Here, NC represents the number of data belonging to category C.

これによって、学習に要する計算量をＯ（Ｎ”）から０
（Ｎ）に減らすことができる。This reduces the amount of calculation required for learning from O(N”) to 0.
(N).

第３の発明のパターン学習装置では、神経回路網を部分
回路網に分は各部分回路網に第１または第２の発明の学
習法を適用する。第１または第２の発明における学習法
は、誤差逆伝播法と異なり出力層に対する教師信号を必
要としないので、各部分回路網を独立した回路網と同じ
ように学習させることができる。従って、第３の発明の
学習装置によって各部分回路網に機能分担をさせるよう
に学習を行うことが可能になる。In the pattern learning device of the third invention, the neural network is divided into partial networks, and the learning method of the first or second invention is applied to each partial network. Unlike the error backpropagation method, the learning method in the first or second invention does not require a teacher signal for the output layer, so each partial network can be trained in the same way as an independent circuit network. Therefore, the learning device of the third aspect of the invention allows learning to be performed in such a way that each partial network is assigned a function.

〔Example〕

第１図は、第１の発明の実施例の構成を表す図である。 FIG. 1 is a diagram showing the configuration of an embodiment of the first invention.

本発明は任意のフィードフォワード型神経回路網に適用
可能であるが、本実施例では入力層を除いて２層の神経
回路網で説明をする。Although the present invention is applicable to any feedforward type neural network, this embodiment will be described using a two-layer neural network excluding the input layer.

各ニューロンの行う演算として、例えば、ける引力ポテ
ンシャルＶａｓｔ及び斥力ポテンシャルＶ　ｒａｐとし
ては、例えば、 ■１□（ｒ）＝ａ−ｒ”　　　　　　　　・・・（１５
）Ｖｒａｐ　（ｒ）　＝　　ｂ　−ｅｘｐ（ｒ／　ｒｅ
）　　・・・（１６）を用いることができる。ここで、
ａ、　　ｂ、　　ｒａは正の定数とする。学習法として
最急降下法を用いると、各データ対に対するパラメータ
の修正量は、δ−，（り、εＣΔｘ、　（２）Δ（ｏ−
ｘｉ　”’　”）　ｘＪ（１））　（１７）δｈｉ（２
）＝εｃΔ、　Ｈ）Δ（（１−ｘｉ””））　　・・・
（１８）δ皆１Ｊ（１） δ１．、　（１）：　　ｌ　””　１　ｒ　　”’　＊　　Ｎ　（ｓゝ；
　Ｓ　＝１１　２とする。ここで、Ｎ　（＄１．　　ｘ
（１１，ｈ（１）はそれぞれ第Ｓ層のユニット数、出力
及び閾値で、Ｗｉｊ”は第（ｓ−１）層と第Ｓ層間の結
合の重みである。As calculations performed by each neuron, for example, the attractive potential Vast and the repulsive potential V rap are as follows:
) Vrap (r) = b −exp(r/ re
)...(16) can be used. here,
a, b, and ra are positive constants. When the steepest descent method is used as a learning method, the amount of parameter correction for each data pair is δ−,(ri,εCΔx, (2)Δ(o−
xi ”' ”) xJ(1)) (17) δhi(2
)=εcΔ, H)Δ((1-xi””))...
(18) δ all 1J (1) δ1. , (1) : l ”” 1 r ”' * N (sゝ;
Let S=112. Here, N ($1. x
(11, h(1) are the number of units, output, and threshold of the S-th layer, respectively, and Wij'' is the weight of the connection between the (s-1)-th layer and the S-th layer.

但し、入力層を第０層とする。評価関数（１）におｒ８
巨”　（１）−ｘ　”　（２）　　　　　　・・・（２
３）となる、但し、Δはデータ対間の差を表すものとす
る。However, the input layer is the 0th layer. r8 in evaluation function (1)
Huge"(1)-x" (2) ...(2
3), where Δ represents the difference between the data pairs.

第１図において、入力層記憶部１２には２個の憶部１４
には２個のＮ（１）次元ベクトルｘ”’（１）。In FIG. 1, the input layer storage section 12 includes two storage sections 14.
has two N(1)-dimensional vectors x''(1).

Ｘ（１）（２）、出力層記憶部１６には２個のＮ（２）
次元べ納される。また、中間層パラメータ記憶部１７に
は中間層の重みＷｉｊ（１）、閾値り、（１）及びこれ
らの修正量δＷ　ｉｊ　”’　、　　δｈｉ（１）が、
出力層パラメータ記憶部１８には出力層の重みｗ　、　
、　（ｚ＋、閾値り、（り及びこれらの修正量δＷｉｊ
”、　　δｈ、（りが格納される。学習は以下の手順で
実行する。X(1)(2), two N(2) in the output layer storage section 16
Dimensionally stored. In addition, the intermediate layer parameter storage unit 17 stores the intermediate layer weight Wij (1), threshold value (1), and their correction amounts δW ij "', δhi (1).
The output layer parameter storage unit 18 stores output layer weights w,
, (z+, threshold value ri, (ri and their correction amount δWij
”, δh, (ri) are stored. Learning is performed in the following steps.

（ａ）中間層パラメータ記憶部１７に格納されているＷ
ｉｊ”ゝ　ｌ、　、　（１）及び出力層パラメータ記憶
部１８に格納されているＷｉｊ　”’＊　　ｈ　＝　”
’　を、例えば乱数で初期化する。(a) W stored in the intermediate layer parameter storage unit 17
ij”ゝl, , (1) and Wij”’* h = ” stored in the output layer parameter storage unit 18
', for example, initialize it with a random number.

ら）中間層パラメータ記憶部１７に格納されているδｗ
ｉｊ（１）、δｈ１（１）、出力層パラメータ記憶部１
８に格納されているδＷｉｊ”、　　δｈ、（り、及び
出力層記憶部１６に格納されているｖ−ｔ−ｏに初期化
する。et al.) δw stored in the intermediate layer parameter storage unit 17
ij (1), δh1 (1), output layer parameter storage unit 1
8, and vto stored in the output layer storage unit 16.

（Ｃ）入力端子１１から、Ｎ３０′次元ベクトルとして
与えられている学習データを１対入力し、それぞれする
。(C) A pair of learning data given as an N30'-dimensional vector is inputted from the input terminal 11, and each is inputted.

（ｄ）中間層計算部１３において、入力層記憶部１２に
格−夕記憶部１７に格納されているｗｉｊ”’＋　　ｔ
ｌｉ（１）から、式（１４）に従ってＸ（１）（１）、
　　ｘ　”’　（２）を計算し中間層記憶部１４に格納
する。また、出力層計算部に格納されているｗ　、　ｊ
＜２＞　、　　ｈ、　（２１から、式（１４）１６に格
納する。(d) In the intermediate layer calculation unit 13, the input layer storage unit 12 stores the case information in the case storage unit 17.
From li(1), according to equation (14), X(1)(1),
x ''' (2) is calculated and stored in the intermediate layer storage unit 14. Also, w, j stored in the output layer calculation unit
<2>, h, (From 21, store in equation (14) 16.

（ｅ）中間層計算部１３において、入力層記憶部１２に
格に格納されているＸ（１）（１）。(e) In the intermediate layer calculation unit 13, X(1)(1) is stored in the input layer storage unit 12.

Ｘ（１）（２）、出力層記憶カ層パラメータ記憶部１８に格納されているｗ　、　ｊ
（りｈ、（りから、データ対が同じカテゴリーに属する
場合は式（１９）、　（２０）、　（２１）に従って、
データ対が異なるカテゴリーに属する場合は式（１９）
、　（２０）。X(1)(2), w, j stored in the output layer storage layer parameter storage unit 18
If the data pair belongs to the same category, according to equations (19), (20), and (21),
If the data pairs belong to different categories, the formula (19)
, (20).

（２２）に従って、δＷｉｊ（１）、δｈ、（１）を計
算し・中間層パラメータ記憶部１７に格納されているδ
Ｗｉｊ”’、　　δｈ、　（１）に加える。また、出力
層計算部１５において、中間層記憶部１４に格納されて
いカテゴリーに属する場合は式（１７Ｌ　（１Ｂ）、　
（２１）に従って、データ対が異なるカテゴリーに属す
る場合は（１７）、　（１８）、　（２２）式に従って
、δｗｉｊ（２）δｈ、　（２）を計算し出力層パラメ
ータ記憶部１８に格納されているδＷｉｊ”、　　δｈ
、（りに加える。According to (22), δWij(1), δh, (1) are calculated and δ stored in the intermediate layer parameter storage unit 17
Wij"', δh, (1). Also, in the output layer calculation unit 15, if it is stored in the intermediate layer storage unit 14 and belongs to the category, the formula (17L (1B),
According to (21), if the data pair belongs to different categories, δwij(2)δh, (2) is calculated according to equations (17), (18), and (22) and stored in the output layer parameter storage unit 18. δWij”, δh
, (Add to ri.

さらに、出力層記憶部１６に格納されている−に属する
場合は式（１５）、　（２３）に従って、データ対が異
なるカテゴリーに属する場合は（１６）、　（２３）に
従ってポテンシャル■を計算し、出力層記憶部１６に格
納されているＶに加える。Further, if the data pair belongs to - stored in the output layer storage unit 16, calculate the potential ■ according to equations (15) and (23), and if the data pair belongs to different categories, calculate the potential ■ according to equations (16) and (23), It is added to V stored in the output layer storage section 16.

（ｆ）未処理の学習データ対があれば手順（Ｃ）〜（ｅ
）を繰り返す。なければ手順（ｇ）以下を実行する。(f) If there are unprocessed learning data pairs, steps (C) to (e)
)repeat. If not, execute step (g) and the following.

（→出力層記憶部１６に格納されている■が閾値■。(→■ stored in the output layer storage unit 16 is the threshold ■).

より小さければ終了する。そうでなければ手順（ハ）以
下を実行する。If it is smaller, terminate. If not, perform step (c) below.

（ロ）中間層パラメータ記憶部１７に格納されているδ
Ｗｉｊ”’、　　δｈ、０）をＷｉｊ（１）　ｌ、　、
　（＋）にそれぞれ加える。また、出力層パラメータ記
憶部１８に格納されているδＷｉｊ”、　　δｈ、＋り
をｗ　、　ｊ（りｈ、　（！ｌ　にそれぞれ加える。(b) δ stored in the intermediate layer parameter storage unit 17
Wij”', δh, 0) as Wij(1) l, ,
Add each to (+). Further, δWij'', δh, and +r stored in the output layer parameter storage unit 18 are added to w, j(rh, and (!l), respectively.

（ｉ）手順（ｂ）以下を繰り返す。(i) Repeat step (b) below.

以上の手続きによって、Ｎ（２層次元出力ベクトル空間
内において、同じカテゴリーに属するデータの分布は１
点に縮まり、異なるカテゴリーに属するデータ分布間は
分離される。By the above procedure, the distribution of data belonging to the same category is 1 in the N (two-layer dimensional output vector space)
The data distributions belonging to different categories are separated.

次に、第２の発明の詳細な説明をする。第２図は、本実
施例の構成を示す図である。本実施例においても、各ニ
ューロンの行う演算が（１４）式で与えられるような入
力層を除いて２層の神経回路網で説明をする。Next, the second invention will be explained in detail. FIG. 2 is a diagram showing the configuration of this embodiment. In this embodiment as well, a two-layer neural network will be explained, excluding the input layer in which the calculations performed by each neuron are given by equation (14).

評価関数（１）における引力ポテンシャルＶ□、及び斥
力ポテンシャルＶ　ｒａｐとしては、例えば、Ｌｔｔ　
　（ｒ）＝　　　　　ｒｚ　　　　　　　　　・　・　
・（２４）Ｖｒｌｌｐ（ｒ）””　　　　　　ｒｚ　　
　　　　　・　・　・（２５）を用いることができる。As the attractive potential V □ and the repulsive potential V rap in the evaluation function (1), for example, Ltt
(r)=rz ・・
・(24) Vrllp(r)”” rz
・・・(25) can be used.

の修正量は、 δ、、ｊ（ｓ）、　−εｄ、（ｓ）（１−ｘ、（ｓｌり
ｘｊ（ｉ−１１、・（２７）δｈ、（ｓｌ＝　−εｄｉ
（１）（１−ｘ％ｌ）り　　　　　・−・（２８）とな
る。ここで、εは正の定数で、ｄ　ｍ　　（ｓ＝１．２
）は、ｄ”　　＝　　（（ａ＋ｂ）Ｎｃ−ｂＮ）　ｘ”−（ａ
＋ｂ）Ｘｃ＋ｂＸ　ＨＨ（２９）：　　ｉ＝ｌ、　　・
・・　Ｎ（＋１で与えられる。The correction amount is δ,,j(s), -εd,(s)(1-x,(sl＝xj(i-11,・(27)δh,(sl=-εdi
(1) (1-x%l) - (28). Here, ε is a positive constant, and d m (s=1.2
) is d” = ((a+b)Nc-bN) x”-(a
+b)Xc+bX HH(29): i=l, ・
・・・N(+1)

・　・　・（３０）第２図において、元ベクトルＸ（（＋１　、入力層記憶部２２にはＮ（０）次中間層記憶部２４にはＮ（１）次５ｌｃ（ｃεＣ）、Ｎｃ（ｃεＣ）及びＮ（！１次元と
なる。但し、ＮｃはカテゴリーＣに属する学習データの
数で、Ｎ、Ｘ＋Ｃ，ｘ、、Ｓｔｃ、Ｓｔは式（８）〜（
１０）で与えられる。学習法として最急降下法を用いる
と、各学習データに対するパラメータ層パラメータ記憶
部２７には中間層の重みＷｉｊ”’閾値り、　（１ゝ及
びこれらの修正量δＷえ、（１）δ３１　、　（＋１が
、出力層パラメータ記憶部２８には出力層の重みＷｉｊ
”、閾値り、（Ｈ及びこれらの修正量δＷｉ、ｌ）、δ
ｈ、（２１が格納される。学習は以下の手順で実行する
。・・・(30) In FIG. 2, the original vector ) and N(! are one-dimensional. However, Nc is the number of learning data belonging to category C, and N,
10) is given by When the steepest descent method is used as the learning method, the parameter layer parameter storage unit 27 for each learning data contains the intermediate layer weights Wij'''thresholds, (1ゝ and these correction amounts δW, (1) δ31, (+1 However, the output layer parameter storage unit 28 stores the output layer weights Wij
”, threshold value, (H and these correction amounts δWi, l), δ
h, (21) are stored. Learning is performed in the following steps.

（ａ）中間層パラメータ記憶部２７に格納されているＷ
ｉｊ”ゝ　ｌ　、　＋１１及び出力層パラメータ記憶部
２８に格納されているＷｉｊ”’、ｈ’１（１）を、例
えば乱数で初期化する。また、出力層記憶部２６に格納
されているＮｃ（ｃεＣ）に、学習データセットに含ま
れるカテゴリーＣに属するデータの数を記憶する。(a) W stored in the intermediate layer parameter storage unit 27
ij''ゝl, +11 and Wij''', h'1 (1) stored in the output layer parameter storage unit 28 are initialized with, for example, random numbers. Further, the number of data belonging to category C included in the learning data set is stored in Nc (cεC) stored in the output layer storage unit 26.

（ｂ）中間層パラメータ記憶部２７に格納されているδ
Ｗ８、（１）、δｈ、（１）、出力層パラメータ記憶部
２８に格納されているδＷｉｊ”、　　δｈ、　＋Ｚ）
、及び出力層記憶部２６に格納されているｖＳｌｃ、Ｘ
１ｃ（Ｃ６０）をＯに初期化する。(b) δ stored in the intermediate layer parameter storage unit 27
W8, (1), δh, (1), δWij”, δh, +Z) stored in the output layer parameter storage unit 28
, and vSlc,X stored in the output layer storage unit 26
Initialize 1c (C60) to O.

（Ｃ）入力端子２１から、Ｎ（０）次元ベクトルとして
与えられている学習データを入力し、それぞれをｘ（０
）　　として入力層記憶部２２に格納する。(C) Input learning data given as N(0)-dimensional vectors from the input terminal 21, and input each x(0)
) in the input layer storage unit 22.

（ｄ）中間層計算部２３において、入力層記憶部２２に
格納されているｘ９０）　及び中間層パラメータ記憶部
２７に格納されているＷｉｊ（１）、ｈｉ（１）から、
式（１４）に従ってｘ（１ゝを計算し中間層記憶部２４
に格納する。また、出力層計算部２５において、中間層
記憶部２４に格納されているｘ（１）及び出力層パラメ
ータ記憶部２８に格納されているｗ　ｉｊ（り　、　　
ｈ、　（２）　から、式（１４）に従ってｘ３２）　を
計算し出力層記憶部２６に格納する。(d) In the intermediate layer calculation unit 23, from x90) stored in the input layer storage unit 22 and Wij(1) and hi(1) stored in the intermediate layer parameter storage unit 27,
Calculate x(1ゝ) according to equation (14) and store it in the intermediate storage unit 24.
Store in. In addition, in the output layer calculation unit 25, x(1) stored in the intermediate layer storage unit 24 and w ij(ri,
h, (2), x32) is calculated according to equation (14) and stored in the output layer storage unit 26.

（ｅ）出力層計算部２５において、出力層記憶部２６に
格記憶部２６に格納されているＳ　ＩＨに加える。また
、加える。(e) The output layer calculation section 25 adds the S IH stored in the case storage section 26 to the output layer storage section 26 . Also, add.

ここで、Ｃは入力された学習データの属するカテゴリー
である。Here, C is the category to which the input learning data belongs.

（ｆ）未処理の学習データがあれば（Ｃ）〜（ｅ）を繰
り返す。(f) If there is unprocessed learning data, repeat (C) to (e).

なければ（粉取下を実行する。If not (execute powder removal).

（ｇ）出力層計算部２５において、出力層記憶部２６に
格納されているＮｃ、　Ｘ　ｌｃ＋　　Ｓ　ｚｃ　（Ｃ
ＣＣ）から式（２６）に従ってポテンシャル■を計算し
、■が閾値■。より小さければ終了する。そうでなけれ
ば同じ学習データセットに対して手順（ハ）以下を実行
する。(g) In the output layer calculation unit 25, Nc, X lc+ S zc (C
CC), the potential ■ is calculated according to equation (26), and ■ is the threshold ■. If it is smaller, terminate. If not, perform step (c) and the following for the same training data set.

（ハ）手順（Ｃ）、　（ｄ）を実行し、出力層計算部２
５において、中間層記憶部２４に格納されているＸ（＋
）、出力層から式（２７）　〜（２９）　ニ従ッテ６　
Ｗｉｊ”、　６　ｈ％”を計算し出力層パラメータ記憶
部２８に格納されているδＷ　ｉ　ｊ　”’　＊　　δ
ｈ、（り　に加える。また、出力Ｓ　ｚｃ、出力層パラ
メータ記憶部に格納されているｗ、、＋！ｌ、　ｈ、＋
２１から、式（２９）、　（３０）に従ってｄ　（＋１
を計算し、中間層記憶部２４に格納する。(c) Execute steps (C) and (d) and output layer calculation unit 2
5, X(+
), from the output layer Equations (27) to (29)
6 h%” is calculated and stored in the output layer parameter storage unit 28 δW i j ”’ * δ
h, (Add to ri. Also, the output S zc, w,, +!l, h, + stored in the output layer parameter storage unit
21, d (+1
is calculated and stored in the intermediate storage unit 24.

（ｉ）中間層計算部２３において、入力層記憶部２２に
格納されているｘ（０）、中間層記憶部２４に格納され
２７に格納されているＷ８．０）から式（２７Ｌ　（２
Ｂ）に従ってδＷｉｊ”’、　　δｈ、　（＋１を計算
し中間層パラメータ記憶部２７に格納されているδＷ、
Ｊ（１）δｈ、（１）に加える。(i) In the intermediate layer calculation unit 23, from x(0) stored in the input layer storage unit 22 and W8.0 stored in the intermediate layer storage unit 24 and 27), the formula (27L (2
According to B), δWij"', δh, (+1 is calculated and stored in the intermediate layer parameter storage unit 27, δW,
J(1) δh, add to (1).

（ｊ）未処理の学習データがあれば手順（５）、（ｉ）
を繰り返す。なければ手順（ロ）以下を実行する。(j) If there is unprocessed learning data, follow steps (5) and (i)
repeat. If not, perform step (b) below.

仮）中間層パラメータ記憶部２７に格納されているδＷ
Ｂ（１）、δｈ、　（＋１をＷｉｊ（１１，ｈ％Ｉ）に
それぞれ加える。また出力層パラメータ記憶部２８に格
納されている６ｗ　、　ｊ（Ｚ　ｌ、δｈ、　＋２１を
Ｗｉｊ”’ｈ、　（Ｚ）　にそれぞれ加える。Temporary) δW stored in the intermediate layer parameter storage unit 27
B(1), δh, (+1) are added to Wij(11,h%I) respectively. Also, 6w, j(Z l, δh, +21 stored in the output layer parameter storage unit 28 are added to Wij'''h, Add each to (Z).

（１）手順（ｂ）以下を繰り返す。(1) Step (b) Repeat the following steps.

以上の手続きによって、Ｎ３２′次元出力ベクトル空間
内において、同じカテゴリーに属するデータの分布は１
点に縮まり、異なるカテゴリーに属するデータ分布間は
分離され、しかも、計算時間は０（Ｎ）ですむ。By the above procedure, the distribution of data belonging to the same category in the N32′-dimensional output vector space is 1
The data distributions belonging to different categories are separated, and the calculation time is 0 (N).

第３図は、第３の発明の実施例の構成を示す図である。FIG. 3 is a diagram showing the configuration of an embodiment of the third invention.

本実施例では、２次元画像データをＮ×Ｍ次元ベクトル
として神経回路網に入力し、パターン変換に対する不変
性を学習させる。本発明は任意のパターン変換に適用可
能であるが、本実施例ではパターン変換が平行移動と回
転の合成からなる場合について説明する。In this embodiment, two-dimensional image data is input to the neural network as an N×M-dimensional vector, and invariance with respect to pattern conversion is learned. Although the present invention is applicable to any pattern transformation, in this embodiment, a case where the pattern transformation consists of a combination of parallel movement and rotation will be described.

まず、第３図の部分神経回路網３１では、入カバターン
を縦方向に平行移動したときに部分神経回路網３１の出
力が不変になるように部分神経回路網３１の学習を行う
。部分神経回路ｗＡ３１の学習が終了したら、次に部分
神経回路網３２で入カバターンを横方向に平行移動した
ときに部分神経回路Ｗ４３２の出力が不変になるように
部分神経回路網３２の学習を行う。部分神経回路ＷＩ３
２の学習が終了したら、最後に部分神経回路網３２で入
カバターンを回転したときに部分神経回路網３３の出力
が不変になるように部分神経回路網３３の学習を行う。First, in the partial neural network 31 of FIG. 3, learning is performed on the partial neural network 31 so that the output of the partial neural network 31 remains unchanged when the input cover pattern is translated in the vertical direction. After the learning of the partial neural circuit wA31 is completed, the partial neural network 32 is then trained so that the output of the partial neural circuit W432 remains unchanged when the input cover pattern is translated in the horizontal direction. . Partial neural circuit WI3
After completing the learning in step 2, the partial neural network 33 is finally trained so that when the input pattern is rotated in the partial neural network 32, the output of the partial neural network 33 remains unchanged.

このように学習を行うことによって、部分神経回路網３
３の出力は平行移動と回転の合成からなるパターン変換
に対して不変になる。しかも、学習は平行移動と回転を
合成した全ての変換に対して行う必要はなく、各々の変
換に対する不変性を各部分神経回路網に学習させるだけ
でよい。By learning in this way, the partial neural network 3
The output of No. 3 becomes invariant to pattern transformation consisting of a combination of translation and rotation. Furthermore, learning does not need to be performed for all transformations that combine translation and rotation, and it is sufficient to have each partial neural network learn invariance for each transformation.

〔Effect of the invention〕

以上説明したように、第１の発明では、学習パターンの
分布を１点に縮めるように学習を行うため、塊化能力の
高い学習結果が得られる。第２の発明では、さらに計算
時間をＯ（Ｎ”）から０（Ｎ）に短縮することができる
。第３の発明では、部分回路網を独立に学習できること
を利用して、各部分回路網に機能分担をさせることがで
きる。As explained above, in the first invention, since learning is performed so as to reduce the distribution of learning patterns to one point, learning results with high agglomeration ability can be obtained. In the second invention, the calculation time can be further reduced from O(N") to 0(N). In the third invention, by utilizing the fact that the partial networks can be learned independently, each partial network can be made to share functions.

[Brief explanation of drawings]

第１図は、第１の発明の実施例の構成を表す図、第２図
は、第２の発明の実施例の構成を表す図、第３図は、第
３の発明の実施例の構成を表す図である。１１・・・・１２・・・・１３・・・・１４・・・・１５・・・・１６・・・・１７・・・・１８・・・・１９・・・・２１・・・・２２・・・・２３・・・・２４・・・・入力端子入力層記憶部中間層計算部中間層記憶部・出力層計算部・出力層記憶部・中間層パラメータ記憶部・出力層パラメータ記憶部・出力端子・入力端子・入力層記憶部・中間層計算部・中間層記憶部２５・２６・２７・２８・２９・３１・３２・３３・３４・３５・・出力層計算部・出力層記憶部・中間層パラメータ記憶部・出力層パラメータ記憶部・出力端子・部分回路網・部分回路網・部分回路網・入力端子・出力端子FIG. 1 is a diagram showing the configuration of an embodiment of the first invention, FIG. 2 is a diagram showing the configuration of an embodiment of the second invention, and FIG. 3 is a diagram showing the configuration of an embodiment of the third invention. FIG. 11... 12... 13... 14... 15... 16... 17... 18... 19... 21... 22... 23... 24... Input terminal Input layer storage section Middle layer calculation section Middle layer storage section / Output layer calculation section / Output layer storage section / Middle layer parameter storage section / Output layer parameter storage - output terminal - input terminal - input layer storage section - middle layer calculation section - middle layer storage section Storage section, middle layer parameter storage section, output layer parameter storage section, output terminal, partial circuit network, partial circuit network, partial circuit network, input terminal, output terminal

Claims

[Claims]

(1) In a pattern learning device using a feedforward neural network, when two learning patterns belong to the same category, the distance between the two learning patterns in the output layer is calculated using the learnable parameters of the neural network. If the two learning patterns belong to different categories, the method has means for modifying the learnable parameters of the neural network so as to increase the distance between the two learning patterns in the output layer. A pattern learning device featuring:

(2) The pattern learning device according to claim 1, wherein the means for modifying the parameters minimizes the potential represented by an even-order polynomial of the distance between the two patterns by modifying the distance between the two learning patterns. A pattern learning device characterized in that the pattern learning device performs the following operations.

(3) A pattern learning device using a feedforward neural network consisting of N partial circuit networks, characterized in that each partial network is constituted by the pattern learning device according to claim 1 or 2. learning device.