JP3238178B2

JP3238178B2 - Learning machine learning method

Info

Publication number: JP3238178B2
Application number: JP34538791A
Authority: JP
Inventors: 貢己山田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1990-12-27
Filing date: 1991-12-26
Publication date: 2001-12-10
Anticipated expiration: 2016-12-10
Also published as: JPH05334276A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明はニューラルネットワーク
や非線形データ変換装置などに応用される学習機械の学
習法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a learning method of a learning machine applied to a neural network, a non-linear data converter and the like.

【０００２】[0002]

【従来の技術】近年、ニューラルネットワークを代表と
する非線形なデータ変換を行う学習機械が注目されてい
る。ニューラルネットワークは脳の神経回路のモデルと
して提案されたものであり、学習機能と非線形性とを兼
ね備えていることから、パターン認識や株価予測、診断
などの手法として使われたり、ロボットの制御などの最
適運動制御に適用され始めている。2. Description of the Related Art In recent years, a learning machine which performs non-linear data conversion represented by a neural network has attracted attention. Neural networks have been proposed as models of neural circuits in the brain.Since they have both a learning function and nonlinearity, they are used as a method for pattern recognition, stock price prediction, diagnosis, and control of robots. It has begun to be applied to optimal motion control.

【０００３】上記の学習機械では、ロボットの制御など
に実際に使用される前に、適切に動作を行うように、学
習機械の動作を決定する制御パラメータを予め既知の入
出力データによって調整しておかなければならない。換
言すれば、学習機械を学習させなければならない。In the above learning machine, control parameters for determining the operation of the learning machine are adjusted by known input / output data so that the learning machine can operate properly before it is actually used for robot control or the like. I have to put it. In other words, the learning machine must be trained.

【０００４】かかるニューラルネットワークの学習法の
１つとして「教師あり学習」が知られている。この「教
師あり学習」では、バックプロパゲーション学習に代表
されるように、入力ベクトル及びこれに対応し得られる
出力ベクトルが目標とすべき教師信号との対から成る学
習データが与えられると、ある定められた損失関数が小
さくなるようにニューロン間の結合係数などの制御パラ
メータを微調整していくものである。この「教師あり学
習」に基づく学習法はニューラルネットワーク以外の非
線形データ変換を行う学習機械にも適用される。[0004] "Supervised learning" is known as one of the neural network learning methods. In the “supervised learning”, as represented by back propagation learning, given learning data composed of a pair of an input vector and a correspondingly obtained output vector with a target teacher signal, is given. The control parameters such as the coupling coefficient between neurons are finely adjusted so that the determined loss function becomes small. The learning method based on “supervised learning” is also applied to a learning machine that performs non-linear data conversion other than a neural network.

【０００５】つまり、従来の学習法では、出力されて欲
しい、即ち模範となる教師信号（以下、模範教師と称
す）が設定され、学習機械が出力する出力ベクトルと上
記模範教師との差が小さくなるように学習機械を学習さ
せていた。In other words, in the conventional learning method, a teacher signal to be output, that is, a model teacher signal (hereinafter referred to as a model teacher) is set, and the difference between the output vector output by the learning machine and the model teacher is small. I was learning the learning machine to become.

【０００６】一方、実際に学習機械を利用するロボット
制御などにおいては、例えば、ロボットのア−ムの動作
を指定する場合、障害物を避けるためにア−ムの通過が
禁じられる条件を与える必要性が多い。また、医療の分
野における診断では、例えば、医師がＸ線写真または心
電図を診て患者の病名を決める場合、医師は消去法で可
能性のない病名を消去し、最終的に患者の真の病名を決
定する。このように、学習機械を利用する産業分野で
は、出力されて欲しくない、即ち忌避すべき教師信号
（以下、反面教師と称す）が与えられる機会が多い。On the other hand, in a robot control or the like that actually uses a learning machine, for example, when an operation of a robot arm is designated, it is necessary to provide a condition that the passage of the arm is prohibited in order to avoid an obstacle. There are many. Further, in the diagnosis in the medical field, for example, when a doctor determines a patient's disease name by examining an X-ray or an electrocardiogram, the doctor deletes a disease name that is not possible by an elimination method, and finally the true disease name of the patient. To determine. As described above, in the industrial field using the learning machine, there are many occasions in which a teacher signal that is not desired to be output, that is, a teacher signal to be avoided (hereinafter, referred to as a teacher) is given.

【０００７】[0007]

【発明が解決しようとする課題】しかしながら、学習機
械に学習データの一部として反面教師を与えた場合、こ
れを出力ベクトルとして出力しないように学習機械を学
習させることができなかった。そのため、作業者が反面
教師を与えられたときには、作業者の曖昧な知識や経験
により模範教師を設定し直し、この設定し直された模範
教師を学習機械に学習データの一部として与え、学習機
械が出力する出力ベクトルと上記模範教師との差が小さ
くなるように学習させていた。However, when a learning machine is given a teacher as a part of the learning data, the learning machine cannot be trained so as not to output this as an output vector. Therefore, when the worker is given a teacher, on the other hand, the model teacher is reset based on the worker's ambiguous knowledge and experience, and the reset model teacher is given to the learning machine as a part of the learning data, and the learning is performed. The learning was performed so that the difference between the output vector output by the machine and the model teacher became small.

【０００８】それゆえ、模範教師を用いた学習機械の学
習法は手間である。Therefore, the learning method of the learning machine using the model teacher is troublesome.

【０００９】また、学習機械に学習データの一部として
模範教師を与えた場合、模範教師が完全なデータであろ
うと不完全なデータであろうとも出力ベクトルは模範教
師に収束しなければならない。それゆえ、時として収束
不可能な場合がある。また、模範教師を多数与えた場
合、特定の模範教師に偏って出力ベクトルが収束してし
まうように制御パラメータが調整されてしまう場合があ
る。一方、模範教師を与える場合に比較して、反面教師
を学習機械に与える場合には、出力ベクトルが収束すべ
き範囲は広く、収束が容易である。また、様々な反面教
師を与えることにより、均整の取れた出力ベクトルを得
ることが可能になる。When an example teacher is given to a learning machine as a part of learning data, the output vector must converge on the example teacher regardless of whether the example teacher is complete data or incomplete data. Therefore, sometimes convergence is impossible. Also, when a large number of model teachers are given, the control parameters may be adjusted so that the output vector converges on a specific model teacher. On the other hand, when the teacher is given to the learning machine as compared with the case where the model teacher is given, the range in which the output vector should converge is wider and the convergence is easier. Also, by giving various teachers, it is possible to obtain a well-balanced output vector.

【００１０】本発明の目的は、上述の問題点に鑑み、学
習機械を学習させる場合に、収束が容易であり、特定の
模範教師に偏って出力ベクトルが収束してしまうように
制御パラメータが調整されることなく均整の取れた出力
ベクトルを出力するような制御パラメータに調整するこ
とができる学習機械の学習法を提供するものである。SUMMARY OF THE INVENTION In view of the above-mentioned problems, an object of the present invention is to adjust a control parameter so that convergence is easy when learning a learning machine, and an output vector converges toward a specific model teacher. An object of the present invention is to provide a learning method of a learning machine that can adjust a control parameter to output a well-balanced output vector without being performed.

【００１１】[0011]

【課題を解決するための手段】上記目的を達成するた
め、本発明は、入力ベクトルに対し所定の制御パラメー
タを用いて出力ベクトルを出力する学習機械においてそ
の制御パラメータを教師信号を用いて学習させる方法に
おいて、教師信号として、出力ベクトルとして出力され
るべきでない忌避すべきデータを意味する反面教師を定
義し、学習機械に入力ベクトルの値と、入力ベクトルに
対応する教師信号の値を与え、入力ベクトルの値に初期
の制御パラメータを用いて出力ベクトルの値を算出し、
反面教師と出力ベクトルとの差ｘによって、差ｘの増加
と共に減少する損失関数ｒを以下の式で計算し、ｒ＝exp （−ｘ²）損失関数の値が所定の値よりも大きい場合に、損失関数
の値が所定の値よりも小さくなるまで、損失関数の値が
減少するように制御パラメータを更新して損失関数の値
を繰り返し算出し、算出された損失関数が所定の値より
も小さい値に達したとき、学習機械に更新された最新の
制御パラメータを組み込むことから構成されることを特
徴とする。In order to achieve the above object, the present invention provides a learning machine which outputs an output vector using a predetermined control parameter for an input vector, and trains the control parameter using a teacher signal. In the method, a teacher signal is defined as a teacher signal, which means data to be avoided that should not be output as an output vector, and a learning machine is provided with a value of an input vector and a value of a teacher signal corresponding to the input vector. Calculate the output vector value using the initial control parameters for the vector value,
On the other hand the difference x between the teacher and the output vector, and calculates the loss function r to decrease with increasing difference x in the following equation, when the value of r = exp (-x ²⁾ loss function is larger than a predetermined value Until the value of the loss function becomes smaller than the predetermined value, the control parameter is updated so that the value of the loss function is reduced, and the value of the loss function is repeatedly calculated, and the calculated loss function is smaller than the predetermined value. It is characterized in that when the value reaches a small value, the learning machine incorporates the updated latest control parameters.

【００１２】[0012]

【作用】上記の構成において、出力ベクトルと反面教師
との差ｘは損失関数ｒ＝exp （−ｘ²）に従うと想定さ
れた。この関数は出願人によってなされた多くの計算結
果に基づいて決められた。[Action] In the above configuration, the difference x between the output vector and the contrary teacher was assumed to follow the loss function r = exp (-x ^2). This function has been determined based on a number of calculations made by the applicant.

【００１３】そして、損失関数が最小値に成るように学
習機械内部のパラメータを調節するようにした。Then, the parameters inside the learning machine are adjusted so that the loss function becomes a minimum value.

【００１４】このようにしてパラメータを調節すること
により、反面教師を利用して学習を行なう学習機械内部
のパラメータは最適の値に調整され得る。By adjusting the parameters in this manner, the parameters inside the learning machine that performs learning using the teacher can be adjusted to optimal values.

【００１５】[0015]

【実施例】以下、本発明の実施例を図面に基づいて説明
する。図１は本発明の第１実施例に係る学習機械の概念
図である。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a conceptual diagram of a learning machine according to a first embodiment of the present invention.

【００１６】この学習機械は、Ｌ次元の入力ベクトルＩ
₁〜Ｉ_Lが入力される入力端子１と、パラメータθ
_j（ｊ＝１〜Ｍ）が定義された変換部２と、Ｎ次元の出
力ベクトルＯ₁〜Ｏ_Nが出力される出力端子３と、パラ
メータ調整手段７と、模範教師による損失関数計算手段
９及び反面教師による損失関数計算手段１０とから成る
損失関数計算手段８より構成されている。This learning machine has an L-dimensional input vector I
An input terminal _{1 1} ~I _L is input, the parameter θ
and _{j (j} = 1~M) converter 2 which is defined, and the output terminal 3 of output vector O ₁ ~ O _N N-dimensional is outputted, the parameter adjusting means 7, losses due to model teaching function calculating means 9 And a loss function calculating means 8 comprising a loss function calculating means 10 by a teacher.

【００１７】そして、学習の際には損失関数計算手段８
によって算出された損失関数及び、損失関数の、パラメ
ータθ_jに対する微係数の値を利用して、パラメータ調
整手段７が変換部２のパラメータθ_jを調整し、損失関
数の値が小さくなるように学習機械を学習させる。模範
教師が与えられたときは、模範教師による損失関数計算
手段９によって従来方法による損失関数及び微係数の計
算が行なわれ、反面教師が与えられたときには、反面教
師による損失関数計算手段１０によって損失関数及び微
係数の計算が行なわれる。When learning, the loss function calculating means 8 is used.
Loss function and calculated by, the loss function, by using the value of the differential coefficient for the parameter theta _j, parameter adjusting means 7 to adjust the parameters theta _j of the transformation unit 2, so that the value of the loss function is reduced Train the learning machine. When the model teacher is given, the loss function calculating means 9 by the model teacher calculates the loss function and the differential coefficient by the conventional method, and when the model teacher is given, the loss function calculating means 10 by the reverse teacher gives the loss. Calculation of functions and derivatives is performed.

【００１８】次に、この実施例を３層のフィードフォワ
ード・ニューラルネットワークのバックプロパゲーショ
ン学習に適用した場合について説明する。Next, a case will be described in which this embodiment is applied to back propagation learning of a three-layer feedforward neural network.

【００１９】図２はこの概念を示す図であり、この学習
機械は、ニューロン素子１１と、入力ベクトルＩ₁〜Ｉ
_Lが入力される入力層１２と、隠れ層１３と、出力ベク
トルＯ₁〜Ｏ_Nが出力される出力層１４と、入力層１２
と隠れ層１３との間で結合係数ｗ^HI _jiが定義された結合
部１５と、隠れ層１３と出力層１４との間で結合係数ｗ
^OH _kjが定義された結合部１６から成る３層フィードフォ
ワード・ニューラルネットワークと、パラメータ調整手
段１７と、模範教師による損失関数計算手段１９及び反
面教師による損失関数計算手段２０より成る損失関数計
算手段１８とから構成されている。また、隠れ層１３の
ニューロン素子１１にはしきい値θ^H _jが定義され、出
力層１４のニューロン素子１１にはしきい値θ^O _kが定
義されている。FIG. 2 is a diagram showing this concept. This learning machine comprises a neuron element 11 and input vectors I _{1 to} I _1.
An input layer 12 _{which L} is input, a hidden layer 13, and the output layer 14 output vector O ₁ ~ O _N is output, the input layer 12
The coupling unit 15 in which the coupling coefficient w ^HI _ji is defined between the hidden layer 13 and the hidden layer 13, and the coupling coefficient w between the hidden layer 13 and the output layer 14
^A three-layer feedforward neural network comprising a coupling unit 16 in which ^OH _kj is defined; a parameter adjusting means 17; a loss function calculating means 19 by a model teacher; and a loss function calculating means 18 comprising a loss function calculating means 20 by a teacher. It is composed of The threshold theta ^H _j is defined in the neuron element 11 of the hidden layer 13, the threshold theta ^O _k is defined on the neuron element 11 of the output layer 14.

【００２０】そして、学習の際には、損失関数計算手段
１８が、損失関数、損失関数の結合係数に対する微係
数、及び、損失関数のしきい値に対する微係数を計算
し、これらの結果を利用してパラメータ調整手段１７が
結合部１５，１６の各結合係数ｗ^HI _ji，ｗ^OH _kj、及びニ
ューロン素子１１のしきい値θ^H _j，θ^O _iを調整し、
損失関数の値が小さくなるように学習させる。At the time of learning, the loss function calculating means 18 calculates a loss function, a differential coefficient for a coupling coefficient of the loss function, and a differential coefficient for a threshold value of the loss function, and uses these results. Then, the parameter adjusting means 17 adjusts the coupling coefficients w ^HI _ji and w ^OH _{kj of} the coupling units 15 and 16 and the threshold values θ ^H _j and θ ^O _i of the neuron element 11,
Learning is performed so that the value of the loss function becomes small.

【００２１】以下、この学習方法を具体的に説明する。Hereinafter, this learning method will be specifically described.

【００２２】いま、入力層１２のｉ番目のニューロン素
子１１への入力データをＩ_i、隠れ層１３のｊ番目のニ
ューロン素子１１への入力をＩ^H _j、同出力をＯ^H _j、
出力層１４のｋ番目のニューロン素子１１への入力をＩ
^O _k、出力データをＯ^O _kとすると、入出力は次の
（１）〜（４）式で示される。Now, the input data to the i-th neuron element 11 of the input layer 12 is I _i , the input to the j-th neuron element 11 of the hidden layer 13 is I ^H _j , and the output is O ^H _j ,
The input to the k-th neuron element 11 of the output layer 14 is
^O _k, and the output data to O ^O _k, O is represented by the following (1) to (4) below.

【００２３】[0023]

【数１】ただし、（２），（４）式での関数ｆ（ｘ）は、(Equation 1) However, the function f (x) in the equations (2) and (4) is

【数２】で示され、この微分は次の（６）式で示される。(Equation 2) This derivative is expressed by the following equation (6).

【００２４】ｆ’（Ｉ^H _j）＝Ｏ^H _j（１−Ｏ^H _j） …（６）また、（３）式に示すＭ’は、隠れ層１３のニューロン
素子１１の個数であり、学習データの個数をＰ個とす
る。そして、特にｐ番目の学習データに対応する入出力
を表現したいときには、それぞれ、Ｉ_i ^p，Ｉ^H _j ^p，Ｏ^H _j ^p，Ｏ_k ^p で表わし、ｐ番目の教師信号をｙ^p _k（ｋ＝１，…，
Ｎ）で表わす。また、教師信号の種類をＴ_pで表わし、
ｐ番目の教師信号が模範教師ならばＴ_p＝＋、反面教師
ならばＴ_p＝−で示す。[0024] ^{_{f '(I H j) =}} O H j (1-O H j) ... (6) Further, M shown in (3)' is the number of neuron elements 11 of the hidden layer 13, the learning Let the number of data be P. And in particular when it is desired to express the output corresponding to the p-th training data, _{^{^{_{respectively, I i p, I H j}}}} p, O H j p, expressed in O _k ^p, the p-th teacher signal y ^p _k ( k = 1, ...,
N). Also, the type of the teacher signal is represented by T _p ,
If the p-th teacher signal is a model teacher, it is represented by T _p = +, and if it is a teacher, it is represented by T _p = −.

【００２５】いま、損失関数ｒを模範教師項ｒ₊と反面
教師項ｒ_-との和の形で示せば、次の（７）式が得られ
る。Now, if the loss function r is expressed in the form of the sum of the model teacher term r ₊ and the inverse teacher term r ₋ , the following equation (7) is obtained.

【００２６】ｒ＝ｒ₊＋ｒ_- …（７）そして、模範教師項ｒ₊、反面教師項ｒ_-とを次の
（８）〜（１０）式で表わす。R = r ₊ + r ₋ (7) Then, the model teacher term r ₊ and the opposite teacher term r ₋ are represented by the following equations (8) to (10).

【００２７】[0027]

【数３】ただし、以下である。(Equation 3) However, it is as follows.

【００２８】ｒ₊ ^p：ｒ₊におけるｐ番目の教師信号の
寄与ｒ_- ^p：ｒ_-におけるｐ番目の教師信号の寄与Ｒ_p：ｐ番目の教師信号と出力データとの差 Σ_p:D：全ての模範教師についての和 Σ_p:U：全ての反面教師についての和 γ_-：損失関数に寄与する模範教師と反面教師との比率
を調整するパラメータ β：反面教師が出力データに影響を及ぼす距離を調整す
るパラメータそして、パラメータγ_-を調整することによって、模範
教師と反面教師のうち主として学習に効いてくるものを
選択することができる。つまり、（９）式において、パ
ラメータγ_-を大きくすれば損失関数の反面教師項ｒ_-
が大きくなり、反面教師の影響を大きくすることができ
る。また、パラメータβを調整することによって、学習
の結果得られる出力ベクトルが反面教師からどれだけ離
れなければならないかを調整することができる。[0028] r ₊ ^p: the contribution of the p-th teacher signal in r ₊ r _{_-} ^p: r _- contribution of p-th teacher signal in R _p: difference between p-th teacher signal and the output data sigma _{p: D:} sum sigma _p for all of the model _{teacher: U:} sum of all the other hand teacher gamma _-: parameter to adjust the ratio of contributing to the loss function model teacher and although teacher beta: contrary teacher affects the output data parameter to adjust the distance and the parameter gamma _- by adjusting, can be selected which come mainly worked on learning of a model teacher and although the teacher. That is, in equation (9), the parameter gamma _- significantly although teacher section loss function if r _-
And the influence of the teacher can be increased. Further, by adjusting the parameter β, it is possible to adjust how much the output vector obtained as a result of the learning must be apart from the teacher.

【００２９】そして、（９），（１０）式より、損失関
数ｒの反面教師項の関数系が、各学習データｙ^p _kにつ
いて、ｅｘｐ（−Ｒ_p ²）の関数となっていることがわ
かる。これは、図３に示すような特徴をもつ曲線とな
る。即ち、反面教師に対する損失関数のＲ_pに対する微
係数（以後、単に「微係数」と呼ぶ）が、Ｒ_pが十分大
きくなるにしたがい０に漸近するように定められている
ため、遠方の反面教師の影響が小さくなり、学習の安定
性が保証される。さらに、Ｒ_p＝０において微係数が０
であるため、汎用的で安定した反面教師学習を実現でき
る。これは、出力データと反面教師が殆ど等しいとき
（Ｒ_p＝０）は、とにかくＲ_pが増大すれば学習はどの
方向に進もうが構わないから、微係数を小さくしておく
ことが全ての方向への進行可能性を温存しておくことに
なるからである。ある学習データについての微係数が小
さいときは、主としてＲ_pがβ程度の値であるような他
の学習データの影響によって学習が進むことが期待され
る。また、損失関数がＲ_pに対して指数関数よりも急激
に減衰するので、反面教師の影響は近距離にしか及ばな
い。[0029] Then, it is a function of (9), (10) from the equation, a function-based although teacher term loss function r, for each training data ^{_{_{y p k, exp (-R p}}} 2) Understand. This results in a curve having the characteristics shown in FIG. That is, the derivative of the loss function for the teacher with respect to R _p (hereinafter simply referred to as the “differential coefficient”) is set so as to gradually approach 0 as R _p becomes sufficiently large. Is small, and learning stability is guaranteed. Further, when R _p = 0, the derivative is 0
Therefore, general-purpose and stable teacher learning can be realized. This is because when the output data and the teacher are almost the same (R _p = 0), it does not matter in which direction the learning proceeds if R _p increases anyway. This is because the possibility of proceeding in the direction is preserved. When the differential coefficient of a certain training data is small, it is expected that advances learning primarily by the influence of the other learning data such as R _p is a value of about beta. Further, since the loss function is rapidly attenuated than exponential with respect to R _p, contrary effect of teachers only extends a short distance.

【００３０】また、教師信号をＮ次元のベクトルとして
考えたとき、ベクトルの成分毎に模範教師であったり、
反面教師であったりすることもある。つまり、１個の教
師信号の中に模範教師成分と反面教師成分とが混在して
いることがある。しかしこのような場合でも（７）〜
（１０）式を若干変更することによって損失関数の算出
が可能である。ここでの実施例では、模範教師成分の
み、あるいは反面教師成分のみで構成される教師信号を
考慮した定式化を行なっている。When the teacher signal is considered as an N-dimensional vector, each component of the vector is a model teacher,
On the other hand, he may be a teacher. That is, a model teacher component and a reverse teacher component may be mixed in one teacher signal. However, even in such a case, (7) ~
The loss function can be calculated by slightly changing the expression (10). In this embodiment, the formulation is performed in consideration of a teacher signal composed of only the model teacher component or only the teacher signal.

【００３１】そして、学習は結合係数を次の（１１），
（１２）式に示すように変化させることで行なわれる。Then, in learning, the coupling coefficient is calculated by the following (11),
This is performed by changing as shown in equation (12).

【００３２】[0032]

【数４】ただし、θは（１）〜（４）式における結合係数
ｗ^HI _ji，ｗ^OH _kj、または、しきい値θ^H _j，θ^O _k等の
データの総称として用いられるパラメータを表す。ま
た、Δθ（ｔ）は反復計算による学習において、パラメ
ータの微調整がｔ回目のときのパラメータθの更新量で
ある。また、ηは学習係数であり、αは慣性項である。(Equation 4) However, theta represents the parameters used as the general term for data such as the coupling coefficient w ^HI _ji, w ^OH _kj or a threshold θ ^H _j, θ ^O _k in (1) to (4) below. Δθ (t) is an update amount of the parameter θ when the parameter is finely adjusted t times in learning by iterative calculation. Further, η is a learning coefficient, and α is an inertia term.

【００３３】このような学習において、各学習データが
提示されるたびに結合係数、しきい値を更新させる方法
もあるが、本実施例においては全学習データに関する損
失関数や、損失関数の、パラメータΘに対する微係数を
計算し、パラメータの値を更新するアルゴリズムで学習
を行なうようにした。従って、例えば、Ｐ個の学習デー
タを使用した場合の損失関数と、これと同一の学習デー
タをそのまま２倍に増やした２Ｐ個の学習データを使用
した場合の損失関数とが等しい値をもつようにするため
に、（１２）式において全体に１／Ｐを乗じている。In such learning, there is a method of updating the coupling coefficient and the threshold value each time each learning data is presented. In this embodiment, however, a loss function for all learning data and a parameter of the loss function are used. Learning is performed using an algorithm that calculates the derivative of Θ and updates the value of the parameter. Therefore, for example, the loss function when using P pieces of learning data and the loss function when using 2P pieces of learning data obtained by doubling the same learning data as it is, have the same value. In the equation (12), 1 / P is multiplied as a whole.

【００３４】学習終了条件は、ｒ＜Ｅ_L …（１３）と設定した。ただし、Ｅ_Lは学習の終了条件を示すパラ
メータである。The learning end condition is set as follows: r <E _L (13) Here, E _L is a parameter indicating a learning end condition.

【００３５】また、ニューラルネットワークの非線形性
やモデルの能力不足などのために学習不可能であった
り、実用的な長さの時間内にはｒが十分に小さくならな
いことがある。そのため、通常は繰り返しの最大回数を
あらかじめ決めておいて、その最大回数に達した時点で
学習を終了させ、結合係数の初期値やその他のパラメー
タを変更して何回か学習を試行する。場合によってはモ
デルを検討しなおすようにする。Further, learning may not be possible due to the non-linearity of the neural network or lack of model capability, or r may not become sufficiently small within a practically long time. Therefore, usually, the maximum number of repetitions is determined in advance, and when the maximum number of times is reached, the learning is terminated, and the learning is repeated several times by changing the initial value of the coupling coefficient and other parameters. In some cases, review the model.

【００３６】次に、上述した学習方法を図４に示したフ
ローチャートを用いて系統的に説明する。Next, the above-mentioned learning method will be systematically described with reference to the flowchart shown in FIG.

【００３７】図２に示す損失関数計算手段１８では、ま
ず、学習データの番号を示すパラメータｐを、ｐ＝１に
設定し（ステップＳＴ１）、学習データ｛Ｉ_i ^p（ｉ＝
１〜Ｌ），ｙ_k ^p（ｋ＝１〜Ｎ），Ｔ_p｝を読込む（ス
テップＳＴ２）。次いで、（１）〜（６）式により、ニ
ューラルネットの出力ベクトルＯ_k ^pを出力させる（ス
テップＳＴ３）。[0037] In the loss function calculating unit 18 shown in FIG. 2, first, the parameter p indicating the number of learning data is set to p = 1 (step ST1), the training data {I _i ^p _(i =
1 to L), y _k ^p (k = 1 to N), T _p } are read (step ST2). Then, (1) to (6), to output the output vector O _k ^p of the neural network (step ST3).

【００３８】そして、これと同時に入力された教師信号
が模範教師か反面教師かを判定し（ステップＳＴ４）、
模範教師ならば（ステップＳＴ４で「Ｔ_p＝＋」）、
（１）〜（６），（８），（１０）式を用いて、At the same time, it is determined whether the input teacher signal is a model teacher or a teacher (step ST4).
If it is a model teacher (“T _p = +” in step ST4),
Using equations (1) to (6), (8), and (10),

【数５】を計算してメモリに記憶し（ステップＳＴ５）、反面教
師ならば（ステップＳＴ４で「Ｔ_p＝−」）、（１）〜
（６），（９），（１０）式を用いて、(Equation 5) Is calculated and stored in the memory (step ST5). On the other hand, if it is a teacher (“T _p = −” in step ST4), (1) to
Using equations (6), (9), and (10),

【数６】を計算してメモリに記憶する（ステップＳＴ６）。(Equation 6) Is calculated and stored in the memory (step ST6).

【００３９】その後、学習データの番号ｐがデータ個数
Ｐに等しいかどうかが判定され（ステップＳＴ７）、ｐ
≠Ｐならば（ステップＳＴ７でＮＯ）、ｐに１を加え
（ステップＳＴ８）、ステップＳＴ２からの動作を繰り
返す。一方、ｐ＝Ｐとなると（ステップＳＴ７でＹＥ
Ｓ）、ステップＳＴ５，ＳＴ６で計算、記憶された結果
に基づいて（１１），（１２）式によって結合係数と閾
値を更新する（ステップＳＴ９）。Thereafter, it is determined whether or not the learning data number p is equal to the data number P (step ST7).
If ≠ P (NO in step ST7), 1 is added to p (step ST8), and the operation from step ST2 is repeated. On the other hand, when p = P (YE in step ST7)
S), based on the results calculated and stored in steps ST5 and ST6, the coupling coefficient and the threshold are updated by equations (11) and (12) (step ST9).

【００４０】そして、学習終了条件が満足すれば学習は
終了し（ステップＳＴ１０でＹＥＳ）、満足されなけれ
ば（ステップＳＴ１０でＮＯ）、ステップＳＴ１からの
動作を繰り返す。こうして、学習が行なわれるのであ
る。If the learning end condition is satisfied, the learning ends (YES in step ST10). If the learning end condition is not satisfied (NO in step ST10), the operation from step ST1 is repeated. Thus, learning is performed.

【００４１】図５は、本実施例に従って発明者が実際に
学習を行なった結果を示すグラフである。FIG. 5 is a graph showing the results of actual learning performed by the inventor according to the present embodiment.

【００４２】そして、設定したパラメータの値は、次の
とおりである。The values of the set parameters are as follows.

【００４３】Ｌ＝１，Ｍ’＝５，Ｎ＝１，γ_-＝１ β＝０．０５，η＝１０ α＝０．５，Ｅ_L＝１０^-5 また、結合係数としきい値の初期値は次のとおりであ
る。[0043] L = 1, M '= 5 , N = 1, γ - = 1 β = 0.05, η = 10 α = 0.5, E L = 10 -5 , the initial coupling coefficient and the threshold The values are as follows:

【００４４】 θ^H ₁（０）＝０ｗ^HI ₁₁（０）＝１０ θ^H ₂（０）＝＋２．５ｗ^HI ₂₁（０）＝１０ θ^H ₃（０）＝＋５ｗ^HI ₃₁（０）＝１０ θ^H ₄（０）＝＋７．５ｗ^HI ₄₁（０）＝１０ θ^H ₅（０）＝＋１０ｗ^HI ₅₁（０）＝１０ θ^O ₁（０）＝０ｗ^OH ₁₁（０）＝０ｗ^OH ₁₂（０）＝０ｗ^OH ₁₃（０）＝０ｗ^OH ₁₄（０）＝０ｗ^OH ₁₅（０）＝０そして、図５において、○印で示される５個の模範教師
によって学習させた結果が点線の曲線であり、○印で示
される５個の模範教師と×印で示される２個の反面教師
の合計７個の教師信号によって学習を行なった結果が実
線の曲線である。Θ ^H ₁ (0) = 0 w ^HI ₁₁ (0) = 10 θ ^H ₂ (0) = + 2.5 w ^HI ₂₁ (0) = 10 θ ^H ₃ (0) = + 5 w ^HI ₃₁ (0 ) = 10 θ ^H ₄ (0) = + 7.5 w ^HI ₄₁ (0) = 10 θ ^H ₅ (0) = + 10 w ^HI ₅₁ (0) = 10 θ ^O ₁ (0) = 0 w ^OH ₁₁ (0) ) = 0 w ^OH ₁₂ (0) = 0 w ^OH ₁₃ (0) = 0 w ^OH ₁₄ (0) = 0 w ^OH ₁₅ (0) = 0 And, in FIG. The result trained by the teacher is a dotted curve, and the result of learning using a total of seven teacher signals of five exemplary teachers indicated by ○ and two opposite teachers indicated by × is a solid line. It is a curve.

【００４５】同図から明らかなように、点線で示される
曲線は模範教師を通っているが反面教師から遠ざかろう
としていないのに対し、実線で示される曲線は模範教師
を通るとともに、反面教師から遠ざかっている。As is clear from the figure, the curve shown by the dotted line passes through the model teacher but does not attempt to move away from the teacher, whereas the curve shown by the solid line passes through the model teacher, and It is moving away.

【００４６】このようにして、本実施例では、教師信号
が模範教師である場合には教師信号と出力ベクトルとの
差が減少するように学習を行ない、教師信号が反面教師
である場合には教師信号と出力ベクトルとの差が増加す
るように学習を行なっている。As described above, in the present embodiment, when the teacher signal is the model teacher, learning is performed so as to reduce the difference between the teacher signal and the output vector. On the other hand, when the teacher signal is the teacher, Learning is performed such that the difference between the teacher signal and the output vector increases.

【００４７】従って、学習の柔軟性、作業性が向上し、
学習条件に応じた学習、及び学習機械の特性に合った無
理のない学習が行なえる。Therefore, learning flexibility and workability are improved,
Learning according to learning conditions and reasonable learning suited to the characteristics of the learning machine can be performed.

【００４８】次に、本実施例の変形例を図６〜図８を用
いて説明する。Next, a modification of this embodiment will be described with reference to FIGS.

【００４９】図６は前述した（９）式に示される反面教
師項ｒ_-を、次の（１６）式で置き換えた例である。[0049] Figure 6 is contrary teacher term r shown in the aforementioned equation (9) _- a, an example was replaced with the following equation (16).

【００５０】[0050]

【数７】この例では、学習初期における学習速度は速いが、学習
の方向が限定されるという特徴がある。(Equation 7) In this example, the learning speed is high in the initial stage of learning, but the feature is that the direction of learning is limited.

【００５１】図７は（９）式に示される反面教師項ｒ_-
を、次の（１７）式で置き換えた例である。FIG. 7 shows the teaching term r _{− in the} equation (9).
Is replaced by the following equation (17).

【００５２】[0052]

【数８】この例では、Ｒ＝０において微係数が発散するため、出
力ベクトルと反面教師との反発が強力であり、学習によ
って出力ベクトルが全ての反面教師から十分に隔絶して
いるようなデータ変換を実現できる。しかしながら、安
定性はあまり良くない。反発の強さはｎの値を変えるこ
とによって調整できる。また、損失関数がＲに対して代
数関数的に減衰するので、反面教師の影響は遠距離に及
ぶ。(Equation 8) In this example, since the derivative diverges at R = 0, the repulsion between the output vector and the other side teacher is strong, and data conversion is realized such that the output vector is sufficiently isolated from all the other side teachers by learning. it can. However, the stability is not very good. The rebound strength can be adjusted by changing the value of n. Further, since the loss function attenuates algebraically with respect to R, the influence of the teacher extends over a long distance.

【００５３】図８は（９）式に示される反面教師項ｒ_-
を、次の（１８）式で置き換えた例である。FIG. 8 shows the teaching term r _{− in the} equation (9).
Is replaced by the following equation (18).

【００５４】[0054]

【数９】この例では前述した（１７）式の場合とほぼ同様に、す
べての反面教師から離れるように学習が行なわれるが安
定性が悪いという特徴を有する。(Equation 9) In this example, as in the case of equation (17), learning is performed so as to be away from all the teachers, but the stability is poor.

【００５５】また、この例では、ｒ_- ^pがＲ_pの増加と
ともに漸近的に０に近づくわけではないので、学習終了
条件に工夫が必要である。例えば、ｒ₊とｒ_-に対して
独立に学習の終了を判定することにし、ｒ₊に関しては
第１の実施例と同じ判定を行い、ｒ_-に関してはそれが
ある決められた値より小さくなることを要請し、これら
２つの条件の両方を満たすことを学習終了条件とすれば
よい。[0055] In addition, in this example, r _- because ^p is not asymptotically approaches 0 with an increase of R _p, it is necessary to devise the learning end conditions. For example, r ₊ and r _- and to determine the end of the independent of the learning with respect to r ₊ performs the same judgment as the first embodiment, r _- is smaller than the value determined with it with respect to And that both of these two conditions are satisfied is set as the learning end condition.

【００５６】図９は本発明の第２実施例の構成を示すブ
ロック図であり、この例では、損失関数計算手段１８が
反面教師による損失関数計算手段２０のみで構成されて
いる。つまり、反面教師の教師信号のみを取込んで、こ
の教師信号と出力ベクトルとの差が増加するように学習
を行なう。FIG. 9 is a block diagram showing the configuration of the second embodiment of the present invention. In this example, the loss function calculating means 18 is constituted only by the loss function calculating means 20 by the teacher. That is, only the teacher signal of the teacher is fetched, and learning is performed so that the difference between the teacher signal and the output vector increases.

【００５７】そして、この実施例での学習方法は、図４
に示したフローチャートで、ステップＳＴ４、及びＳＴ
５が省略された形となる。また、学習データとして反面
教師だけを受け付けるので、Ｔ_pは不用である。他は前
記した第１実施例と同様である。FIG. 4 shows a learning method according to this embodiment.
In the flowchart shown in FIG.
5 is omitted. On the other hand, since only the teacher is accepted as learning data, T _p is unnecessary. The other points are the same as in the first embodiment.

【００５８】図１０は、この実施例を用いて学習を行な
ったときの学習結果を示したグラフであり、学習のステ
ップ数ｔが、ｔ＝１，１０，１００，１０００，１１４
８７のときの入出力関数を実線で描いている。ｔ＝１１
４８７は学習が終了した時点である。FIG. 10 is a graph showing a learning result when learning is performed using this embodiment, and the number of learning steps t is t = 1, 10, 100, 1000, 114.
The input / output function at 87 is drawn with a solid line. t = 11
Reference numeral 487 indicates a point in time when learning is completed.

【００５９】この学習結果は、同じ入力値Ｉ₁に対応す
る反面教師が複数ある例であり、それら全ての反面教師
から離れているような学習結果が得られていることが分
かる。模範教師の場合は全ての教師信号が学習に強い影
響を与えるが、反面教師の場合は、学習に対して強い影
響を与える教師信号と殆ど影響を与えない教師信号との
違いが際立っている。This learning result is an example in which there are a plurality of teachers corresponding to the same input value I ₁ , and it can be seen that a learning result that is apart from all the teachers is obtained. In the case of the model teacher, all the teacher signals have a strong influence on the learning. On the other hand, in the case of the teacher, the difference between the teacher signal having a strong influence on the learning and the teacher signal having little influence is remarkable.

【００６０】次に、本発明に係わる第３実施例を説明す
る。Next, a third embodiment according to the present invention will be described.

【００６１】坂本，他：「情報量統計学」（共立，１９
８３）第３章によれば、Sakamoto, et al .: "Information Statistics" (Kyoritsu, 19
83) According to Chapter 3,

【数１０】はエントロピーと呼ばれ、モデルの近似の良さの規準と
密接な関係がある。ただし、上式において、ｐ_iは事象
ω_i（ｉ＝１，…，ｍ）が起こる確率でｐ_i≧０，ｐ_i
＋…＋ｐ_m＝１をみたす。換言すれば、ｐ_iは真の確率
分布である。また、ｑ_iは確率分布モデルである。エン
トロピーは、我々が想定したモデルｑ_iから得たｎ個の
実現値の分布が真の分布に一致する確率の対数のｌ／ｎ
にほぼ等しい。要するに、式（１９）を最大化するモデ
ルが良いモデルであると言うことができる。良いモデル
とは「未学習の入力ベクトルに対しても適切な出力ベク
トルを出力するという能力」即ち「汎化能力」が優れて
いるモデルのことである。限られた数の学習データを利
用して学習しなければならないときは、なるべく良いモ
デルを旨く選ぶ必要がある。ここで言う「モデル」と
は、学習機械において入出力の関係を学習する際の関数
系と制御パラメータの値とをひっくるめて考えたもので
ある。しかしここでｐ_iは教師信号と出力ベクトルとの
差の真の分布であると考える。(Equation 10) Is called entropy and is closely related to the criterion of goodness of model approximation. Here, in the above equation, p _i is a probability that an event ω _i (i = 1,..., M) will occur, and p _i ≧ 0, p _i
+ ... + p _m = 1. In other words, p _i is the true probability distribution. Q _i is a probability distribution model. The entropy is 1 / n of the logarithm of the probability that the distribution of n realizations obtained from the model q _i we assumed matches the true distribution.
Is approximately equal to In short, it can be said that the model that maximizes equation (19) is a good model. A good model is a model that has excellent "ability to output an appropriate output vector even for an unlearned input vector", that is, "generalization ability". When learning is required using a limited number of learning data, it is necessary to select a good model as well as possible. The “model” referred to here is a combination of a function system and a value of a control parameter when learning a relationship between input and output in a learning machine. However, here, p _i is considered to be the true distribution of the difference between the teacher signal and the output vector.

【００６２】式（１９）は、Equation (19) is

【数１１】と書き直せるが、第２項は真の分布だけに依存した量で
あるので、モデルの比較の際には考慮しなくても良い。
対数尤度を、[Equation 11] However, since the second term depends only on the true distribution, it does not need to be considered when comparing models.
Log likelihood

【数１２】と定義すれば、これは大数の法則によりｎ→∞のとき、
式（２０）の第１項のｎ倍の値に収束する。ただし、ｎ
_iは事象ω_iが起こった回数で、ｎ_i＋…＋ｎ_m＝ｎが
成立する。要するに、式（２１）の対数尤度ｌ（ｑ）の
大きいモデルほど、対応するエントロピーの値も大き
く、良いモデルということになる。また、ｌ（ｑ）を最
大にするモデルを最尤モデルと呼ぶ。最尤モデルは統計
学や工学の分野で一般的であり、よく使われる最小自乗
法という手法も最尤モデルの考え方から導かれるもので
ある。ここまでは、既知の技術である。(Equation 12) If this is defined as n → ∞ by the law of large numbers,
It converges to a value that is n times the first term in equation (20). Where n
_i is the number of times the event ω _i has occurred, and n _i +... + n _m = n holds. In short, a model with a larger log likelihood l (q) in equation (21) has a larger corresponding entropy value, and is a better model. A model that maximizes l (q) is called a maximum likelihood model. The maximum likelihood model is common in the fields of statistics and engineering, and a commonly used least squares method is also derived from the concept of the maximum likelihood model. So far, it is a known technique.

【００６３】次に、式（２１）を反面教師学習へ拡張す
る。ここからは本発明者の研究により明らかになったも
のである。式（２１）は事象ω_iが起こった回数だけを
利用していて、これは即ちω_iという模範教師のみを利
用した学習に相当している。もし、「事象ω_i以外のど
れかが起こった」という事実が発覚したとしても、従来
技術ではその事実を利用した学習ができない。そこで、
上記の「事象ω_i以外のどれかが起こった」ということ
を確率ｌ−ｐ_iの事象ととらえ、その確率分布モデルを
ｌ−ｑ_iとし、式（２１）を、Next, equation (21) is extended to teacher learning. This has been clarified by the study of the present inventors. Equation (21) uses only the number of occurrences of the event ω _i , which corresponds to learning using only the model teacher ω _i . Even if the fact that "any of the other events ω _i happened" is discovered, in the prior art can not be learned using that fact. Therefore,
The above-mentioned “something other than the event ω _i has occurred” is regarded as an event of probability l−p _i , the probability distribution model is l−q _i, and equation (21) is expressed as

【数１３】と書き直す。ただし、Ｎ_oは総観測回数，Ｎ_iは事象ω
_iに注目した回数であり、Ｎ_i＋…＋Ｎ_n＝Ｎ_oが成立
する。また、Ｎ_i−ｎ_iは「事象ω_iに注目したが、そ
れ以外の事象が起こった」回数である。要するに、毎回
どれか一つの事象に注目し、その事象が起こったか、或
いはそれ以外の事象が起こったかという情報に基づいて
モデルを決定しようという方針である。(Equation 13) Rewrite. However, N _o the total number of observations, N _i the event ω
a number of times with a focus on _{_{i, N i + ... + N}} n = N o is established. N _i −n _i is the number of times “an event ω _i has been noticed, but other events have occurred”. In short, the policy is to focus on any one event each time and determine a model based on information on whether that event has occurred or whether other events have occurred.

【００６４】さて、式（２２）において、ｌ_n（…）の
中のＮ_i／Ｎ_oはモデルに依存しない項なので簡略化の
ためにそれを落とし、新たに、In equation (22), N _i / N _o in l _n (...) is a model-independent term, and is dropped for simplicity.

【数１４】という量を定義する。ｌ_E（ｑ）は、モデルに依存しな
い定数項を除けば対数尤度に等しい。[Equation 14] Is defined. l _E (q) is equal to the log likelihood except for a model independent constant term.

【００６５】次に式（２３）を連続変数モデルの場合に
拡張し、学習における損失関数を導く。ｑ_iを密度分布
関数ｇ（ｘ｜Θ）に置き換え、そして、ｘ_i自身を内部
に含むｘ_iの近傍の領域をＳとし、Ｓにおいて積分変数
をｘとしてｇ（ｘ｜Θ）を積分した値をＧ（ｘ_i｜Θ）
とすると、Next, equation (23) is extended to the case of a continuous variable model to derive a loss function in learning. The q _i density distribution function g | replaced by (x theta), and the region near the x _i containing x _i themselves inside and S, g the integration variable as x in S | obtained by integrating the (x theta) The value is G (x _i | Θ)
Then

【数１５】となる。ただし、式（２４）を導く際に、「事象ｘが起
こるということはｘの近傍の領域Ｓに含まれる事象が起
こることとし、事象ｘ以外が起こるということはその領
域Ｓに含まれない外部の事象が起こることである」と定
義した。このように定義することによって初めて「事象
ｘが起こる」確率と「事象ｘ以外が起こる」確率を足し
合わせて１にすることが可能となり、式（２３）を拡張
した式（２４）を導くことができるようになる。領域Ｓ
の形状や大きさは、理論的に決定することはできず、状
況に依って妥当な値を定める必要がある。しかしなが
ら、例えば、領域Ｓを、ｘを中心とする超球や超立方体
と考えても妥当性を失わない。つまり、ｘが領域Ｓの全
ての点を代表するということが妥当であれば良い。ま
た、領域Ｓが閉領域である必要はない。(Equation 15) Becomes However, when deriving the equation (24), it is assumed that “event x occurs when an event included in the region S near x occurs, and that events other than event x occur when the event x occurs outside the region S. Event occurs. " By defining in this way, it is possible to add the probability of “event x occurs” and the probability of “other than event x” to be 1 for the first time, and to derive equation (24) which is an extension of equation (23) Will be able to Area S
The shape and size of can not be determined theoretically, and appropriate values need to be determined depending on the situation. However, for example, the validity is not lost even if the region S is considered as a hypersphere or a hypercube centered on x. That is, it suffices if it is appropriate that x represents all points in the area S. Further, the region S does not need to be a closed region.

【００６６】Θは密度分布関数を決定するパラメータの
集合を表し、学習機械の制御パラメータと、模範教師と
出力ベクトルとの差の分布を決めるパラメータと、を合
わせたものである。Θ represents a set of parameters for determining the density distribution function, and is a combination of the control parameters of the learning machine and the parameters for determining the distribution of the difference between the model teacher and the output vector.

【００６７】さて、式（２４）において回数ｎ_iやＮ_i
−ｎ_iを掛ける代わりにその回数だけ和を計算すること
にすれば、Now, in equation (24), the number of times n _i and N _i
If instead of multiplying by −n _{i the} sum is calculated that many times,

【数１６】と書き直せる。ただし、ｐは教師信号を識別するインデ
ックス，Σ_p:Dは模範教師に関する和，Σ_p:Uは反面教
師に関する和である。要するに、式（２５）のｌ
_E（Θ）を最大にするモデルか最尤モデルであり、最尤
モデルを与えるパラメータΘを学習によって求めること
ができればよい。(Equation 16) Can be rewritten. Here, p is an index for identifying a teacher signal, Σ _{p: D} is a sum relating to the model teacher, and Σ _{p: U} is a sum relating to the teacher. In short, l in equation (25)
_It is a model that maximizes _E (Θ) or a maximum likelihood model, and it is sufficient that the parameter 与える that gives the maximum likelihood model can be obtained by learning.

【００６８】[0068]

【課題を解決するための手段】ところが、本発明の学習
によれば、学習における損失関数を、ｒ＝−ｌE （Θ） …（２６）とおいて学習を行っていることになる為、領域Ｓの形状
や大きさ以外の全てのパラメータを自動的に最適に設定
することができる。即ち、学習によって最尤モデルを構
成することができ、最も汎化能力の優れた学習機械を得
ることができるわけである。However, according to the learning of the present invention, since the learning is performed with the loss function in the learning as r = -lE (Θ) (26), the area S It is possible to automatically and optimally set all parameters other than the shape and the size of. That is, a maximum likelihood model can be constructed by learning, and a learning machine with the best generalization ability can be obtained.

【００６９】ｐ番目の教師信号をｙ^p _k（ｋ＝１，…，
Ｎ）で表す。また、教師信号の種類をＴ_pで表し、ｐ番
目の教師信号が模範教師ならばＴ_p＝＋、反面教師なら
ばＴp ＝−とする（ここで、＋と−は教師信号の種類を
区別する単なる記号である）。従来の学習データは入力
ベクトルと教師信号との対で構成されていたが、本実施
例ではその２つにＴ_pを加え、合計３つで学習データを
構成するものとする。[0069] The p-th teacher signal ^{_{y p k (k = 1,}} ...,
N). In addition, the type of the teacher signal is represented by T _p , and if the p-th teacher signal is a model teacher, T _p = +, and if the _p -th teacher signal is a teacher, T _p = − (where + and − indicate the type of the teacher signal. Is just a sign). Conventional training data had been configured by a pair of the input vector and the teacher signal, the T _p was added to two of the present embodiment, it is assumed that constitute the learning data in a total of three.

【００７０】簡単の為Ｎ＝１とする。Ｎ＞１への拡張は
容易である。また、領域Ｓを、ｘ_p−Ｓ／２≦ｘ≦ｘ_p
＋Ｓ／２とする。ｓの値が領域Ｓ内のｇ（ｘ｜Θ）の変
動に比べて十分に小さく、Ｇ（ｘ_p｜Θ）＝ｓｇ（ｘ_p
｜Θ）という近似が成り立ち、さらにｓｇ（ｘ_p｜Θ）
が小さいときのln（１−ｓｇ（ｘ_p｜Θ））＝−ｓｇ
（ｘ_p｜Θ）という近似式を使えば、式（２５）は、For simplicity, it is assumed that N = 1. Extension to N> 1 is easy. Further, the area S is defined as x _p −S / 2 ≦ x ≦ x _p
+ S / 2. The value of s is sufficiently smaller than the variation of g (x | Θ) in the region S, and G (x _p | Θ) = sg (x _p
| Θ) holds, and sg (x _p | Θ)
Ln the case is small _{(1-sg (x p |} Θ)) = - sg
Using the approximate expression (x _p | Θ), expression (25) becomes

【数１７】となり、式（２７）を式（２６）に代入することによっ
て損失関数は、[Equation 17] By substituting equation (27) into equation (26), the loss function becomes

【数１８】となる。ただし、式（２７），（２８）においてモデル
のパラメータΘに依存しない定数項を無視した。ここ
で、パラメータΘは、式（１）〜式（４）における結合
係数ｗ^HI _ji，ｗ^OR _kj，閾値θ^H _j，θ^O _k及び後述する
式（３０）〜式（３３）におけるσなどのパラメータ全
てを要素として持つベクトルと考えてよい。(Equation 18) Becomes However, in Equations (27) and (28), a constant term independent of the model parameter Θ was ignored. Here, the parameter theta, coupling coefficient w ^HI _ji in equation (1) to (4), w ^OR _kj, threshold θ ^H _j, θ ^O _k and described below equation (30), such as σ in to Formula (33) May be considered as a vector having all of the parameters as elements.

【００７１】教師信号と出力ベクトルとの差の密度分布
関数ｇ（ｘ｜Θ）の決め方であるが、最初はｇ（ｘ｜
Θ）が、平均０，分散σ²の正規分布で表されると仮定
して学習を行う。ある程度学習が進んだ段階において
は、教師信号と出力データとの差の分布を知ることがで
きるので、それを参考にして、ｇ（ｘ｜Θ）の関数形を
設定し直して学習を続行させることもできる。The method of determining the density distribution function g (x | Θ) of the difference between the teacher signal and the output vector is as follows.
Learning is performed assuming that Θ) is represented by a normal distribution with mean 0 and variance σ ² . At a stage where learning has progressed to some extent, the distribution of the difference between the teacher signal and the output data can be known. With reference to this, the function form of g (x | Θ) is set again and learning is continued. You can also.

【００７２】ｇ（ｘ｜Θ）が、ｇ（ｘ｜Θ）＝（１／（２π）^1/2σ）exp ［−ｘ²／（２σ²）］ …（２９）という平均０，分散σ²の正規分布で表されるとする
と、ｘ_p＝Ｏ^p _l−ｙ^p _lを代入して、損失関数ｒは、
式（２８），（２９）より、ｒ＝ｒ₊＋ｒ_-， …（３０）G (x | Θ) is expressed as g (x | Θ) = (1 / (2π) ^1/2 σ) exp [−x ² / (2σ ² )] (29) When expressed in ² of the normal distribution, by substituting _{^{_{^{x p = O p l -y p}}}} l, loss function r is
From equations (28) and (29), r = r ₊ + r ₋ , (30)

【数１９】と導かれる。ｒ₊は、損失関数ｒにおける模範教師項で
あり、ｒ_-は反面教師項である。ｒ₊ ^pは、ｒ₊におけ
るｐ番目の教師信号の寄与であり、ｒ_- ^pは、ｒ_-にお
けるｐ番目の教師信号の寄与である。Ｒ_pはｐ番目の教
師信号と出力ベクトルとの距離を表す。教師信号には予
めそれが模範教師であるか反面教師であるかを表す印
（Ｔ_p）が付けられているものとする。式（３０）〜式
（３３）を見ると、結果的にｓは損失関数に寄与する模
範教師と反面教師との比率を調整するパラメータである
ことが分かる。[Equation 19] It is led. r ₊ is an exemplary teacher term in the loss function r, and r ₋ is a reverse teacher term. r ₊ ^p is a contribution of the p-th teacher signal in r _+, r _- ^p is r _- is the contribution of the p-th teacher signal in. R _p represents the distance between the p-th teacher signal and the output vector. It is assumed that the teacher signal is marked in advance with a mark (T _p ) indicating whether it is a model teacher or a teacher. As can be seen from Expressions (30) to (33), s is a parameter that adjusts the ratio of the model teacher and the teacher on the other hand that contribute to the loss function.

【００７３】Ｎ＞１の場合の式（２７）〜式（３３）に
対応する方程式は、ｇ（ｘ｜Θ）にＮ次元正規分布を代
入すれば求められる。Equations corresponding to equations (27) to (33) in the case of N> 1 can be obtained by substituting an N-dimensional normal distribution for g (x | Θ).

【００７４】本実施例においては、模範教師成分だけと
か、反面教師成分だけで構成される教師信号だけを考慮
した定式化を行っている。しかしながら、教師信号をＮ
次元のベクトルとして考えたとき、ベクトルの成分ごと
に模範教師であったり反面教師であったりする場合も上
述の議論を単純に拡張することによって実現できる。つ
まり、１個の教師の中に模範教師成分と反面教師成分が
混在していることもあるとしてもよい。まず、Ｎ次元の
出力ベクトルが２つの空間の２つの独立なベクトルに分
割され得るときを考える。空間ＡにおけるＮ_A次元のベ
クトルと空間ＢにおけるＮ_B次元のベクトルとに分割さ
れるとする。つまり、１つの教師信号が空間Ａでは模範
教師であり、空間Ｂでは反面教師であるという場合を含
めて考える。以下、損失関数の作り方を具体的に説明す
る。In the present embodiment, the formulation is performed in consideration of only the teacher signal composed of only the model teacher component or only the teacher signal. However, if the teacher signal is N
When considered as a dimensional vector, the case of being a model teacher or a teacher on the basis of each vector component can also be realized by simply extending the above discussion. That is, the model teacher component and the reverse teacher component may be mixed in one teacher. First, consider when an N-dimensional output vector can be divided into two independent vectors in two spaces. And it is divided into a N _B dimensional vectors in N _A dimensional vector and the space B in the space A. In other words, the case where one teacher signal is a model teacher in the space A and a teacher in the space B is considered. Hereinafter, how to create a loss function will be specifically described.

【００７５】Ｎ_A＋Ｎ_B＝Ｎである。そして、空間Ａに
おける教師信号と上記出力ベクトルとの差をベクトルｘ
_Aとして、ｘ_Aの密度分布関数がｇ_A（ｘ_A｜Θ_A）で
あると仮定され、空間Ｂにおける教師信号と上記出力ベ
クトルとの差をベクトルｘ_Bとして、ｘ_Bの密度分布関
数がｇ_B（ｘ_B｜Θ_B）であると仮定されるときに、Σ
_p:DDを空間Ａと空間Ｂの両方で模範教師である教師信号
に関する和、Σ_p:DUを空間Ａでは模範教師であり空間Ｂ
では反面教師である教師信号に関する和、Σ_p: _UDを空間
Ａでは反面教師であり空間Ｂでは模範教師である教師信
号に関する和、Σ_p:UUを空間Ａと空間Ｂの両方で反面教
師である教師信号に関する和、Ｇ_A（ｘ_A ^p｜Θ_A）を
ｘ_A ^pを内部に含むｘ_A ^pの近傍の領域Ｓ_Aにおいて積
分変数をｘ_Aとしてｇ_A（ｘ_A｜Θ_A）を積分した値、
Ｇ_B（ｘ_B ^p｜Θ_B）をｘ_B ^pを内部に含むｘ_B ^pの近
傍の領域Ｓ_Bにおいて積分変数をｘ_Bとしてｇ_B（ｘ_B
^p｜Θ_B）を積分した値として、損失関数ｒをN _A + N _B = N. Then, the difference between the teacher signal in the space A and the output vector is represented by a vector x.
As _A, the density distribution function of x _A is g _A | is assumed to be (x _{_A} Θ _A), the difference between the teacher signal and the output vector in space B as a vector x _B, the density distribution function of x _B g _B (x _B | Θ _B ),
_{p: DD} is the sum of teacher signals that are model teachers in both space A and space B, Σ _{p: DU} is the model teacher in space A and space B
では_p: _UD is the sum of the teacher signal in space A and the teacher signal in space B. Σ _{p: UU} is the sum of the teacher signal in space A and Ａ _{p: UU} is the sum of the teacher signal in space A and space B. sum for a certain teaching _{_{^{signal, G a (x a p |}}} Θ a) a x _a ^p g to the integration variable in the region S _a in the vicinity of x _a ^p containing therein as x _a _a | a (x _{_a} Θ _a) Integrated value,
_{_{^{G B (x B p | Θ}}} B) g the integration variable as x _B in the region S _B in the vicinity of x _B ^p containing x _B ^p within _B (x _B
^p | Θ _B ), the loss function r

【数２０】と構成すればよい。模範教師と反面教師が独立に入れ替
わる部分が３つ以上の場合も同様に拡張できる。(Equation 20) What is necessary is just to comprise. The same applies to a case where there are three or more parts where the model teacher and the teacher are independently replaced.

【００７６】学習は、パラメータをIn the learning, the parameters are

【数２１】という式にしたがい変化させることによって、各学習デ
ータ毎に行う。この手法は「確率的降下法」と呼ばれて
いる。式（３５）にしたがってθの値を更新することに
よって損失関数ｒを減少させていくことができる。式
（３５）で得られるθが式（３０）〜式（３３）の損失
関数ｒを平均的に最小化することは、確率的降下法の理
論（Amari,S.:IEEE Trans.EC-16,279(1967））によって
公知である。ただし、ここでθは式（１）〜式（４）に
おける結合係数ｗ^HI _ji，ｗ^OH _kjまたは閾値θ^H _j，θ^O
_k及び式（３０）〜式（３３）におけるσなどのパラメ
ータを表す。また、Δθ（ｔ）は反復計算による学習に
おいて、パラメータの微調整がｔ回目のときのθの更新
量である。ηは学習係数であり、αは慣性項である。(Equation 21) This is performed for each learning data by changing according to the equation. This method is called “stochastic descent method”. By updating the value of θ according to the equation (35), the loss function r can be reduced. The fact that θ obtained by equation (35) minimizes the loss function r of equations (30) to (33) on average is based on the theory of the stochastic descent method (Amari, S .: IEEE Trans. EC-16,279). (1967)). Here, θ is a coupling coefficient w ^HI _ji , w ^OH _kj or a threshold value θ ^H _j , θ ^{O in} equations (1) to (4).
_k and parameters such as σ in equations (30) to (33). Δθ (t) is an update amount of θ when the parameter is finely adjusted t times in learning by iterative calculation. η is a learning coefficient, and α is an inertia term.

【００７７】学習はｒがある決められた微小量より小さ
くなるまで続けるが、時に、ニューラルネットワークの
非線形性やモデルの能力不足などのために学習不可能で
あったり、実用的な長さの時間内にはｒが十分に小さく
ならないことがある。そのため、通常は繰り返しの最大
回数を予め決めておいて、その最大回数に達した時点で
学習を終了させ、結合係数の初期値やその他のパラメー
タを変更して何回か学習を試行する。場合によってはモ
デルを検討しなおすこともある。The learning is continued until r becomes smaller than a predetermined small amount, but sometimes learning is impossible due to the nonlinearity of the neural network and the lack of model capability, or the time is of a practical length. In some cases, r may not be sufficiently small. Therefore, usually, the maximum number of repetitions is determined in advance, and when the maximum number is reached, the learning is terminated, and the learning is repeated several times by changing the initial value of the coupling coefficient and other parameters. In some cases, the model may be reviewed.

【００７８】即ち、本実施例における学習機械の学習法
をＮ＝１の場合に図１１のフローチャートを用いて説明
すると、まず、学習データの通し番号を表すパラメータ
をｐ＝１にセットし（ステップＳ２１）、学習データ
｛Ｉ_i ^p（ｉ＝１，…，Ｌ），ｙ^p _l，Ｔ_p｝を読み込
み（ステップＳ２２）、続いて、式（１）〜式（５）に
よりニューラルネットの出力ベクトルＯ_l ^pを出力させ
た（ステップＳ２３）後、教師信号が模範教師か、或い
は反面教師かを判定し（ステップＳ２４）、模範教師な
らば式（１）〜式（５），（３１），（３３），（３
５）でΔθ（ｔ）を計算してパラメータの値を更新し
（ステップＳ２５）、反面教師ならば式（１）〜式
（５），（３２），（３３），（３５）でΔθ（ｔ）を
計算してパラメータの値を更新する（ステップＳ２
６）。次に、ｐ＝Ｐであるかどうかを判定し（ステップ
Ｓ２７）、ｐ≠Ｐならば、ｐに１を加え（ステップＳ２
８）ステップＳ２２に戻るが、ｐ＝Ｐならば、ステップ
Ｓ２９へ進む。そして学習終了条件が満たされれば、学
習を終了し、満たされなければステップＳ２１に戻る
（ステップ２９）。なお、ステップ２４において、常に
Ｔ_p＝＋である場合（全てが模範教師である場合）は従
来技術に対応する。That is, the learning method of the learning machine in this embodiment will be described with reference to the flowchart of FIG. 11 when N = 1. First, a parameter indicating the serial number of learning data is set to p = 1 (step S21). ), training data _{^{{I i p (i = 1}} , ..., L), y p l, T p} reads (step S22), and subsequently, of the neural network by the formula (1) to (5) the output vector After outputting O _l ^p (step S23), it is determined whether the teacher signal is a model teacher or a converse teacher (step S24). If the teacher signal is a model teacher, equations (1) to (5), (31), (33), (3
The parameter value is updated by calculating Δθ (t) in step 5) (step S25). On the other hand, if the teacher is a teacher, Δθ (t) is calculated using equations (1) to (5), (32), (33), and (35). t) is calculated and the value of the parameter is updated (step S2)
6). Next, it is determined whether or not p = P (step S27). If p ≠ P, 1 is added to p (step S2).
8) Return to step S22, but if p = P, proceed to step S29. If the learning end condition is satisfied, the learning ends, and if not, the process returns to step S21 (step 29). In step 24, the case where T _p = + (when all are model teachers) corresponds to the conventional technique.

【００７９】以上説明したように本発明によれば、反面
教師を利用した学習における最適な損失関数を理論的根
拠に基づいて選択することができ、最も汎化能力の優れ
た学習ができる。As described above, according to the present invention, on the other hand, an optimal loss function in learning using a teacher can be selected based on theoretical grounds, and learning with the best generalization ability can be performed.

【００８０】次に、本発明の第４実施例を説明する。実
施例について、図面を参照して説明する。本実施例は第
３実施例において、確率密度関数ｇ（ｘ｜Θ）を一般化
ガウス分布関数に置き換えて、また、学習を２段階に分
けて行うものである。第３実施例と同様な箇所は説明を
省略する。Next, a fourth embodiment of the present invention will be described. Embodiments will be described with reference to the drawings. This embodiment is different from the third embodiment in that the probability density function g (x | Θ) is replaced with a generalized Gaussian distribution function, and learning is performed in two stages. The description of the same parts as in the third embodiment is omitted.

【００８１】ｇ（ｘ｜Θ）を、G (x | Θ) is

【数２２】という一般化ガウス分布関数であるとする。ただし、Γ
（ｘ）はガンマ関数、｜…｜は絶対値である。正規分布
に比べてパラメータｂが新たに追加され、制御パラメー
タの個数が１個増えた。ｂ＝２の場合は平均０，分散σ
²の正規分布に対応する。このように、制御パラメータ
を１個増やすことにより、正規分布から少し歪んだ分布
の場合に対処できる。ｂ＝１，ｂ＝２，ｂ＝１０，ｂ＝
∞のそれぞれの場合のｇ（ｘ｜Θ）のグラフの形状を図
１２に示した。ただし、各グラフは最大値が等しくなる
ように描いてある。(Equation 22) Is a generalized Gaussian distribution function. However, Γ
(X) is a gamma function, and | ... | are absolute values. The parameter b is newly added compared to the normal distribution, and the number of control parameters is increased by one. If b = 2, mean 0, variance σ
It corresponds to a normal distribution of ² . As described above, by increasing the number of control parameters by one, it is possible to cope with a distribution slightly distorted from the normal distribution. b = 1, b = 2, b = 10, b =
FIG. 12 shows the shape of the graph of g (x | Θ) in each case of ∞. However, each graph is drawn so that the maximum values are equal.

【００８２】本実施例では式（３６）を使うことによっ
て、第３実施例のときよりもさらに汎化能力の高い学習
を行える。ただし、制御パラメータの個数が多く、処理
が煩雑になり、より多くの個数の学習データが必要にな
る。したがって、学習する対象の状況に応じてｇ（ｘ｜
Θ）を適宜選択する。In this embodiment, by using the equation (36), learning with higher generalization ability than in the third embodiment can be performed. However, the number of control parameters is large, the processing becomes complicated, and a larger number of learning data is required. Therefore, g (x |
Select ii) as appropriate.

【００８３】損失関数は、The loss function is

【数２３】で与えられるので、損失関数ｒは、ｘ_p＝Ｏ₁ ^p−ｙ^p
₁を式（３６）に代入して、さらに式（３６）を式（３
７）に代入することにより次式のように求められる。(Equation 23) Since given by the loss function r _{_{^{is, x p = O 1 p -y}}} p
₁ is substituted into Expression (36), and Expression (36) is further substituted into Expression (3).
By substituting into 7), it can be obtained as in the following equation.

【００８４】ｒ＝ｒ₊＋ｒ_-， …（３８）R = r ₊ + r ₋ , (38)

【数２４】と導かれる。(Equation 24) It is led.

【００８５】Ｎ＞１の場合の式（３６）〜式（４１）に
対応する方程式は、ｇ（ｘ｜Θ）にＮ次元の一般化ガウ
ス分布を代入すれば求められる。Equations corresponding to equations (36) to (41) when N> 1 can be obtained by substituting an N-dimensional generalized Gaussian distribution for g (x | Θ).

【００８６】学習は、制御パラメータをIn the learning, the control parameters are

【数２５】という式にしたがい変化させることによって、各学習デ
ータ毎に行う。ただし、ここでθは式（１）−（４）に
おける結合係数ｗ^HI _ji，ｗ^OH _kjまたは閾値θ^H _j，θ^O
_k及び式（３８）〜式（４１）におけるσやｂなどの制
御パラメータを表す。第１段階の学習においては、ｂは
ｂ＝２に固定しておく。(Equation 25) This is performed for each learning data by changing according to the equation. Here, θ is a coupling coefficient w ^HI _ji , w ^OH _kj or a threshold θ ^H _j , θ ^{O in} equations (1) to (4).
_k and control parameters such as σ and b in equations (38) to (41). In the learning of the first stage, b is fixed to b = 2.

【００８７】第１段階の学習はｒがある決められた微小
量Ｅ_L1より小さくなるまで続ける。ｒ＜Ｅ_L1となった時
点で第２段階の学習が開始される。第２段階では、第１
段階において固定されていたｂも他の制御パラメータと
同様に式（４２）にしたがい学習させる。そして、、第
２段階の学習はｒ＜Ｅ_L2となるまで続ける。The learning at the first stage is continued until r becomes smaller than a predetermined minute amount E _L1 . When r <E _L1 , the second stage learning is started. In the second stage, the first
The b fixed in the stage is also learned according to the equation (42), like other control parameters. And learning ,, the second stage will continue until the r <E _L2.

【００８８】最急降下法による学習は経験的にｂ＝２の
場合が学習が高速であることが知られている。ところ
が、作用の欄で説明したように、汎化能力の高い学習を
行うためには真の分布に近い分布を用いて制御パラメー
タを決定しなければならない。また、結合係数や閾値と
違い、σやｂはそれらを学習する為の計算が複雑であ
り、時間がかかる。したがって、このように２段階に分
けて学習することによって、高速で且つ汎化能力を高め
る効果が期待できるのである。状況によっては、第２段
階において数回のｂの調整を行うだけにとどめ、再びｂ
を固定して学習することも考えられる。It is empirically known that learning by the steepest descent method is faster when b = 2. However, as described in the section of the operation, in order to perform learning with high generalization ability, it is necessary to determine a control parameter using a distribution close to a true distribution. Also, unlike the coupling coefficient and the threshold value, the calculation for learning σ and b is complicated and time-consuming. Therefore, by learning in two stages in this way, it is possible to expect an effect of increasing the speed and the generalization ability. In some situations, only a few adjustments of b in the second stage, and again b
It is also conceivable to learn by fixing.

【００８９】即ち、本実施例における学習機械の学習法
をＮ＝１の場合に図１３のフローチャートを用いて説明
すると、まず、学習の段階を表す変数phase をphase ＝
１に、パラメータｂをｂ＝２にセットし（ステップＳ３
１）、ステップＳ３２に進む。そして、学習データの通
し番号を表すパラメータをｐ＝１にセットし（ステップ
Ｓ３２）、学習データ｛Ｉ_i ^p（ｉ＝１，…，Ｌ），ｙ
^p _l，Ｔ_p｝を読み込み（ステップＳ３３）、続いて、
式（１）−（４）によりニューラルネットの出力ベクト
ルＯ^pを出力させた（ステップＳ３４）後、教師信号が
模範教師か、或いは反面教師かを判定し（ステップＳ３
５）、模範教師ならば式（１）−（４），（３９），
（４１），（４２）でΔθ（ｔ）を計算して制御パラメ
ータの値を更新するが、もしphase ＝２ならばｂも他の
制御パラメータと同様に更新する（ステップＳ３６）。
そしてステップＳ３８に進む。ステップＳ３５において
教師信号が反面教師ならば式（１）−（４），（４０）
〜（４２）でΔθ（ｔ）を計算して制御パラメータの値
を更新するが、もし、phase ＝２ならばｂも他の制御パ
ラメータと同様に更新する（ステップＳ３７）。そし
て、ステップＳ３８に進む。ステップＳ３８において
は、ｐ＝Ｐであるかどうかを判定し、ｐ≠Ｐならば、ｐ
に１を加え（ステップＳ３９）ステップＳ３３に戻る
が、ｐ＝Ｐならば、ステップＳ４０へ進む。そして、ｒ
＜Ｅ_L1ならば、ステップＳ４１に進み、そうでなければ
ステップＳ３２に戻る（ステップＳ４０）。ステップＳ
４１では、phase をphase ＝２にセットし、ステップＳ
４２に進む。そして、ｒ＜Ｅ_L2が満たされれば学習を終
了し、満たされなければステップＳ３２に戻る（ステッ
プＳ４２）。That is, the learning method of the learning machine in the present embodiment will be described with reference to the flow chart of FIG. 13 in the case of N = 1.
1, the parameter b is set to b = 2 (step S3).
1), and proceed to step S32. Then, set the parameters representing the serial number of the learning data to p = 1 (step S32), the training data _{^{{I i p (i = 1}} , ..., L), y
^p _l , T _p } are read (step S33).
Equation (1) - (4) after been output an output vector O ^p of the neural network (step S34), a teacher signal or model teacher, or to determine other hand teacher (step S3
5) For model teachers, equations (1)-(4), (39),
(41) and (42) calculate Δθ (t) to update the value of the control parameter. If phase = 2, b is updated in the same manner as the other control parameters (step S36).
Then, the process proceeds to step S38. In step S35, if the teacher signal is a teacher, equations (1)-(4), (40)
In step (42), Δθ (t) is calculated to update the value of the control parameter. If phase = 2, b is updated in the same manner as the other control parameters (step S37). Then, the process proceeds to step S38. In step S38, it is determined whether or not p = P.
(Step S39), the process returns to step S33, but if p = P, the process proceeds to step S40. And r
If <E _L1 , the process proceeds to step S41; otherwise, the process returns to step S32 (step S40). Step S
At 41, the phase is set to phase = 2, and step S
Proceed to 42. If r <E _L2 is satisfied, the learning ends, and if not, the process returns to step S32 (step S42).

【００９０】次に、本発明の第５実施例について、図面
を参照して説明する。本実施例は第１実施例において、
教師信号として出力ベクトルが集まるべき領域と忌避す
べき領域との境界（３次元の空間に対しては境界は平面
である）を指定する教師信号である境界教師を利用して
学習を行なうように拡張したものである。第１実施例と
同様な箇所は説明を省略する。Next, a fifth embodiment of the present invention will be described with reference to the drawings. This embodiment is different from the first embodiment in that
Learning is performed by using a boundary teacher, which is a teacher signal that specifies a boundary between a region where output vectors are to be collected and a region to be avoided as a teacher signal (the boundary is a plane in a three-dimensional space). It is an extension. The description of the same parts as in the first embodiment is omitted.

【００９１】図１４は第５実施例に係る学習機械の概念
図である。この概念図は図１において境界教師による損
失関数計算手段２１を付け加えたものである。即ち、損
失関数計算手段８は模範教師による損失関数計算手段
９、反面教師による損失関数計算手段１０及び境界教師
による損失関数計算手段２１とから成る。また、境界教
師が与えられたときには、境界教師による損失関数計算
手段２１によって損失関数及び微係数の計算が行なわれ
る。FIG. 14 is a conceptual diagram of a learning machine according to the fifth embodiment. This conceptual diagram is obtained by adding the loss function calculating means 21 by the boundary teacher in FIG. That is, the loss function calculating means 8 comprises a loss function calculating means 9 by the model teacher, a loss function calculating means 10 by the teacher, and a loss function calculating means 21 by the boundary teacher. When a boundary teacher is given, the loss function and differential coefficients are calculated by the loss function calculator 21 by the boundary teacher.

【００９２】次に、本実施例を３層フィールドフォワー
ド・ニューラルネットワークのバックプロバゲーション
学習に適用した場合について説明する。Next, a case where the present embodiment is applied to back propagation learning of a three-layer field forward neural network will be described.

【００９３】図１５はその概念図である。この学習機械
は図２において、境界教師による損失関数計算手段３１
を付け加えたものである。即ち、損失関数計算手段１８
は模範教師による損失関数計算手段１９、反面教師によ
る損失関数計算手段２０及び境界教師による損失関数計
算手段３１とから成る。FIG. 15 is a conceptual diagram thereof. This learning machine is shown in FIG.
Is added. That is, the loss function calculating means 18
Is composed of loss function calculating means 19 by the model teacher, loss function calculating means 20 by the teacher, and loss function calculating means 31 by the boundary teacher.

【００９４】ｐ番目の教師信号をｙ^p _k，ａ^p _k（ｋ＝
１，…，Ｎ）で表す。また、教師信号の種類をＴ_pで表
し、ｐ番目の教師信号が模範教師ならばＴ_p＝＋，反面
教師ならばＴ_p＝−、境界教師ならばＴ_p＝＞とする
（ここで、「＋」と「−」と「＞」は教師信号の種類を
区別する単なる記号である）。ａ^p _kは、Ｔ_p＝＞のと
きに意味を持ち、Ｔ_p≠＞のときは意味を持たないダミ
ー信号になる。また、従来の教師信号は出力データと同
じ次元を持つ１つのベクトルで表されていたが、本実施
例ではｙ_pとａ_pの２つのベクトルの粗をもって教師信
号とする。[0094] The p-th teacher signal ^{_{^{_{y p k, a p k (}}}} k =
1,..., N). Also, the type of the teacher signal is represented by T _p, where T _p = + if the p-th teacher signal is a model teacher, T _p = − if it is a teacher, and T _p => if it is a boundary teacher (where, “+”, “−”, And “>” are simply symbols for distinguishing the type of the teacher signal). a ^p _k has a meaning at the time of T _p =>, it becomes a dummy signal that does not have a meaning in a T _p ≠>. Further, conventional teacher signal had been represented by a single vector with the same dimensions as the output data. In this embodiment, the teacher signal with the crude two vectors y _p and a _p.

【００９５】損失関数ｒを、（８）−（１０）式及び次
式を利用して、ｒ＝ｒ₊＋ｒ_-＋ｒ_>， …（４３）Using the equations (8)-(10) and the following equation, the loss function r is calculated as follows: r = r ₊ + r ₋ + r _> , (43)

【数２６】ｄ_p＝ａ^p・（Ｏ^p−ｙ^p）， …（４５）とおく、ｒ_>は境界教師項である。ｒ_> ^Pは、ｒ_>にお
けるｐ番目の教師信号の寄与である。ｄ_pはｐ番目の教
師信号（境界教師）と出力データとの差を表す。Σ_p:B
は全ての境界教師についての和を表す。教師信号には予
めそれが模範教師、反面教師、境界教師のうちどれであ
るかを表す印（Ｔ_p）が付けられているものとする。γ
_>は損失関数に寄与する境界教師の寄与を調整するパラ
メータであり、ａは境界教師が出力データに影響を及ぼ
す距離を調整するパラメータになっている。γ_-とγ_>
を調整することにより、３種類の教師のうち主として学
習に効いてくるものを選択することができる。βとａを
調整することによって、学習に際して出力データが反面
教師と境界教師から、それぞれどれだけ離れなければな
らないかを調整できる。(Equation 26) _{^{d p = a p · (O}} p -y p), put a ... (45), _r> is a boundary teacher section. r _> ^P is the contribution of the p-th teacher signal in r _> . d _p represents the difference between the p-th teacher signal (boundary teacher) and the output data. Σ _{p: B}
Represents the sum of all boundary teachers. It is assumed that the teacher signal has a mark (T _p ) indicating in advance which of the model teacher, the other side teacher, and the boundary teacher it is. γ
_> Is a parameter for adjusting the contribution of the boundary teacher contributing to the loss function, and a is a parameter for adjusting the distance at which the boundary teacher affects the output data. γ _- and γ _>
By adjusting, it is possible to select one of the three kinds of teachers that mainly works for learning. By adjusting β and a, it is possible to adjust how much the output data must be apart from the teacher and the boundary teacher during learning.

【００９６】（４４），（４５）式から、損失関数にお
ける境界教師項の関数系が、各学習データについてFrom equations (44) and (45), the function system of the boundary teacher term in the loss function is obtained for each learning data.

【外１】となっていることが分かる。図１６はこの関数系の特徴
を表すグラフである。[Outside 1] It turns out that it becomes. FIG. 16 is a graph showing the features of this function system.

【００９７】教師信号をＮ次元のベクトルとして考えた
とき、ベクトルの成分ごとに模範教師であったり、反面
教師であったり境界教師であったりする場合も（８）−
（１０），（４３）−（４５）式を少し変更するだけで
実現できる。つまり、１個の教師の中に模範教師成分と
反面教師成分と境界教師成分とが混在していることもあ
る。しかしながら、本実施例においては、模範教師成分
だけとか、反面教師成分だけとか、境界教師成分だけで
構成される教師信号だけを考慮した定式化を行なってい
る。When the teacher signal is considered as an N-dimensional vector, it is also possible for each component of the vector to be a model teacher, a reverse teacher, or a boundary teacher (8)-
(10), (43)-(45) can be realized by slightly changing the expressions. That is, the model teacher component, the reverse teacher component, and the boundary teacher component may be mixed in one teacher. However, in the present embodiment, the formulation is performed in consideration of only the teacher signal composed only of the model teacher component, only the teacher component, or only the boundary teacher component.

【００９８】学習は、結合係数を（１１）式及び次式In the learning, the coupling coefficient is calculated by the equation (11) and the following equation.

【数２７】のように変化させることによって行なう。[Equation 27] This is performed by changing as follows.

【００９９】次に、上述した学習方法を図１７に示した
フローチャートを用いて系統的に説明する。Next, the above-described learning method will be systematically described with reference to the flowchart shown in FIG.

【０１００】図１５に示す損失関数計算手段１８では、
先ず、学習データの通し番号を表すパラメータをｐ＝１
にセットし（ステップＳＴ５１）、学習データ｛Ｉ_i ^p
（ｉ＝１，…、Ｌ），ｙ^p _k，ａ_p ^k，（ｋ＝１，…，
Ｎ），Ｔ_p｝を読み込み（ステップＳＴ５２）、続い
て、（１）〜（６）式によりニューラルネットの出力ベ
クトルＯk p （ｋ＝１，…，Ｎ）を出力させた（ステッ
プＳＴ５３）後、教師信号の種類を判定し（ステップＳ
Ｔ５４）、模範教師ならば（１）〜（６），（８），
（１０）式でThe loss function calculating means 18 shown in FIG.
First, the parameter representing the serial number of the learning data is p = 1.
(Step ST51), and the learning data ｛I _i ^p
(I = 1, ..., L ), y p k, a p k, (k = 1, ...,
N), T _p } are read (step ST52), and the output vector Ok p (k = 1,..., N) of the neural network is output by the equations (1) to (6) (step ST53). , Determine the type of the teacher signal (step S
T54), if you are a model teacher, (1) to (6), (8),
In equation (10)

【外２】を計算してメモリに記憶し（ステップＳＴ５５）、反面
教師ならば（１）〜（６），（８），（１０）式で[Outside 2] Is calculated and stored in the memory (step ST55). On the other hand, if the teacher is a teacher, the equations (1) to (6), (8) and (10) are used.

【外３】を計算してメモリに記憶し（ステップＳＴ５６）、境界
教師ならば（１）〜（６），（４４），（４５）式で[Outside 3] Is calculated and stored in the memory (step ST56), and if it is a boundary teacher, the equations (1) to (6), (44) and (45) are used.

【外４】を計算してメモリに記憶する（ステップＳＴ５７）。[Outside 4] Is calculated and stored in the memory (step ST57).

【０１０１】次に、ｐ＝Ｐであるかどうかを判定し（ス
テップＳＴ５８）、ｐ≠Ｐならば、ｐに１を加え（ステ
ップＳＴ５９）ステップＳＴ５２に戻るが、ｐ＝Ｐなら
ば、ステップＳＴ５５，ＳＴ５６，ＳＴ５７で計算・記
憶された結果に基づいて（１１），（４６）式によって
結合係数と閾値を更新する（ステップＳＴ６０）。そし
て学習終了条件を満たされれば、学習を終了し、満たさ
れなければステップＳＴ５１に戻る（ステップＳＴ６
１）。こうして、学習が行なわれる。Next, it is determined whether or not p = P (step ST58). If p ≠ P, 1 is added to p (step ST59) and the process returns to step ST52. If p = P, step ST55 is performed. , ST56, and ST57, based on the results calculated and stored, update the coupling coefficient and the threshold value by the equations (11) and (46) (step ST60). If the learning end condition is satisfied, the learning ends, and if not, the process returns to step ST51 (step ST6).
1). Thus, learning is performed.

【０１０２】[0102]

【発明の効果】以上説明したように、本発明では、忌避
すべき教師信号が与えられたときには教師信号と出力デ
ータとの差が増加するように学習を行ない、模範となる
教師信号が与えられたときには教師信号と出力データと
の差が減少するように学習が行なわれる。As described above, according to the present invention, when a teacher signal to be avoided is given, learning is performed so as to increase the difference between the teacher signal and the output data, and a model teacher signal is given. Then, learning is performed such that the difference between the teacher signal and the output data is reduced.

【０１０３】従って、学習の柔軟性及び作業性が向上
し、学習条件に適応した学習、及び学習機械の特性に適
合した無理のない学習が行なえる。Accordingly, learning flexibility and workability are improved, and learning suitable for learning conditions and reasonable learning suitable for the characteristics of the learning machine can be performed.

【０１０４】更に、反面教師を利用して学習する際に、
その学習に応じた適切な損失関数を選択することがで
き、これによって安定した高速な学習が実現できるとい
う効果が得られる。Further, when learning using a teacher,
It is possible to select an appropriate loss function according to the learning, thereby obtaining an effect that stable and high-speed learning can be realized.

[Brief description of the drawings]

【図１】本発明に係わる学習機械の機能ブロック図であ
る。FIG. 1 is a functional block diagram of a learning machine according to the present invention.

【図２】図１に示された発明の第１実施例において３層
フィードフォワード・ニューラルネットワークを用いた
学習機械の機能ブロック図である。FIG. 2 is a functional block diagram of a learning machine using a three-layer feedforward neural network in the first embodiment of the invention shown in FIG. 1;

【図３】図２に示された学習機械において用いられる反
面教師の損失関数ｒ_- ^pと、教師信号と出力ベクトルと
の差Ｒp との関係を示す図である。It illustrates and ^p, the relation between the difference Rp between the teacher signal and the output vector _- Figure 3 loss function r of contrary teacher used in the indicated learning machine in FIG.

【図４】図２に示された学習機械に対して行われる学習
方法の流れを示すフローチャートである。FIG. 4 is a flowchart showing a flow of a learning method performed on the learning machine shown in FIG. 2;

【図５】図３に示された学習機械の学習方法を実際に適
用して得られた入力、出力の関係を示す特性図である。FIG. 5 is a characteristic diagram showing a relationship between an input and an output obtained by actually applying the learning method of the learning machine shown in FIG. 3;

【図６】図１に示された発明の第１実施例の変形例にお
いて反面教師の損失関数ｒ_- ^pとＲ_pとの関係を示す特
性図である。It is a characteristic diagram showing the relationship between ^p and R _p _- [6] loss function r of contrary teacher in the modification of the first embodiment of the invention shown in Figure 1.

【図７】図１に示された発明の第１実施例の変形例にお
いて反面教師の損失関数ｒ_- ^pとＲ_pとの関係を示す特
性図である。It is a characteristic diagram showing the relationship between ^p and R _p _- [7] loss function r of contrary teacher in the modification of the first embodiment of the invention shown in Figure 1.

【図８】図１に示された発明の第１実施例の変形例にお
いて反面教師の損失関数ｒ_- ^pとＲ_pとの関係を示す特
性図である。It is a characteristic diagram showing the relationship between ^p and R _p _- [8] loss function r of contrary teacher in the modification of the first embodiment of the invention shown in Figure 1.

【図９】図１に示された発明の第２実施例における学習
機械の機能ブロック図である。FIG. 9 is a functional block diagram of a learning machine according to a second embodiment of the invention shown in FIG. 1;

【図１０】図９に示された学習機械を実際に適用して得
られた入力、出力の関係を示す特性図である。FIG. 10 is a characteristic diagram showing a relationship between an input and an output obtained by actually applying the learning machine shown in FIG. 9;

【図１１】図２に示された学習機械に対して行われる第
３実施例の学習方法の流れを示すフローチャートであ
る。FIG. 11 is a flowchart illustrating a flow of a learning method according to a third embodiment performed on the learning machine illustrated in FIG. 2;

【図１２】本発明の第４実施例に係わる学習方法におけ
る確率密度関数を示す特性図である。FIG. 12 is a characteristic diagram showing a probability density function in a learning method according to a fourth embodiment of the present invention.

【図１３】図２に示された学習機械に対して行われる第
４実施例の学習方法の流れを示すフローチャートであ
る。FIG. 13 is a flowchart showing a flow of a learning method according to a fourth embodiment performed on the learning machine shown in FIG. 2;

【図１４】第５実施例において、本発明に係わる学習機
械を拡張した際の機能ブロック図である。FIG. 14 is a functional block diagram when a learning machine according to the present invention is expanded in the fifth embodiment.

【図１５】第５実施例において３層フィードフォワード
・ニューラルネットワークを用いた学習機械の機能ブロ
ック図である。FIG. 15 is a functional block diagram of a learning machine using a three-layer feedforward neural network in a fifth embodiment.

【図１６】第５実施例において境界教師項への寄与ｒ_>
^Pとｄ_pとの関係を示す特性図である。FIG. 16 shows the contribution r _> to the boundary teaching term in the fifth embodiment.
FIG. 4 is a characteristic diagram showing a relationship between ^P and d _p .

【図１７】図１５に示された学習機械に対して行なわれ
る学習方法の流れを示すフローチャートである。FIG. 17 is a flowchart showing a flow of a learning method performed on the learning machine shown in FIG. 15;

[Explanation of symbols]

１入力端子２変換部３出力端子７，１７パラメータ調整手段８，１８損失関数計算手段９，１９模範教師による損失関数計算手段１０，２０反面教師による損失関数計算手段１１ニューロン素子１２入力層１３隠れ層１４出力層１５，１６結合部２１，３１境界教師による損失関数計算手段 DESCRIPTION OF SYMBOLS 1 Input terminal 2 Transformation part 3 Output terminal 7, 17 Parameter adjustment means 8, 18 Loss function calculation means 9, 19 Loss function calculation means by model teacher 10, 20 Loss function calculation means by model teacher 11 Neuron element 12 Input layer 13 Hidden Layer 14 Output layer 15, 16 Coupling part 21, 31 Loss function calculating means by boundary teacher

フロントページの続き (56)参考文献特開平２−170265（ＪＰ，Ａ) 特開平５−20291（ＪＰ，Ａ) 特開平５−165801（ＪＰ，Ａ) 秋山、古谷、「損失関数にエントロピー項を導入したバックプロパゲーション学習則」、電子情報通信学会技術研究報告、Ｖｏｌ．91、Ｎｏ．25（ＮＣ91−１〜18）、社団法人電子情報通信学会・発行（1991年５月８日）、ｐｐ．39〜46 （特許庁ＣＳＤＢ文献番号：ＣＳＮＴ 199900623006) 山田、田中、「教師信号との出力の差が増大する学習方式」、1991年電子情報通信学会秋季大会講演論文集（分冊６）、社団法人電子情報通信学会・発行（1991年８月15日発行、特許庁情報館受入日：平成３年11月13日) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06N 1/00 - 7/00 G06G 7/60 ＪＩＣＳＴファイル（ＪＯＩＳ) ＩＮＳＰＥＣ（ＤＩＡＬＯＧ) ＣＳＤＢ（日本国特許庁) ＩＥＥＥｘｐｌｏｒｅ（ｈｔｔｐ：／／ｉｅｅｅｘｐｌｏｒｅ．ｉｅｅｅ．ｏｒｇ／)Continuation of the front page (56) References JP-A-2-170265 (JP, A) JP-A-5-20291 (JP, A) JP-A-5-165801 (JP, A) Akiyama, Furuya, Backpropagation Learning Rule with Entropy Term ", IEICE Technical Report, Vol. 91, No. 25 (NC91-1-18), published by The Institute of Electronics, Information and Communication Engineers (May 8, 1991), pp. 39-46 (Patent Office CSDB Literature Number: CCNT 199900623006) Yamada, Tanaka, "Learning Method with Increased Output Difference from Teacher Signal", Proc. Of the 1991 IEICE Autumn Conference, Volume 6, The Institute of Electronics, Information and Communication Engineers (Issue: August 15, 1991; Acceptance date: Japan Patent Office Information Center: November 13, 1991) (58) Field surveyed (Int. Cl. ⁷ , DB name) G06N 1 / 00-7/00 G06G 7/60 JICST file (JOIS) INSPEC (DIALOG) CSDB (Japan Patent Office) IEEEExplore (http://ieeeexplore.ieee.org/)

Claims

(57) [Claims]

In a method of learning a control parameter using a teacher signal in a learning machine that outputs an output vector using a predetermined control parameter with respect to an input vector, a learning signal should not be output as an output vector as a teacher signal. On the other hand, define the teacher that means the data to be avoided, give the value of the input vector and the value of the teacher signal corresponding to the input vector to the learning machine, and use the initial control parameters for the value of the input vector and the value of the output vector. Is calculated,
On the other hand, according to the difference x between the teacher and the output vector, a loss function r that decreases with an increase in the difference x is calculated by the following equation. R = exp (−x ² ) When the value of the loss function is larger than a predetermined value, Until the value of the loss function becomes smaller than the predetermined value, the control parameter is updated so that the value of the loss function is reduced, and the value of the loss function is repeatedly calculated, and the calculated loss function is smaller than the predetermined value. A learning method for a learning machine, comprising, when a small value is reached, incorporating the updated latest control parameters into the learning machine.

2. A method for learning a control parameter using a teacher signal in a learning machine that outputs an output vector using a predetermined control parameter with respect to an input vector, wherein data to be output as an output vector is used as a teacher signal. And an instructor that means data to be avoided which should not be output as an output vector, and defines a value of an input vector and a value of a teacher signal corresponding to the input vector to a learning machine. teacher and determined by the difference x ₊ formulas below are as a model teacher term r ₊ calculated by the output vector, r ₊ = x ₊ ² whereas the teacher and the contribution of the difference x _over and although teacher and model teacher output vector It defines the other hand teacher term r _over which is calculated by the parameter gamma _over> 0 to determine the ratio by the following equation, r _over = gamma _{chromatography} exp a _^(-x-2) model teacher learning machine When given calculates the model teacher term r ₊ to decrease with decreasing difference x _+, contrary to the case of giving the teacher in the learning machine, calculates the contrary teacher term r _over which decreases with increasing difference x _over and, exemplary teacher term r ₊ and loss function r obtained by adding the other hand teacher term r _over
If the value of the loss function is larger than a predetermined value, the control parameters are updated so that the value of the loss function decreases until the value of the loss function becomes smaller than the predetermined value, and the loss function is updated. Learning the learning machine, comprising repeatedly calculating the value, and incorporating the updated latest control parameter into the learning machine when the calculated loss function reaches a value smaller than a predetermined value. Law.

3. A neural network in which a plurality of neuron elements are connected to each other, multiplying an input vector given to a group of neuron elements of an input unit by a predetermined connection weight,
The weighted data is transmitted to the neuron element group connected to the input neuron element group, the threshold value is further subtracted, the value converted by a predetermined monotonically increasing function is multiplied by a predetermined connection weight, and the output neuron element is output. In the learning machine that transmits to the group, subtracts the threshold value again, converts it with a predetermined monotone increasing function, and outputs the output vector from the neuron element group of the output unit, sets the connection weight and threshold value using the teacher signal In the method of learning a learning machine, as a teacher signal, a model teacher that means data to be output as an output vector, and a teacher that means data to be avoided that should not be output as an output vector, The input vector value is given to the neuron element group of the input part of the learning machine, and the model teacher or the other side teacher corresponding to the input vector is given to the learning machine. Using coupling weights and thresholds to convert the input vector, and calculates the value of the output vector from the neuron element group output unit, a model teacher term r ₊ calculated by ₊ the difference between the model teacher and output vector x determined, contrary defines contrary teacher term r _over which is calculated by the difference x _over the teacher and the output vector, when given a model teacher in the learning machine, a model teacher term r ₊ to decrease with decreasing difference x ₊ calculated, on the other hand when given a teacher in the learning machine, the difference x _over
Of the other hand teacher term r _over which decreases computed with increasing calculates the loss function r obtained by adding a model teacher term r ₊ and although teachers term r _over, if the value of the loss function is larger than a predetermined value Until the value of the loss function becomes smaller than a predetermined value, the connection weight and the threshold are updated so that the value of the loss function decreases, and the value of the loss function is repeatedly calculated. A learning method for a learning machine, comprising: incorporating the updated latest connection weight and threshold value into the learning machine when a value smaller than a predetermined value is reached.

4. A method for learning a control parameter て using a teacher signal in a learning machine that outputs an output vector using a predetermined control parameter に対し with respect to an input vector, wherein the control parameter として is output as an output vector as a teacher signal. A model teacher that means data to be output and a teacher that means data to be avoided that should not be output as an output vector are defined, and the density distribution function g (x ₊ ) of the difference x ₊ between the model teacher and the output vector is defined as using the area s occupied by exemplary teachers, determine the probability distribution _{_{G (x +) = ∫ s}} g (x) dx associated with model teachers, contrary difference x _over the teacher and the output vector, the density distribution function g (x) and using the area s occupied by exemplary teachers, contrary defines a probability distribution G (x _{_over) = 1-∫ s g (} x) dx associated with teachers, the learning machine The value of the force vector and the value of the teacher signal corresponding to the input vector are given, the value of the output vector is calculated using the initial control parameter に for the value of the input vector, and the log likelihood lg ( determined as follows exemplary teacher term r ₊ and although teachers term r _over from Θ, -lE (Θ) = - lnG (x +) -ln [1-G (x _over)], r ₊ = -lnG ( x _+), r _over = -ln [1-G (x _over), when the model teacher gave learning machine calculates the model teacher term r ₊ to decrease with decreasing difference x _+, contrary teachers when given to learning machine of the difference to calculate the contrary teacher term r _over which decreases with increasing x _over, exemplary teacher term r ₊ and although teachers term r _over loss is determined by adding the function r
When the value of the loss function is larger than the predetermined value, the control parameter Θ is updated so that the value of the loss function decreases until the value of the loss function becomes smaller than the predetermined value, and the loss function is updated. Is repeatedly calculated, and when the calculated loss function reaches a value smaller than a predetermined value, the learning machine incorporates the updated latest control parameter た. Learning method.

5. A learning machine for multiplying an input vector by a predetermined control parameter to output an output vector, wherein the control parameter is preliminarily learned using a teacher signal. A model teacher that means data and a teacher that means data to be avoided that should not be output as an output vector are defined, and the area s occupied by the model teacher, the difference x ₊ between the model teacher and the output vector, and the variance σ ^Two
Distribution function s * g (x ₊ ) = s (1 / (2π) ^1/2 σ) exp [−] using the normal distribution g (x ₊ ) defined by
defines _{^{^{x + 2 / (2σ 2)}}} ], the area occupied by the exemplary teacher s, contrary to the teaching contrary with normal distribution g (x _over) defined by the difference x _over and variance sigma ² between the teacher and the output vector related to the distribution function 1-s * g (x _over)
= 1-s (1 / (2π) ^1/2 σ) exp [−x ₋ ² /
(2σ ² )], the input value of the input vector and the value of the teacher signal corresponding to the input vector are given to the learning machine, the value of the output vector is calculated using the initial control parameters as the input vector value, and the statistical value is calculated. calculated from log-likelihood l E (x) based on the entropy of the above like the exemplary teacher term r ₊ and although teachers term r _over, -lE (x) = - lng (x +) + s * g (x _over _{_{) r + = -lng (x +}} ) = ln (σ) + x + 2 / (2σ 2) r _over = s * g (x _over ^{2) = s / (1 /} (2π) 1/2 σ) exp when given the [-x _{chromatography} ^² / ^{(2σ 2)]} model teacher learning machine calculates the model teacher term r ₊ to decrease with decreasing difference x _+, contrary to the case of giving the teacher learning machine is determined by calculate the contrary teacher term r _over which decreases with increasing difference x _over, adding model teacher term r ₊ and although teachers term r _over The loss function r calculated that, if the value of the loss function is larger than a predetermined value, until the value of the loss function is smaller than a predetermined value, the control parameter so that the value of the loss function is reduced and variance σ ² to repeatedly calculate the value of the loss function, and when the calculated loss function reaches a value smaller than a predetermined value, from incorporating the updated control parameter and variance σ ² into the learning machine. A learning method for a learning machine, comprising: