JP3312042B2

JP3312042B2 - Learning machine

Info

Publication number: JP3312042B2
Application number: JP20846992A
Authority: JP
Inventors: 健次福水
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1992-07-13
Filing date: 1992-07-13
Publication date: 2002-08-05
Anticipated expiration: 2017-08-05
Also published as: JPH0635887A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、文字認識，画像認識，
音声認識等の認識装置、株価予測等の予測装置、あるい
は運動制御，機械動作制御等の制御装置などのように、
与えられた入力から望ましい出力を得るような装置全般
に利用される学習機械に関する。The present invention relates to character recognition, image recognition,
Such as a recognition device such as voice recognition, a prediction device such as stock price prediction, or a control device such as motion control, machine operation control, etc.
The present invention relates to a learning machine used in a general device that obtains a desired output from a given input.

【０００２】[0002]

【従来の技術】ニューラルネットワークを含む一般の学
習機械においては、入力が与えられたときに、この入力
に対して望ましい出力が得られるようにパラメータの学
習がなされる。この学習方式として、従来では、例えば
文献「バックプロパゲーション麻生英樹著コンピュー
トロール２４特集ニューロコンピュータ（コロナ
社）」に示されているようなバックプロパゲーション法
が知られている。2. Description of the Related Art In a general learning machine including a neural network, when an input is given, parameters are learned so as to obtain a desired output with respect to the input. Conventionally, as this learning method, a back-propagation method as disclosed in, for example, a document “Backpropagation Hideki Aso, Compute Roll 24 Special Issue Neurocomputer (Corona)” is known.

【０００３】[0003]

【発明が解決しようとする課題】ところで、この種の学
習機械を例えば文字認識に適用する場合、パラメータを
学習した後に、実際の文字認識を行なわせた結果、ある
文字について誤認識したときには、この文字についても
正しく認識がなされるようパラメータの再学習が望まれ
る。しかしながら、従来では、パラメータの再学習につ
いては何らの考慮もなされていなかった。これは、１つ
には、上述の例において、誤認識を生じた文字を正しく
認識させるような再学習がなされるときには（すなわ
ち、望ましくない出力を出すデータｘ₀を新たに学習し
たいデータとし、この新たに学習したいデータだけを用
いて再学習を行なうと）、いままで正しく認識されてい
た他の文字に誤認識が生じ、悪影響を及ぼす場合が多い
という問題があるためと考えられる。本願の発明者は、
この問題が、新たに学習したいデータｘ₀だけを入力デ
ータとして用いて再学習を行なうときに生じるものであ
り、初期学習時の学習データ全体の分布に対するこのデ
ータｘ₀の比重が不明なために生じると判断した。従っ
て、この問題を解決するために、初期の学習に使用した
学習データを再び用意し、初期の学習データと特定のデ
ータｘ₀との和集合が全体の分布からのより精度良い標
本，すなわち入力データであるとして、これを再学習に
用いる方法を当初考えた。しかしながら、この方法で
は、初期の学習データを常に保持しなければならず、工
学的な応用においては、初期の学習データを常に保持す
ることは記憶容量の点で不利であり、実用性に欠けると
いう問題があった。By the way, when this kind of learning machine is applied to, for example, character recognition, if a certain character is erroneously recognized as a result of actual character recognition after learning parameters, Re-learning of parameters is desired so that characters can be correctly recognized. However, conventionally, no consideration has been given to parameter re-learning. One of the reasons is that, in the above example, when re-learning is performed so as to correctly recognize a character in which misrecognition has occurred (that is, data x ₀ that outputs an undesired output is data to be newly learned, If re-learning is performed using only the new data to be learned), there is a problem that other characters that have been correctly recognized up to now are erroneously recognized and often have an adverse effect. The inventor of the present application
This problem occurs when re-learning is performed using only the data x ₀ to be newly learned as input data, and the specific gravity of the data x ₀ with respect to the distribution of the entire learning data at the time of initial learning is unknown. It was determined to occur. Therefore, in order to solve this problem by preparing again learning data used for the initial learning, more accurate sample of the entire distribution union of the initial training data and specific data x _0, ie input As data, I first thought of a method to use it for relearning. However, in this method, it is necessary to always retain the initial learning data, and in engineering applications, it is disadvantageous in terms of storage capacity to retain the initial learning data constantly and lacks practicality. There was a problem.

【０００４】本発明は、初期の学習データを常時保持せ
ずとも、他のデータに対して悪影響を及ぼすことの少な
い再学習の可能な学習機械を提供することを目的として
いる。An object of the present invention is to provide a learning machine capable of re-learning which does not adversely affect other data without always holding initial learning data.

【０００５】[0005]

【課題を解決するための手段および作用】上記目的を達
成するために、請求項１記載の発明は、入出力空間上の
確率分布を表わすパラメータを保持するパラメータ記憶
手段と、入力が与えられたときにパラメータ記憶手段に
保持されているパラメータに従って、出力の確率分布を
計算する出力確率計算手段と、入力とそれに対する望ま
しい出力との組を学習データとして用いてパラメータを
学習し、パラメータ記憶手段に設定するパラメータ学習
手段と、出力が与えられたときにパラメータ記憶手段に
保持されているパラメータに従って、入力の確率分布を
計算する入力確率計算手段と、入力確率計算手段で計算
された確率分布に従う入力標本と新たに学習したいデー
タとの和集合およびそれらに対する望ましい出力の組を
再学習データとして用いてパラメータ学習手段によりパ
ラメータの再学習を行なわせ、再学習のなされたパラメ
ータをパラメータ記憶手段に再設定する再学習手段とを
有していることを特徴としている。入力確率計算回路を
用いて発生させた入力標本は、初期の学習データを良好
に近似したものとなるので、この入力標本と新たに学習
したいデータｘ₀との和集合を用いて再学習を行なうこ
とにより、初期の学習データを常に保持することなく、
かつ他のデータに対して悪影響を及ぼすことの少ない再
学習を行なうことが可能となる。In order to achieve the above object, according to the first aspect of the present invention, a parameter storage means for holding a parameter representing a probability distribution in an input / output space and an input are provided. Sometimes, according to the parameters held in the parameter storage means, output probability calculation means for calculating the probability distribution of the output, learning the parameters using a set of the input and the desired output corresponding thereto as learning data, the parameter storage means Parameter learning means for setting, input probability calculation means for calculating a probability distribution of the input according to the parameters held in the parameter storage means when the output is given, and input according to the probability distribution calculated by the input probability calculation means The union of the sample and the data to be newly learned and the set of desired outputs for To perform the re-learning of the parameter by parameter learning unit using, it is characterized in having a re-learning means for resetting was subjected to the re-learning parameter in the parameter storage means. Since the input sample generated using the input probability calculation circuit is a good approximation of the initial learning data, re-learning is performed using the union of this input sample and the data x ₀ to be newly learned. By doing so, without always retaining the initial learning data,
In addition, it is possible to perform relearning with little adverse effect on other data.

【０００６】また、請求項２記載の発明は、出力確率計
算手段および入力確率計算手段が、可逆構成の１つのニ
ューラルネットワークによって実現されている。これに
より、簡単な構成の下で良好な入力標本を容易に得るこ
とができる。Further, in the invention according to claim 2, the output probability calculating means and the input probability calculating means are realized by one neural network having a reversible configuration. Thereby, a good input sample can be easily obtained with a simple configuration.

【０００７】また、請求項３記載の発明は、初期の学習
データの個数を保持する学習データ数記憶手段をさらに
有し、再学習手段は、入力確率計算手段で計算された確
率分布に従う入力標本のうち、学習データ数記憶手段に
保持されている個数と同数の標本を抽出し、該標本と新
たに学習したいデータとの和集合およびそれらに対する
望ましい出力の組を再学習データとして用いてパラメー
タ学習手段によりパラメータの再学習を行なわせること
を特徴としている。このような構成では、初期の学習デ
ータの個数分の入力標本が得られるので、初期学習時の
学習データをより良好に近似した入力標本を得ることが
でき、この入力標本と新たに学習したいデータとの和集
合および、それらに対する望ましい出力の組を再学習デ
ータとして、再学習を行なうことにより、より良好な再
学習を行なうことができる。The invention according to claim 3 further comprises learning data number storage means for holding the number of initial learning data, wherein the re-learning means comprises an input sample according to the probability distribution calculated by the input probability calculating means. Out of the number of samples held in the learning data number storage means, and a parameter learning is performed by using a union of the sample and data to be newly learned and a set of desired outputs corresponding thereto as re-learning data. It is characterized in that parameters are re-learned by means. In such a configuration, input samples corresponding to the number of initial learning data can be obtained, so that an input sample that better approximates the learning data at the time of the initial learning can be obtained. Re-learning is performed by using the union of と and a set of desired outputs for them as re-learning data, whereby better re-learning can be performed.

【０００８】また、請求項４記載の発明は、再学習手段
で再学習が行なわれた後に、用いられた再学習データの
個数を保持する学習データ数記憶手段をさらに有し、再
学習手段は、入力確率計算手段で計算された確率分布に
従う入力標本のうち、学習データ数記憶手段に保持され
ている個数と同数の標本を抽出し、該標本と新たに学習
したいデータとの和集合およびそれらに対する望ましい
出力の組を再学習データとして用いてパラメータ学習手
段によりパラメータの再学習を行なわせることを特徴と
している。これにより、再学習の際に、初期の学習デ−
タと新たに学習したいデ−タとの学習頻度の比率を自動
的に決定することができ、より外界のデ−タの分布に忠
実な学習が可能となる。Further, the invention according to claim 4 further comprises learning data number storage means for holding the number of used relearning data after relearning is performed by the relearning means. Of the input samples according to the probability distribution calculated by the input probability calculation means, the same number of samples as the number held in the learning data number storage means are extracted, and the union of the sample and the data to be newly learned and The parameter is re-learned by the parameter learning means by using a set of desired outputs to the re-learning data. Thereby, at the time of re-learning, the initial learning data
The ratio of the learning frequency between the data and the data to be newly learned can be automatically determined, and learning more faithful to the distribution of external data becomes possible.

【０００９】また、請求項５記載の発明は、再学習デー
タの学習強度を表わす学習率記憶手段をさらに有し、再
学習手段は、入力確率計算手段で計算された確率分布に
従う入力標本の個数と新たに学習したいデータの個数と
の比が、学習率記憶手段に保持されている値に従って決
定された比率に一致するように再学習デ−タを決定し該
再学習データを用いてパラメータ学習手段によりパラメ
ータの再学習を行なわせることを特徴としている。これ
により、再学習の際に、初期の学習デ−タと新たに学習
したいデ−タとの学習頻度の比率をオペレ−タなどが任
意に決定することができ、特定のデ−タだけを強く学習
することが可能となる。Further, the invention according to claim 5 further comprises a learning rate storing means for indicating a learning intensity of the relearning data, wherein the relearning means includes a number of input samples according to the probability distribution calculated by the input probability calculating means. The re-learning data is determined so that the ratio between the number of data to be newly learned and the number of data to be newly learned matches the ratio determined according to the value held in the learning rate storage means, and parameter learning is performed using the re-learning data. It is characterized in that parameters are re-learned by means. Thus, at the time of re-learning, the operator can arbitrarily determine the ratio of the learning frequency between the initial learning data and the data to be newly learned, and only the specific data can be determined. It becomes possible to learn strongly.

【００１０】[0010]

【実施例】以下、本発明の実施例を図面に基づいて説明
する。図１は本発明に係る学習機械の第１の実施例の構
成図である。図１の学習機械は、入出力空間（ｘ，ｙ）
上の確率分布を表わすパラメータｗを保持するパラメー
タ記憶部１０と、入力ｘが与えられたときにパラメータ
記憶部１０に保持されているパラメータｗに従って、出
力ｙの確率分布を出力する階層型のニューラルネットワ
ークＮＷと、入力とそれに対する望ましい出力との組か
らなる学習データに基づきパラメータｗの学習を行な
い、学習のなされたパラメータｗをパラメータ記憶部１
０に設定するパラメータ学習部１１と、パラメータｗの
再学習を行ない、再学習のなされたパラメータｗをパラ
メータ記憶部１０に設定する再学習部１２とを有してい
る。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a configuration diagram of a first embodiment of a learning machine according to the present invention. The learning machine of FIG. 1 has an input / output space (x, y)
A parameter storage unit 10 for holding a parameter w representing the above probability distribution, and a hierarchical neural network for outputting a probability distribution of an output y in accordance with the parameter w stored in the parameter storage unit 10 when an input x is given Learning of the parameter w is performed based on learning data including a network NW and a set of an input and a desired output corresponding thereto, and the learned parameter w is stored in the parameter storage unit 1.
It has a parameter learning unit 11 that sets the parameter w to 0 and a relearning unit 12 that relearns the parameter w and sets the relearned parameter w in the parameter storage unit 10.

【００１１】ここで、階層型のニューラルネットワーク
ＮＷは、入力素子１と、第１の中間層２と、正規化細胞
３と、第２の中間層４と、出力素子５とを備えており、
このニューラルネットワークＮＷは、可逆的に構成され
ている。すなわち、入力ｘが入力素子１に加わったとき
には、出力素子５からは、入力ｘが与えられたときの出
力ｙの確率分布が得られるとともに、出力ｙが出力素子
５に加わったときには、入力素子１からは、出力ｙが与
えられたときの入力ｘの確率分布が得られるようになっ
ている。従って、このニューラルネットワークＮＷは、
入力ｘが与えられたときに、パラメータ記憶部１０に保
持されているパラメータｗに従って出力ｙの確率分布を
計算する出力確率計算回路として機能するとともに、出
力ｙが与えられたときにパラメータ記憶部１０に保持さ
れているパラメータｗに従って入力ｘの確率分布を計算
する入力確率計算回路としても機能するようになってい
る。Here, the hierarchical neural network NW includes an input element 1, a first intermediate layer 2, a normalized cell 3, a second intermediate layer 4, and an output element 5,
This neural network NW is configured reversibly. That is, when the input x is applied to the input element 1, the probability distribution of the output y when the input x is given is obtained from the output element 5, and when the output y is applied to the output element 5, From 1, the probability distribution of the input x when the output y is given can be obtained. Therefore, this neural network NW
When an input x is given, it functions as an output probability calculation circuit that calculates a probability distribution of an output y according to a parameter w held in a parameter storage unit 10, and when an output y is given, the parameter storage unit 10 Also functions as an input probability calculation circuit that calculates the probability distribution of the input x in accordance with the parameter w held in.

【００１２】通常の動作時には、このニューラルネット
ワークＮＷは、出力確率計算回路として機能し、入力素
子１に入力ｘが加わったときに、パラメータ記憶部１０
に保持されているパラメータｗに従って、出力ｙの確率
分布を計算し、出力素子５から出力するようになってい
る。すなわち、パラメータｗの学習時（初期学習時）に
は、入力とそれに対する望ましい出力との組からなる学
習データに基づき、パラメータ学習部１１によりパラメ
ータｗを学習し、入力ｘに対して望ましい出力ｙが出さ
れるよう、パラメータ記憶部１０内のパラメータｗを更
新するようになっている。また、パラメータｗの学習が
パラメータ学習部１１によりなされ、パラメータ記憶部
１０に望ましいパラメータｗが設定されたときには、こ
のニューラルネットワークＮＷを実際の文字認識，音声
認識等に使用し、未知の入力ｘに対して出力ｙを得るこ
とができる。During normal operation, the neural network NW functions as an output probability calculation circuit, and when an input x is applied to the input element 1, the parameter storage unit 10
The probability distribution of the output y is calculated in accordance with the parameter w stored in the output element 5 and output from the output element 5. That is, at the time of learning the parameter w (at the time of initial learning), the parameter learning unit 11 learns the parameter w based on learning data composed of a set of an input and a desired output, and obtains a desired output y for the input x. Is updated, the parameter w in the parameter storage unit 10 is updated. When the parameter w is learned by the parameter learning unit 11 and the desired parameter w is set in the parameter storage unit 10, this neural network NW is used for actual character recognition, voice recognition, etc. On the other hand, an output y can be obtained.

【００１３】しかしながら、パラメータｗの学習がなさ
れ、パラメータ記憶部１０に望ましいと思われるパラメ
ータｗが設定された場合にも、文字認識などにおいて誤
認識する文字等が発生することがある。例えば、ある特
定の入力ｘ₀に対する出力が不適切となることがある。
このような場合に、図１の学習機械では、再学習部１２
によってパラメータｗを再学習し、特定の入力ｘ₀に対
しても適切な出力が返るようにすることを意図してお
り、この再学習を行なう際に、ニューラルネットワーク
ＮＷを入力確率計算回路としても機能させるようになっ
ている。However, even if the parameter w is learned and the parameter w considered to be desirable is set in the parameter storage unit 10, characters or the like which are erroneously recognized in character recognition or the like may occur. For example, it may output for a particular input x ₀ may become inadequate.
In such a case, in the learning machine of FIG.
In order to re-learn the parameter w by using a parameter w, an appropriate output is returned even for a specific input x _{0. In} performing the re-learning, the neural network NW may be used as an input probability calculation circuit. It works.

【００１４】すなわち、再学習を行なう場合には、前述
のように、初期の学習時に使用した学習データと特定の
データ（すなわち新たに学習したいデータ）ｘ₀との和
集合を入力データとして用いるのが望ましいが、初期の
学習時に使用した学習データを保持することは記憶容量
の点で不利である。そこで、この第１の実施例では、初
期の学習時に使用した学習データを保持するかわりに、
再学習部１２は、初期学習時に出力として教えた教師信
号（入力に対する望ましい出力）を発生させ、これらを
出力素子５に加えて、入力素子１から入力ベクトルの標
本（入力標本）を出力させて、入力素子１から出力され
た入力標本と特定のデータ（新たに学習したいデータ）
ｘ₀との和集合およびそれらに対する望ましい出力の組
を再学習データとして、パラメータ学習部１１により再
学習させるようになっている。このように、初期の学習
データのかわりの入力標本を発生させるときに、ニュー
ラルネットワークＮＷを入力確率計算回路として機能さ
せるようになっている。[0014] That is, the in the case of performing the re-learning, as described above, using the union of the initial training data used during learning and x ₀ (data want to learn i.e. newly) which specific data as input data However, holding learning data used at the time of initial learning is disadvantageous in terms of storage capacity. Therefore, in the first embodiment, instead of holding the learning data used in the initial learning,
The re-learning unit 12 generates a teacher signal (desired output with respect to the input) taught as an output during the initial learning, adds these to the output element 5, and outputs a sample of the input vector (input sample) from the input element 1. , Input sample output from input element 1 and specific data (data to be newly learned)
The parameter learning unit 11 re-learns the union with x ₀ and the desired output set for them as re-learning data. As described above, when generating input samples instead of the initial learning data, the neural network NW is made to function as an input probability calculation circuit.

【００１５】次にこのような構成の学習機械の処理動作
についてより詳細に説明する。通常の処理動作時には、
ニューラルネットワークＮＷは、出力確率計算回路とし
て機能し、入力素子１には、例えばＭ次元入力ベクトル
データｘが外部の装置（図示せず）から加わる。例えば
文字認識に適用される場合には、外部装置で抽出された
文字の特徴ベクトルなどが入力として使われる。入力素
子１に入力ベクトルデータｘが加わると、このニューラ
ルネットワーク，すなわち出力確率計算回路は、与えら
れた入力ｘに従って、ｘが与えられた時のＮ次元出力ベ
クトルデータｙの条件付き確率分布を計算する。ここ
で、入出力（ｘ，ｙ）の満足する確率分布は、パラメー
タ記憶部１０に保持されるパラメータｗを用いて、確率
密度関数ｐ(ｗ；ｘ，ｙ）によって推定される。このと
き、出力確率算出回路は、パラメータ記憶部１０に保持
されているパラメータｗを用いて、入力ｘが入力された
ときの出力ｙの条件付き確率ｐ(ｗ；ｙ｜ｘ）を次式に
従って数値演算する。Next, the processing operation of the learning machine having such a configuration will be described in more detail. During normal processing operation,
The neural network NW functions as an output probability calculating circuit, and the input element 1 receives, for example, M-dimensional input vector data x from an external device (not shown). For example, when applied to character recognition, a feature vector of a character extracted by an external device is used as an input. When the input vector data x is added to the input element 1, the neural network, that is, the output probability calculation circuit calculates the conditional probability distribution of the N-dimensional output vector data y when x is given according to the given input x. I do. Here, the probability distribution satisfying the input / output (x, y) is estimated by the probability density function p (w; x, y) using the parameter w stored in the parameter storage unit 10. At this time, the output probability calculating circuit uses the parameter w held in the parameter storage unit 10 to calculate the conditional probability p (w; y | x) of the output y when the input x is input according to the following equation. Perform a numerical operation.

【００１６】[0016]

【数１】 (Equation 1)

【００１７】いま、例えば、Ｒ(ｘ），Ｓ(ｙ）をそれぞ
れＲ^M，Ｒ^N上の確率密度関数とし、ξ∈Ｒ^M，η∈Ｒ^N，
ρ，σ≧０に対し、次式のように定義する。[0017] Now, for example, R (x), S and (y) respectively R ^M, and the probability density function on ^{^{^{R N, ξ∈R M, η∈R N}}} ,
For ρ, σ ≧ 0, it is defined as follows.

【００１８】[0018]

【数２】 (Equation 2)

【００１９】この場合、確率密度関数ｐ(ｗ；ｘ，ｙ）
としては、ｗ＝(θ_h，ξ_h，ρ_h，η_h，σ_h）（１≦ｈ≦
Ｈ）をパラメータとして、次式を用いることができる。In this case, the probability density function p (w; x, y)
As w = (θ _h , ξ _h , ρ _h , η _h , σ _h ) (1 ≦ h ≦
The following equation can be used with H) as a parameter.

【００２０】[0020]

【数３】 (Equation 3)

【００２１】以下では、Ｒ，Ｓが次式のようなガウス分
布で表わされるものとする。In the following, it is assumed that R and S are represented by a Gaussian distribution as shown in the following equation.

【００２２】[0022]

【数４】 (Equation 4)

【００２３】このとき、条件付き確率ｐ(ｗ；ｙ｜ｘ）
は、次式のようになる。At this time, the conditional probability p (w; y | x)
Is as follows:

【００２４】[0024]

【数５】 (Equation 5)

【００２５】数５で表わされる確率分布ｐ(ｗ；ｙ｜
ｘ）に従って出力ｙを発生させるため、ニューラルネッ
トワークＮＷでは、具体的には、以下の処理がなされ
る。先づ、第１の中間層２は、パラメータξ_h，ρ_h，θ
_hと入力ベクトルｘとを用い、各素子の出力Ｏ_h ⁽¹⁾(ｘ）
を次式のように算出する。The probability distribution p (w; y |
In order to generate the output y in accordance with x), the following processing is specifically performed in the neural network NW. First, the first intermediate layer 2 includes parameters ξ _h , ρ _h , θ
_{Using h} and input vector x, output O _h ⁽¹⁾ (x) of each element
Is calculated as in the following equation.

【００２６】[0026]

【数６】 (Equation 6)

【００２７】次に正規化細胞３は、第１の中間層２の各
素子の出力Ｏ_h ⁽¹⁾(ｘ）の総和Ｏ（ｘ）を次式により求
め、出力する。Next, the normalized cell 3 calculates and outputs the sum O (x) of the outputs O _h ⁽¹⁾ (x) of the respective elements of the first intermediate layer 2 by the following equation.

【００２８】[0028]

【数７】 (Equation 7)

【００２９】ここで、α_h(ｘ）＝Ｏ_h ⁽¹⁾(ｘ）／Ｏ(ｘ）
は、次式を満たす。Here, α _h (x) = O _h ⁽¹⁾ (x) / O (x)
Satisfies the following equation.

【００３０】[0030]

【数８】 (Equation 8)

【００３１】従って、｛α₁(ｘ），α₂(ｘ），…，α
_H(ｘ）｝は、Ｈ個の離散集合｛１，２，…，Ｈ｝上の確
率を表わす。この離散分布に基づいて、集合｛１，２，
…，Ｈ｝から１つの要素ｈ₀を選び出し、第２の中間層
４の出力値を、（００，…，１，…，０）とおく。すな
わち、１つの要素ｈ₀に対応した出力値だけを“１”と
する。Therefore, ｛α ₁ (x), α ₂ (x),..., Α
_H (x)} represents probabilities on H discrete sets {1, 2,..., H}. Based on this discrete distribution, the set {1,2,2,
, H}, one element h ₀ is selected, and the output value of the second intermediate layer 4 is set to (00,..., 1,..., 0). That is, only the output value corresponding to one element h ₀ is set to “1”.

【００３２】出力素子５は、ニューラルネットワークＮ
Ｗの出力ｙを、ガウス分布Ｓ(η_h0，σ_h0；ｙ）に従う
サンプルとして確率的に１つ決定する。このようにし
て、出力確率計算回路は、出力値ｙを数５の確率分布に
従う標本として発生させることができる。The output element 5 is a neural network N
One output y of W is stochastically determined as a sample according to the Gaussian distribution S (η _h0 , σ _h0 ; y). In this way, the output probability calculation circuit can generate the output value y as a sample according to the probability distribution of Expression 5.

【００３３】望ましい確率分布に従って、出力ｙが得ら
れるようになるためには、出力確率計算回路においてパ
ラメータｗを適切な値に決定することが必要となる。こ
の学習はパラメータ学習部１１により行なわれる。先
づ、学習に際して、学習データと呼ばれる、入力とそれ
に対する望ましい出力の組の例｛（ｘ_s，ｙ_s）｜（１≦
ｓ≦Ｓ）｝が用意される。例えば、このニューラルネッ
トワークＮＷを文字認識に応用する場合には、文字
「あ」、「い」、「う」…に対する特徴ベクトルをｘ_s
とし、また、それらに対応する出力ｙ_sを（１，０，
０，…），（０，１，０，…），（０，０，１，…），
…とすれば良い。In order for the output y to be obtained in accordance with the desired probability distribution, it is necessary to determine the parameter w to an appropriate value in the output probability calculation circuit. This learning is performed by the parameter learning unit 11. Previously Dzu, upon learning, called learning data, input a set of examples of desirable output thereto _{_{{(x s, y s)}} | (1 ≦
s ≦ S)｝ is prepared. For example, when this neural network NW is applied to character recognition, the feature vector for the characters “A”, “I”, “U” _,.
And then, also, the output y _s corresponding to them (1,0,
0, ...), (0, 1, 0, ...), (0, 0, 1, ...),
...

【００３４】乱数を用いるなどの方法によりパラメータ
ｗを初期化した後、パラメータ学習部１１は、最急上昇
法による最尤推定を行なう。すなわち、次式のように、
対数尤度関数Ｌ(ｗ）の値がなるべく大きくなるよう
に、パラメータｗの値を次の最急上昇法の規則に従って
更新していく。After initializing the parameter w by a method such as using a random number, the parameter learning unit 11 performs the maximum likelihood estimation by the steepest ascent method. That is, as in the following equation:
The value of the parameter w is updated according to the following rule of the steepest ascent method so that the value of the log likelihood function L (w) becomes as large as possible.

【００３５】[0035]

【数９】 (Equation 9)

【００３６】ガウス分布を用いた場合には、ｐ_s＝ｐ
(ｗ；ｘ_s，ｙ_s）と書くとき、次式に従って逐次的にパ
ラメータｗの更新を行ない、パラメータｗの変化量があ
る閾値より小さくなったところで更新を停止し、パラメ
ータ記憶部１０に書き込む。When a Gaussian distribution is used, p _s = p
When writing (w; x _s , y _s ), the parameter w is updated sequentially according to the following equation. When the amount of change of the parameter w becomes smaller than a certain threshold, the update is stopped and written into the parameter storage unit 10. .

【００３７】[0037]

【数１０】 (Equation 10)

【００３８】以上のように決定され、パラメータ記憶部
１０内に最終的に記憶されたパラメータｗが実際の文字
認識などの動作の際に出力確率計算回路によって使用さ
れる。The parameter w determined as described above and finally stored in the parameter storage unit 10 is used by the output probability calculation circuit during an operation such as actual character recognition.

【００３９】ところで、学習によってパラメータｗを設
定した後、実際の文字認識などの動作を行なうときに、
文字の誤認識などが生じる場合がある。このような場合
に、すなわち入力ベクトルｘ₀に対する出力が不適切で
ある場合に、適切な出力が返るように再学習したいとす
る。このとき再学習部１２は、ニューラルネットワーク
ＮＷを入力確率計算回路として機能させ、初期学習に出
力として教えた教師信号を発生させて、これらに対する
入力ベクトルの標本を入力確率計算回路から発生させ
る。After the parameter w is set by learning, when an operation such as actual character recognition is performed,
Erroneous recognition of characters may occur. In such a case, that is, when the output for the input vector x ₀ is inappropriate, and would like to re-learn to appropriate output is returned. At this time, the re-learning unit 12 causes the neural network NW to function as an input probability calculation circuit, generates teacher signals taught as outputs in the initial learning, and generates samples of input vectors corresponding thereto from the input probability calculation circuit.

【００４０】すなわち、入力確率計算回路は、出力ベク
トルｙが与えられた時の入力ベクトルｘの条件付き確率
ｐ(ｗ；ｘ｜ｙ）を、次式に従って数値演算する。That is, the input probability calculation circuit numerically calculates the conditional probability p (w; x | y) of the input vector x when the output vector y is given according to the following equation.

【００４１】[0041]

【数１１】 [Equation 11]

【００４２】数１１で表わされる確率分布ｐ(ｗ；ｘ｜
ｙ）に従って入力ｘを発生させるため、ニューラルネッ
トワークＮＷでは、具体的には、以下の処理がなされ
る。すなわち、先づ、第２の中間層４は、パラメータη
_h，σ_h，θ_hと出力ベクトルｙとを用い、各素子の出力
ｒ_h ⁽¹⁾(ｘ）を次式のように算出する。The probability distribution p (w; x |
In order to generate the input x according to y), the following processing is specifically performed in the neural network NW. That is, first, the second intermediate layer 4 has the parameter η
_{Using h} , σ _h , θ _h and the output vector y, the output r _h ⁽¹⁾ (x) of each element is calculated as follows.

【００４３】[0043]

【数１２】 (Equation 12)

【００４４】次に正規化細胞３は、第２の中間層４の各
素子の出力の総和ｒ(ｘ）を次式により求め、出力す
る。Next, the normalized cell 3 calculates and outputs the total sum r (x) of the outputs of the respective elements of the second intermediate layer 4 by the following equation.

【００４５】[0045]

【数１３】 (Equation 13)

【００４６】ここで、β_h(ｘ）＝ｒ_h ⁽¹⁾(ｘ）／ｒ(ｘ）
は、次式を満たす。Here, β _h (x) = r _h ⁽¹⁾ (x) / r (x)
Satisfies the following equation.

【００４７】[0047]

【数１４】 [Equation 14]

【００４８】従って、｛β₁(ｘ），β₂(ｘ），…，β
_H(ｘ）｝は、Ｈ個の離散集合｛１，２，…，Ｈ｝上の確
率を表わす。この離散分布に基づいて、集合｛１，２，
…，Ｈ｝から１つの要素ｈ₀を選び出し、第１の中間層
２の出力値を（０，０，…，１，…，０）とおく。すな
わち、１つの要素ｈ₀に対応した出力値だけを“１”と
する。入力素子１は、ニューラルネットワークＮＷの逆
方向の出力をガウス分布Ｒ(ξ_h0，ρ_h0；ｘ）に従うサ
ンプルとして確率的に１つ決定する。このようにして、
入力確率計算回路は、出力確率計算回路の場合と同様の
仕方で、但し、各素子の計算順序を反対にして、数１１
の確率分布に従う入力標本を発生させることができる。Therefore, ｛β ₁ (x), β ₂ (x),.
_H (x)} represents probabilities on H discrete sets {1, 2,..., H}. Based on this discrete distribution, the set {1,2,2,
, H}, one element h ₀ is selected, and the output value of the first intermediate layer 2 is set to (0, 0,..., 1,..., 0). That is, only the output value corresponding to one element h ₀ is set to “1”. The input element 1 stochastically determines one output in the reverse direction of the neural network NW as a sample according to a Gaussian distribution R (ξ _h0 , ρ _h0 ; x). In this way,
The input probability calculation circuit operates in the same manner as the output probability calculation circuit, except that the calculation order of each element is reversed.
Can be generated according to the probability distribution of

【００４９】再学習部１２は、これらの入力標本と新た
に学習したいベクトルデータｘ₀との和集合および、そ
れらに対する望ましい出力の組を再学習データとして、
初期学習時と同様に、パラメータ学習部１１によりパラ
メータｗの再学習を行なわせる。The re-learning unit 12 uses the union of these input samples and the vector data x ₀ to be newly learned and a set of desired outputs for them as re-learning data.
As in the case of the initial learning, the parameter learning unit 11 re-learns the parameter w.

【００５０】このような処理において、入力確率計算回
路を用いて発生させた入力標本は、初期の学習データを
良好に近似したものとなるので、この入力標本と新たに
学習したいデータｘ₀との和集合およびそれらに対する
望ましい出力の組を再学習デ−タとして用いて再学習を
行なうことにより、他のデータに対して悪影響を及ぼす
ことの少ない再学習を行なうことが可能となる。In such a process, the input sample generated by using the input probability calculation circuit is a good approximation of the initial learning data. Therefore, the input sample and the data x ₀ to be newly learned are obtained. By performing re-learning using the union and a desired set of outputs as the re-learning data, it is possible to perform re-learning with little adverse effect on other data.

【００５１】また、初期の学習デ−タのうち、入力デ−
タについては、上述したように、これのかわりに入力確
率計算回路を用いて入力標本が得られるので、これを常
に保持しておく必要がない。また、初期の学習デ−タの
うち、望ましい出力（教師信号）についても、これを簡
単な回路もしくはＣＰＵ等により規則的に作成可能であ
り、望ましい出力（教師信号）が簡単な回路等によって
自動的に作成される場合には、これを保持する必要がな
い。より具体的には前述の例に示したようなパタ−ン認
識においては、識別すべきカテゴリの数だけ出力素子を
用意し、ある入力ｘに対する望ましい出力（教師信号）
として、（０，…，１，…０）のように、入力ｘが属す
るカテゴリに対する出力素子だけを“１”にしたベクト
ルを設定する方法が良く用いられる。このような場合に
は、再学習用の入力デ−タ（入力標本）を発生させると
きに、教師信号として、（１，０，…，０），（０，
１，０，…，０），（０，０，１，０，…，０），…を
出力素子に与えて入力デ−タを発生させれば良く、実際
には、これらは簡単な回路もしくはＣＰＵ等により規則
的に作成可能である。従って、この実施例によれば、入
力とその望ましい出力との組からなる初期の学習デ−タ
を常時保持することなく、かつ、他のデ−タに対して悪
影響を及ぼすことの少ない再学習を行なうことが可能と
なる。Also, of the initial learning data, the input data
As described above, since the input sample is obtained using the input probability calculation circuit instead of the input sample, it is not necessary to always hold the input sample. Also, a desired output (teacher signal) of the initial learning data can be regularly generated by a simple circuit or CPU or the like, and the desired output (teacher signal) is automatically generated by a simple circuit or the like. If created dynamically, there is no need to keep this. More specifically, in the pattern recognition as shown in the above-described example, output elements are prepared by the number of categories to be identified, and a desired output (teacher signal) for a certain input x is provided.
For example, a method of setting a vector in which only the output element corresponding to the category to which the input x belongs to "1", such as (0,..., 1,... 0), is often used. In such a case, when generating re-learning input data (input samples), (1, 0,..., 0), (0,
, 0), (0, 0, 1, 0,..., 0),... To input elements to generate input data. In practice, these are simple circuits. Alternatively, it can be created regularly by a CPU or the like. Therefore, according to this embodiment, the re-learning does not always hold the initial learning data consisting of the pair of the input and the desired output, and has little adverse effect on other data. Can be performed.

【００５２】また、上記例において、新たに学習したい
データｘ₀としては、これが１つの場合に限らず、複数
個存在する場合であっても良い。この場合にも、全く同
様の手順により、他のデータに対して悪影響を及ぼすこ
との少ない再学習を行なうことができる。[0052] In the above example, the data x ₀ to be newly learned, which is not limited to one, may be a case where there exist a plurality. Also in this case, re-learning that does not adversely affect other data can be performed by exactly the same procedure.

【００５３】図２は本発明に係る学習機械の第２の実施
例の構成図である。なお、図２において図１と同様の箇
所には同じ符号を付している。この第２の実施例では、
初期の学習時に用いられた学習データの個数を記憶する
学習データ数記憶部１３がさらに設けられており、再学
習部１２は、初期学習時に出力として教えた教師信号
を、学習データ数記憶部１３に保持された学習データ個
数分だけ発生させ、これらに対応した入力標本を入力確
率計算回路に発生させ、パラメータ学習部１１によって
再学習を行なわせるようになっている。FIG. 2 is a configuration diagram of a second embodiment of the learning machine according to the present invention. In FIG. 2, the same parts as those in FIG. 1 are denoted by the same reference numerals. In this second embodiment,
A learning data number storage unit 13 for storing the number of learning data used at the time of the initial learning is further provided, and the re-learning unit 12 outputs the teacher signal taught as an output at the time of the initial learning to the learning data number storage unit 13. Are generated for the number of pieces of learning data held in, the input samples corresponding thereto are generated in the input probability calculation circuit, and the parameter learning unit 11 performs re-learning.

【００５４】このような構成では、初期学習時に出力と
して教えた教師信号のうち、学習データ数記憶部１３に
保持された個数分のデータに対応した個数の入力標本が
得られるので、初期学習時の学習データをより良好に近
似した入力標本を得ることができ、この入力標本と新た
に学習したいデータｘ₀との和集合および、それらに対
する望ましい出力の組を再学習データとして、再学習を
行なうことにより、より良好なパラメータｗを得ること
ができる。In such a configuration, among the teacher signals taught as outputs during the initial learning, the number of input samples corresponding to the number of data held in the learning data number storage unit 13 can be obtained. Can be obtained, and a re-training is performed using a union of the input sample and data x ₀ to be newly _trained and a set of desired outputs for the unsampled data as re-training data. Thereby, a better parameter w can be obtained.

【００５５】図３は本発明に係る学習機械の第３の実施
例の構成図である。この第３の実施例では、再学習部１
２で再学習が行なわれた後に、再学習データの個数を記
憶する学習データ数記憶部１４が設けられており、再学
習部１２は、初期学習時に出力として教えた教師信号を
学習データ数記憶部１４に保持された個数だけ発生さ
せ、これらに対する入力標本を入力確率計算回路に発生
させ、パラメータ学習部１１によって再学習を行なわせ
るようになっている。FIG. 3 is a block diagram of a third embodiment of the learning machine according to the present invention. In the third embodiment, the relearning unit 1
2, after the re-learning is performed, a learning data number storage unit 14 for storing the number of re-learning data is provided. The number held by the unit 14 is generated, input samples for these are generated by the input probability calculation circuit, and the parameter learning unit 11 performs re-learning.

【００５６】このような構成では、再学習が終了したと
きに、再学習データの個数を学習データ数記憶部１４に
書き込む。これにより、学習データ数記憶部１４に保持
された値は、今までに学習したデータの総数になる。こ
れにより、再学習部１２では、初期学習時に出力として
教えた教師信号を今までに学習したデータの総数だけ発
生させ、これらに対する入力標本と新たに学習したいデ
ータｘ₀の和集合および、それらに対する望ましい出力
の組を再学習データとして、再学習を行なう。このよう
に、第３の実施例では、一旦学習が行なわれた後のニュ
ーラルネットワークを再学習する際に、今までに学習し
たデータと新たに学習したいデータとの学習頻度の比率
を自動的に決定することができ、より外界のデータの分
布に忠実な学習が可能となる。In such a configuration, when the re-learning is completed, the number of the re-learning data is written in the learning data number storage unit 14. Thus, the value held in the learning data number storage unit 14 becomes the total number of data learned so far. Thereby, the re-learning unit 12 generates the teacher signal taught as an output at the time of the initial learning by the total number of data learned so far, the union of the input sample for these and the data x ₀ to be newly learned, and the Re-learning is performed using a desired set of outputs as re-learning data. As described above, in the third embodiment, when the neural network is re-learned after the learning has been performed, the learning frequency ratio between the data that has been learned so far and the data that is to be newly learned is automatically determined. Can be determined, and learning that is more faithful to the distribution of external data becomes possible.

【００５７】図４は本発明に係る学習機械の第４の実施
例の構成図である。この第４の実施例では、オペレータ
の指示などにより決定された学習強度Ｇを保持する学習
率記憶部１５が設けられており、再学習部１２は、初期
学習時に出力として教えた教師信号を発生するが、この
とき、新たに学習したいデータｘ₀と発生する教師信号
との比率がＧ：１になるように教師信号の発生個数を決
定し、これら教師信号に対する入力標本を入力確率計算
回路に発生させ、パラメータ学習部１１によって再学習
を行なわせるようになっている。FIG. 4 is a block diagram of a fourth embodiment of the learning machine according to the present invention. In the fourth embodiment, a learning rate storage unit 15 for holding a learning intensity G determined by an operator's instruction or the like is provided, and the re-learning unit 12 generates a teacher signal taught as an output during initial learning. However, at this time, the number of teacher signals to be generated is determined so that the ratio between the data x ₀ to be newly learned and the generated teacher signal is G: 1, and input samples for these teacher signals are input to the input probability calculation circuit. The parameter is generated, and re-learning is performed by the parameter learning unit 11.

【００５８】このような構成では、一旦学習が行なわれ
た後のニューラルネットワークを再学習する際に、初期
の学習データと新たに学習したいデータとの学習頻度の
比率をオペレータなどが任意に決定することができ、特
定のデータ，すなわち新たに学習したいデータだけを強
く学習することが可能となる。In such a configuration, when re-learning the neural network once the learning has been performed, the operator or the like arbitrarily determines the ratio of the learning frequency between the initial learning data and the data to be newly learned. Thus, it is possible to strongly learn only specific data, that is, only new data to be learned.

【００５９】[0059]

【発明の効果】以上に説明したように、請求項１記載の
発明によれば、再学習を行なうときに、再学習手段は、
入力確率計算手段に出力を与えてパラメータ記憶手段に
保持されているパラメータに従って入力の確率分布を計
算させるようになっており、この場合、入力確率計算手
段を用いて発生させた入力標本は、初期の学習データを
良好に近似したものとなるので、この入力標本と新たに
学習したいデータとの和集合を用いて再学習を行なうこ
とにより、初期の学習データを常時保持することなく、
かつ他のデータに対して悪影響を及ぼすことの少ない再
学習を行なうことができる。また、請求項２記載のよう
に、出力確率計算手段および入力確率計算手段が、可逆
構成の１つのニューラルネットワークによって実現され
ていることにより、簡単な構成の下で、良好な入力標本
を容易に得ることができる。As described above, according to the first aspect of the present invention, when performing re-learning, the re-learning means includes:
An output is provided to the input probability calculation means to calculate an input probability distribution according to the parameters held in the parameter storage means. In this case, the input sample generated by the input probability calculation means is initially initialized. Since the learning data of is a good approximation, by performing re-learning using the union of this input sample and the data to be newly learned, the initial learning data is not always held.
Re-learning that does not adversely affect other data can be performed. Further, since the output probability calculating means and the input probability calculating means are realized by one neural network having a reversible configuration, a good input sample can be easily obtained with a simple configuration. Obtainable.

【００６０】また、請求項３記載の発明によれば、初期
の学習データの個数を保持する学習データ数記憶手段を
さらに有し、再学習手段は、入力確率計算手段で計算さ
れた確率分布に従う入力標本のうち、学習データ数記憶
手段に保持されている個数と同数の標本を抽出し、該標
本と新たに学習したいデータとの和集合およびそれらに
対する望ましい出力の組を再学習データとして用いてパ
ラメータ学習手段によりパラメータの再学習を行なわせ
るようにしているので、初期学習時の学習データをより
良好に近似した入力標本を得ることができ、この入力標
本を用いてより良好な再学習を行なうことができる。According to the third aspect of the present invention, there is further provided a learning data number storing means for holding the number of initial learning data, wherein the re-learning means follows the probability distribution calculated by the input probability calculating means. Among the input samples, the same number of samples as the number held in the learning data number storage means are extracted, and a union of the sample and data to be newly learned and a set of desired outputs for them are used as relearning data. Since parameter re-learning is performed by the parameter learning means, an input sample that better approximates the learning data at the time of initial learning can be obtained, and better re-learning is performed using this input sample. be able to.

【００６１】また、請求項４記載の発明によれば、再学
習手段で再学習が行なわれた後に、用いられた再学習デ
ータの個数を保持する学習データ数記憶手段をさらに有
し、再学習手段は、入力確率計算手段で計算された確率
分布に従う入力標本のうち、学習データ数記憶手段に保
持されている個数と同数の標本を抽出し、該標本と新た
に学習したいデータとの和集合およびそれらに対する望
ましい出力の組を再学習データとして用いてパラメータ
学習手段によりパラメータの再学習を行なわせるように
しているので、再学習の際に、初期の学習デ−タと新た
に学習したいデ−タとの学習頻度の比率を自動的に決定
することができ、より外界のデ−タの分布に忠実な学習
が可能となる。According to the fourth aspect of the present invention, after the re-learning is performed by the re-learning means, there is further provided a learning data number storage means for holding the number of used re-learning data. Means for extracting, from among input samples according to the probability distribution calculated by the input probability calculation means, as many samples as the number held in the learning data number storage means, and a union of the samples and data to be newly learned In addition, the parameter learning means is used to re-learn the parameters by using a set of desired outputs and re-learning data, so that when re-learning, the initial learning data and the data to be newly learned are obtained. The ratio of the learning frequency with the data can be automatically determined, and learning can be performed more faithfully in the distribution of external data.

【００６２】また、請求項５記載の発明によれば、再学
習データの学習強度を表わす学習率記憶手段をさらに有
し、再学習手段は、入力確率計算手段で計算された確率
分布に従う入力標本の個数と新たに学習したいデータの
個数との比が、学習率記憶手段に保持されている値に従
って決定された比率に一致するように再学習デ−タを決
定し該再学習データを用いてパラメータ学習手段により
パラメータの再学習を行なわせるようにしているので、
再学習の際に、初期の学習デ−タと新たに学習したいデ
−タとの学習頻度の比率をオペレ−タなどが任意に決定
することができ、これによって、特定のデ−タだけを強
く学習することが可能となる。According to the fifth aspect of the present invention, there is further provided a learning rate storing means for indicating a learning intensity of the relearning data, wherein the relearning means has an input sample according to the probability distribution calculated by the input probability calculating means. The re-learning data is determined by using the re-learning data so that the ratio between the number of data and the number of data to be newly learned matches the ratio determined according to the value held in the learning rate storage means. Since the parameters are re-learned by the parameter learning means,
At the time of re-learning, the operator or the like can arbitrarily determine the ratio of the learning frequency between the initial learning data and the data to be newly learned, whereby only specific data can be determined. It becomes possible to learn strongly.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明に係る学習機械の第１の実施例の構成図
である。FIG. 1 is a configuration diagram of a first embodiment of a learning machine according to the present invention.

【図２】本発明に係る学習機械の第２の実施例の構成図
である。FIG. 2 is a configuration diagram of a second embodiment of the learning machine according to the present invention.

【図３】本発明に係る学習機械の第３の実施例の構成図
である。FIG. 3 is a configuration diagram of a third embodiment of the learning machine according to the present invention.

【図４】本発明に係る学習機械の第４の実施例の構成図
である。FIG. 4 is a configuration diagram of a fourth embodiment of the learning machine according to the present invention.

[Explanation of symbols]

１入力素子２第１の中間層３正規化細胞４第２の中間層５出力素子１０パラメ−タ記憶部１１パラメ−タ学習部１２再学習部１３，１４学習データ数記憶部１５学習率記憶部ＮＷニュ−ラルネットワ−ク Reference Signs List 1 input element 2 first intermediate layer 3 normalized cell 4 second intermediate layer 5 output element 10 parameter storage unit 11 parameter learning unit 12 relearning unit 13, 14 learning data number storage unit 15 learning rate storage Section NW neural network

───────────────────────────────────────────────────── フロントページの続き (56)参考文献渡辺澄夫、福水健次，「ニューラルネットワークの統一理論と新しいモデルの提案」，電子情報通信学会技術研究報告，日本，社団法人電子情報通信学会・発行，1992年３月18日，Ｖｏｌ．91, Ｎｏ．529（ＮＣ91−98〜131），ｐｐ. 179−186 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06N 1/00 - 7/08 G06F 9/44 G06F 19/00 G06K 9/00 - 9/82 G10L 15/00 - 17/00 G06T 7/00 - 7/60 ＪＳＴファイル（ＪＯＩＳ) ＣＳＤＢ（日本国特許庁)──────────────────────────────────────────────────続き Continuation of the front page (56) References Sumio Watanabe and Kenji Fukumizu, "Unified Theory of Neural Networks and Proposal of New Model", IEICE Technical Report, Japan, The Institute of Electronics, Information and Communication Engineers, Japan -Published, March 18, 1992, Vol. 91, No. 529 (NC91-98 to 131), pp. 179-186 (58) Fields investigated (Int. Cl. ⁷ , DB name) G06N 1/00-7/08 G06F 9/44 G06F 19/00 G06K 9/00 -9/82 G10L 15/00-17/00 G06T 7/00-7/60 JST file (JOIS) CSDB (Japan Patent Office)

Claims

(57) [Claims]

1. Parameter storage means for storing a parameter representing a probability distribution in an input / output space, and output probability calculating an output probability distribution in accordance with parameters stored in the parameter storage means when an input is given. A calculating means, a parameter learning means for learning a parameter using a set of an input and a desired output corresponding thereto as learning data, and setting the parameter in the parameter storage means; Input probability calculation means for calculating the probability distribution of the input according to the parameters which are present, a union of the input sample according to the probability distribution calculated by the input probability calculation means and the data to be newly learned, and a desired output set for them. Using the parameter learning means to re-learn the parameters using as learning data, A re-learning means for resetting re-learned parameters in the parameter storage means.

2. The learning machine according to claim 1, wherein said output probability calculating means and said input probability calculating means are realized by one neural network having a reversible configuration.

3. The learning machine according to claim 1, further comprising learning data number storage means for holding the number of initial learning data, wherein said re-learning means follows the probability distribution calculated by the input probability calculation means. Among the input samples, the same number of samples as the number held in the learning data number storage unit are extracted, and a union of the sample and data to be newly learned and a set of desired outputs corresponding thereto are used as relearning data. A parameter learning means for causing the parameter learning means to relearn parameters.

4. The learning machine according to claim 1, further comprising learning data number storage means for holding the number of used relearning data after the relearning is performed by said relearning means, The learning means extracts, from among input samples according to the probability distribution calculated by the input probability calculation means, as many samples as the number held in the learning data number storage means,
A learning machine characterized in that the parameter learning means performs parameter re-learning by using a union of the sample and data to be newly learned and a desired output set for the union set as re-learning data. .

5. The learning machine according to claim 1, further comprising a learning rate storage unit that represents a learning intensity of the relearning data, wherein the relearning unit is configured to input samples according to the probability distribution calculated by the input probability calculating unit. The re-learning data is determined by using the re-learning data so that the ratio between the number of data and the number of data to be newly learned matches the ratio determined according to the value held in the learning rate storage means. A learning machine characterized in that the parameter learning means re-learns parameters.