JPH03105662A

JPH03105662A - Self-learning processing system for learning device

Info

Publication number: JPH03105662A
Application number: JP1244409A
Authority: JP
Inventors: Tamami Sugasaka; 菅坂　玉美; Minoru Sekiguchi; 実関口; Kazushige Saga; 一繁佐賀; Shigemi Osada; 茂美長田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1989-09-20
Filing date: 1989-09-20
Publication date: 1991-05-02

Abstract

PURPOSE:To constitute the system so that the learning device itself can generate a teacher pattern by providing an evaluating part for evaluating input and output patterns conforming to the purpose or not, based on a rule determined in advance, with regard to the correspondence of an output pattern to which a noise is added and an input pattern at that time. CONSTITUTION:In an input part 1, an input pattern (a) is generated from external information, and in a learning device 2, the input pattern (a) is processed, and to its result, a noise is added from a noise part 3 and it becomes an output pattern (b). In an input/output pattern corresponding part 5, by coordinating the input pattern (a) and the output pattern (b), an input/output pattern (c) is generated, and in an evaluating part 6, the input/output pattern (c) is evaluated, and a pattern held in a teacher pattern table part 7 and an unnecessary pattern are selected. At the time of learning, the input/output pattern held in the teacher pattern table part 7 is learned as teacher pattern (d) by the learning device 2. In such a way, it is unnecessary to prepare the teacher pattern in advance.

Description

【発明の詳細な説明】〔概　要〕提示された教師パターンにもとづいて，入力された入力
パターンを処理する学習装置を有する学習処理装置にお
いて，自己学習を行わせるようにした学習装置のための
自己学習処理方式に関し，学習装置に対して合目的的な
入出力パターンを自動的に抽出せしめるようにすること
を目的とし，学習装置からの出力にノイズを附加せしめ
た出力パターンと，そのときの入力パターンとの対応に
ついて，予め定めたルールにもとづいて合目的的な入出
力パターンか否かを評価する評価部をもうけ，合目的的
な人出力パターンを教師パターンとして保持させるよう
構戒する．〔産業上の利用分野〕本発明は提示された教師パターンにもとづいて，入力さ
れた入力パターンを処理する学習装置を有する学習処理
装置において，自己学習を行わせるようにした学習装置
のための自己学習処理方式に関する．近年．教師パターンを提示することにより学習する装置
（ニューラル・ネットワーク等）が，アルファベットフ
ォント認識や画像認識などのパターン認識や，適応フィ
ルタや．ロボットの各種制御などに応用されるようにな
った．〔従来の技術〕従来の教師パターンを必要とする学習装置における教師
パターンは．人間が予め用意した教師パターンを用いて
学習を行うようにされている．〔発明が解決しようとす
る課題〕ところが，実際のアブリケーシジンの場合，時系列を扱
ったパターン．教師パターン自体が変化するパターン．
予測不能な状態に対するパターンなどがあるため，教師
パターンとして用意するパターンの種類や量を決定する
ことが難しい．従って，教師パターンを作成するのに非
常に時間を要する．実際のアブリケーシッンで学習装置
を実用化するためには，学習装置自身が教師パターンを
作戒するようなアルゴリズムが求められる．本発明は．
学習装置に対して合目的的な入出力パターンを自動的に
抽出せしめるようにすることを目的とする．〔課題を解決するための手段〕第１図は，本発明の原理説明図を示す．図中．■は入力
部であって外部環境から検出された刺戟にもとづいて入
力パターン（ａ）（一般には複数ビットの情報）を作戒
する．■は学習装置であって，教師パターン（ｄ）を取
り込んでおり．当該教師パターン（ｂ）にもとづいて入
力パターン（ａ）を処理し．教師パターン（ｂ）に沿う
出力を発するような処理を行う。■はノイズ部であって
，学習装置■の出力に対して，いわばランダムなノイズ
を附加して学習装置■の出力自体よりもより合目的的な
出力パターンが得られることを期待するものである．■
は出力部であって．例えば図示しないモータを駆動する
などの動作を行う．■は本発明においてもうけられた入
出力パターン対応部であって，入力パターン（ａ）とそ
れに対応する出力パターン（ｂ）との対応をとり，入出
力パターン（ｃ）を出力する．■は本発明においてもう
けられた評価部であって．多数の入出力パターン（ｃ）
の夫々について，直前の入出力パターンと現人出力パタ
ーンとを保持せしめ，両者パターンの対比にもとづいて
．現入出力パターンが上記学習装置［２］にとって合目
的的なものであるか否かを評価し，当該合目的的な入出
力パターンを選択的に抽出する．■は教師パターン・テ
ーブル部であって．入出力パターン対応部■によって得
られた多数の入出力パターン（ｃ）を一時的に蓄積した
上で，上記評価部■による参照を許し，上記選択的に抽
出された合目的的な入出力パターンを保持してゆき．か
つ，当該合目的的な人出力パターンを学習装置［２］に
対して．次回のトライにおける教師パターン（ｄ）とし
て供給する．〔作　用〕入力部ので外部情報より入力パターン（ａｌを作戒する
．この入力パターン（ａ）は．学習装置■と入出力パタ
ーン対応部■とに送られる．学習装置［２］において入
力パターン（ａ）が処理され．その結果にノイズ部■か
らノイズが例えば加算され出力パターン（ハ）となる．
出力部■では出力パターン（ｂ）から外部出力を作戒す
る．一方，入出力パターン対応部■では．入力パターン
（ａ）と出力パターン（ｂ）とを対応づけ入出力パター
ン（ｃ）を作戒する．評価部■において入出力パターン
（ｃ）を評価し，教師パターンテーブル部［７］に保持
するパターンと不要なパターンとを選別する．学習時に
は．教師パターンテーブル部［７］に保持された入出力
パターンを教師パタ一ン（ｂ）として学習装置［２］に
学習させる．〔実施例〕具体例を用いて説明する．第２図はロボットの例を示し
，視覚センサ入力＠を持つロボット■とターゲット＠と
を想定して．ターゲット＠に近づくという行動をロボッ
ト■に学習させた例を説明する．ロボットは本発明であ
る自己学習処理方式を用いて自己学習する機能をもつ．ロボット■は１１個の視覚センサ入力＠を持・つｉ例え
ば，ターゲットを捕らえた視覚センサは「ｌ」，それ以
外は「０』とする．従って第２図における視覚センサ入
力＠はｒｏ００００１１１０００Ｊとなる．この視覚セ
ンサ入力■が外部情報として入力部■に送られる．入力
部■では，送られた視覚センサ入力＠より入力パターン
（ａ）を作戒する．この入力パターン（ａ）を学習装置
■と入出力パターン対応部■とに送る．学習装直■では
入力パターン（ａ）を処理する．学習装置■の結果にノ
イズ部■で作成されたノイズを加算し．最終的な出力パ
タ一ン（ｂ）とする．ノイズ部■では一様乱数を用いて
ノイズを作成する．もちろん．ガウス分布等のノイズを
用いてもよい．出力部■では．出力パターン（ｂ）から
進行方向の角度０（第２図）を計算し．外部出力として
モータへ送る．その結果ロボットは移動■（第２図）す
る．このＩＩＩの移動を！ステップとする．一方，入出
力パターン対応部［Ｆ］では．出力パターン（ｂ）を入
力パターン（ｂ）と対応づけ入出力パターン（ｃ）を作
戒する．この入出力パターン（ｃ）は教師パターンテー
ブル部［７］に一旦蓄えられる．評価部■では，後述す
る評価方法に従って入出力パターンを評価する．評価部
によって選択された合目的的な人出力パターンは教師パ
ターンテーブル部［７］に保持される．第２図においては図示を省略しているが．ロボットの移
動範囲に制限を与えるためにロボットの周囲に壁を想定
して当該壁に達すると１トライが終了したとする．また
ロボットが自由に動き得る最大移動時間を設け．当該最
大移動時間に達したときにもＩトライとする．即ち．学
習する条件として．　（１）ターゲットを捕らえるか，
（ｉｉ）周囲の壁に当たるか，　　（ｉｉ）最大移動時
間に達するかを満たすまで以上を繰り返す．その条件を
満たすまでを１トライとする．１トライ終了した後．教
師パターンテーブル部［７］に保持された入出力パター
ンを教師パターン（ｂ）として次回のトライのために学
習を行う．学習終了後には例えば別の位置にロボットと
ターゲットとを置き直して以上を繰り返す．第３図は評価方法を説明する説明図である．図において
は．１トライの一連の行動軌跡［株］と１トライの一連
の入出力パターン［株］■［相］およびその評価値［相
］とが示されている．第３図において，過去入力［相］
は１つ前の視覚センサ入力．現在人力＠は現在の視覚セ
ンサ人力．出力［株］は最終的な出力パターン．評価値
［相］は評価部で評価した結果である．評価部■では，
１つ前の入出力パターン（ｆ）（第３図）を蓄えておき
，現在の入出力パターン（ｅ）（第３図）とから以下（
Ａ）　（Ｂ）の条件に従って入出力パターンを評価する
．評価は，Ｏ，Ｘ（強化信号と呼ぶ）を用いて行う．以
下の条件（Ａ）　（Ｂ）を満たす入出力パターンを○と
評価する．評価結果で０と評価された入出力パターンを
教師パターンテーブル部■で保持する．評価値［相］は
条件（＾）を満たせば「１」．条件ＣＢ）を満たせば「
一ｌ」とし，どちらも満たさなければ『０」とする．（＾）入出力パターン（第３図ブロックＡにおける（ｆ
））の如く現在人力［７］に何らかの入力があり．入出
力パターン（第３図ブロックＡにおける（ｅ））の如く
現在人力＠が入出力パターン（第３図ブロックＡにおけ
る（ｆ））の現在人力＠より大きく反応した時，例えば
第３図のブロックＡにおいて，人出力パターンＡ　（ｆ
）の現在人力＠に「１』が１つあるが．入出力パターン
Ａ　（ｅ）の現在人力＠にｌ’ｌ」が３つあるように入
出力パターンＡ（ｆ）の現在人力■よりも「１ノの数が
多くなる時には，これはターゲットをより大きく捕らえ
た．つまり近づいたことを意味する．（Ｂ）入出力パタ
ーン（第３図ブロックＢにおける（０）の如く過去入力
［相］に何らかの入力があり．入出力パターン（第３図
ブロックＢにおける（『））の如く現在人力［７］に何
も入力がなく．入出力パター・ン（第３図ブロックＢに
おける（ｅ））の如く現在人力＠に何らかの入力があっ
た時，例えば第３図のブロックＢにおいて，人出力パタ
ーンＢ　（ｆ）の過去入力［相］に「１」が１つあり，
入出力パターンＢ　（ｆ）の現在人力■は全て「０」で
．入出力パターンＢ　（ｅ）の現在人力［７］にｒ１，
が１つある時には．これは，ターゲットを一瞬見失った
後にもとに戻ることを意味する．第３図・図示の行動軌
跡［相］が得られた間に上記の如き評価が行われる．第４図は本発明にもとづいてロボットが学習によりター
ゲットを捕まえるようになる経過を示す．第４図は．未
学習状態の行動軌跡０．１トライ目の学習後の行動軌跡
０．２トライ目の学習後の行動軌跡０，３トライ目の学
習後の行動軌跡［相］を示している．第４図から，学習
を重ねるにつれターゲットにより早く近づいていき．最
終的には最短距離に近い経路で行動できるようになった
ことがわかる．第５図は，未学習状態から１トライ行動
した軌跡＠とその際の入出力パターン＠＠＠および評価
値［相］を示す．１｝ライ目の教師パターンは１０種類
しか作成されていないが，それなりに教師パターンを得
ることができている．〔発明の効果〕以上に説明したように．本発明によれば．装置は実行と
対応して教師パターンを作成し．その教師パターンを用
いて学習することが出来る．従って．予め教師パターン
を用意する必要がなく非常に手間が省ける．人間は装置
の動作だけを監視していればよく，細かい内部構造まで
知る必要がない。[Detailed Description of the Invention] [Summary] A learning processing device having a learning device that processes an inputted input pattern based on a presented teacher pattern, in which self-learning is performed. Regarding the self-learning processing method, the purpose is to make the learning device automatically extract a purposeful input/output pattern, and the output pattern with noise added to the output from the learning device and the Regarding correspondence with input patterns, we have an evaluation section that evaluates whether or not it is a purposeful input/output pattern based on predetermined rules, and we take care to maintain purposeful human output patterns as teacher patterns. [Industrial Application Field] The present invention is directed to a learning processing device having a learning device that processes an input pattern input based on a presented teacher pattern. Concerning learning processing methods. recent years. Devices (neural networks, etc.) that learn by presenting teacher patterns can be used for pattern recognition such as alphabet font recognition and image recognition, adaptive filters, etc. It has come to be applied to various types of robot control. [Prior Art] The conventional teacher pattern in a learning device that requires a teacher pattern is as follows. Learning is performed using teacher patterns prepared in advance by humans. [Problem to be solved by the invention] However, in the case of actual applicability, patterns that deal with time series. A pattern in which the teacher pattern itself changes.
Since there are patterns for unpredictable states, it is difficult to determine the type and amount of patterns to prepare as teacher patterns. Therefore, it takes a lot of time to create a teacher pattern. In order to put the learning device into practical use in actual abduction, an algorithm is required in which the learning device itself disciplines the teacher pattern. The present invention is.
The purpose is to enable the learning device to automatically extract purposeful input/output patterns. [Means for solving the problem] Figure 1 shows a diagram explaining the principle of the present invention. In the figure. ■ is an input section that adjusts the input pattern (a) (generally multiple bits of information) based on stimuli detected from the external environment. ■ is a learning device that incorporates the teacher pattern (d). Process the input pattern (a) based on the teacher pattern (b). Processing is performed to generate an output in accordance with the teacher pattern (b). ■ is the noise part, which is expected to add random noise to the output of the learning device ■ to obtain an output pattern that is more purposeful than the output of the learning device ■ itself. ．． ■
is the output part. For example, it performs operations such as driving a motor (not shown). 3 is an input/output pattern correspondence unit provided in the present invention, which takes correspondence between an input pattern (a) and its corresponding output pattern (b), and outputs an input/output pattern (c). ■ is an evaluation section created in the present invention. Many input/output patterns (c)
For each, the previous input/output pattern and the current person's output pattern are retained, and based on the comparison of both patterns. Evaluate whether the current input/output pattern is appropriate for the learning device [2], and selectively extract the appropriate input/output pattern. ■ is the teacher pattern table section. After temporarily accumulating a large number of input/output patterns (c) obtained by the input/output pattern correspondence section (■), the above-mentioned evaluation section (■) is allowed to refer to the input/output patterns (c), and the purposeful input/output patterns selectively extracted as described above are created. Keep it. And, send the relevant human output pattern to the learning device [2]. This will be supplied as the teacher pattern (d) for the next try. [Function] The input section generates an input pattern (al) from external information. This input pattern (a) is sent to the learning device ■ and the input/output pattern correspondence section ■. In the learning device [2], the input pattern (a) is processed. For example, noise is added from the noise section (2) to the result, resulting in an output pattern (c).
The output section ■ controls the external output according to output pattern (b). On the other hand, in the input/output pattern correspondence section■. The input pattern (a) and the output pattern (b) are associated with each other, and the input/output pattern (c) is determined. The evaluation section (■) evaluates the input/output pattern (c) and selects patterns to be retained in the teacher pattern table section [7] and unnecessary patterns. When learning. The input/output pattern held in the teacher pattern table section [7] is made to be learned by the learning device [2] as a teacher pattern (b). [Example] This will be explained using a specific example. Figure 2 shows an example of a robot, assuming a robot ■ with visual sensor input @ and a target @. We will explain an example in which a robot ■ learns the behavior of approaching a target @. The robot has a self-learning function using the self-learning processing method of the present invention. The robot ■ has 11 visual sensor inputs.For example, the visual sensor that captured the target is "l", and the others are "0".Therefore, the visual sensor input in Figure 2 is ro0000111000J. ．This visual sensor input ■ is sent as external information to the input unit ■.The input unit ■ generates an input pattern (a) from the sent visual sensor input @.This input pattern (a) is sent to the learning device ■ and the input/output pattern correspondence section ■.The learning device ■ processes the input pattern (a).The noise created by the noise section ■ is added to the result of the learning device ■.The final output pattern (b). In the noise part ■, create noise using uniform random numbers. Of course, noise such as Gaussian distribution may also be used. In the output part ■, the angle in the traveling direction from the output pattern (b) is 0. (Fig. 2) is calculated and sent to the motor as an external output.As a result, the robot moves (Fig. 2).This movement of III is called a step.On the other hand, in the input/output pattern correspondence section [F] .The output pattern (b) is matched with the input pattern (b) and the input/output pattern (c) is controlled.This input/output pattern (c) is temporarily stored in the teacher pattern table section [7].In the evaluation section ■ , the input/output pattern is evaluated according to the evaluation method described later.The purposeful human output pattern selected by the evaluation section is held in the teacher pattern table section [7].It is omitted from illustration in Fig. 2. In order to limit the movement range of the robot, we assume that there is a wall around the robot, and one try ends when it reaches the wall.We also set the maximum movement time during which the robot can move freely. When the time is reached, it is also regarded as an I try.In other words, as a condition for learning.(1) Either the target is captured or
Repeat the above until either (ii) the robot hits the surrounding wall or (ii) the maximum travel time is reached. One try is until the condition is met. After completing one try. Learning is performed for the next trial using the input/output pattern held in the teacher pattern table section [7] as the teacher pattern (b). After learning, for example, place the robot and target in another position and repeat the above process. Figure 3 is an explanatory diagram explaining the evaluation method. In the figure. A series of behavioral trajectories [stock] for one try, a series of input/output patterns [stock] ■ [phase] for one try, and their evaluation values [phase] are shown. In Figure 3, past input [phase]
is the previous visual sensor input. Current human power @ is the current visual sensor human power. Output [stock] is the final output pattern. The evaluation value [phase] is the result of evaluation by the evaluation section. In the evaluation department,
The previous input/output pattern (f) (Figure 3) is stored, and the following (
A) Evaluate the input/output pattern according to the conditions in (B). Evaluation is performed using O and X (called reinforcement signals). An input/output pattern that satisfies the following conditions (A) and (B) is evaluated as ○. The input/output patterns evaluated as 0 in the evaluation results are held in the teacher pattern table section ■. The evaluation value [phase] is "1" if the condition (^) is satisfied. If condition CB) is satisfied, “
If neither is satisfied, set it as ``0''. (^) Input/output pattern ((f in block A in Figure 3)
)) There is currently some kind of input to human power [7]. When the current human power @ in the input/output pattern ((e) in block A in Figure 3) reacts more than the current human power in the input/output pattern ((f) in block A in Figure 3), for example, the block in Figure 3 In A, human output pattern A (f
There is one "1" in the current human power @ of ).As there are three l'l's in the current human power @ of input/output pattern A (e), it is better than the current human power ■ of input/output pattern A (f). "When the number of 1's increases, this means that the target has been captured larger. In other words, it has gotten closer. (B) Input/output pattern (past input [phase] like (0) in block B of Figure 3) There is some input to the input/output pattern ((') in block B in Figure 3), and there is currently no input to human power [7].Input/output pattern ((e) in block B in Figure 3) When there is some input to the current human power @, for example, in block B of Figure 3, there is one "1" in the past input [phase] of the human output pattern B (f),
The current human power ■ in input/output pattern B (f) is all "0". r1 to the current human power [7] of input/output pattern B (e),
When there is one. This means that it loses sight of the target for a moment and then returns to its original position. The above evaluation is performed while the behavior trajectory [phase] shown in Figure 3 is obtained. Figure 4 shows the progress in which the robot learns to catch the target based on the present invention. Figure 4 is. Behavior trajectory in unlearned state 0. Behavior trajectory after learning on the 1st try. Behavior trajectory after learning on the 2nd try. Behavior trajectory after learning on the 0th and 3rd trials [phase]. From Figure 4, as the learning progresses, the robot approaches the target faster. It can be seen that in the end, the user was able to take the route closest to the shortest distance. Figure 5 shows the trajectory @ of one try action from the unlearned state, the input/output pattern @ @ and the evaluation value [phase] at that time. 1} Although only 10 types of teacher patterns for lie eyes have been created, we have been able to obtain a reasonable number of teacher patterns. [Effects of the invention] As explained above. According to the invention. The device creates a teacher pattern corresponding to the execution. It is possible to learn using the teacher pattern. Therefore. There is no need to prepare a teacher pattern in advance, which saves a lot of time and effort. Humans only need to monitor the operation of the device; there is no need to know the detailed internal structure.

なお今回は時系列を扱ったパターンを学習させたが，教
師パターン自体が変化するパターン．予測不能な状態に
対するパターンなどについても学習可能なことは言うま
でもない．もちろん．既知のパターンを用意しておくこ
とも可能である．また今回挙げた実施例では，１１個の
視覚センサに対する学習例について述べたが，他の各種
センサを取りつけることも可能である．Note that this time we learned a pattern that deals with time series, but the teacher pattern itself changes. It goes without saying that it is also possible to learn patterns for unpredictable situations. of course. It is also possible to prepare a known pattern. In addition, in the example given this time, a learning example was described for 11 visual sensors, but it is also possible to attach various other sensors.

[Brief explanation of drawings]

第１図は本発明の原理説明図，第２図はロボットの例．
第３図ないし第５図は説明図である．図中， ■二入力部． ■：学習装置， ■：ノイズ部， ■：出力部， ■：入出力パターン対応部， ■：評価部， ■：教師パターンテーブル部，（ａ）：入カパターン， Φ）：出力パターン．（ｃ）：入出力パターン，＠：教師パターン．Figure 1 is an explanatory diagram of the principle of the present invention, and Figure 2 is an example of a robot.
Figures 3 to 5 are explanatory diagrams. In the figure, ■Second input section. ■: Learning device, ■: Noise section, ■: Output section, ■: Input/output pattern correspondence section, ■: Evaluation section, ■: Teacher pattern table section, (a): Input pattern, Φ): Output pattern. (c): Input/output pattern, @: Teacher pattern.

Claims

[Claims] Input unit [1] that creates an input pattern (a) from external information
], a learning device [2], a learning device [2], which processes an input pattern (a) based on a presented teacher pattern;
2], a learning processing device comprising a noise unit [3] that creates noise to be added to the processing result, and an output unit [4] that receives the output pattern (b) that is the final result,
An input/output pattern correspondence unit [5] that matches the input pattern (a) and the output pattern (b) to create an input/output pattern (c), an evaluation unit [6] that evaluates the input/output pattern (c), and an evaluation unit It is equipped with a teacher pattern table section [7] that stores the input/output patterns sent from [6], and the evaluation section evaluates the correspondence of the input/output patterns (c).
The input/output pattern is held in the teacher pattern table section [7] according to the evaluation criteria, and the held input/output pattern is supplied as the teacher pattern (b) to the learning device [2]. A self-learning processing method for learning devices.