JP3827037B2

JP3827037B2 - Learning method and apparatus, robot, and recording medium

Info

Publication number: JP3827037B2
Application number: JP13338197A
Authority: JP
Inventors: 淳谷
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1997-05-23
Filing date: 1997-05-23
Publication date: 2006-09-27
Anticipated expiration: 2017-05-23
Also published as: JPH10326265A

Description

【０００１】
【発明の属する技術分野】
本発明は、学習方法および装置、ロボット、並びに記録媒体に関し、特に、リカレント型ニューラルネットに対して、迅速に学習を行わせることができるようにした、学習方法および装置、ロボット、並びに記録媒体に関する。
【０００２】
【従来の技術】
リカレント型ニューラルネットにより、予測を行うことができる。例えば、図１０に示すように、ロボット１１に、障害物１０Ａの周囲を反時計方向に移動させるルートと、障害物１０Ａと障害物１０Ｂの周囲を反時計方向に移動させるルートを、ロボット１１の有するリカレント型ニューラルネットに記憶させることができる。このような記憶をさせておくと、ロボット１１は、例えば、障害物１０Ａの周囲を移動するとき、ランドマークが、ランドマーク１、ランドマーク２、ランドマーク５の順序で表れることを記憶しており、また、障害物１０Ａと障害物１０Ｂの周囲を移動する場合には、ランドマークが、ランドマーク１、ランドマーク２、ランドマーク３、ランドマーク４、ランドマーク５の順序で表れることを記憶しているので、これらのランドマークを認識しながら、ロボット１１は、障害物１０Ａの周囲、または障害物１０Ａと障害物１０Ｂの周囲を移動することができる。
【０００３】
ところで、例えば、このような学習が既に行われている状態において、ランドマーク１乃至ランドマーク５のうち、ランドマーク４を削除したような場合、ロボット１１のリカレント型ニューラルネットに対して、再び学習を行わせる必要が生じる。
【０００４】
図１１は、このような場合における、従来の学習方法を表している。すなわち、最初にステップＳ３１において、新たな学習データを入力し、これをリカレント型ニューラルネットに学習させる。ステップＳ３２において、学習結果を評価し、充分な評価が得られたか否かを判定し、まだ充分な評価が得られない場合には、学習がまだできていないものと判定し、ステップＳ３１に戻り、再び、新たな学習データを入力し、学習させる処理を実行する。
【０００５】
以上のようにして、ステップＳ３２で、新たな学習データが学習できたと判定されるまで、学習処理が繰り返し実行される。
【０００６】
【発明が解決しようとする課題】
従来の学習方法は、このように、１つのランドマークを除去しただけの変化であるにも拘らず、学習処理を１からすべてやり直すようにしていた。その結果、ロボット１１が、全く学習を行っていない状態から学習させる場合と同様となり、学習に長い時間がかかる課題があった。
【０００７】
本発明はこのような状況に鑑みてなされたものであり、より迅速に、学習を完了することができるようにするものである。
【０００８】
【課題を解決するための手段】
請求項１に記載の学習方法は、ランダムな初期値を入力層に入力し、入力された初期値に基づいて、出力値を生成して出力層から出力するとともに、出力層から出力された出力値を入力層に入力する処理を所定回数繰り返すことによって得られる所定個数の出力値からなる元の学習データによる学習と、新たな学習データによる学習を交互に行う学習ステップを備えることを特徴とする。
請求項２に記載の学習装置は、ランダムな初期値を入力層に入力し、入力された初期値に基づいて、出力値を生成して出力層から出力するとともに、出力層から出力された出力値を入力層に入力する処理を所定回数繰り返すことによって得られる所定個数の出力値からなる元の学習データによる学習と、新たな学習データによる学習を交互に行う学習手段を備えることを特徴とする。
【０００９】
請求項３に記載の記録媒体は、ランダムな初期値を入力層に入力し、入力された初期値に基づいて、出力値を生成して出力層から出力するとともに、出力層から出力された出力値を入力層に入力する処理を所定回数繰り返すことによって得られる所定個数の出力値からなる元の学習データによる学習と、新たな学習データによる学習を交互に行う学習ステップの処理をコンピュータに実行させるためのプログラムが記録されていることを特徴とする。
【００１０】
請求項４に記載のロボットは、ランダムな初期値をリカレント型ニューラルネットの入力層に入力し、入力された初期値に基づいて、出力値を生成して出力層から出力するとともに、出力層から出力された出力値を入力層に入力する処理を所定回数繰り返すことによって得られる所定個数の出力値からなる元の学習データによる学習と、新たな学習データによる学習を交互に行うことによって、移動経路を学習する学習手段と、学習の結果に基づいて出力される経路の方向にロボットを移動させる移動手段とを備えることを特徴とする。
【００１１】
請求項１に記載の学習方法、請求項２に記載の学習装置、および請求項３に記載の記録媒体においては、ランダムな初期値が入力層に入力され、その入力された初期値に基づいて、出力値が生成されて出力層から出力される。また、出力層から出力された出力値を入力層に入力される処理を所定回数繰り返すことによって得られる所定個数の出力値からなる元の学習データによる学習と、新たな学習データによる学習が交互に行われる。
請求項４に記載のロボットにおいては、ランダムな初期値が入力層に入力され、その入力された初期値に基づいて、出力値が生成されて出力層から出力される。また、出力層から出力された出力値を入力層に入力される処理を所定回数繰り返すことによって得られる所定個数の出力値からなる元の学習データによる学習と、新たな学習データによる学習が交互に行われ、移動経路が学習される。また、学習の結果に基づいて出力される経路の方向にロボットが移動するようになされる。
【００１２】
【発明の実施の形態】
以下に本発明の実施の形態を説明するが、特許請求の範囲に記載の発明の各手段と以下の実施の形態との対応関係を明らかにするために、各手段の後の括弧内に、対応する実施の形態（但し一例）を付加して本発明の特徴を記述すると、次のようになる。但し勿論この記載は、各手段を記載したものに限定することを意味するものではない。
【００１４】
図１は、本発明の学習方法を応用したロボットの外観構成を示している。この実施の形態においては、ロボット１１の上部にテレビカメラ１２が取り付けられ、周囲の画像を撮像するようになされている。ロボット１１の下側には、車輪１３が取り付けられ、任意の位置に移動できるようになされている。また、ロボット１１の側面には、ディスプレイ１４が取り付けられ、所定の文字や画像が表示されるようになされている。
【００１５】
図２は、ロボット１１の内部の構成例を示している。テレビカメラ１２は、周囲の映像をカラー画像として取り込み、取り込んだカラー画像データを制御回路２４と量子化回路２５に出力している。量子化回路２５は入力されたカラー画像データを量子化し、ニューラルネット認識装置２３に出力するようになされている。ニューラルネット認識装置２３は、量子化回路２５より入力されたカラー画像データから、後述するランドマークを認識し、認識結果を制御回路２４に出力するようになされている。例えば、マイクロコンピュータなどよりなる制御回路２４は、ニューラルネット認識装置２３に対して、ロボットの移動方向を通知するとともに、ニューラルネット認識装置２３より供給された、次のランドマークの予測結果をＣＲＴ，ＬＣＤなどよりなるディスプレイ１４に出力し、表示させるようになされている。
【００１６】
また、制御回路２４は、モータ２１を駆動し、テレビカメラ１２を所定の方向に指向させるようになされている。さらに制御回路２４は、モータ２２を駆動し、車輪１３を回転して、ロボット１１を所定の位置に移動させるようになされている。
【００１７】
図３は、ロボット１１の移動空間を平面的に表している。この実施の形態においては、障害物１０Ａと障害物１０Ｂの回りに、ランドマーク１乃至ランドマーク５が配置されている。この実施の形態の場合、ロボット１１は、ランドマーク１、ランドマーク２、ランドマーク５の経路で、障害物１０Ａの周囲を反時計方向に移動するか、またはランドマーク１、ランドマーク２、ランドマーク３、ランドマーク４、ランドマーク５の経路で、障害物１０Ａと障害物１０Ｂの周囲を反時計方向に移動するものとする。
【００１８】
この場合、制御回路２４は、図４に示す処理を実行する。最初にステップＳ１において、制御回路２４は、ランドマークが発見されたか否かを判定する。すなわち、テレビカメラ１２は、周囲の画像を撮像し、撮像した結果得られたカラー画像データを量子化回路２５を介してニューラルネット認識装置２３に出力している。ニューラルネット認識装置２３は、後述するように、ランドマークを認識すると、その認識結果を制御回路２４に出力する。制御回路２４は、ニューラルネット認識装置２３の出力をモニタし、ランドマークが発見されたか否かを判定し、まだ発見されていない場合においては、ステップＳ２に進み、モータ２１を駆動して、テレビカメラ１２を所定の方向に回動させたり、モータ２２を駆動して、ロボット１が、図３において障害物１０Ａ，１０Ｂの周囲を反時計方向に移動するように車輪１３を回転させる。このステップＳ１，Ｓ２の処理は、ステップＳ１において、ランドマークが発見されたと判定されるまで繰り返し実行される。
【００１９】
ステップＳ１において、ランドマークが発見されたと判定された場合、ステップＳ３に進み、制御回路２４は、発見したランドマークの方向に障害物が存在するか否かを判定する。すなわち、制御回路２４は、テレビカメラ１２の出力から障害物の有無を判定し、障害物が存在すると判定した場合においては、ステップＳ４に進み、車輪１３を右方向に回転する処理を実行する。すなわち、このとき制御回路２４は、モータ２２を駆動し、車輪１３を右方向（時計方向）に回転させる。
【００２０】
その後、ステップＳ３に戻り、ロボット１１が新たに指向した方向に、障害物が存在するか否かを再び判定する。障害物が存在すると判定された場合、再びステップＳ４に進み、ロボット１１をさらに時計方向に回転する処理が行われる。そして、ステップＳ３において、障害物が存在しないと判定された場合、ステップＳ５に進み、制御回路２４は、ロボット１１をステップＳ１で発見されたランドマークの方向に移動させる処理を実行する。すなわち、このとき制御回路２４は、モータ２２を駆動し、車輪１３を回転させ、ロボット１１をランドマークの方向に移動させる。
【００２１】
次にステップＳ６に進み、制御回路２４は、ランドマークに到達したか否かを、テレビカメラ１２の出力とニューラルネット認識装置２３の出力から判定する。すなわち、制御回路２４は、ニューラルネット認識装置２３よりランドマークが検出されていることを表す信号が入力されているとともに、テレビカメラ１２が充分大きなランドマークの画像を出力しているとき、ランドマークに到達したものと判定する。ランドマークにまだ到達していない場合においては、ステップＳ３に戻り、それ以降の処理を繰り返し実行し、ランドマークに到達したと判定された場合、ステップＳ１に戻り、新たなランドマークを発見し、そのランドマークに向かって、上述した場合と同様の処理が実行される。
【００２２】
以上のようにして、ロボット１１は、障害物１０Ａに衝突しないように、ランドマーク１に向かって走行し、ランドマーク１に到達したら、ランドマーク１からランドマーク２に向かって走行する。ランドマーク２に到達したら、さらにランドマーク５に向かって走行し、ランドマーク５に到達したら、そこからランドマーク１に向かって走行する。
【００２３】
あるいはまた、ロボット１１は、ランドマーク２に到達したとき、ランドマーク５の方向でなく、ランドマーク３の方向に移動し、ランドマーク３に到達したら、そこからランドマーク４に進み、ランドマーク４に到達したら、ランドマーク５に進む。
【００２４】
ロボット１１が、２つのルートのうち、いずれのルートを移動するかは、制御回路２４により予めプログラムすることが可能である。
【００２５】
ここで、ニューラルネット認識装置２３の構成について説明する。図５に示すように、ニューラルネット認識装置２３は、ホップフィールド型ニューラルネットにより構成される相関記憶ネット４１、ウィナーテイクオール（winner-take-all）型ニューラルネット４２、およびリカレント型ニューラルネット４３により基本的に構成されている。
【００２６】
テレビカメラ１２より出力されたカラー画像データは、相関記憶ネット４１に入力される前に、量子化回路２５に入力され、量子化される。すなわち、量子化回路２５は、図６に示すように、色相と彩度からなる空間（テーブル）を有し、この所定の色相と彩度で規定されるカラー画像データのうち、領域Ａ１の範囲に属するカラー画像データは、すべて例えば赤のデータとする。同様に、領域Ａ２に属するカラー画像データは、すべて緑のデータとして量子化し、さらに、領域Ａ３に属するカラー画像データは、すべて青のデータとして量子化する。
【００２７】
なお、ここにおける赤、緑、および青の名称は、便宜的なものに過ぎず、それ以外の名称であってもよい。すなわち、これらの名称は、各領域の単なるコード（量子化データの名称）にすぎない。
【００２８】
色相と彩度により規定される空間上に存在するカラー画像データは、無限に存在するのであるが、これをこの実施の形態の場合、３個の量子化データに量子化する。このように、多くの数のカラー画像データを、充分少ない数の量子化データに量子化することで、ニューラルネットによる物体の学習と認識が可能となる。このように、量子化回路２５によりカラー画像データを量子化した量子化データが、ニューラルネット認識装置２３に供給される。従って、ニューラルネット認識装置２３に入力される量子化データは、図６に示した空間により規定される３つのデータのいずれかにより表されたデータとなる。
【００２９】
図５に示すように、相関記憶ネット４１は、図６に示した量子化ステップの数（この実施の形態の場合３個）に対応する数のフィールドを有している。フィールド４１Ｒは、図６における領域Ａ１の赤の量子化データに対応するフィールドであり、フィールド４１Ｇは、図６の領域Ａ２の緑の量子化データに対応するフィールドであり、そして、フィールド４１Ｂは、図６の領域Ａ３の青の量子化データに対応するフィールドである。量子化回路２５より出力された３つの量子化データにより構成される入力パターンは、相関記憶ネット４１の、それぞれ対応するフィールドのニューロンに入力される（想起される）。
【００３０】
すなわち、各ニューロンの内部の状態をＵとするとき、次式が成立する。
【００３１】
【数１】

【００３２】
ここで、ｉはニューロンの番号を表し、ｔは所定の時刻を表している。従って、Ｕ_i ^t+1は、ｉ番目のニューロンの時刻ｔ＋１におけるニューロンの内部の状態を表している。
【００３３】
ここで、ｋは、ダンパを表す定数であり、αも所定の定数である。
【００３４】
Ｗ_ijは、ｉ番目のニューロンからｊ番目のニューロンに対する結合重み係数を表している。ａ_j ^tは、ｊ番目の時刻ｔにおけるニューロンの出力を表している。この出力は、次式により規定される。
【００３５】
【数２】

【００３６】
ここで、ｌｏｇｉｓｔｉｃ（Ａ）は、Ａに対してシグモイド関数を乗算することを表している。また、Ｔは定数を表している。すなわち、上記式は、ニューロンの内部状態を定数Ｔで割算した結果にシグモイド関数を乗算した結果が、ニューロンの出力となることを意味している。
【００３７】
以上のようにして、想起のダイナミクスが行われるのに対し、学習のダイナミクスは、次の式により表される。
【００３８】
【数３】

【００３９】
上記式における０．５は、閾値として機能する。すなわち、各ニューロンの出力は０乃至１の間の値となるが、０．５より小さい値であるとき、結合重み係数を負にし、０．５より大きい場合、結合重み係数を正にする機能を有している。
【００４０】
ニューラルネットにランドマーク１乃至ランドマーク５を認識の基準となる物体として学習させると、その学習の結果は、この結合重み係数Ｗ_ijとして記憶されることになる。
【００４１】
ウィナーテイクオール型ニューラルネット４２は、少なくとも認識すべきランドマークの数に対応する数のニューロン（この実施の形態の場合、５個のニューロン）を有し、相関記憶ネット４１の各フィールドから所定の入力が行われたとき、５個のニューロンのうち、最も大きな値を出力する１個のニューロンの出力を１．０とし、他の４個のニューロンの出力を０．０とする学習を行わせる。これにより、相関記憶ネット４１の３つのフィールドで規定されるパターンから１つのランドマークが判定され、そのランドマークに対応するニューロンが発火することになる。
【００４２】
このように、ウィナーテイクオール型ニューラルネット４２においては、発火するニューロンが１つだけとなるので、その出力を処理する後段のリカレント型ニューラルネット４３の構成を簡単にすることができる。
【００４３】
リカレント型ニューラルネット４３は、入力層５１、中間層５２、および出力層５３により、基本的に構成されている。入力層５１は、ウィナーテイクオール型ニューラルネット４２に対応する５個のニューロンを有するパターンノード５１Ａ、リカレント型ニューラルネット４３の内部状態を保持するニューロンを有するコンテックスノード５１Ｂ、並びに制御回路２４より次に移動する方向が指令されるニューロンを有する方向ノード５１Ｃとにより構成されている。
【００４４】
出力層５３は、５個のランドマークに対応するニューロンを有するパターンノード５３Ａと、入力層５１におけるコンテックスノード５１Ｂに対応するコンテックスノード５３Ｂを有している。中間層５２の各ニューロンは、入力層５１と出力層５３の各ノードを結合している。また、出力層５３のコンテックスノード５３Ｂのニューロンの出力は、入力層５１のコンテックスノード５１Ｂのニューロンに帰還されている。
【００４５】
リカレント型ニューラルネット４３は、ウィナーテイクオール型ニューラルネット４２から、入力層５１のパターンノード５１Ａに、１つのランドマークに対応する入力がなされると、次に現れるランドマークを予測し、出力層５３から出力する。
【００４６】
ニューラルネット認識装置２３は、量子化回路２５よりカラー画像データが入力されると、図７のフローチャートに示す処理を実行する。
【００４７】
最初にステップＳ１１において、ランドマークが探索されるまで待機する。この実施の形態の場合、ランドマーク１乃至ランドマーク５は、いずれも所定の色で着色されており、ニューラルネット認識装置２３は、カラー画像データが入力されたとき、ステップＳ１１でＹＥＳの判定を行い、ステップＳ１２に進む。
【００４８】
ステップＳ１２においては、ニューラルネット認識装置２３は、いま探索されたランドマーク（現ランドマーク）の認識処理を実行する。この認識処理は、ニューラルネット認識装置２３の相関記憶ネット４１において実行される。
【００４９】
ステップＳ１２の現ランドマークの認識処理が終了したとき、次にステップＳ１３に進み、ウィナーテイクオール型ニューラルネット４２において、ステップＳ１２で得られた認識結果の絞り込み処理を行う。すなわち、５つのランドマークのうちのいずれが認識されたのかを明確にする。そして、ステップＳ１４に進み、現在のランドマークの次に現れるランドマークをリカレント型ニューラルネット４３において予測する処理を行う。予測した結果は、ディスプレイ１４に表示される。以上の処理は、ランドマークが探索されるごとに繰り返し実行される。
【００５０】
いま、ランドマーク１乃至ランドマーク５の認識すべき基準の物体としての画像が、相関記憶ネット４１における結合重み係数として記憶（学習）されたものとする。この状態で、例えば、相関記憶ネット４１に、テレビカメラ１２で撮影され、量子化回路２５で量子化されたランドマーク２のパターンが入力されると、フィールド４１Ｒ，４１Ｇ，４１Ｂには、それぞれランドマーク２の量子化された赤のデータ、緑のデータ、および青のデータが、それぞれ所定の位置に発火する。ウィナーテイクオール型ニューラルネット４２は、各フィールドの発火状態から対応するランドマークを判定し、判定結果に基づいて１つのランドマークに対応するニューロンを発火させる。いまの場合、ランドマーク２に対応するニューロンが発火する。
【００５１】
そこで、リカレント型ニューラルネット４３の入力層５１のパターンノード５１Ａには、ウィナーテイクオール型ニューラルネット４２のニューロンに対応して、ランドマーク２に対応するニューロンが発火する。また、このとき、制御回路２４は、次に進むべき方向は左であるのか右であるのかを判定し、その方向に対応する信号を入力層５１の方向ノード５１Ｃに入力する。図５の実施の形態においては、左方向に対応するニューロンが発火されている。このため、リカレント型ニューラルネット４３は、ランドマーク２の次に到来するランドマークを予測し、その予測結果を出力層５３のパターンノード５３Ａに出力する。図３に示すように、ランドマーク２が検出された状態において、次に移動する方向が左方向である場合においては、次に現れるランドマークは、ランドマーク５となる。従って、この場合、図５に示すように、出力層５３では、ランドマーク５に対応する番号５のニューロンが発火する。
【００５２】
制御回路２４は、ニューラルネット認識装置２３より、次のランドマークを予測するデータの入力を受けたとき、これに対応する番号をディスプレイ１４に出力し、表示させる。いまの場合、例えば、番号５がディスプレイ１４に表示される。これにより、使用者は、次に現れるランドマークがランドマーク５であることを知ることができる。
【００５３】
リカレント型ニューラルネット４３の入力層５１のパターンノード５１Ａにおけるランドマーク２に対応するニューロンが発火した状態において、方向ノード５１Ｃで右方向に対応するニューロンを発火させた場合においては、図３に示すように、ランドマーク２から右方向に移動したとき、次に現れるランドマークはランドマーク３であるので、出力層５３のパターンノード５３Ａにおいては、ランドマーク３に対応する番号３のニューロンが発火することになる。
【００５４】
また、例えばランドマーク４が、ランドマーク１と近似した色彩のランドマークであったとすると、ランドマーク１とランドマーク４のいずれが認識されたのかが不明瞭となる。しかしながら、この実施の形態の場合、リカレント型ニューラルネット４３にコンテックスノードが設けられているため、これにより、状態の遷移も識別される。
【００５５】
すなわち、図３に示すように、ランドマーク４はランドマーク３の次に表れるものであり、ランドマーク１はランドマーク５の次に表れるものである。リカレント型ニューラルネット４３においては、そのコンテックスノード５１Ｂ，５３Ｂにより、現在の状態がどの状態であるのかを識別できるため、直前に認識されたランドマークがランドマーク３である場合においては、次に入力されるランドマークは、ランドマーク１ではなくランドマーク４であることが認識される。同様に、直前に認識されていたランドマークが、ランドマーク５である場合においては、次に予測されるランドマークは、ランドマーク４ではなくランドマーク１であることを認識することができる。
【００５６】
以上のようにして、ロボット１１のリカレント型ニューラルネット４３に、ランドマーク１、ランドマーク２、ランドマーク５の順序でランドマークを検索することで、障害物１０Ａの周囲を反時計方向に移動する経路と、ランドマーク１、ランドマーク２、ランドマーク３、ランドマーク４、ランドマーク５の順序でランドマークを検索することで、障害物１０Ａと障害物１０Ｂの周囲を反時計方向に移動する経路が既に学習されているものとする。このような状態で、例えば、図８に示すように、ランドマーク４が削除されたものとする。従って、このとき、ロボット１１が移動するワークスペースに存在するランドマークは、ランドマーク１、ランドマーク２、ランドマーク３、およびランドマーク５の４個のランドマークとなる。このような、わずかな変更を加えたような場合、リカレント型ニューラルネット４３に対して、図９のフローチャートに示すような学習を行わせる。
【００５７】
すなわち、最初にステップＳ２１において、リカレント型ニューラルネット４３に対して、所定の初期値を入力する。この初期値は、ランダムなものであってよい。初期値が入力されたリカレント型ニューラルネット４３には、各ニューロンに元の学習データに対応する係数が学習されているので、何らかの出力がなされる。
【００５８】
ステップＳ２２においては、リカレント型ニューラルネット４３に、リハーサルにより、元の学習データを想起させる。すなわち、リカレント型ニューラルネット４３において、初期値に基づいて生成された出力を入力に帰還し、帰還された出力に基づいて、新たな出力を想起させる動作を繰り返させる。このようなリハーサル処理を何回か行うと、上述したように、リカレント型ニューラルネット４３のニューロンには、元の学習データの係数が学習されているため、リカレント型ニューラルネット４３に、元の学習データを想起させ、出力させることができる。
【００５９】
次に、ステップＳ２３に進み、リカレント型ニューラルネット４３に対して、新たな学習データを入力し、学習させる。すなわち、ランドマーク１、ランドマーク２、ランドマーク５の順番による移動経路と、ランドマーク１、ランドマーク２、ランドマーク３、ランドマーク５の順番による移動経路を学習させる。
【００６０】
次に、ステップＳ２４において、元の学習データを入力し、学習させる。すなわち、ランドマーク１、ランドマーク２、ランドマーク５の順番による移動経路と、ランドマーク１、ランドマーク２、ランドマーク３、ランドマーク４、ランドマーク５の順番による移動経路を学習させる。この元の学習データは、ステップＳ２２のリハーサル処理により、リカレント型ニューラルネット４３に、自ら想起させたものを用いる。
【００６１】
次に、ステップＳ２５に進み、充分な学習ができたか否かを判定する。まだ、充分な学習ができていないと判定された場合には、ステップＳ２３に戻り、それ以降の処理を繰り返し実行する。
【００６２】
以上のようにして、新たな学習データによる学習と、元の学習データによる学習を加算して（この実施の形態の場合、交互に配置して）、学習させるようにすると、新たな学習データだけで学習させる場合に較べて、短い時間で学習を完了させることができる。
【００６３】
元の学習データは、ロボット１１にメモリを具備させ、そこに記憶させておくことも可能である。しかしながら、そのようにすると、それだけ余分な構成を必要とし、装置が大型化するだけでなく、コスト高となる。従って、そのような方法は、あまり実用的ではない。
【００６４】
なお、図９の処理例において、新たな学習データによる学習と元の学習データによる学習を、１回ずつ交互に行うようにしたが、例えば、２回ずつ、あるいは３回ずつ交互に行わせるようにすることも可能である。但し、例えば、合計で３０００回の学習を行わせるときに、最初に１５００回、新たな学習データで学習させた後、次の１５００回、元の学習データで学習させるようにすると、新たな学習データによる学習結果と、元の学習データによる学習結果の中間の学習結果が得られるようになり、あまり好ましくない。従って、比較的頻繁に、新たな学習データによる学習と、元の学習データによる学習を交替させることが望ましい。比較的頻繁に交替されるので、新たな学習データによる学習と、元の学習データによる学習のいずれを先に行ったとしても、結果にそれほどの差異はない。
【００６５】
但し、例えば、新たな学習データによる学習と、元の学習データによる学習を交互に繰り返した後、次第に元の学習データによる学習より、新たな学習データによる学習の回数を増加させるようにしてもよい。
【００６６】
なお、この学習方法は、リカレント型ニューラルネットをロボットに適用した場合に限らず、さまざまな装置に応用した場合にも適用が可能である。ただし、既に学習されている状態と、新たに学習する状態とが、比較的近似した状態である場合に適用すると、より効果を挙げることができる。
【００６７】
【発明の効果】
以上の如く、請求項１に記載の学習方法、請求項２に記載の学習装置、請求項３に記載の記録媒体、および請求項４に記載のロボットによれば、最初から新たな学習データだけで学習させる場合に較べて、より短い時間で学習を完了することが可能となる。
【図面の簡単な説明】
【図１】本発明の学習方法を応用したロボットの外観構成を示す図である。
【図２】図１の実施の形態の内部の構成を示すブロック図である。
【図３】図１の実施の形態の移動する空間を説明する図である。
【図４】図２の制御回路の動作を説明するフローチャートである。
【図５】図２のニューラルネット認識装置２３の詳細な構成例を示す図である。
【図６】図２の量子化回路２５の動作を説明する図である。
【図７】図２のニューラルネット認識装置２３の動作を説明するフローチャートである。
【図８】図１の実施の形態の移動する他の空間を説明する図である。
【図９】学習方法を説明するフローチャートである。
【図１０】従来のロボットの移動する空間を説明する図である。
【図１１】従来の学習を方法を説明するフローチャートである。
【符号の説明】
１１ロボット，１２テレビカメラ，１３車輪，１４ディスプレイ，２３ニューラルネット認識装置，２４制御回路，２５量子化回路，４１相関記憶ネット，４２ウィナーテイクオール型ニューラルネット，４３リカレント型ニューラルネット[0001]
BACKGROUND OF THE INVENTION
  The present invention relates to a learning method and apparatus,robot,And a learning method and apparatus, particularly for a recurrent neural network, capable of performing learning quickly, with respect to a recording medium,robot,And a recording medium.
[0002]
[Prior art]
Prediction can be performed by a recurrent neural network. For example, as shown in FIG. 10, a route for moving the periphery of the obstacle 10A in the counterclockwise direction to the robot 11 and a route for moving the periphery of the obstacle 10A and the obstacle 10B in the counterclockwise direction are It can be stored in a recurrent neural network. For example, when the robot 11 moves around the obstacle 10 </ b> A, the landmark 11 appears in the order of the landmark 1, the landmark 2, and the landmark 5. In addition, when moving around the obstacle 10A and the obstacle 10B, it is stored that the landmarks appear in the order of the landmark 1, the landmark 2, the landmark 3, the landmark 4, and the landmark 5. Therefore, the robot 11 can move around the obstacle 10A or around the obstacle 10A and the obstacle 10B while recognizing these landmarks.
[0003]
By the way, for example, when the landmark 4 is deleted from the landmarks 1 to 5 in a state where such learning has already been performed, the learning is performed again on the recurrent neural network of the robot 11. Need to be performed.
[0004]
FIG. 11 shows a conventional learning method in such a case. That is, first, in step S31, new learning data is input, and this is learned by the recurrent neural network. In step S32, the learning result is evaluated to determine whether or not sufficient evaluation has been obtained. If sufficient evaluation has not been obtained yet, it is determined that learning has not yet been performed, and the process returns to step S31. Again, new learning data is input and a process of learning is executed.
[0005]
As described above, the learning process is repeatedly executed until it is determined in step S32 that new learning data has been learned.
[0006]
[Problems to be solved by the invention]
In the conventional learning method, the learning process is restarted from 1 even though it is a change in which only one landmark is removed. As a result, it is the same as when the robot 11 learns from a state in which no learning is performed, and there is a problem that it takes a long time to learn.
[0007]
The present invention has been made in view of such a situation, and makes it possible to complete learning more quickly.
[0008]
[Means for Solving the Problems]
  The learning method according to claim 1 comprises:Random initial values are input to the input layer, output values are generated based on the input initial values and output from the output layer, and processing for inputting the output values output from the output layer to the input layer is predetermined. A learning step of alternately performing learning using original learning data consisting of a predetermined number of output values obtained by repeating the number of times and learning using new learning dataIt is characterized by providing.
The learning device according to claim 2 inputs a random initial value to the input layer, generates an output value based on the input initial value, outputs the output value from the output layer, and outputs output from the output layer It is characterized by comprising learning means for alternately performing learning using original learning data consisting of a predetermined number of output values obtained by repeating a process of inputting a value to the input layer a predetermined number of times and learning using new learning data. .
[0009]
  The recording medium according to claim 3 is:Random initial values are input to the input layer, output values are generated based on the input initial values and output from the output layer, and processing for inputting the output values output from the output layer to the input layer is predetermined. For causing a computer to execute a learning step process that alternately performs learning using original learning data consisting of a predetermined number of output values obtained by repeating the number of times and learning using new learning data.The program is recorded.
[0010]
  Claim 4robotIsRandom initial values are input to the input layer of the recurrent neural network, output values are generated based on the input initial values and output from the output layer, and output values output from the output layer are input to the input layer. Learning means for learning a movement route by alternately performing learning with original learning data consisting of a predetermined number of output values obtained by repeating input processing a predetermined number of times and learning with new learning data; Moving means for moving the robot in the direction of the route to be output based on the result;It is characterized by providing.
[0011]
  The learning method according to claim 1,In the learning device according to claim 2 and the recording medium according to claim 3, a random initial value is input to the input layer, and an output value is generated based on the input initial value to generate an output layer. Is output from. In addition, learning with the original learning data consisting of a predetermined number of output values obtained by repeating the process of inputting the output value output from the output layer into the input layer a predetermined number of times and learning with new learning data are alternately performed.Done.
According to a fourth aspect of the present invention, a random initial value is input to the input layer, and an output value is generated and output from the output layer based on the input initial value. In addition, learning with the original learning data consisting of a predetermined number of output values obtained by repeating the process of inputting the output value output from the output layer into the input layer a predetermined number of times and learning with new learning data are alternately performed. And the travel route is learned. Further, the robot moves in the direction of the route that is output based on the learning result.
[0012]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below, but in order to clarify the correspondence between each means of the invention described in the claims and the following embodiments, in parentheses after each means, The features of the present invention will be described with the corresponding embodiment (however, an example) added. However, of course, this description does not mean that each means is limited to the description.
[0014]
FIG. 1 shows an external configuration of a robot to which the learning method of the present invention is applied. In this embodiment, a television camera 12 is attached to the upper part of the robot 11 so as to capture surrounding images. A wheel 13 is attached to the lower side of the robot 11 so that it can move to an arbitrary position. A display 14 is attached to the side surface of the robot 11 so that predetermined characters and images are displayed.
[0015]
FIG. 2 shows an internal configuration example of the robot 11. The television camera 12 captures surrounding video as a color image and outputs the captured color image data to the control circuit 24 and the quantization circuit 25. The quantization circuit 25 quantizes the input color image data and outputs it to the neural network recognition device 23. The neural network recognizing device 23 recognizes a landmark described later from the color image data input from the quantization circuit 25 and outputs the recognition result to the control circuit 24. For example, the control circuit 24 composed of a microcomputer or the like notifies the neural network recognition device 23 of the moving direction of the robot, and displays the prediction result of the next landmark supplied from the neural network recognition device 23 as CRT, The information is output and displayed on a display 14 such as an LCD.
[0016]
The control circuit 24 drives the motor 21 and directs the television camera 12 in a predetermined direction. Further, the control circuit 24 drives the motor 22 and rotates the wheels 13 to move the robot 11 to a predetermined position.
[0017]
FIG. 3 shows the movement space of the robot 11 in a plan view. In this embodiment, landmarks 1 to 5 are arranged around the obstacle 10A and the obstacle 10B. In this embodiment, the robot 11 moves in the counterclockwise direction around the obstacle 10A along the path of the landmark 1, the landmark 2, and the landmark 5, or the landmark 11, the landmark 2, the land It is assumed that the path around the obstacle 10A and the obstacle 10B moves counterclockwise along the path of the mark 3, the landmark 4, and the landmark 5.
[0018]
In this case, the control circuit 24 executes the process shown in FIG. First, in step S1, the control circuit 24 determines whether or not a landmark has been found. That is, the television camera 12 captures surrounding images and outputs color image data obtained as a result of the capturing to the neural network recognition device 23 via the quantization circuit 25. As will be described later, when the neural network recognition device 23 recognizes a landmark, it outputs the recognition result to the control circuit 24. The control circuit 24 monitors the output of the neural network recognizing device 23 and determines whether or not a landmark has been found. If the landmark has not been found, the control circuit 24 proceeds to step S2 to drive the motor 21 to The camera 12 is rotated in a predetermined direction or the motor 22 is driven to rotate the wheel 13 so that the robot 1 moves counterclockwise around the obstacles 10A and 10B in FIG. The processes in steps S1 and S2 are repeatedly executed until it is determined in step S1 that a landmark has been found.
[0019]
If it is determined in step S1 that a landmark has been found, the process proceeds to step S3, and the control circuit 24 determines whether an obstacle exists in the direction of the found landmark. That is, the control circuit 24 determines the presence / absence of an obstacle from the output of the television camera 12, and if it is determined that an obstacle exists, the control circuit 24 proceeds to step S4 and executes a process of rotating the wheel 13 in the right direction. That is, at this time, the control circuit 24 drives the motor 22 to rotate the wheel 13 in the right direction (clockwise).
[0020]
Then, it returns to step S3 and it is determined again whether an obstacle exists in the direction where the robot 11 newly pointed. If it is determined that there is an obstacle, the process proceeds to step S4 again, and processing for further rotating the robot 11 in the clockwise direction is performed. If it is determined in step S3 that there is no obstacle, the process proceeds to step S5, and the control circuit 24 executes a process of moving the robot 11 in the direction of the landmark found in step S1. That is, at this time, the control circuit 24 drives the motor 22, rotates the wheel 13, and moves the robot 11 in the direction of the landmark.
[0021]
In step S6, the control circuit 24 determines whether or not the landmark has been reached from the output of the television camera 12 and the output of the neural network recognition device 23. That is, when the signal indicating that the landmark is detected is input from the neural network recognition device 23 and the TV camera 12 outputs a sufficiently large landmark image, the control circuit 24 receives the landmark. It is determined that has been reached. If the landmark has not yet been reached, the process returns to step S3 and the subsequent processing is repeatedly executed. If it is determined that the landmark has been reached, the process returns to step S1 to find a new landmark, Processing similar to that described above is executed toward the landmark.
[0022]
As described above, the robot 11 travels toward the landmark 1 so as not to collide with the obstacle 10A. When the robot 11 reaches the landmark 1, the robot 11 travels from the landmark 1 toward the landmark 2. When the landmark 2 is reached, the vehicle further travels toward the landmark 5. When the landmark 5 is reached, the vehicle travels toward the landmark 1.
[0023]
Alternatively, when the robot 11 reaches the landmark 2, the robot 11 moves in the direction of the landmark 3, not in the direction of the landmark 5. When the robot 11 reaches the landmark 3, the robot 11 proceeds to the landmark 4 and proceeds to the landmark 4. When you reach, go to landmark 5.
[0024]
It is possible to program in advance by the control circuit 24 which of the two routes the robot 11 moves.
[0025]
Here, the configuration of the neural network recognition device 23 will be described. As shown in FIG. 5, the neural network recognition device 23 includes a correlation storage network 41 configured by a hop field type neural network, a winner-take-all type neural network 42, and a recurrent type neural network 43. Basically composed.
[0026]
The color image data output from the television camera 12 is input to the quantization circuit 25 and quantized before being input to the correlation storage net 41. That is, as shown in FIG. 6, the quantization circuit 25 has a space (table) composed of hue and saturation, and the range of the area A1 in the color image data defined by the predetermined hue and saturation. For example, all the color image data belonging to is red data. Similarly, all color image data belonging to the area A2 is quantized as green data, and further, all color image data belonging to the area A3 is quantized as blue data.
[0027]
Note that the names of red, green, and blue here are merely for convenience, and other names may be used. That is, these names are merely codes (names of quantized data) of each area.
[0028]
The color image data existing in the space defined by the hue and saturation exists infinitely. In the case of this embodiment, this is quantized into three quantized data. As described above, by quantizing a large number of color image data into a sufficiently small number of quantized data, an object can be learned and recognized by a neural network. Thus, the quantized data obtained by quantizing the color image data by the quantizing circuit 25 is supplied to the neural network recognition device 23. Therefore, the quantized data input to the neural network recognizing device 23 is data represented by one of the three data defined by the space shown in FIG.
[0029]
As shown in FIG. 5, the correlation storage net 41 has a number of fields corresponding to the number of quantization steps shown in FIG. 6 (three in this embodiment). The field 41R is a field corresponding to the red quantized data in the area A1 in FIG. 6, the field 41G is a field corresponding to the green quantized data in the area A2 in FIG. 6, and the field 41B is This is a field corresponding to the blue quantized data in the area A3 in FIG. The input pattern constituted by the three quantized data output from the quantizing circuit 25 is input (recollected) to the neurons in the corresponding fields of the correlation storage net 41.
[0030]
That is, when U is the internal state of each neuron, the following equation is established.
[0031]
[Expression 1]

[0032]
Here, i represents a neuron number, and t represents a predetermined time. Therefore, U_i ^{t + 1}Represents the internal state of the i-th neuron at time t + 1.
[0033]
Here, k is a constant representing a damper, and α is also a predetermined constant.
[0034]
W_ijRepresents a connection weight coefficient from the i-th neuron to the j-th neuron. a_j ^tRepresents the output of the neuron at the j-th time t. This output is defined by the following equation.
[0035]
[Expression 2]

[0036]
Here, logistic (A) represents multiplying A by a sigmoid function. T represents a constant. That is, the above expression means that the result of dividing the internal state of the neuron by the constant T and multiplying the result by the sigmoid function is the output of the neuron.
[0037]
As described above, recall dynamics are performed, whereas learning dynamics are expressed by the following equations.
[0038]
[Equation 3]

[0039]
0.5 in the above formula functions as a threshold value. That is, the output of each neuron is a value between 0 and 1, but when the value is less than 0.5, the connection weight coefficient is negative, and when it is greater than 0.5, the connection weight coefficient is positive. have.
[0040]
When the neural network is made to learn landmarks 1 to 5 as objects for recognition, the result of the learning is the connection weight coefficient W_ijWill be stored as
[0041]
The winner take-all neural network 42 has a number of neurons (five neurons in this embodiment) corresponding to at least the number of landmarks to be recognized. When input is performed, learning is performed such that the output of one neuron that outputs the largest value among the five neurons is set to 1.0 and the output of the other four neurons is set to 0.0. . As a result, one landmark is determined from the pattern defined by the three fields of the correlation storage net 41, and the neuron corresponding to the landmark is fired.
[0042]
Thus, in the winner-take-all type neural network 42, only one neuron fires, so the configuration of the recurrent neural network 43 in the subsequent stage that processes the output can be simplified.
[0043]
The recurrent neural network 43 basically includes an input layer 51, an intermediate layer 52, and an output layer 53. The input layer 51 includes a pattern node 51A having five neurons corresponding to the winner take-all neural network 42, a context node 51B having neurons holding the internal state of the recurrent neural network 43, and the control circuit 24. And a direction node 51C having a neuron to which the direction of movement is commanded.
[0044]
The output layer 53 includes a pattern node 53A having neurons corresponding to five landmarks and a context node 53B corresponding to the context node 51B in the input layer 51. Each neuron in the intermediate layer 52 connects each node of the input layer 51 and the output layer 53. Further, the output of the neuron of the context node 53B of the output layer 53 is fed back to the neuron of the context node 51B of the input layer 51.
[0045]
When an input corresponding to one landmark is input from the winner take-all type neural network 42 to the pattern node 51A of the input layer 51, the recurrent type neural network 43 predicts the next appearing landmark and outputs the output layer 53. Output from.
[0046]
When the color image data is input from the quantization circuit 25, the neural network recognition device 23 executes the process shown in the flowchart of FIG.
[0047]
First, in step S11, the process waits until a landmark is searched. In the case of this embodiment, the landmarks 1 to 5 are all colored with a predetermined color, and the neural network recognition device 23 determines YES in step S11 when color image data is input. And proceed to step S12.
[0048]
In step S12, the neural network recognizing device 23 executes a recognition process for the landmark (current landmark) searched now. This recognition process is executed in the correlation storage net 41 of the neural network recognition device 23.
[0049]
When the current landmark recognition process in step S12 is completed, the process proceeds to step S13, and the winner take-all type neural network 42 performs a process for narrowing the recognition result obtained in step S12. That is, it is clarified which of the five landmarks has been recognized. Then, the process proceeds to step S14, in which the landmark that appears next to the current landmark is predicted in the recurrent neural network 43. The predicted result is displayed on the display 14. The above processing is repeatedly executed every time a landmark is searched.
[0050]
Now, it is assumed that images as reference objects to be recognized by the landmarks 1 to 5 are stored (learned) as coupling weight coefficients in the correlation storage net 41. In this state, for example, when the pattern of the landmark 2 photographed by the television camera 12 and quantized by the quantization circuit 25 is input to the correlation storage net 41, each of the

fields

41R, 41G, and 41B has a land land. The quantized red data, green data, and blue data of the mark 2 ignite at predetermined positions. The winner take-all type neural network 42 determines a corresponding landmark from the firing state of each field, and fires a neuron corresponding to one landmark based on the determination result. In this case, the neuron corresponding to the landmark 2 fires.
[0051]
Therefore, at the pattern node 51A of the input layer 51 of the recurrent neural network 43, a neuron corresponding to the landmark 2 is fired corresponding to the neuron of the winner take-all neural network 42. At this time, the control circuit 24 determines whether the next direction is left or right, and inputs a signal corresponding to the direction to the direction node 51C of the input layer 51. In the embodiment of FIG. 5, the neuron corresponding to the left direction is fired. For this reason, the recurrent neural network 43 predicts a landmark that comes next to the landmark 2 and outputs the prediction result to the pattern node 53 A of the output layer 53. As shown in FIG. 3, when the landmark 2 is detected and the next moving direction is the left direction, the next appearing landmark is the landmark 5. Therefore, in this case, as shown in FIG. 5, in the output layer 53, the neuron of number 5 corresponding to the landmark 5 is fired.
[0052]
When the control circuit 24 receives input of data for predicting the next landmark from the neural network recognizing device 23, the control circuit 24 outputs a number corresponding thereto to the display 14 for display. In this case, for example, the number 5 is displayed on the display 14. Thereby, the user can know that the next appearing landmark is the landmark 5.
[0053]
When a neuron corresponding to the landmark 2 in the pattern node 51A of the input layer 51 of the recurrent neural network 43 is fired, and a neuron corresponding to the right direction is fired in the direction node 51C, as shown in FIG. On the other hand, since the next appearing landmark is the landmark 3 when moving to the right from the landmark 2, the neuron of number 3 corresponding to the landmark 3 is fired in the pattern node 53A of the output layer 53. become.
[0054]
For example, if the landmark 4 is a landmark having a color similar to that of the landmark 1, it is unclear which of the landmark 1 and the landmark 4 is recognized. However, in the case of this embodiment, since the recurrent type neural network 43 is provided with the context node, this also identifies the state transition.
[0055]
That is, as shown in FIG. 3, the landmark 4 appears next to the landmark 3, and the landmark 1 appears next to the landmark 5. In the recurrent neural network 43, the current state can be identified by the context nodes 51B and 53B. Therefore, when the landmark recognized immediately before is the landmark 3, It is recognized that the input landmark is not the landmark 1 but the landmark 4. Similarly, when the landmark recognized immediately before is the landmark 5, it can be recognized that the landmark predicted next is the landmark 1, not the landmark 4.
[0056]
As described above, by searching the landmarks in the order of the landmark 1, the landmark 2, and the landmark 5 in the recurrent neural network 43 of the robot 11, the periphery of the obstacle 10A is moved counterclockwise. A route that moves around the obstacle 10A and the obstacle 10B in a counterclockwise direction by searching for the landmark in the order of the landmark 1, the landmark 2, the landmark 3, the landmark 4, and the landmark 5. Is already learned. In this state, for example, it is assumed that the landmark 4 has been deleted as shown in FIG. Accordingly, at this time, the landmarks present in the work space in which the robot 11 moves are the four landmarks of the landmark 1, the landmark 2, the landmark 3, and the landmark 5. When such a slight change is made, the recurrent neural network 43 is caused to perform learning as shown in the flowchart of FIG.
[0057]
That is, first, in step S21, a predetermined initial value is input to the recurrent neural network 43. This initial value may be random. In the recurrent neural network 43 to which the initial value is input, a coefficient corresponding to the original learning data is learned for each neuron, and thus some output is made.
[0058]
In step S22, the recurrent neural network 43 is caused to recall the original learning data by rehearsal. That is, in the recurrent neural network 43, the output generated based on the initial value is fed back to the input, and the operation of recalling a new output is repeated based on the fed back output. When such rehearsal processing is performed several times, as described above, since the coefficients of the original learning data are learned in the neurons of the recurrent neural network 43, the original learning data is transferred to the recurrent neural network 43. Data can be recalled and output.
[0059]
In step S23, new learning data is input to the recurrent neural network 43 to be learned. That is, the movement path in the order of the landmark 1, the landmark 2, and the landmark 5 and the movement path in the order of the landmark 1, the landmark 2, the landmark 3, and the landmark 5 are learned.
[0060]
Next, in step S24, the original learning data is input and learned. That is, the movement path in the order of the landmark 1, the landmark 2, and the landmark 5 and the movement path in the order of the landmark 1, the landmark 2, the landmark 3, the landmark 4, and the landmark 5 are learned. As this original learning data, data recollected by the recurrent neural network 43 by the rehearsal processing in step S22 is used.
[0061]
Next, it progresses to step S25 and it is determined whether sufficient learning was completed. If it is determined that sufficient learning has not been performed yet, the process returns to step S23 and the subsequent processing is repeatedly executed.
[0062]
As described above, when learning is performed by adding learning based on new learning data and learning based on the original learning data (alternatively arranged in this embodiment), only new learning data is obtained. Learning can be completed in a shorter time compared to the case of learning with.
[0063]
The original learning data can be stored in the robot 11 having a memory. However, in such a case, an extra configuration is required, which not only increases the size of the apparatus but also increases the cost. Therefore, such a method is not very practical.
[0064]
In the processing example of FIG. 9, learning with new learning data and learning with original learning data are alternately performed once, but for example, it is alternately performed twice or three times. It is also possible to make it. However, for example, when learning is performed 3000 times in total, if learning is first performed with new learning data 1500 times and then learning is performed with the original learning data 1500 times, new learning is performed. An intermediate learning result between the learning result by the data and the learning result by the original learning data can be obtained, which is not preferable. Therefore, it is desirable to alternate learning with new learning data and learning with original learning data relatively frequently. Since it is changed relatively frequently, there is not much difference in the result regardless of which learning is performed with new learning data and learning with the original learning data.
[0065]
However, for example, after learning with new learning data and learning with original learning data are alternately repeated, the number of learning with new learning data may be gradually increased from learning with original learning data. .
[0066]
This learning method can be applied not only when the recurrent type neural network is applied to the robot but also when applied to various devices. However, the present invention can be more effective when applied when the already learned state and the newly learned state are relatively approximate states.
[0067]
【The invention's effect】
As described above, the learning method according to claim 1,The learning device according to claim 2,The recording medium according to claim 3, and the recording medium according to claim 4.robotAccording to,Learning can be completed in a shorter period of time than when learning is performed only from new learning data from the beginning.
[Brief description of the drawings]
FIG. 1 is a diagram showing an external configuration of a robot to which a learning method of the present invention is applied.
FIG. 2 is a block diagram showing an internal configuration of the embodiment of FIG.
FIG. 3 is a diagram for explaining a moving space according to the embodiment of FIG. 1;
4 is a flowchart for explaining the operation of the control circuit of FIG. 2;
FIG. 5 is a diagram showing a detailed configuration example of the neural network recognition device 23 in FIG. 2;
6 is a diagram for explaining the operation of the quantization circuit 25 of FIG. 2;
7 is a flowchart for explaining the operation of the neural network recognition device 23 of FIG.
FIG. 8 is a diagram illustrating another space in which the embodiment of FIG. 1 moves;
FIG. 9 is a flowchart illustrating a learning method.
FIG. 10 is a diagram illustrating a space in which a conventional robot moves.
FIG. 11 is a flowchart illustrating a conventional learning method.
[Explanation of symbols]
11 robot, 12 TV camera, 13 wheel, 14 display, 23 neural network recognition device, 24 control circuit, 25 quantization circuit, 41 correlation memory network, 42 winner take-all type neural network, 43 recurrent type neural network

Claims

In the learning method of the recurrent type neural network,
A random initial value is input to the input layer, and based on the input initial value, an output value is generated and output from the output layer, and the output value output from the output layer is input to the input layer. A learning method comprising: a learning step of alternately performing learning using original learning data composed of a predetermined number of the output values obtained by repeating the processing to be performed a predetermined number of times and learning using new learning data.

In the learning device of the recurrent type neural network,
A random initial value is input to the input layer, and based on the input initial value, an output value is generated and output from the output layer, and the output value output from the output layer is input to the input layer. A learning apparatus comprising learning means for alternately performing learning using original learning data composed of a predetermined number of the output values obtained by repeating the processing to be performed a predetermined number of times and learning using new learning data.

In a recording medium on which a program for causing a computer to learn a recurrent neural network is recorded,
A random initial value is input to the input layer, and based on the input initial value, an output value is generated and output from the output layer, and the output value output from the output layer is input to the input layer. A learning step of alternately performing learning using original learning data composed of a predetermined number of output values obtained by repeating the processing to be performed a predetermined number of times and learning using new learning data
A recording medium on which a program for causing a computer to execute the process is recorded.

In a robot that moves based on learning of a recurrent neural network,
  A random initial value is input to the input layer of the recurrent neural network, and based on the input initial value, an output value is generated and output from the output layer, and the output value output from the output layer Is learned by alternating the learning with the original learning data consisting of a predetermined number of the output values and the learning with the new learning data, which are obtained by repeating the process of inputting to the input layer a predetermined number of times. Learning means,
  Moving means for moving the robot in the direction of a route output based on the learning result;
  A robot characterized by comprising: