JP4160677B2

JP4160677B2 - Computer system with distributed learning function

Info

Publication number: JP4160677B2
Application number: JP36417698A
Authority: JP
Inventors: 謙元寺内
Original assignee: 謙元寺内
Priority date: 1998-12-22
Filing date: 1998-12-22
Publication date: 2008-10-01
Anticipated expiration: 2018-12-22
Also published as: JP2000187505A

Description

【０００１】
【発明の属する技術分野】
本発明は、ロボット等の制御に用いられる制御対象物や環境に応じた学習機能を有するコンピュータシステムに関するものである。
【０００２】
【従来の技術】
従来のロボット等の制御に用いられるコンピュータシステムにおいて、ニューロンと結合回路網を基本とした生体の動作メカニズムに基づいて動作するニューロコンピュータがある。ニューロンは、神経細胞ともいい、生体内において情報処理を専ら担うように特殊化した細胞である。ニューロンを相互に結合したものが、脳の神経回路網（ｎｅｕｒａｌｎｅｔｗｏｒｋ）である。ニューロコンピュータは、神経回路網の並列学習情報処理の原理をモデル化した神経回路網モデルを用いて、学習を行う。
【０００３】
【発明が解決しようとする課題】
しかしながら、上記のような従来のニューロコンピュータでは、学習処理に使用されるＣＰＵが一つであるため、制御対象物が増加した場合に、システムの拡張が難しく、また、学習処理の高速化が図れないという問題があった。また、人間の脳の神経回路網におけるニューロンの結合には、他の細胞の興奮を促す興奮性の結合と、逆に他の細胞の興奮を抑えようとする抑制性の結合とがあり、これらの結合を組み合わせて複雑な並列学習処理を実現しているが、これに対して、従来のニューロコンピュータでは、ニューロンに相当するセンサ等からの入力信号がオンの場合に、出力側のＣＰＵにとってプラスの評価要因となる興奮性の結合しか用いられていないため、複雑な学習処理を行うことができないという問題があった。
【０００４】
本発明は、上述した問題点を解決するためになされたものであり、複雑な学習処理を行うことができるようにして、複雑な環境に適応することが可能で、また、制御対象物が増加した場合でも、システムの拡張が容易で、学習処理の高速化を図ることが可能な学習機能を有するコンピュータシステムを提供することを目的とする。
【０００５】
【課題を解決するための手段】
上記目的を達成するために本発明は、制御対象物や環境に応じた学習機能を有するコンピュータシステムにおいて、ＣＰＵを有するユニットを複数連結したユニット群より構成され、前記ユニット群中の各ユニットの連結構造は、水平方向の並列的な連結構造と、垂直方向の階層構造を有する連結構造とを含み、前記ユニット群中のユニットのうち、最上位の階層のユニットと最下位の階層のユニットとを除く各ユニットは、自ユニットに接続されたセンサから取得した制御対象物や環境の情報（以下、第１の入力情報という）と、水平方向に連結された他のユニットがセンサから取得した制御対象物や環境の情報（以下、第２の入力情報という）とに基づいて、上位の階層のユニットより前回入力されたコマンドに対して自ユニットが前回出力したコマンドが、自ユニットにとってプラスの効果をもたらしたか否かを評価する評価手段と、前記評価手段による評価の結果、自ユニットにとってプラスの効果をもたらしたと評価されたときに、前回入力されたコマンドと、前回の第１及び第２の入力情報と、前回出力したコマンドとの組み合わせデータを記憶する記憶手段と、今回の第１及び第２の入力情報と上位の階層のユニットより入力された今回の入力コマンドとを含む組み合わせデータが、前記記憶手段に記憶された過去の組み合わせデータ中に存在するか否かを検索する検索手段と、前記検索手段による検索の結果、今回の入力コマンドと今回の第１及び第２の入力情報とを含む組み合わせデータが、前記記憶手段に記憶された過去の組み合わせデータ中に存在するときに、該当の組み合わせデータ中の出力コマンドを、低位の階層のユニットに出力するコマンドとして決定し、前記検索手段による検索の結果、今回の入力コマンドと今回の第１及び第２の入力情報とを含む組み合わせデータが、前記記憶手段に記憶された過去の組み合わせデータ中に存在しないときに、前記記憶手段に記憶された過去の組み合わせデータ中に、今回の入力コマンドと今回の第１及び第２の入力情報とを含む組み合わせデータの類似データが存在するか否かの検索を行って、この結果、前記過去の組み合わせデータ中に類似データが存在する場合には、この類似データ中の出力コマンドを、低位の階層のユニットに出力するコマンドとして決定し、前記過去の組み合わせデータ中に類似データが存在しない場合には、ランダムに創成した出力コマンドを、低位の階層のユニットに出力するコマンドとして決定する決定手段とを備え、前記ユニット群中の各ユニットと、入力側のセンサや他ユニットとの結合の種類が、入力側のセンサや他ユニットからの信号がオンの場合にそれぞれ、出力側の各ユニットにとってプラスの評価要因となる興奮性の結合と、出力側の各ユニットにとってマイナスの評価要因となる抑制性の結合と、出力側の各ユニットにとってプラスの評価要因にもマイナスの評価要因にもならない中性結合とより構成され、前記評価手段は、入力側のセンサや他ユニットから自ユニットへ入力された信号がオンのときに、該当の入力側のセンサや他ユニットとの結合の種類が興奮性の結合の場合は、評価値に所定の値をプラスし、該当の入力側のセンサや他ユニットとの結合の種類が抑制性の結合の場合は、評価値から所定の値をマイナスすることにより評価値の合計を求めて、この合計がプラスとなる場合に、上位の階層のユニットより前回入力されたコマンドに対して自ユニットが前回出力したコマンドが、自ユニットにとってプラスの効果をもたらしたと評価するようにしたものである。
【０００６】
上記構成においては、自ユニットに接続されたセンサ等から取得した制御対象物や環境等の情報と、水平方向に連結された他のユニットがセンサ等から取得した制御対象物や環境等の情報とに基づいて、上位の階層のユニットより前回入力されたコマンドに対して自ユニットが前回出力したコマンドが、自ユニットにとってプラスの効果をもたらしたか否かを評価することができる。従って、この評価を繰り返すことにより、それぞれのユニットが、入力されるコマンドの種類毎に、自ユニットにとってプラスの効果をもたらす出力コマンドを学習して、出力することができる。これにより、制御対象物が増加した場合でも、増加した制御対象物に対応したユニットを水平方向に追加して連結するだけの簡単な手順で、増加した制御対象物にとってプラスの効果をもたらす出力コマンドを出力することができるので、制御対象物の増加に応じて容易にシステムを拡張することができる。また、このようにシステムを拡張した場合でも、ユニット群中のそれぞれのユニットが、それぞれのユニット内のＣＰＵを用いてユニット毎に学習処理を行うことができるので、学習処理の高速化を図ることができる。さらにまた、ユニット群中の各ユニットの連結構造として、垂直方向の階層構造を有する連結構造を用いるようにしたので、自ユニットに接続されたセンサや水平方向に連結された他のユニットから取得した制御対象物や環境等の情報に基づいて、上位の階層のユニットより入力されたコマンドを、制御対象物や環境に応じたより詳細な命令を表すコマンドに展開することができる。
【０００７】
また、ユニット群中の各ユニットと入力側のセンサや他ユニットとの結合が、入力側のセンサや他ユニットからの信号がオンの場合に、出力側の各ユニットにとってプラスの評価要因となる興奮性の結合と、出力側の各ユニットにとってマイナスの評価要因となる抑制性の結合と、出力側の各ユニットにとってプラスの評価要因にもマイナスの評価要因にもならない中性結合とよりなるものとしてもよい。これにより、出力側の各ユニットが、従来の学習機能を有するコンピュータシステムで用いられていた興奮性の結合に加えて、抑制性の結合と中性結合とを用いて、入力側のセンサや他ユニットからの制御対象物や環境に関する情報を受け取ることができるので、出力側の各ユニットにとって好ましいコマンドに加えて、出力側の各ユニットにとって好ましくないコマンドを学習することができる。これにより、複雑な学習処理を行うことができる。また、出力側の各ユニットにとって、その時点で好ましいか否かが分からない中性結合したセンサ等からの情報を受け取ることができるので、より多くの情報に基づいて学習処理を行うことができる。
【０００８】
【発明の実施の形態】
以下、本発明の一実施形態による学習機能を有するコンピュータシステムについて図面を参照して説明する。本実施形態によるコンピュータシステムは、ロボット等に組み込まれて、アーム等のアクチュエータの制御に用いられる。図１（ａ）は本実施形態によるコンピュータシステムの基本単位となるユニットを個別に表した図、（ｂ）はユニットを垂直・水平ネットワークで接続した様子を示す図、（ｃ）はユニット内部に格納される入力データと出力コマンドの組み合わせデータのフォーマットを示す図である。ユニット１のユニット本体部２の内部には、不図示のＣＰＵ，ＲＡＭ、ＲＯＭが格納されており、また、ユニット１の入力側及び出力側には、それぞれ入力ポート３及び出力ポート４が設けられている。ユニット本体部２内部のＣＰＵは、上位の階層のユニット１より入力されたコマンド、及び自ユニットや他ユニットに接続されたセンサ系からの制御対象物や環境についての情報を、入力ポート３より受け取り、ＲＯＭに格納された学習処理用のプログラムを用いて、図１（ｃ）に示される自ユニットにとってプラスの効果をもたらす入力データと出力コマンドの組み合わせのデータをＲＡＭに蓄積していく。そして、蓄積されたデータに基づいて入力データに応じた自ユニットにとって好ましい出力コマンドを出力ポート４から出力する。図１（ｂ）に示される垂直・水平ネットワーク１０を構成する各ユニット１が、このような処理を繰り返すことにより、各ユニット１のそれぞれが環境や制御対象物に応じたコマンドを学習して出力することができるようになるので、垂直・水平ネットワーク１０内の最上位層（第３層）のユニット１の出力ポート４から、全てのユニット１にとって好ましい出力コマンドを、アーム等の駆動用のデバイスへ出力することができる。
【０００９】
図１（ｂ）に示されるセンサ系に用いられるセンサとしては、温度、湿度、接触、光度、音声等のセンサが考えられ、環境や制御対象物の状況を把握するために必要な情報をとらえることができるものであれば、特に制約はない。ただし、これらのセンサには、ＯＮ／ＯＦＦ接点の出力のあるものが用いられる。また、図１（ｂ）に示される間接出力は、各階層のユニット１の状況を把握するためのインジケータ等に用いられる。
【００１０】
次に、上記入力ポート３の持つ入力端子について説明する。各入力端子には、０か１かで表された２進数のデータが入力され、センサからの情報の入力用端子の場合は、１はＯＮを、０はＯＦＦを表す。また、コマンド入力用の端子の場合は、２進数で表した命令コードが入力される。センサ入力用端子から入力されたデータは、直前に出力したコマンドの評価値として用いられる。このセンサ入力用端子には、興奮端子、抑制端子、中性端子の３種類の入力端子がある。興奮端子に入力されたデータが１（ＯＮ）の場合には、ユニット１が前回入力されたデータに対して出力したコマンドが、そのユニット１にとってプラスの効果をもたらしたと評価される。これに対して、抑制端子に入力されたデータが１の場合には、ユニット１が前回入力されたデータに対して出力したコマンドが、そのユニット１にとってマイナスの効果をもたらしたと評価される。また、中性端子に入力されるデータについては、そのユニット１にとっての評価値として扱わない。この中性端子は、主に垂直・水平ネットワーク１０の接続ポートとして使用され、コマンドの伝達用に用いられるが、現時点ではそのユニット１にとってプラスか否か分からない要因について検知するセンサとの接続用にも用いられる。
【００１１】
次に、各ユニット１内のＲＯＭに内蔵される学習用のプログラムについて図２のフローチャートを参照して説明する。図に示されるプログラムは、垂直・水平ネットワーク１０内の全てのユニット１に常駐する。各ユニット１内のＣＰＵは、センサ情報とコマンドより構成された入力データを受け取って、自ユニットにとって好ましいコマンドを出力する。先ず、ＣＰＵは、入力ポート３に入力された入力データのＯＮ／ＯＦＦを読み取って（＃１）、興奮端子と抑制端子から入力されたセンサ情報に基づいて、ユニット１が前回の入力データに対して出力したコマンドが、自らのユニット１にとってプラスとなる環境をもたらしたか否かを評価する（＃２）。具体的には、興奮端子に入力されたデータが１の場合には、評価値に１をプラスし、抑制端子に入力されたデータが１の場合には、評価値から１をマイナスする。この結果、各端子に入力されたデータの評価値の合計がプラスとなる場合に限り、図１（ｃ）に示されるデータフォーマットに従って、ユニット１内部の一時メモリに記憶された前回の入出力データの組み合わせデータをユニット１内部のＲＡＭに保存する。
【００１２】
次に、ＣＰＵは、今回読み取った入力データと同じ入力データが、ＲＡＭ内の過去の組み合わせデータ中にあるか否かを検索し（＃３）、同じ入力データが存在する場合には（＃４でＹＥＳ）、過去の組み合わせデータ中の該当入力データと対になる出力コマンドを、出力データ中のコマンドに相当するビットのデータとしてセットする（＃５）。ＲＡＭ内の過去の組み合わせデータ中に同じ入力データが存在しない場合には（＃４でＮＯ）、過去の組み合わせデータ中に類似データが存在するか否かの検索を行い（＃６）、類似データが存在する場合には（＃７でＹＥＳ）、類似データ中の出力コマンドを、出力データ中のコマンドに相当するビットのデータとしてセットし（＃５）、類似データが存在しない場合には（＃７でＮＯ）、出力コマンドをランダムに創成して、出力データ中のコマンドに相当するビットのデータとしてセットする（＃８）。次に、ＣＰＵは、＃１で読み取った入力データ中のセンサ情報に基づいて、出力データ中のセンサ情報に相当するビットの情報を編集し、出力コマンドとセンサ情報よりなる出力データを、各ユニット間の出力の同期をとりながら、出力ポート４に出力する（＃９）。この際、今回の入力データと出力データの組み合わせデータを、ユニット１内部の一時メモリに記憶する。
【００１３】
制御対象物や環境が広範囲になった場合、及び制御対象物に対する制御や環境が複雑化した場合には、接続するユニット１を増やす必要がある。この接続方法には垂直方向の接続と、水平方向の接続がある。先ず、垂直方向接続時のデータの流れについて図３を参照して説明する。図３は、垂直方向ネットワークにおけるデータの流れを示す図である。垂直方向ネットワーク１１は、各階層のユニット１に接続されたセンサから入力された環境や制御対象物に関する情報に応じて、上位階層のユニット１から入力されたコマンドをより詳細なコマンドに展開することにより、環境や制御対象物に応じた合目的的な出力コマンドを創出することができる。
【００１４】
図３に示されるように、垂直方向ネットワーク１１は、直列的に階層をなして接続している。コマンドを含むデータは、上位階層のユニット１から下位階層のユニット１へ、すなわち、第ｎ＋２層から、第ｎ＋１層、第ｎ層へ出力されていく。図に示されるように、第ｎ＋２層から第ｎ＋１層に対して、”０１”というコマンドが出力されたとすると、この”０１”を入力データとして受け取った第ｎ＋１層のＣＰＵは、ＲＡＭ中の過去データを検索して、”０１”という入力データに対応した”０１００”という出力コマンドを第ｎ層に対して出力する。第ｎ層のＣＰＵは、同じ様に自ユニット内のＲＡＭ中の過去データを検索して、”０１００”という入力データに対応した”０１１０１１１０”という出力コマンドを制御対象物に対して出力する。このことは、垂直方向ネットワーク１１の階層構造とＲＡＭ内の過去データとを用いて、最上位層の第ｎ＋２層が出力する出力パターンを８桁から２桁へと１／４に圧縮しているといえる。これは、例えば、第ｎ＋２層で指示された「歩く」という動作を、第ｎ＋１層で「右足を前後に動かす」、「左足を前後に動かす」という動作に分解し、さらに第ｎ層で「右大腿四頭筋を収縮させる」、「右ひらめ筋を収縮させる」、「左・・・・」等、出力の最も簡単な単位的出力パターン（出力コマンド）に分解していくことにつながる。そして、第ｎ＋２層から同じ「歩く」というコマンド（”０１”）が出力された場合でも、各階層のユニット１でセンサ入力される情報を加味して出力コマンドを選択・創出することにより、状況に合った的確な「歩く」動作を表すコマンドを下位の階層のユニット１から出力することができ、例えば、「歩く」場所が坂道である場合には、「かかとの角度を変化させて歩く」等の的確な「歩く」動作を実現することができる。逆に、高位の階層のユニット１では、「歩く」、「曲がる」等のまとまった行動をセットで準備することができるようになるので、下位の階層のユニット１でセンサ入力情報に応じたコマンドの詳細化を図ることにより、対象物や環境に応じた合目的的な行動を選択することが可能になる。
【００１５】
上記垂直方向ネットワーク１１内の各階層のユニット１の違いは、一つのループにかかる処理時間の長さが異なることで、下位の階層では、１回のループにかかる時間は高位の階層における時間と比べて短くなる。この時間の設定は、各ユニット１が自動的に行う。各ユニット１における入力ポート３内のコマンド入力用端子の位置は、予め決められている。
【００１６】
次に、水平方向接続時のデータの流れについて図４を参照して説明する。図４は、水平方向ネットワークの接続例を示す図である。２つ以上のユニット１について、制御対象となる物又は環境が部分的に同じ場合、このように水平的に接続して、相互又は一方的に情報を提供することにより、実質的に各ユニット１に接続するセンサを増やし、結合した各ユニット１が、より多くの情報に基づいて、出力コマンドを正確に決定することができるようになる。
【００１７】
図４では、第ｍ局のユニット１と第ｍ＋１局のユニット１とが水平方向に接続されており、第ｍ局の出力が第ｍ＋１局の入力として、また、第ｍ＋１局の出力が第ｍ局の入力として相互に入力される。図４の場合は、第ｍ局から第ｍ＋１局へ”０”の値が入力され、第ｍ＋１から第ｍ局へ”１”の値が入力されている。第ｍ局において第ｍ＋１局のおかれている状況、すなわち第ｍ＋１局に接続されているセンサからの情報を全て把握する必要はない。それは、各ユニット１が自律的に過去のデータから他のユニットの状況を学習することで、対応することができるからである。水平方向ネットワーク１２において水平接続に関係するユニット１の数は、２つ以上あればよく、端子数が許す限りいくつでもよい。ただし、各ユニット１は、（入力データ−出力コマンド）の関係の中で学習を繰り返すので、各ユニット１自身にとって必要な状況を把握するための最小限の外部センサ入力用の端子は残しておく必要がある。接続端子としては、センサ用の接続端子と同様に、興奮・抑制・中性のいずれの種類の端子でも使用可能である。ある制御対象物又は環境についての入力情報が、接続している２つのユニットのそれぞれについて、同調する効果を奏する場合は興奮端子に接続し、相反する効果を奏する場合は抑制端子に接続し、どちらでもない場合は中性端子に接続するという点が基本になるが、この方法には限らない。一方が興奮端子に接続しているからといって、他方も興奮端子に接続する必要はなく、他の種類の端子に接続していてもよいし、どの端子とも接続していなくてもよい。
【００１８】
次に、垂直・水平ネットワーク１０の設計方法について図５（ａ）（ｂ）を参照して説明する。図５（ａ）（ｂ）は、それぞれ接続端子合計表と、接続端子合計表を用いた垂直・水平ネットワーク１０の設計例を示す図である。垂直・水平ネットワーク１０を設計する際に、各ユニット１相互間の入出力端子の接続関係について把握する必要がある。図５（ａ）に示される接続端子合計表は、各ユニット１の入出力用の接続端子数と接続関係を明らかにし、垂直・水平ネットワーク１０の設計に役立てるための表である。また、図５（ｂ）に示される垂直・水平ネットワーク１０の設計図では、Ｃｏｍｍａｎｄは、垂直方向ネットワーク１１の系統に従って接続していき、Ｓｅｎｓｏｒは、水平方向ネットワーク１２の系統に従って直接接続されるセンサの接続数と、水平方向の接続ユニットへの結線で接続していく。接続端子合計表中のユニット名には、階層と水平方向の位置が分かるように記入していくと、理解しやすい。図５（ｂ）の場合は、階層位置は数字で、水平方向の位置はアルファベットで示している。接続端子合計表中における接続数の欄は端子の数を記入するためのものである。１つのユニット１からの出力が分岐する点、及び２つ以上のユニット１からの出力が合流する点には、その端子数の内訳を記入するようにする。これらの内訳は、入力端子数と出力端子数に矛盾することなく、合っていなければならない。
【００１９】
上述したように、本実施形態による分散学習機能を有するコンピュータシステムによれば、ユニット群中のそれぞれのユニットは、センサや他ユニットから取得した自ユニットの現在の環境等に関する情報に基づき、前回入力されたコマンドに対する自ユニットの前回出力コマンドが、自ユニットにとってプラスの効果をもたらしたか否かを評価することができる。従って、この評価を繰り返すことにより、それぞれのユニットが、入力されるコマンドの種類毎に、自ユニットにとってプラスの効果をもたらす出力コマンドを学習して、出力することができる。これにより、制御対象物が増加した場合でも、増加した制御対象物に対応したユニットを水平方向に追加して連結するだけの簡単な手順で、増加した制御対象物にとってプラスの効果をもたらす出力コマンドを出力することができるので、制御対象物の増加に応じて容易にシステムを拡張することができる。また、ユニット群中の各ユニットの連結構造として、垂直方向の階層構造を有する連結構造を用いることにより、自ユニットに接続されたセンサや水平方向に連結された他のユニットから取得した環境等の情報に基づいて、上位の階層のユニットより入力されたコマンドを、制御対象物や環境に応じたより詳細な命令を表すコマンドに展開して、出力することができる。
【００２０】
本発明は、上記実施形態に限られるものではなく、様々な変形が可能である。例えば、上記実施形態では、前回の入力データに対する前回の出力コマンドがプラスの評価である場合には、前回の入力データと出力コマンドの組み合わせデータを無条件にユニット１内部のＲＡＭに保存したが、ＲＡＭ中に前回の組み合わせデータに加えて該当の出力コマンドを発行した際の評価値も保存し、ＲＡＭ中に前回の入力データに該当する組み合わせデータが既に格納されている場合には、ＲＡＭ中の該当出力コマンドを発行した際の評価値と前回の出力コマンドを発行した際の評価値とを比較することにより、ＲＡＭ中に保存する入力データと出力コマンドの組み合わせを択一的に決定するようにしてもよい。これにより、自ユニットにとって好ましい入力データと出力コマンドの組み合わせをより正確に学習することができる。
【００２１】
【発明の効果】
以上のように本発明によれば、ユニット群中のそれぞれのユニットは、前回入力されたコマンドに対する自ユニットの前回出力コマンドが、自ユニットにとってプラスの効果をもたらしたか否かを評価することができるので、この評価を繰り返すことにより、それぞれのユニットが、入力されるコマンドの種類毎に、自ユニットにとってプラスの効果をもたらす出力コマンドを学習して、出力することができる。これにより、制御対象物が増加した場合でも、増加した制御対象物に対応したユニットを水平方向に追加して連結するだけの簡単な手順で、増加した制御対象物にとってプラスの効果をもたらす出力コマンドを出力することができるので、制御対象物の増加に応じて容易にシステムを拡張することができる。また、このようにシステムを拡張した場合でも、ユニット群中のそれぞれのユニットが、ＣＰＵを有しており、それぞれのユニット毎に学習処理を行うことができるので、水平方向の連結を増やし、広範な制御対象物や環境についての情報に基づいて複雑な学習処理を行った場合でも、学習処理の高速化を図ることができる。さらにまた、ユニット群中の各ユニットの連結構造として、垂直方向の階層構造を有する連結構造を用いることにより、上位の階層のユニットより入力されたコマンドを、センサや他のユニットから取得した制御対象物や環境等の情報に基づいて、制御対象物や環境に応じたより詳細な命令を表すコマンドに展開して、出力することができる。
【００２２】
また、ユニット群中の各ユニットと入力側のセンサや他ユニットとの結合を、入力側のセンサや他ユニットからの信号がオンの場合に、出力側の各ユニットにとってプラスの評価要因となる興奮性の結合と、出力側の各ユニットにとってマイナスの評価要因となる抑制性の結合と、出力側の各ユニットにとってプラスの評価要因にもマイナスの評価要因にもならない中性結合とよりなるものとすることにより、出力側の各ユニットが、自ユニットにとって好ましいコマンドと、好ましくないコマンドとを学習することができる。これにより、複雑な学習処理を行うことができるので、複雑な環境に適応することができるようになる。また、出力側の各ユニットにとって、その時点で好ましいか否かが分からない中性結合したセンサ等からの情報を受け取ることができるので、より多くの情報に基づいて正確な学習処理を行うことができる。
【図面の簡単な説明】
【図１】（ａ）は本発明の一実施形態によるコンピュータシステムの基本単位となるユニットを個別に表した図、（ｂ）はユニットを垂直・水平ネットワークで接続した様子を示す図、（ｃ）はユニット内部に格納される入力データと出力コマンドの組み合わせデータのフォーマットを示す図である。
【図２】各ユニット内のＲＯＭに内蔵される学習用のプログラムの制御を示すフローチャートである。
【図３】垂直方向ネットワークにおけるデータの流れを示す図である。
【図４】水平方向ネットワークの接続例を示す図である。
【図５】（ａ）（ｂ）は、それぞれ接続端子合計表と、接続端子合計表を用いた垂直・水平ネットワークの設計例を示す図である。
【符号の説明】
１ユニット
１１垂直方向ネットワーク（垂直方向の階層構造を有する連結）
１２水平方向ネットワーク（水平方向の並列的な連結）[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a computer system having a learning function corresponding to a control object and environment used for controlling a robot or the like.
[0002]
[Prior art]
In a computer system used for controlling a conventional robot or the like, there is a neurocomputer that operates based on a living mechanism based on a neuron and a connection network. A neuron is also called a nerve cell, and is a cell specialized to handle information processing exclusively in a living body. The neuron connected to each other is the neural network of the brain. The neurocomputer performs learning using a neural network model that models the principle of parallel learning information processing of a neural network.
[0003]
[Problems to be solved by the invention]
However, since the conventional neurocomputer as described above has only one CPU used for the learning process, it is difficult to expand the system when the number of control objects increases, and the learning process can be speeded up. There was no problem. In addition, neuronal connections in the neural network of the human brain include excitatory connections that stimulate the excitement of other cells, and conversely, inhibitory connections that try to suppress the excitement of other cells. On the other hand, in the conventional neurocomputer, when the input signal from the sensor corresponding to the neuron is on, it is positive for the output side CPU. There is a problem that complicated learning processing cannot be performed because only excitatory coupling that is an evaluation factor is used.
[0004]
The present invention has been made to solve the above-described problems, and can be adapted to a complicated environment by performing a complicated learning process, and the number of control objects is increased. In such a case, it is an object of the present invention to provide a computer system having a learning function that is easy to expand the system and can speed up the learning process.
[0005]
[Means for Solving the Problems]
In order to achieve the above object, the present invention is a computer system having a learning function according to an object to be controlled and an environment, and is composed of a unit group in which a plurality of units each having a CPU are connected, and each unit in the unit group is connected. The structure includes a horizontal parallel connection structure and a vertical connection structure, and among the units in the unit group, a unit of the highest hierarchy and a unit of the lowest hierarchy are included. Each unit except for the control object and environmental information (hereinafter referred to as first input information) acquired from the sensor connected to the own unit and the control object acquired from the sensor by other units connected in the horizontal direction. Based on the information on the object and the environment (hereinafter referred to as second input information), the previous unit When a force command is an evaluation means for evaluating whether a positive impact for the local unit, the evaluation unit by evaluation of the results was evaluated as for the local unit had a positive effect, Last entered The command, the previous first and second input information, Last output Combination data including storage means for storing combination data with a command, and current input commands input from the first and second input information of this time and a unit of a higher hierarchy is stored in the storage means. Search means for searching whether or not there is past combination data, and combination data including the current input command and the current first and second input information as a result of the search by the search means, When there is past combination data stored in the means, the output command in the corresponding combination data is determined as a command to be output to the lower hierarchy unit. As a result of the search by the search means, when combination data including the current input command and the current first and second input information does not exist in the past combination data stored in the storage means, In the past combination data stored in the storage means, a search is performed as to whether or not there is similar data of combination data including the current input command and the current first and second input information. As a result, when similar data exists in the past combination data, an output command in the similar data is determined as a command to be output to a unit in a lower hierarchy, and the similar data is included in the past combination data. If it does not exist, the randomly generated output command is determined as the command to be output to the lower hierarchy unit. And determining the type of coupling between each unit in the unit group and the input side sensor or other unit when the signal from the input side sensor or other unit is on. Excitability coupling that is a positive evaluation factor for each unit, inhibitory coupling that is a negative evaluation factor for each output unit, and a positive evaluation factor and a negative evaluation factor for each output unit And the evaluation means is the type of coupling with the corresponding input side sensor or other unit when the signal input from the input side sensor or other unit to the own unit is on. If the coupling is excitatory, a predetermined value is added to the evaluation value, and if the coupling type with the corresponding input sensor or other unit is inhibitory coupling, the predetermined value is mapped from the evaluation value. If the total of the evaluation values is calculated by negative, and the total is positive, the command that the previous unit output last time for the command that was previously input from the higher-level unit has a positive effect on the local unit. It was made to evaluate that it brought about.
[0006]
In the above configuration, information on the control object and environment acquired from the sensor connected to the own unit, and information on the control object and environment acquired by the other unit connected in the horizontal direction from the sensor, etc. Based on the above, it is possible to evaluate whether or not the command previously output by the own unit with respect to the command previously input from the upper layer unit has a positive effect on the own unit. Therefore, by repeating this evaluation, each unit can learn and output an output command that has a positive effect on the unit for each type of input command. As a result, even if the number of controlled objects increases, an output command that has a positive effect on the increased controlled objects with a simple procedure of adding and connecting units corresponding to the increased controlled objects in the horizontal direction. Therefore, the system can be easily expanded according to the increase in the number of control objects. Even when the system is expanded in this way, each unit in the unit group can perform the learning process for each unit using the CPU in each unit, so that the learning process can be speeded up. Can do. Furthermore, since a connection structure having a vertical hierarchical structure is used as the connection structure of each unit in the unit group, it is obtained from a sensor connected to the own unit or other units connected in the horizontal direction. Based on information such as the control object and the environment, a command input from a higher-level unit can be expanded into a command representing a more detailed command corresponding to the control object and the environment.
[0007]
In addition, the coupling between each unit in the unit group and the input side sensor or other unit is an excitement that becomes a positive evaluation factor for each output side unit when the signal from the input side sensor or other unit is on. The combination of sex, the inhibitory combination that is a negative evaluation factor for each unit on the output side, and the neutral combination that is neither a positive evaluation factor nor a negative evaluation factor for each unit on the output side Also good. As a result, each unit on the output side uses an inhibitory coupling and a neutral coupling in addition to the excitatory coupling that is used in a computer system having a conventional learning function. Since the information about the control object and the environment from the unit can be received, in addition to the commands preferable for each unit on the output side, it is possible to learn commands not preferable for each unit on the output side. Thereby, a complicated learning process can be performed. In addition, since each unit on the output side can receive information from a neutrally-coupled sensor or the like that is unclear as to whether or not it is preferable at that time, learning processing can be performed based on more information.
[0008]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, a computer system having a learning function according to an embodiment of the present invention will be described with reference to the drawings. The computer system according to the present embodiment is incorporated in a robot or the like and used for controlling an actuator such as an arm. FIG. 1A is a diagram individually showing units as basic units of the computer system according to the present embodiment, FIG. 1B is a diagram showing a state in which the units are connected in a vertical / horizontal network, and FIG. It is a figure which shows the format of the combination data of the input data and output command which are stored. A unit body 2 of the unit 1 stores a CPU, RAM, and ROM (not shown), and an input port 3 and an output port 4 are provided on the input side and output side of the unit 1, respectively. ing. The CPU in the unit main body 2 receives from the input port 3 commands input from the higher-level unit 1 and information on the control target and environment from the sensor system connected to the own unit and other units. Then, using a learning processing program stored in the ROM, data of a combination of input data and output command that has a positive effect on the unit shown in FIG. 1C is accumulated in the RAM. Then, based on the accumulated data, an output command preferable for the unit corresponding to the input data is output from the output port 4. Each unit 1 constituting the vertical / horizontal network 10 shown in FIG. 1B repeats such processing, so that each unit 1 learns and outputs a command corresponding to the environment and the controlled object. From the output port 4 of the unit 1 of the uppermost layer (third layer) in the vertical / horizontal network 10, a preferred output command for all the units 1 is sent to a drive device such as an arm. Can be output.
[0009]
As a sensor used in the sensor system shown in FIG. 1 (b), sensors such as temperature, humidity, contact, luminous intensity, and voice can be considered, and information necessary for grasping the environment and the state of a controlled object is captured. There is no particular limitation as long as it can be used. However, sensors having an ON / OFF contact output are used for these sensors. Further, the indirect output shown in FIG. 1B is used as an indicator for grasping the status of the unit 1 in each layer.
[0010]
Next, input terminals of the input port 3 will be described. In each input terminal, binary data represented by 0 or 1 is input. In the case of a terminal for inputting information from a sensor, 1 represents ON and 0 represents OFF. In the case of a command input terminal, an instruction code expressed in binary number is input. Data input from the sensor input terminal is used as an evaluation value of the command output immediately before. The sensor input terminals include three types of input terminals: excitement terminals, suppression terminals, and neutral terminals. When the data input to the excitement terminal is 1 (ON), it is evaluated that the command output by the unit 1 for the previously input data has a positive effect on the unit 1. On the other hand, when the data input to the suppression terminal is 1, it is evaluated that the command output by the unit 1 for the previously input data has a negative effect on the unit 1. Further, data input to the neutral terminal is not treated as an evaluation value for the unit 1. This neutral terminal is mainly used as a connection port of the vertical / horizontal network 10 and is used for command transmission, but for connection with a sensor that detects factors that are not positive for the unit 1 at present. Also used for.
[0011]
Next, a learning program incorporated in the ROM in each unit 1 will be described with reference to the flowchart of FIG. The program shown in the figure resides in all units 1 in the vertical / horizontal network 10. The CPU in each unit 1 receives input data composed of sensor information and a command, and outputs a command preferable for its own unit. First, the CPU reads ON / OFF of the input data input to the input port 3 (# 1), and based on the sensor information input from the excitement terminal and the suppression terminal, the unit 1 detects the previous input data. It is evaluated whether or not the command outputted in this way has brought a positive environment for its own unit 1 (# 2). Specifically, when the data input to the excitement terminal is 1, 1 is added to the evaluation value, and when the data input to the suppression terminal is 1, 1 is subtracted from the evaluation value. As a result, the previous input / output data stored in the temporary memory in the unit 1 according to the data format shown in FIG. 1 (c) only when the sum of the evaluation values of the data input to the terminals is positive. Are stored in the RAM inside the unit 1.
[0012]
Next, the CPU searches whether the same input data as the input data read this time is in the past combination data in the RAM (# 3), and when the same input data exists (# 4) YES), an output command that is paired with the corresponding input data in the past combination data is set as bit data corresponding to the command in the output data (# 5). If the same input data does not exist in the past combination data in the RAM (NO in # 4), a search is performed to determine whether or not similar data exists in the past combination data (# 6). Is present (YES in # 7), the output command in the similar data is set as bit data corresponding to the command in the output data (# 5), and if there is no similar data (# 7), an output command is randomly created and set as bit data corresponding to the command in the output data (# 8). Next, the CPU edits bit information corresponding to the sensor information in the output data based on the sensor information in the input data read in # 1, and outputs the output data including the output command and the sensor information to each unit. The output is output to the output port 4 while synchronizing the output between them (# 9). At this time, the combination data of the current input data and output data is stored in a temporary memory inside the unit 1.
[0013]
When the control object and the environment become wide, and when the control and environment for the control object become complicated, it is necessary to increase the number of units 1 to be connected. This connection method includes a vertical connection and a horizontal connection. First, a data flow at the time of vertical connection will be described with reference to FIG. FIG. 3 is a diagram illustrating a data flow in the vertical network. The vertical network 11 expands the commands input from the upper layer unit 1 into more detailed commands according to the information regarding the environment and the control target input from the sensor connected to the unit 1 of each layer. Thus, it is possible to create a purposeful output command according to the environment and the controlled object.
[0014]
As shown in FIG. 3, the vertical network 11 is connected in series in a hierarchy. The data including the command is output from the upper layer unit 1 to the lower layer unit 1, that is, from the (n + 2) th layer to the (n + 1) th layer and the nth layer. As shown in the figure, if a command “01” is output from the (n + 2) th layer to the (n + 1) th layer, the CPU of the (n + 1) th layer that has received this “01” as input data The data is searched, and an output command “0100” corresponding to the input data “01” is output to the nth layer. Similarly, the n-th layer CPU searches past data in the RAM in its own unit and outputs an output command “01101110” corresponding to the input data “0100” to the control object. This uses the hierarchical structure of the vertical network 11 and past data in the RAM to compress the output pattern output from the n + 2 layer of the top layer from 8 digits to 2 digits to 1/4. It can be said. For example, the operation of “walking” instructed in the (n + 2) th layer is decomposed into operations of “moving the right foot back and forth” and “moving the left foot back and forth” in the (n + 1) th layer. This leads to decomposition into the simplest output unit patterns (output commands) such as “contract the right quadriceps”, “contract the right soleus”, “left...”, Etc. Even when the same “walk” command (“01”) is output from the (n + 2) th layer, it is possible to select and create an output command by taking into account information input from the sensor in the unit 1 of each layer. A command representing an accurate “walking” operation suitable for the user can be output from the unit 1 in the lower hierarchy. For example, when the “walking” place is a slope, “walk with changing heel angle” Thus, it is possible to realize an accurate “walking” operation. On the other hand, since unit 1 in the higher hierarchy can prepare a set of actions such as “walking” and “turn”, a command corresponding to sensor input information in unit 1 in the lower hierarchy It is possible to select a purposeful action according to the object and the environment.
[0015]
The difference between the units 1 of each layer in the vertical network 11 is that the length of processing time required for one loop is different. In the lower layer, the time required for one loop is the time in the higher layer. It is shorter than that. Each unit 1 automatically sets this time. The position of the command input terminal in the input port 3 in each unit 1 is determined in advance.
[0016]
Next, the flow of data during horizontal connection will be described with reference to FIG. FIG. 4 is a diagram illustrating a connection example of the horizontal network. When two or more units 1 have partially the same object or environment to be controlled, they are connected horizontally in this way to provide information to each other or unilaterally. The number of sensors connected to the unit 1 is increased, and the combined units 1 can accurately determine the output command based on more information.
[0017]
In FIG. 4, the unit 1 of the m-th station and the unit 1 of the m + 1-th station are connected in the horizontal direction, the output of the m-th station is used as the input of the m + 1-th station, and the output of the m + 1-th station is m-th. They are mutually input as station inputs. In the case of FIG. 4, a value “0” is input from the m-th station to the m + 1-th station, and a value “1” is input from the m + 1-th station to the m-th station. It is not necessary to grasp all the information from the sensor connected to the m + 1st station, that is, the situation where the m + 1st station is located in the mth station. This is because each unit 1 can respond by autonomously learning the status of other units from past data. The number of units 1 related to the horizontal connection in the horizontal network 12 may be two or more, and may be any number as long as the number of terminals permits. However, since each unit 1 repeats learning in the relationship of (input data-output command), a minimum external sensor input terminal for grasping the situation necessary for each unit 1 itself is left. There is a need. As the connection terminal, any type of excitement / suppression / neutral terminal can be used in the same manner as the sensor connection terminal. When input information about a certain control object or environment has an effect to synchronize for each of the two connected units, it is connected to the exciter terminal, and if it has a contradictory effect, it is connected to the suppression terminal. If not, the basic point is to connect to the neutral terminal, but this is not a limitation. Just because one is connected to the excitement terminal, the other need not be connected to the excitement terminal, and may be connected to another type of terminal or not connected to any terminal.
[0018]
Next, a method for designing the vertical / horizontal network 10 will be described with reference to FIGS. 5A and 5B are diagrams showing design examples of the vertical / horizontal network 10 using the connection terminal total table and the connection terminal total table, respectively. When designing the vertical / horizontal network 10, it is necessary to grasp the connection relationship of the input / output terminals between the units 1. The connection terminal total table shown in FIG. 5A is a table for clarifying the number of input / output connection terminals and connection relationships of each unit 1 and for use in the design of the vertical / horizontal network 10. 5B, the command is connected according to the system of the vertical network 11, and the sensor is a sensor directly connected according to the system of the horizontal network 12. Connect with the number of connections and the connection to the connection unit in the horizontal direction. It is easier to understand if the unit name in the connection terminal total table is entered so that the level and the horizontal position can be understood. In the case of FIG. 5B, the hierarchical position is indicated by a number, and the horizontal position is indicated by an alphabet. The connection number column in the connection terminal total table is for entering the number of terminals. A breakdown of the number of terminals is entered at the point where the output from one unit 1 branches and the point where the outputs from two or more units 1 merge. These breakdowns must match without contradicting the number of input terminals and the number of output terminals.
[0019]
As described above, according to the computer system having the distributed learning function according to the present embodiment, each unit in the unit group is input based on information on the current environment of the own unit obtained from the sensor and other units. It is possible to evaluate whether or not the previous output command of the own unit for the issued command has a positive effect on the own unit. Therefore, by repeating this evaluation, each unit can learn and output an output command that has a positive effect on the unit for each type of input command. As a result, even if the number of controlled objects increases, an output command that has a positive effect on the increased controlled objects with a simple procedure of adding and connecting units corresponding to the increased controlled objects in the horizontal direction. Therefore, the system can be easily expanded according to the increase in the number of control objects. In addition, by using a connection structure having a vertical hierarchical structure as a connection structure of each unit in the unit group, such as an environment acquired from a sensor connected to the own unit or other units connected in the horizontal direction, etc. Based on the information, a command input from a higher-level unit can be expanded and output as a command representing a more detailed command corresponding to the control object or the environment.
[0020]
The present invention is not limited to the above embodiment, and various modifications are possible. For example, in the above embodiment, when the previous output command for the previous input data is positive evaluation, the combination data of the previous input data and the output command is unconditionally saved in the RAM in the unit 1. In addition to the previous combination data, the evaluation value when the corresponding output command is issued is also stored in the RAM, and when the combination data corresponding to the previous input data is already stored in the RAM, By comparing the evaluation value when the output command is issued with the evaluation value when the previous output command is issued, the combination of the input data and output command stored in the RAM is determined alternatively. May be. As a result, it is possible to more accurately learn a combination of input data and output command preferable for the unit.
[0021]
【The invention's effect】
As described above, according to the present invention, each unit in the unit group can evaluate whether or not the previous output command of the own unit with respect to the previously input command has a positive effect on the own unit. Therefore, by repeating this evaluation, each unit can learn and output an output command that has a positive effect on the unit for each type of input command. As a result, even if the number of controlled objects increases, an output command that has a positive effect on the increased controlled objects with a simple procedure of adding and connecting units corresponding to the increased controlled objects in the horizontal direction. Therefore, the system can be easily expanded according to the increase in the number of control objects. Even when the system is expanded in this way, each unit in the unit group has a CPU, and the learning process can be performed for each unit. Even when a complicated learning process is performed based on information about a control object and an environment, the learning process can be speeded up. Furthermore, by using a connection structure having a vertical hierarchical structure as a connection structure of each unit in the unit group, a command input from a higher-level unit is acquired from a sensor or other unit. Based on information such as the object and the environment, it can be expanded into a command representing a more detailed command corresponding to the control object and the environment and output.
[0022]
In addition, the coupling between each unit in the unit group and the input side sensor or other unit is an excitement that is a positive evaluation factor for each output side unit when the signal from the input side sensor or other unit is on. And a combination of restraints that is a negative evaluation factor for each unit on the output side, and a neutral combination that is neither a positive evaluation factor nor a negative evaluation factor for each unit on the output side. By doing so, each unit on the output side can learn a command that is preferable and a command that is not preferable for the unit. Thereby, since complicated learning processing can be performed, it becomes possible to adapt to a complicated environment. In addition, since each unit on the output side can receive information from a neutrally-coupled sensor or the like, which is not preferable at that time, it is possible to perform accurate learning processing based on more information it can.
[Brief description of the drawings]
FIG. 1A is a diagram individually showing units as basic units of a computer system according to an embodiment of the present invention. FIG. 1B is a diagram showing a state in which units are connected in a vertical / horizontal network. ) Is a diagram showing a format of combination data of input data and output command stored in the unit.
FIG. 2 is a flowchart showing control of a learning program built in a ROM in each unit.
FIG. 3 is a diagram showing a data flow in a vertical network.
FIG. 4 is a diagram illustrating a connection example of a horizontal network.
FIGS. 5A and 5B are diagrams illustrating design examples of a vertical / horizontal network using a connection terminal total table and a connection terminal total table, respectively.
[Explanation of symbols]
1 unit
11 Vertical network (concatenation with hierarchical structure in the vertical direction)
12 Horizontal network (horizontal parallel connection)

Claims

In a computer system having a learning function according to the controlled object and environment,
Consists of a unit group in which a plurality of units having a CPU are connected,
The connection structure of each unit in the unit group includes a parallel connection structure in the horizontal direction and a connection structure having a hierarchical structure in the vertical direction,
Of the units in the unit group, each unit excluding the unit of the highest hierarchy and the unit of the lowest hierarchy,
Control object and environment information (hereinafter referred to as first input information) acquired from the sensor connected to the own unit, and control object and environment information acquired from the sensor by other units connected in the horizontal direction. (Hereinafter referred to as “second input information”), whether the command output by the unit last time with respect to the command previously input from the upper layer unit has a positive effect on the unit. An evaluation means to evaluate;
As a result of the evaluation by the evaluation means, when it is evaluated that the unit has a positive effect, the previous input command, the previous first and second input information, and the previous output command Storage means for storing combination data;
Whether or not combination data including the current first and second input information and the current input command input from a higher-level unit exists in the past combination data stored in the storage means. A search means for searching;
As a result of the search by the search means, when combination data including the current input command and the current first and second input information are present in the past combination data stored in the storage means, An output command in the combination data is determined as a command to be output to a lower hierarchy unit, and as a result of the search by the search means, combination data including the current input command and the current first and second input information is obtained. When the combination data stored in the storage means does not exist in the past combination data, the current input command and the first and second input information of the current combination are stored in the past combination data stored in the storage means. A search is performed to determine whether there is similar data in the combination data to be included. As a result, similar data exists in the past combination data. In this case, the output command in the similar data is determined as a command to be output to a unit in a lower hierarchy, and when similar data does not exist in the past combination data, an output command created at random is Determining means for determining as a command to be output to a unit in a lower hierarchy ,
A positive evaluation factor for each unit on the output side when the type of coupling between each unit in the unit group and the sensor or other unit on the input side is ON when the signal from the sensor or other unit on the input side is ON. Excitability coupling, suppression coupling that is a negative evaluation factor for each unit on the output side, and neutral coupling that is neither a positive evaluation factor nor a negative evaluation factor for each unit on the output side Configured,
The evaluation means includes
When the signal input from the sensor on the input side or other unit to the unit is on and the type of coupling with the sensor on the input side or other unit is excitatory coupling, the evaluation value is a predetermined value. If the type of coupling with the sensor or other unit on the input side is inhibitory coupling, the total of the evaluation values is obtained by subtracting a predetermined value from the evaluation value, and this total is added. In this case, the distributed learning function is characterized in that the command output by the previous unit from the previous command input from the upper layer unit has been evaluated as having a positive effect on the own unit. A computer system.